.

This is Spinn3r's offficial weblog where we discuss new product direction, feature releases, and all our cool news.

Spinn3r is a web service for indexing the blogosphere. We provide raw access to every blog post being published - in real time. We provide the data and you can focus on building your application / mashup.

Spinn3r handles all the difficult tasks of running a spider/crawler including spam prevention, language categorization, ping indexing, and trust ranking.

If you'd like to read more about Spinn3r you could read our Founder's blog or check out Tailrank - our memetracker.

Spinn3r is proudly hosted by ServerBeach.

Archives

September 2009
July 2009
June 2009
May 2009
April 2009
February 2009
January 2009
December 2008
October 2008
September 2008

Yahoo Extends Meta Crawl Tags via HTTP Headers

Yahoo has extended support for crawler control to HTTP headers (at least in Slurp):

Today we're announcing support for tags that give webmasters even more flexibility over which pages and documents are crawled and indexed by Yahoo! Search. Specifically, we're extending our support of page level exclusion tags -- NOINDEX, NOARCHIVE, NOSNIPPET, NOFOLLOW -- to provide additional control for archiving and summarization of ANY file type. Previously, these page level tags could only be expressed within html pages through the META directive (for e.g. <META NAME="Slurp" CONTENT="NOARCHIVE">), but based on feedback from our webmasters, Yahoo! now enables these tags to be expressed through X-Robots-Tag directive in the http header, giving webmasters the flexibility to achieve exclusions on PDF, Word documents, PowerPoint, video, and other file types,

I think we'll go ahead and implement this in Spinn3r.

This seems like a reasonable extension. I wish they would have implemented their Robots-Nocontent extension enabled for easier parsing.

The way it's currently written you have to use a context free parser which are difficult to write.

Comments

gordongreg

This is nice article. Yahoo shows a URL reference and cached link, but no snippet. Clicking on the cached link returns the cached page.
=====================================

gordongreg

california dui

Post a comment

Comments are moderated, and will not appear on this weblog until the author has approved them.

If you have a TypeKey or TypePad account, please Sign In.