Yahoo Extends Meta Crawl Tags via HTTP Headers
Yahoo has extended support for crawler control to HTTP headers (at least in Slurp):
Today we're announcing support for tags that give webmasters even more flexibility over which pages and documents are crawled and indexed by Yahoo! Search. Specifically, we're extending our support of page level exclusion tags -- NOINDEX, NOARCHIVE, NOSNIPPET, NOFOLLOW -- to provide additional control for archiving and summarization of ANY file type. Previously, these page level tags could only be expressed within html pages through the META directive (for e.g. <META NAME="Slurp" CONTENT="NOARCHIVE">), but based on feedback from our webmasters, Yahoo! now enables these tags to be expressed through X-Robots-Tag directive in the http header, giving webmasters the flexibility to achieve exclusions on PDF, Word documents, PowerPoint, video, and other file types,
I think we'll go ahead and implement this in Spinn3r.
This seems like a reasonable extension. I wish they would have implemented their Robots-Nocontent extension enabled for easier parsing.
The way it's currently written you have to use a context free parser which are difficult to write.
This is nice article. Yahoo shows a URL reference and cached link, but no snippet. Clicking on the cached link returns the cached page.
=====================================
gordongreg
california dui
Posted by: gordongreg | September 08, 2008 at 01:41 AM