Search Engines & SEO Blog
Yahoo with new crawler-versionJohannes Beus
In the course of their restructuring of large parts of their search-infrastructure Yahoo finally embraced their crawler. The worlds largest Hadoop-installation, which Yahoo uses as the foundation for their websearch (10k CPUs, 5 PetaByte harddrive space) is now being filled by “Slurp/3.0”. The crawler is already active and I was able to observe it in the wild in werbserver-logs.llf320021.crawl.yahoo.net - - [15/Apr/2008:03:16:05 +0200] "GET /news/ HTTP/1.0" 200 34962 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)"Yahoo seems to have remedied some of the points of criticism for their “old” Slurps. It appears that the new version's actions are by far more planned and bandwidth-conserving. Hopefully quirks, like the omission of the “trailing-slash” for directories, are finally a thing of the past. The new crawler works from new IP-addresses which means that those of you who run cloaking on IP-basis should now, at the latest, think about changing to the “DNS-ReverseDNS-method” to identify searchengine-crawlers.
|














