Johannes Beus
In his blog, Adam Doppelt compiled some interesting data and comparisons of the crawling-behavior of Google's and Yahoo's bots. For this he analyzed the logfiles for urbanspoon.com, a restaurant-guide for the USA. While the Googlebot is crawling the site with a nice consistency, Yahoo causes extreme peaks. Another surprising finding is that Google only requested 1,4 percent of all pages twice while it were 38 percent for Yahoo. I fully agree with Doppelt's conclusion that Yahoo still has a long way to go to catch up to the market leader. If I scan the logfiles of some of our larger projects for the “behavior” of Yahoo's Slurp I am able to find many oddities. What irritates me the most is the fact that Yahoo generally leaves off trailing-slashes regardless of how the site is linked. Instead of accessing /directory/subdirectory/ it will first access /directory/subdirectory – should the server-configuration as well as all Mod_Rewrite-rules be correct, then there should be a 302-redirect to the correct directory, otherwise there will be duplicate-content or an error-page. The behavior of the searchenginecrawlers is one of the few aspects in which we can view Google's 90% market-domination as beneficial – that way you do not really have to worry much about the flawed programming of the other searchengine's crawlers.

Nice marginal web-finding: in their article on Internet-speech, Spiegel Online exposed the following sentence as typical for the German Internet-speech: “The Hijacking-problem could be easily avoided with a 301 header-redirect.”

Johannes Beus

Johannes Beus, Founder and CEO of SISTRIX, has been interested in the optimisation of websites for searchengines since 2001. In 2003 he started to regularly publish summaries of his evaluations and share his thoughts on the SEO-sector on one of the oldest German SEO-blogs.
Johannes Beus - on Thu (06/28/2007) at 11:58 AM

Add Comment

more
This posting is older than 30 days and therefore closed for new comments.