Search Engines & SEO Blog
Googlebot's groundhog dayJohannes Beus
I admit it, I suffer from an advanced case of logfile-addiction. A Tail-Grep-variation for one of my projects can virtually always be found on my right screen. Early in the morning its Microsoft's unagitated spider, after the first cup of coffee its usually Google and if I feel like being really crazy, I will even look at the behavior of the Yahoo-Robot in textform, from time to time. I should not be the only one who noticed that Google exhibits the most “intelligent” crawling-behavior by far: pages are crawled, depending on update-frequency as well as universal importance and put into the index. It is interestingly enough that now, a Google patent from the year 2003 has been released that is concerned with just these backgrounds of crawler proceedings.Applied for in the middle of 2003 and already being appointed a US-patent with the number 7,308,643 in November 2007, it is describing which criteria a searchengine-operator can chose to decide on which URLs are being crawled and how often this should happen. The authors have divided the URLs in the index of a searchengine into three distinct categories: In the standard category, in which all addresses end up to begin with they use a crawl frequency that all the available addresses are renewed once within a defined time-frame – as far as I remember in 2003 they still renewed the Googleindex about once a month. Above that category we have one in which URLs are updated on a daily basis and another step up we have one that runs “real-time” – here the crawling is supposed to be much more frequently. Which category a URL is being sorted to depends on two factors: PageRank as well as the update-frequency of the site. Two assumptions that have been on the table for a long time but, as far as I know, have never before been confirmed. The patent likewise describes how the searchenginecrawler can adjust the crawling-frequency depending on the time between the request and the response from the servers – slow servers can therefore amount to a problem from a SEO point-of-view. If we now consider that the innovation-cycle in the searchengine-sector is considerably less than the four years that have passed since the submission of the patent and that I include the “behavior” of the Googlebot, then it seems that these rigid categories do not exist anymore. Now, if you have a site whose content is rather static through which one of the above mentioned factors becomes absent, it may be sensible to take a closer look at your sites crawler-frequency if you wish to have more information than the 0 to 10 in your toolbar ...
|














