Johannes Beus
Looking through some server-logfiles, I noticed that by now even Microsoft and Google are not reluctant anymore to use dubious methods to advertise their searchengines. While in the past the referrer-spam, which is the automatic visitation of websites with a set referrer of the advertised site was reserved for erotic- and Viagra-sites, now it seem that the two searchengine-giants want to enhance their name recognition in this way, too.
All jokes aside, both Microsoft and Google are checking sites for cloaking on a referrer-base. For this they access the sites with a referrer that looks as if it just came from their searchengine. For Google the logfiles look like this:
crawl-66-249-66-243.googlebot.com - - [07/Feb/2008:13:10:35 +0100] "GET /news/234-neue-msn-suche-online.html HTTP/1.1" 200 8223 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
74.125.16.67 - - [07/Feb/2008:13:34:52 +0100] "GET /news/234-neue-msn-suche-online.html HTTP/1.1" 200 8223 "http://www.google.com/search?q=abc" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7"
First is the normal Googlebot with the usual characteristics, soon thereafter a visitor with an IP that accidentally belongs to Google shows up with a referrer which suggests that he was searching for “abc”. Google then compares both pages and decides if the page is cloaking. Microsoft is not being amateurish by providing referrers that still have a connection to the sites topic or, at least, come from a rotation of several search-queries. Lets hope that in the future, not every searchengine comes up with the plan to test something like this which would make the server-logfiles completely useless.