Google Caffeine: first impressions of the next generation

Johannes Beus
From today on, Google lets you take a look at the finished pieces of their new Google-foundation, which goes by the working title Caffeine. The first reports and blogposts on this subject often talk about this being a reaction to the Yahoo/Microsoft deal – but I think, that Google has learned a lot in the last 3,5 years (when the last infrastructure-update went live), there are at least some new requirements and they just want to implement them in this new system.

Seeing how no one else seems to have done this, I took a closer look at the beta and tried to extract some interesting data from it. I based these diagrams and tables on about 10.000 English keywords which were first queried from both Google.com and the Caffeine-sandbox and then compared. At the moment, the beta seems to only work reliably in the English version, seeing how German keywords will often return weird or incorrect results.

The first thing I looked at was the speed that it took Google to find results and return them. While Google has always been know not to waste time in searching the index, they still seem to have been able to make some major improvements:


Shown here is the time that it took to return a result against the rate of returns that were answered during that time. This makes the area under the curves the interesting part – where Caffeine is noticeably faster than the already quick Google. Both graphs show an interesting progress (Warning: anything from this point on will be undifferentiated and insubstantial guesswork): I fancy, this is probably the result we would see, if we had both a first as well as a second (“supplemental”) index.

The second evaluation has to do with the amount of documents that were found for a searchquery. This diagram shows which index showed more results:


We can see nicely, that, about two-thirds of the time, the Google-beta returns more pages than the real index. Only about 2,2 percent of the queries have exactly the same number of found documents. This should impact the ranking in the long-tail especially, since the algorithm will have many more pages to sift through, which should lead to better results.

Because the beta-index results can not be compared to the well-known data from the German Google-index, I calculated kind of a “small” Visibilityindex for them. Here we have some well-known domains and how they fare, as far as visibility is concerned, compared to the actual Google-index.

Well-known Domains
wikipedia.org+8,02%
yahoo.com-16,67%
amazon.com+27,87%
tripadvisor.com-3,49%
nextag.com-3,66%
google.com+0,25%
youtube.com-1,09%
about.com-3,38%
shopstyle.com-6,56%
bizrate.com-3,71%
yelp.com+29,17%
ebay.com-16,25%
zappos.com+21,90%
blogspot.com+28,28%
consumersearch.com+0,23%
answers.com+16,26%
facebook.com+14,37%
linkedin.com-36,17%
shopping.com-3,84%
aol.com-21,12%
epinions.com-10,21%
merchantcircle.com-27,14%
virtualtourist.com+46,94%

Contrary to the “perceived assumption”, Wikipedia will get another boost. Yahoo and AOL, both large content-portals, may feel like they are the biggest losers of this (if the emphasis stays the same as in the current beta-index).
Johannes Beus - on Wed (08/12/2009) at 00:59 AM

Add Comment

more
This posting is older than 30 days and therefore closed for new comments.