The Web, sites in German, sites from Germany

Johannes Beus
Thomas asked me if I ever analysed how the origin of an IP-address or the top-level-domain have any impact on the ranking for Google Germany. I had not but it is an interesting subject – therefore, here the results. In each case, I rank the tests with the three search-possibilities “the web”, “sites in German” and “sites from Germany”, in the differences of those we are able to make some interesting observations.

Search: the web

First the evaluation through top-level-domain. Just as for the following tests, we took 10.000 keywords and analyzed the first 100 results for each – that makes about 1 million points per diagram. There is no claim to the correctness of the data and especially not for the conclusions we draw from them, everybody should consider this themselves. Shown here is the amount of the particular top-level-domains for the searchresults on the respective position.

Top-Level-Domains

It is nice to see that the share of the Domains are quite constant, just for the first ranks we get a break in the DE-domain while the CNO (Com/Net/Org) wins considerable ground. Now, if we construct the diagram that all domains except Wikipedia.org are being evaluated we can see why: Wikipedia's colossal strength, which I already mentioned a few days ago, is enough to distort the assessment of even such a rather broad databasis. Neither Amazon.com nor ebay.de have such a strong pull on the SERPs as Wikipedia. Though this is quite useful for possible conclusions from this diagram, seeing that otherwise you might believe that DE-domains are enjoying a certain edge in the SERPs – Wikipedia.org puts this thought to rest. For reasons of completeness and for statistic-freaks, here the overview of the SERPs-wide ratio of different domains.

Domains Top 7Domains 8 - 30

A second test aims at the origin of the IP-address. I can assume that it is known that every host has a IP-address and this IP can be matched to the country in which it is hosted. For the European region these allocations are being done by RIPE which led me to use their data.

IP-Adresse

the “international” line represents all the IP-addresses that are not managed by Ripe. In this case they came nearly exclusively from the USA. Even if it is harder to see in the small diagram, due to the scaling, looking at the large diagram or the raw-data, at the latest, will show that German IP-addresses are represented noticeably more often in the anterior ranks while US-IPs are clearly decreasing. The question whether this is caused by the fact that operators of websites with German IPs are conducting “better” SEO or if the origin of an IP matters to the algorithm, has to be answered by everyone for themselves.

Search: sites in German

While the general websearch was still able to find documents that were not in German, the search for “sites in German” can only find just those. What is interesting in the context is how Google determines the language of a site and if you can possibly get an advantage in the ranking by using hints that point to the right language. I checked the HTTP-headers, which point to language (“content-language”), on the one hand and Meta-tags on the other.

HTTP-Header und Meta-Tags

we can see – nothing. Putting the language into either the HTTP-header or the Meta-tag does not have an effect on the SERPs. This should not come as a surprise to most but since I already have those data ... just as before, we removed Wikipedia out of the assessment for the HTTP-headers since I would have falsified the results too much. You should still post these informations since here we only measured if there is a positive effect on the ranking through the posting. If Google fails to notice the site as German, because of too few words or too much English text, it will not even find its way into the index.

Search: sites from Germany

While it is still rather easy to think about how to get into the index that is behind the search for “sites in German” it starts to get more difficult with the search for “sites from Germany”. If you take the share of pages with a German IP from the index of the “sites from Germany” and compare them to the percentage of the general websearch, you will see that this seems to be a possible factor for the inclusion in this index.

Anteil deutscher IP-Adressen

This thought is being braced by the fact that the German Wikipedia, which is being hosted in the Netherlands with a IP from the Netherlands, is not included in this index. The fact that there must be other factors that decide on the inclusion into the index of German sites shows that not 100% of the sites have a German IP. We can especially see that “international” IP-Addresses like those from the USA who have a DE-domain get into the index. One train of thought that goes in another direction is that the origin of links that point to the site have an effect. After consultation with the server that evaluates these statistics and after taking a quick look at the thermometer I an foreseeing an evaluation – for only 100 checked backlinks per page this would mean 100 million data points, nothing for 90° F.

Conclusion

It is not surprising that this is not an easy subject which can be solved with one of those popular Digg-top-10-lists. It seems that at the moment the best advice that can be given is to correctly adjust many of the signals that inform Google of the sites origin. The top-level-domain does not seem to play a huge role and it does not hurt anyone to tune the language, HTTP-and Meta-headers to German but might be supportive. It also seems to be sensible to choose a provider that offers native IP-addresses for the respective countries. In the future it would be interesting to evaluate the effects that the origin of the links for a site have on its ranking, though this would lead to a large mass of data.
Johannes Beus - on Fri (07/20/2007) at 13:03 PM

Add Comment

more
This posting is older than 30 days and therefore closed for new comments.