Register / Login

Supplemental Index – second class websites?

Ever since its implementation in 2003, the so called “supplemental index” has been the subject of numerous questions and discussions: What exactly is the supplemental index, why is a site included in it and how can it get out of it, what repercussions does the supplemental index have on the ranking of the sites? Since this blogposting ended up being rather comprehensible, I decided to split it into three postings over three days.

Google themselves write that the “supplemental results” originate in the “supplemental index”. This is where all those sites that are not fully compliant with the requirements for the normal index are stored. As an example they note that maybe the number of parameters was too large to get included in the first index and therefore the site will be found in the second index. The assignment whether its the first or the second index is being done automatically.

Even though Google has removed the counter from their homepage that displayed the size of their index, we can assume that there are notably more than 10 billion websites in the Googleindex. If we were to believe the estimations of the experts on Wikipedia, then the “deep web”, which is the part of the Internet which is not indexed by searchengines like Google, should be around 500 billion websites. Seeing that even Google is subject to technological-constraints and the fact that more indexed sites does not necessarily mean an increase in the quality of the index, they have to consider which sites to include. It seems that Google is counting on a two-tiered approach: first and second – also called supplemental – index. Since there are quite restrictive qualifications for the inclusion into the first index, sites, which might still include the wanted informations, are not being accepted. The second index, where the acceptance criteria are lowered, is for just these sites.

To figure out if and if so, how many pages of a domain are in Google's supplemental-index at the moment, you can currently (Google changes this from time to time) use the following query: “site:domain.tld ***-gjfhgh”. For this domain the query returns around 40 pages at the moment – where there are about 770 in the first index. It is extremely rare for any domain to have no pages what so ever in the supplemental index, there are always some to be found – even for Wikipedia and Google themselves. The ratio between the pages in the first and second index can be used as an indicator for possible problems with the domain. For this you divide the number of pages in the supplemental index by the number of pages in the first index. The closer this number is to 1, there more pages of the domain are affected. While this domain is doing rather well with around 6% of all pages, linkcatalogues like “linkheim.de”, for example, have values of 70 percent and more. For extremely large sites with a couple of hundreds of thousand pages, this method is sadly not working reliably anymore – at that number, Google is not accurately “guessing” the number of pages in the supplemental index anymore.

Tomorrow we will continue with the possible reasons that can cause a relocation from the first index to the second.


Part I: Supplemental index – second class websites?
Part II: Supplemental index – how did I get here?
Part III: Supplemental index – how can I escape the Google Hell?
Johannes Beus - on Tue (06/05/2007) at 10:14 AM

Comments closedComments closed
This posting is older than 30 days and therefore closed for new comments.