Johannes Beus
As we saw yesterday, nearly every domain has a certain amount of pages in the supplemental index. The goal should be to keep this amount as low as possible. If we go through the collection of Wikipedia pages that are in the second index we can see that Google is already pretty good at identifying pages whose added value is low: mostly templates, articles with few or otherwise already existing content or user pages fall into it. This already brings us to the points which can be responsible for a website being in the supplemental index:
Duplicate Content
Duplicate content is a subject matter that has been relevant for years and will also keep relevant for years to come. Google is also and especially on behalf of the customer/searcher not interested in having a search where the first ranks of pages will all offer the exact same content. You can be sure that the third Amazon-partnershop or the fourth Wikipedia-clone will get boring rather quickly and the searcher will, at the very latest, go and search somewhere else, when they get to a second SERPs-page which is full of copies. This is the reason why Google is trying to identify such content. In this case it is not confined to whole websites but even single text-excerpts can cause trouble. Even though the so-called external duplicate content, which is content that is already present on other domains, is more problematic, you can also get a problem with internal duplicate content if it exceeds a certain amount.
You can find duplicate content especially often in the title- as well as the meta-description-tag, both of which are a serious problem. Many CMS-, blog- or shopping-systems will often fill them with pages worth of similar or resembling data. Since both of these tags have a relevance to the duplicate content filters that is not to be taken lightly, this can quickly lead to problems.
Bad internal linking
The internal linking and linkstructure is one of the topics that are neglected the most but which are, at the same time, one of the most complex as far as OnPage-optimization is concerned. The goal should be for all pages to be easily accessible through short paths and that the incoming, internal links mirror the value of the page within the project. Even though this might sound easy at first, this will become quite complicated with an increasing number of pages and is usually one reason for an unfavorable ratio between pages in the first and second index.
Little to no content
Sites that consist of only a few words of text and besides that just the usual HTML-framework of navigation and footer are often affected. It makes sense, seeing that with so little content, the chance of it being exactly that what is being searched for is rather small. Usually these are pages that were not created to rank well with searchengines: templates, contact information, graphics-heavy pages or profile-pages for forum members.
URL-structure
Google loves static HTML-pages. The content is not changed too often, they are usually more valuable than dynamic pages, there are proper HTTP-headers that the Googlebot can interpret and the probability that Google gets caught in badly programmed, dynamic databases is not present. Just as Google write themselves, the number of parameters of a dynamic site influence if a site will show up in the first or second index.
Incoming links
Like so many other times in the SEO-sector, the quality and quantity of incoming links offers the possibility of shifting the individual borders of your project a bit. The saying the large sites can get away with a lot is no accident and everyone who thinks that they will be able to launch their full project catalog, filled with a couple of hundreds of thousands of pages, into the index with only three to four backlinks from an article directory, should not be surprised if a majority of the pages ends up in the supplemental index if they make it into the index at all.
Tomorrow we will finally get straight down to the good stuff possible strategies to escape from the Google Hell or at least a noticeable reduction in pages that are holed up there.
Part I:
Supplemental index second class websites?
Part II: Supplemental index how did I get here?
Part III:
Supplemental index how can I escape the Google Hell?