Less Is More – How Too Many Indexed Pages Can Damage Your Domain

· 16. November 2016 · 2 Comments
Juan Gonzalez
Juan Gonzalez
I studied Regional Studies of Latin American at the University of Cologne - Germany, majoring in "Business Informatics“. I also studied Business Administration and currently I’m doing a Master in International Business Administration. I feel a fascination with SEO and the people who make it possible.

An ancient Indian Minister, Sissa ibn Dahir (Sessa), invented the board game of Chess in order to direct the attention of his ruler, Shihram, to the problems in his country. The ruler expressed his enthusiasm for the game and Sessa was allowed to decided how he wanted to be compensated.

Sessa wanted just rice, but in the following distribution: 1 grain of rice on the first square on the board, 2 grains on the second square, 4 grains on the third, 8 on the fourth, and so on until the square number 64. The ruler laughed it off as a small prize for a brilliant invention.

What the ruler did not consider was that in the end this would add up to more rice than exists world-wide (even today): 18,446,744,073,709,551,615.

This exercise can be used to demonstrate how quickly exponential sequences grow. Something very similar happens if you use filter parameters within the URLs on your domains.

If you have 1 product in 10 different colours, in 10 different sizes and with 10 different prices, you can suddenly have 1,000 new URLs, for the same product, with no additional value. If you allow Google to index URLs with low quality content, they will negatively affect your rankings for the entire domain.

Let’s look at an example.

Showing Google More Irrelevant Pages Decreases a Domain’s Importance

Comparison between Screwfix.com, Homebase.co.uk and Diy.com (B&Q)

As you can see, the website for home improvement, Homebase.co.uk, has a visibility score of 102 points and it is higher than its competitors, Screwfix.com (97 points) and Diy.com (76 points). The interesting part is that Google only has 189,000 ULRs from Homebase.co.uk within their index.

Screwfix.com has 2.5 times more ULRs (478,000) indexed and Diy.com has the incredible amount of 36 times as many URLs within Google’s index (6,770,000).

If we compare the number of keywords that those domains are generating with they own content, we can see that Diy.com needs 6,7 million URLs to rank for 340,986 keywords, and only 79,584 of those keywords manage to reach the top 10 of the search results.

Homebase.co.uk has 36 times fewer ULRs than Diy.com but ranks for more keywords (342,071) and has 103,735 of those keywords within the top 10 of the search results.

Homebase.co.uk not only has fewer URLs and more keywords than Diy.com, those also rank much better, as well.

How URLs on Diy.com get indexed

I’m actually using these domains just because a tool like a “drill“ is, after all, just a “drill“. That makes the following examples easier to understand. On fashion websites, where filter problems are also very common, you might see a normal t-shirt as either a golf shirt, a polo shirt, a v-neck, a blouse, and much more. Let’s keep with drills.

So, I found 26 “Combi Drills“ on Diy.com. You can filter the this 26 products by Availability, Price, Rating, Brand, Voltage, Batteries, Watts, or even by Weight (if you really want to, for example).

Filtering these 26 “Combi Drills“ by “18V“, I get 22 products on Diy.com, but on Google, they have 232 indexed URLs that use this filter:

Filter “Drills filtered by "Voltage" on Diy.com
Drills filtered by “Voltage” on Diy.com

If 232 index URLs for 22 products are not enough for you, just take a look at the number of indexed URLs for “Weight”. You will find 1,220 URLs:

The following search result nails the main problem for Diy.com on the head. This ULR contains the filters price, voltage, weight and cordless:

Example for a search result using more than one filter
Example for a search result using more than one filter

One of the biggest challenges for big websites is getting the most current and relevant content indexed by Google. Google has an individual limit when it comes to crawling and indexing pages on a domain: How many URLs should be crawled a day and how many of them deserve a place on Google’s search results? This is the reason why it is so important to use your resources as intelligently and productively as possible.

Please also keep in mind that the more fitting a URL is for the search request, the more Google will trust this source. Many different URLs for one and the same product, without any new value, make life really hard for Google, as it can be hard for them to decide on which URL should show up in the rankings and which one of them is the most relevant. This will cause Google to lose trust in the website, which will then cause the rankings to go down.

On the other hand, having an increase in indexed pages can be viewed in a positive light, if this increase is accompanied by an increase in the number of keywords and good rankings.

Conclusion

18,446,744,073,709,551,615 grains of rice are a lot. Quite a lot.

Enough to feed 100 tons of rice to every single human on Planet Earth. That’s 1 kg of rice per day per human for 275 long years. And economically speaking, more than a millennium worth of global rice production (Source: http://www.dedoimedo.com/life/rice.html).

Related posts

Cloaking: Spotify.com violates the Google Webmaster Guidelines
The domain Spotify.com has massively lost visibility worldwide on Google. The country with the largest loss is France (-82,1%)...
Juan Gonzalez
16. November 2016
Google Penguin 4.0 – The Winners
About two weeks ago Google officially announced the rollout of Penguin 4.0 and that it could now be found in the Google results....
Juan Gonzalez
16. November 2016
Mobile First – Google makes Mobile Index the Main Index
Last week, Google announced at the Pubcon in Las Vegas that the Mobile Index will take the place of the Desktop Index as...
Juan Gonzalez
16. November 2016

Comments

Jon   
18. November 2016, 18:47

Interesting post.

No doubt having irrelevant pages indexed isn’t a good idea, but not a big fan of proving looking at certain factors in isolation and ignoring others – they don’t exist in isolation in the real world.

Homebase may have a better link profile or a hundred other thing.

Not that I disagree with the point you’re making, i just don’t think the elements you showed in your table are necessarily the full picture.

Jon

Juan Gonzalez   
23. November 2016, 10:06

Hi Jon,

thank you very much for visiting us!

You are right, it very seldom comes down to one isolated factor. The problem here is that I want to make as many people as possible aware of how problematic it can be to ignore indexing issues, without overburdening the article by going into detail about every aspect of these three domains. You can be sure that I did not ignore the overall picture while analyzing the domains. I would like to give you a clearer picture: https://www.sistrix.com/wp-content/uploads/2016/11/Bildschirmfoto-2016-11-22-um-11.08.00.png

As you can see, Homebase.co.uk actually has a less pronounced link profile than Diy.com, in every metric (for the link comparison we use our own data and the data of our partner Majestic). And still, Diy.com has a lesser Visibility on Google.

Have a nice and successful day!

Juan

Comments will be closed 30 days after the post was published.