Pagination as a danger for causing internal duplicate content

Johannes Beus
Blog- and content-management-systems like Wordpress or Ruby on Rails usually distribute content that does not fit on one page over a few pages and then offer a navigation to the page-numbers at the end of the page. This behavior, which is useful and neat for the user, can pose a problem for searchengines. If, for example, I were to use a Wordpress-blog, then you could find this text through the following pages.


As we can see, the original content is being published on a large number of additional pages. If we take a look at how many Wordpress-blogs have the specific “/page/2” in the Googleindex we quickly get an insight into the scope of this problem. To make things worse, we have the fact that – as Gerald put it so nicely – the Googlebot has, at no time, a current overview of the whole website. An article that was on the front page just now, has the chance to show up on the second page shortly thereafter and this will just confuse the searchenginecrawler even more.

By now it is so that searchengines, with Google leading the way, can handle this problem rather well and they detect the URL of the original content with a decent reliability. Since searchengineoptimization – the name should give it away – is not about getting by somehow but about finding a rather optimized solution, any websiteoperator who is confronted with this problem should try and find a solution. In this blog, for example, we solved this problem by having only the 10 most recent articles up on the blogs startpage, all the other content can only be found at one URL each. The pages for the archive or the tags only link to the headlines of the URL.
Johannes Beus - on Mon (06/18/2007) at 17:38 PM

Add Comment

more
This posting is older than 30 days and therefore closed for new comments.