Register / Login

Searchengines come up with a solution for non-existent Duplicate-Content-Problem

There is no Duplicate-Content-Problem – at least not if we are to believe the tenor of a postings in the Google Webmasterblog in September of last year. As far as interpretations in this matter go, Google has taken up the flexibility of the nose of an aardvark and, together with Yahoo and Microsoft, introduced a solution for the (non-existent) Duplicate-Content-Problem. The trio imagines that through the use of a new clue in every webpages' sourcecode, in which the websiteoperators are to specify the “correct” URL, they will help to avoid Duplicate-Content, which is generated when the same webpage can be reached through more than one URL.

The problem here is that pages which can be reached through more that one URL usually only come into existence if the site is dynamically generated. In the blogposting, Google gives a great example for this: both the URL's example.com/shop.php?item=seo and example.com/shop.php?item=seo&category=spam will show the exact same article with the exact same content. Besides unclean programming on the site, another cause for this that often shows up is that, over different software-cycles, URLs get changed but are still supposed to be downward compatible.

This announcement has caused some euphoria in the SEO-sector, Seomoz was so thrilled he called this the biggest development since the introduction of XML-sitemaps. As it might have already been noticeable by the tone of the last few sentences as well as the headline, I look at this a little more differentiated and would like to explain this below:

There already is a solution
For ages, the possibility of redirecting wrong URLs to the correct version has existed through the use of 301-redirects. Searchengines follow these instructions and take the destination-page into the index. The actual problem can be found somewhere else: most web-applications do not know their correct URL, therefore they can not distinguish between the queried and the correct URL and, in the case of a discrepancy, use a 301-redirect. Seeing that the newly introduced tag also requires the website to know its “correct” URL, I do not think this to be an improvement.

New links are scattered
Pages are being linked with the URL that is used to open them in a browser. This should not be much of a surprise, however, when following the proposal, this will have a negative impact. Lets start out with an example of a solution using a 301-redirect:

301

The websurfer gets redirected to the correct URL, can view it in the addressbar of the browser and will link or quote that URL. This redirection is missing though if the “Canonical”-tag is set:

301

New links to this one page are not pooled on that page but spread over the different versions, all of which are online under different URLs. In the end you are left to trust that the searchengines will transfer the linkjuice as completely as possible to your mainpage.

The proposal by Google, Yahoo and Microsoft is alleviating symptoms rather than eliminating their cause. New and proprietary HTML-tags are introduced even though there are already established ways of solving this problem.
Johannes Beus - on Sat (02/14/2009) at 11:00 AM

Comments closedComments closed
This posting is older than 30 days and therefore closed for new comments.