Duplicate Content means that content is accessible through multiple URLs. This so-called Duplicate Content should be categorically avoided. Each piece of content on a website must only be accessible through one single URL. Otherwise, Google is put on the spot and has to decide which URL to display in the rankings and which positive ranking signals to assign to which URL.
What is Duplicate Content?
Duplicate Content, often truncated to “DC”, is the existence of identical content on one or more websites. A distinction is made between Internal Duplicate Content and External Duplicate Content.
Internal Duplicate Content can be created simply by having the same content accessible through multiple URLs on the same website.
External Duplicate Content may occur when a website is available in multiple language versions, but appears with more than one language version in the search results for a specific search market (for example in Germany on google.de).
- Please check out: Is it possible to identify Duplicate Content through the Visibility Index history?
Internal and External Duplicate Content
Duplicate Content may either be internal or external. Internal Duplicate Content is limited to you own domain / hostname, whereas External Duplicate Content is domain-overlapping, where it shows up on two or more domains.
Example of Internal Duplicate Content
Oftentimes, online shops have to deal with Duplicate Content. Here is a very common case that the product detail pages are also accessible without having the corresponding category or product page in the address:
It is not uncommon that those pages are then indexed by searchengines, when both URLs are also linked to externally, for example. Another reason can be an inconsistent internal link strategy.
Example of External Duplicate Content
Many websites can be accessed by multiple domain names. There is nothing wrong with that, as long as all other domain version redirect to the corresponding main domain using a 301-redirect.
If this is not the case, Google is confronted with different domains that all have the same content. This makes it hard for Google-Bot to determine the relevancy of each individual page, which can lead to ranking problems for the website.
Evidently, the domain radio-sws.de is the desired main domain. The content of the website radio-sws.de can be found in identical form on three other domains. That is how Duplicate Content is created due to multiple domain names for one website. Google can not always be sure which of the four domains is the most relevant for the subject and therefore ranks them in turn.
What Duplicate Content is not
If a piece of content is available in multiple language versions, for example in German and in English, then this is not defined as Duplicate Content. Citations or cited paragraphs are also not identified as Duplicate Content.
If you cite from other pieces of content, please keep in mind to use the correct semantic markup within the sourcecode:
<blockquote>The cited text belongs here - <cite>The name of the cited author or the source belong here</cite></blockquote>
Why is Duplicate Content a problem?
From Google’s point of view, Duplicate Content can be compared to attempted fraud, while also impeding Google in their quest to find the best possible results for the user.
Google tries hard to index and show pages with distinct information. […] However, in some cases, content is deliberately duplicated across domains in an attempt to manipulate search engine rankings or win more traffic. Deceptive practices like this can result in a poor user experience, when a visitor sees substantially the same content repeated within a set of search results.– Google Search Console Help– Google Search Console Help
Due to this, the existence of Duplicate Content may cause Google to penalise the domain and should be taken very seriously by webmasters.
While Duplicate Content does not always have to end in, and will surely not immediately lead to, a punishment from Google, a continuing DC-problem can cause lasting damage to a website. This is because the DC may lead to a penalty after a certain amount of time or to problems with the indexing of the website. Furthermore, Duplicate Content can, amongst other things, be responsible for fluctuations in the SERP (Search Engine Result Pages) rankings, as it is unclear to Google which page offers the most relevant content for the current search request. This will cause Google to switch the destination-URLs around as they try to find the best fit.
Google attempts to solve Duplicate Content problems on their own
With an existing DC-problem, Google tries to identify which content is the most relevant for the actual user’s search request and return this result in the SERPs. During the content indexation phase, Google will also try to identify the best possible version (URL) of the content and index only this one – if possible.
If the rankings and traffic for a website are consistent even though the website does have a DC-problem, and may even show fluctuations in the amount of indexed pages, then the Duplicate Content problem does not need to be the first thing on your mind, at the moment.
Checking your own website for Duplicate Content
The SISTRIX Optimizer offers an automated on-page analysis of your websites and will show you all SEO-relevant errors. Each type of error comes complete with its own explanation and concrete guidelines for improvements, which helps to introduce you to the Onpage-Optimisation for your website. Duplicate Content-errors (Duplicate Content Found) will be shown in detail for each URL.
Video – Google Q&A on duplicate content
Additional information on this subject:
- Duplicate content – Google Search Console Help