What is Duplicate Content
Duplicate content is content found in more than one place on the internet that is the same or “appreciably similar”, as Google puts it. Duplicate content is a problem for search engines because it makes it difficult for them to decide the most relevant version for the particular search.
Although not technically a penalty, duplicate content issues will negatively impact your search engine rankings. This is especially true when dealing with large amounts of duplicate pages.
Let’s explore why duplicate content can occur, some best practices, and how to fix the most common duplicate content issues.
How Does Duplicate Content Happen
A report from AHREFS shows duplicate content issues:
The most common ways duplicate content gets created are:
URL variations can be introduced by analytics and other click tracking code, causing duplicate content problems.
If your site assigns session IDs to each visitor or you happen to have printer-friendly versions of some content, make sure you do this through scripts in order to avoid duplicate content issues.
HTTP vs. HTTPS or WWW vs. non-WWW pages
When you have various live versions of your website, it will definitely cause duplicate content problems.
Scraped or Copied Content
Product information and descriptions, product pages, are often one of the main sources of duplicate content for eCommerce sites. But, scrapers that republish your content are quite common as another source of duplicate content.
How Does Duplicate Content Affect SEO
Every SEO guide for beginners or experts will tell you that search engines do not like duplicate content. Users also don’t enjoy duplicate content that much. That should be enough for you to go ahead and fix your duplicate content issues.
There are three main issues you could potentially face if your website has too much duplicate content.
1. Decrease in Organic Traffic
Obvious. If Google and other search engines are having problems figuring out your duplicate content issues, your rankings are going to suffer, thus your traffic too.
If Google struggles to rank a few different pages with duplicate content, all of them will struggle to rank.
Although extremely rare, Google has said that duplicate content is grounds for penalties or even complete deindexing of a website. Most of the time when this happens it’s because of content scraping or copying.
3. Fewer Indexed Pages
If your website has lots of duplicate pages, Google might just run out of crawl budget and decide to just not index the pages at all. This issue is most commonly found on eCommerce sites.
Duplicate Content Best Practices
Making sure you take care of your duplicate content issues will definitely help you with your rankings and also improve your domain authority.
Keep an Eye Out for the Same Content on Different URLs
Product pages for eCommerce sites are the most common place where duplicate content gets created. Try setting everything up through scripts in order to have every different version of your product under the same URL.
Check Indexed Pages
Checking your indexed pages is as easy as performing a search on Google. Just type “site:yoursite.com” in Google. Also, you can go to Google Search Console and consult your indexed pages.
The idea is that the number you find through either of these methods should be the same as the number of pages you created manually on your website. If you see exorbitant numbers that don’t make sense, you’ll know those pages are duplicate content.
Make Sure Your Site Redirects Correctly
We already talked about different versions of your site causing duplicate content problems. Making sure you redirect all those different versions to the right one will ensure proper redirections.
Use 301 Redirects
If deleting duplicate pages is not an option, 301 redirects are a really easy way to fix duplicate content issues on your website. The tip is to redirect all duplicate content pages back to the original. This way, search engines will only index the original content, helping the original page to rank better.
Use the Canonical Tag
Using the rel=canonical tag is another easy solution for duplicate content issues. What this tag does is tell Google and other search engines which page is the original among the rest so the duplicates are ignored and only the original is indexed.
Canonical tags can easily be done without help from your developer and are actually preferred by Google over blocking pages with duplicate content.
Keep An Eye Out For Similar Content
Duplicate content is not necessarily identical content. We already discussed Google’s definition of duplicate includes “appreciable similar” content. This is usually not an issue for most websites. But, if you’re serious about ranking and awesome content, plan to write 100% unique content for every one of your pages.
There are various tools out there to help you find duplicate content. These tools will scan (crawl) your website and produce a report with the pages that have duplicate content. Siteliner, SEMrush, AHREFS are a couple of examples of tools you could use.
This is a screenshot of MOZpro showing duplicate content issues:
There might be cases where you have pages with very similar content that can be consolidated into one awesome page with lots of valuable information.
Let’s say you have three different pages dealing with the same topic but from a different angle. For example:
- How Marketing and Sales Compliment each other
- Marketing for Sales Enablement
- Marketing Best Practices to Boost Sales
You could create one page with all the content, for example:
- Ultimate Guide to Marketing and Sales Enablement
After removing the duplicate content and redirecting all URLs to the new super page, it should rank better than the old ones.
Noindex WordPress Tag or Category Pages
WordPress automatically generates tag and category pages. Add a “noindex” tag to these pages so that they can continue to exist and be useful to your users, but don’t get indexed by search engines.
Alternatively, set up your WordPress so these pages are not generated at all.
Meta Robots Noindex
This meta robots tag can be added to your page’s HTML head to exclude it from a search engine’s index. The Meta Noindex, Follow tag allows search engines to crawl the links on a page but doesn’t include them in the indices.
<head>…[other code that might be in your document’s HTML head]…<meta name=”robots” content=”noindex,follow”>…[other code that might be in your document’s HTML head]…</head>
Remember that search engines want to see everything in order to catch errors and other issues in your code, so let them crawl these pages but add the Noindex tag.
Duplicate content is relatively easy to fix and handle, but it can also get out of control if you don’t keep an eye on it. Make sure to follow the guidelines above and:
- Maintain consistency in your internal linking.
- Make sure your syndicated content links back to the original content.
- Add a self-referential rel=canonical link to your existing pages to stop the efforts of some scrapers.
Continue to learn more with Outreach Frog and take your SEO to the next level. Cliche, yes, but true.