December 21, 2022
by Sofia

How to check for duplicate content using JetOctopus

The problem of duplicate content is one of the reasons for the low visibility of the site in the SERP. When a website has many pages with non-unique content, search engines cannot determine which page to index and rank. As a result, Googlebot can independently select which page to show in the search results, or it will not show any of the pages.

What is duplicate content?

Duplicate content is the same information using the same words, tags and other HTML elements, located at different URL addresses. When JetOctopus analyzes the pages of your website, the pages with duplicate content will be those that have all the same text elements inside the <body>.

It can be both the same texts on the pages and completely identical listings with the same headings, product names, etc. And all types of such duplicate content can be found using JetOctopus.

Why you need to monitor duplicate content

There are several reasons why you should ensure that your website doesn’t have much duplicate content.

First, it will be difficult for search engines to determine the relevant page, so Googlebot may index a completely different URL than you expected.

Secondly, link equity will be distributed between the few pages, so internal and external linking will not have such a good result or there will be no result at all.

Third, pages with duplicate content are less visible in search results. What is visibility and how it works, read in the article What is ranking in GSС reports and how to analyze this metric. That is, in total, all versions will rank less often in Google. As a result, you will get less organic traffic.

How to find duplicate URLs with JetOctopus

1. Start crawling your website. To do this, log in, select the desired project and click the “New crawl” button. You can also select any other completed crawl to analyze.

How to check for duplicate content using JetOctopus - 1

Pay attention to the crawl settings: choose whether you want to crawl only indexable pages or everything on your website. You can also use the include/exclude option

More information: How to configure a crawl.

2. Wait for the crawl to complete. In the meantime, you can read the article on how to check for duplicate titles and meta descriptions.

3. Go to crawl results – “Duplication” dashboard. Select the “Content” report.

How to check for duplicate content using JetOctopus - 2

4. On the “Content duplication overview” chart, you will find a comparison of the content of all scanned pages, including the number of pages with duplicate content.

How to check for duplicate content using JetOctopus - 3

All sections of the chart are clickable. Therefore, you can go to the data table with detailed results. Actually, we recommend analyzing each case separately. Below we will tell you why.

5. On the “Duplication by indexability” diagram, you will see the ratio of indexable and non-indexable pages with duplicate content.

How to check for duplicate content using JetOctopus - 4

By the way, if you need information only about duplicate content on indexable pages, use the built-in indexable segment.

How to check for duplicate content using JetOctopus - 5

You can also create your own URL segments and analyze duplicate content in each individual segment. 

More information: How to use segments.

If you need to analyze similar content, click on the appropriate segment of the “Content duplication overview” diagram and set the desired percentage of similarity in the filters.

What to pay attention to when analyzing pages with duplicate content

All duplicate content on your website should be viewed in context, as some duplicate content on the site is the norm. However, everything depends on the number of duplicated pages, scale and other technical characteristics.

When analyzing duplicate content, pay attention to the following points:

  • are the duplicated pages indexable: if the pages with duplicate content are non indexable and only one of the pages is indexable, it will not be a problem for search engines to choose the page that should be displayed in the SERP;
  • are the pages blocked in the robots.txt file – if so, search engines will not have access to this content;
  • whether they have correct canonicals – if pages with duplicate content have correct canonicals, this will allow search engines to properly consolidate the content and rank the right page;
  • sometimes similar content can be on different local versions of pages, so check for hreflangs in the page code.

About Sofia
Technical SEO specialist. Sofia has almost 10 years of experience, of which the last 5 years in JavaScript SEO. She is convinced that SEO is a very technical part of digital marketing. And without logs and in-depth data analysis, you can't do effective SEO.



Get exclusive tech SEO insights
We are tech SEO geeks who believe that SEO is predictable and numeric. Don’t miss our insigths!