Work with Rel=canonical Like a Pro


Mar 12, 2019

Work with Rel=canonical Like a Pro

Canonical tags were created to fix duplicate content issues. In a nutshell, if you have 3 duplicate pages (or approximately similar ones) you should pick just one of them to be shown in Google. Canonicalization is a way to help Googlebot decide what page to show in the SERP. However, rel=canonical tags don’t help the search engines unless you use them properly. This tutorial walks you through how you can use JetOctopus to audit rel=canonical tags quickly and efficiently across a website.

How to find non-canonical pages which are blocked from indexation

Google guidelines tell us that if the page is non-canonical, (rel=canonical tag points to the other page), this page should be open to indexation. It’s a common situation when rel=canonical tag is added to all pages indiscriminately whether these pages are open to indexation or not. This is an easy way how to find those pages with JetOctopus:

  1. If crawler finds the above-mentioned problem, you will see Non-canonical page is not allowed for indexation notification in Indexation/Problems section. Once you click on the number of pages in this section, you get to the DataTable with the list of problematic URLs. You can export data in Excel or CVS format.
  2. Choose filters in DataTable:
  1. Is Canonical Page = No
  2. Status Code = 200
  3. Is Meta tag Indexable = No
  4. How to find self-canonical pages

    Adding rel=canonical with URL that is equal to URL of the current page is a good practice that helps to avoid accidental duplicate pages issue. You should remember that self canonical page shouldn’t be closed with meta=robots noindex. Here is a way how to find self-canonical pages which are blocked by meta:

Choose filters in DataTable:

  1. Status Code = 200
  2. Is Self Canonical = Yes
  3. Is Meta tag Indexable = No

How to find all canonized pages

The canonized page is a page on which rel=canonical tags from other pages are pointed. It’s crucial for a canonized page to have 200 Status Code, be allowed for indexation and not to canonize the other page.

Find all canonized pages in DataTable. Set filter Is Canonized Page = Yes

    Then you can filter these pages to meet your individual requirements.

  1. Status Code=!NotEqual 200 - shows pages with all Status Codes except 200, so there can be different redirects, 404, 500 errors.
  2. Is Canonical Page = No - shows non-canonized pages or in other words canonical chains
  3. Is Meta Tag Indexable = No - shows pages that are blocked for indexation by meta tag.

Above mentioned cases are usually treated as bugs in Technical SEO and must be fixed.

How to find non-canonical pages which canonize blocked for indexation pages

Canonized pages must be always opened for indexation. You can use abovementioned How to find all canonized pages method to find these pages or choose the next filters in DataTable:

  1. Is Canonical Page = No
  2. Canonical Target is Indexable = No

How to find non-canonical pages which canonize pages with not-OK 200 Status Code

Canonized pages must have 200 Status Code response. To see canonized pages with other Status Codes, choose the following filters:

  1. Is Canonical Page = No
  2. Canonical Target Status Code != 200

How to find pages on which a majority of non-canonical pages point (the most canonized pages)

You can filter pages by values in DataTable, for instance:

  1. filter Count of In Canonicals > 100
  2. or
  3. filter Count of In Canonicals = 1
  4. Canonization has something in common with link building: page A can be canonized only with one non-canonical page, but page B - 100k different pages. This is a widespread situation on e-commerce websites.

Also, with the help of Setup Column you can add In Canonicals column

And sort pages by this column:
Where you can see all pages which canonized the selected page.

Further optimization

Wrong rel=canonical implementation can lead to huge SEO issues. We have a client-oriented philosophy, so if you have any questions about technical SEO in general, and the rel=canonical tags in particular, feel free to drop us a line serge@jetoctopus.com

Read more: Why Partial Technical Analysis Harms Your Website.

ABOUT THE AUTHOR

Serge Bezborodov is a CTO of JetOctopus. He is a professional programmer with 9+ years of experience. He has worked with aggregators for 5 years - vacancies, real estate, cars. Also, he has experience in design DB architecture and query optimization. Serge has crawled more than 160 M pages, 40 TB of data with JetOctopus.

  • Please, share!
auto.ria.com
Auto classified, 20m pages crawled
Duplications
What problem was your SEO department working at when you decided to try our crawler?
We needed to detect all possible errors in no time because Google Search Console shows only 1000 results per day.

What problems did you find
That’s quite a broad question. We managed to detect old, unsupported pages and errors related to them. We also found a large number of duplicated pages and pages with 404 response code.

How quickly did you implement the required changes after the crawl?
We are still implementing them because the website is large and there are lots of errors on it. There are currently 4 teams working on the website. In view of this fact we have to assign each particular error to each particular team and draw up individual statements of work.

And what were they?
It’s quite difficult to measure results right now because we constantly work on the website and make changes. But a higher scan frequency by bots would mean the changes are productive. However, around one and a half months ago we enabled to index all the paginated pages and this has already affected our statistics.

Having seen the crawl report, what was the most surprising thing you found? (Were there errors you’d never thought you’d find?)
I was surprised to find so many old, unsupported pages which are out of the website’s structure. There were also a large number of 404 pages. We are really glad we’ve managed to get a breakdown of the website subdirectories. Thus we’ve made a decision which team we will start working with in the beginning.

You have worked with different crawlers. Can you compare JetOctopus with the others and assess it?
Every crawler looks for errors and finds them. The main point is the balance between the scanned pages and the price. JetOctopus is one of the most affordable crawlers.

Would you recommend JetOctopus to your friends?
We’re going to use it within our company from time to time. I would recommend the crawler to my friends if they were SEO optimizers.

Your suggestions for JetOctopus.
To refine the web version ASAP. There are a few things we were missing badly:
Thank you very much for such a detailed analysis. Currently we have been reflecting on a redirects' problem.
ultra.by
Do you wan’t
more SEO traffic?
All sites have technical errors which block your SEO traffic boost
I’ts an axiom
Find out the most epic errors ot your site and start fixing them
The amount of the provided information is quite enough and you can see at once where and what the problems are
auto.ria.com