July 10, 2023
by Sofia

How to find canonicalized URLs in XML sitemaps and why it matters

Sitemaps serve as an excellent method to communicate to search engines like Google and Bing about the importance of specific pages on your website. Additionally, they effectively convey updates to content on these pages and notify search engines of the creation of new pages. 

The significance of sitemap analysis cannot be overstated, especially in relation to detecting canonicalized URLs. It’s crucial to address this issue since if your sitemaps contain links to non-indexable or broken pages, it could negatively impact your website’s crawl rate. This is because search bots may allocate their crawl budget to non-indexable and canonicalized pages instead of those that are indexable. Canonicalized pages, identifiable by the presence of a rel=canonical link to another page, signal to search engines that the page is a duplicate and shouldn’t be indexed.

How to detect canonicalized URLs in sitemaps 

Initiate a sitemap crawl: click the “New Crawl” button and choose the “Sitemaps Only” crawl mode. In the URL field, enter your website’s homepage link. Next, specify sitemaps. In the “Sitemaps” field, enter the URLs of the sitemaps containing the canonicalized pages you want to identify.

How to find canonicalized URLs in XML sitemaps and why it matters 1

Next, configure the necessary crawl settings and start the crawl. 

Wait for the crawl to complete. You can see for yourself that the JetOctopus SEO scanner is very fast. Therefore, you may not even have time to drink coffee, as the crawl is already over. 

So, after the crawl is complete, you need to go to the results of this crawl and select “Sitemap URLs”.

How to find canonicalized URLs in XML sitemaps and why it matters 1

Afterward, add the “Page crawl” dataset, click the “+Add filter” button, and opt for the “Is Canonical Page” – “No” filter. 

How to find canonicalized URLs in XML sitemaps and why it matters 3

Alternatively, use the “Indexation” report, as this crawl solely features pages from sitemaps. Within the “Indexation” report, select “Canonicals” and click on the “Non-canonical pages” chart segment, than add filter “Found in Sitemap” – “Yes”, to access a data table listing all canonicalized pages found in the sitemaps.

How to find canonicalized URLs in XML sitemaps and why it matters 4

This list will encompass all pages detected in the sitemap that contain canonical links pointing to other URLs. 

The “Sitemap URLs” datatable is best used to detect canonicalized links if you ran a full website crawl with sitemaps.

What if your sitemaps contain canonicalized links?

First, analyze the received URLs and check whether these URLs contain canonicals pointing to other pages without error. If the pages need to be indexed and the canonical pointing to another page is wrong, fix it.

Otherwise, the canonicalized URLs in the sitemaps need to be replaced. If sitemaps are generated by developers, or if you use a sitemap plugin, you should contact the developers about which URLs to remove from your sitemaps.

If you generate sitemaps yourself and upload them to your website, you must generate a new sitemap that does not contain canonicalized URLs. In addition to canonicalized URLs, broken pages that return a status code other than 200, pages blocked from indexing and pages blocked from being scanned by the robots.txt file should be removed from the sitemaps.

About Sofia
Technical SEO specialist. Sofia has almost 10 years of experience, of which the last 5 years in JavaScript SEO. She is convinced that SEO is a very technical part of digital marketing. And without logs and in-depth data analysis, you can't do effective SEO.

Search

Categories

Get exclusive tech SEO insights
We are tech SEO geeks who believe that SEO is predictable and numeric. Don’t miss our insigths!