July 4, 2023
by Sofia

How to identify and fix broken pages in XML sitemaps

Discovering broken links in sitemaps is a critical task for website owners and SEOs. Sitemaps serve as one of the roadmaps for search bots, informing them about the pages on your website and any recent content updates. When properly optimized, sitemaps can significantly improve your website’s visibility in search engines. However, having broken pages, non-indexable pages, or pages blocked by robots.txt in your sitemaps can be detrimental, adversely affecting your website’s crawl budget. We encountered such a scenario while assisting a large website with technical analysis using JetOctopus. Upon analyzing the sitemaps, we discovered numerous broken and non-indexable pages, causing the search bot to waste resources crawling them. Once we rectified the broken pages in the XML sitemaps, the website’s crawl budget improved dramatically. Let’s delve further into this issue of identifying broken links in sitemaps.

How to identify broken links in your sitemaps?

To identify broken links in sitemaps, initiate a sitemap crawl. Simply click the “New crawl” button and select the “Sitemap only” crawl mode. Enter your website’s homepage URL in the designated field, and provide the links to the sitemaps you wish to examine in the “Sitemaps” field.

How to identify and fix broken pages in XML sitemaps - JetOctopus SEO Crawler - 1

Configure any other necessary crawl settings specific to your website, and commence the crawl.

If you want to gain additional insights into the URLs within the sitemaps, such as their internal linking, relationship with other pages on your website, and whether they are orphans, you can conduct a full crawl by enabling the “Process sitemap” checkbox. And, in the Advanced Settings, enter the required sitemaps in the “Sitemaps” field.

Once the crawl is complete, navigate to the crawl results and access the “Sitemaps” dashboard. Here, you will find a dedicated data table displaying non-200 status codes in the sitemaps.

How to identify and fix broken pages in XML sitemaps - JetOctopus SEO Crawler - 2

Thoroughly analyze the listed pages in detail. If any pages return a 5xx status code, consider conducting a recrawl, as 5xx response codes are typically temporary.

How to fix broken URLs in the sitemap?

Now, what should you do with broken (non-200 status code) links in XML sitemaps?

If you come across 404, 301, 302, or other non-200 pages in your sitemaps, it is imperative to remove them promptly. If your sitemaps are generated automatically or through a plugin, reach out to the developers for assistance. Alternatively, if you manually upload sitemaps, you can generate a new one using JetOctopus, replacing the sitemap containing the broken links.

Conclusion 

Identifying and resolving broken pages within your XML sitemaps is crucial for optimizing your website’s crawlability and ensuring an efficient use of your crawl budget. By utilizing the sitemap analysis capabilities of JetOctopus, you can streamline this process and maintain a healthy website structure. Keep a vigilant eye on your sitemaps, and regularly perform the necessary checks and updates to enhance your website’s search engine visibility.

About Sofia
Technical SEO specialist. Sofia has almost 10 years of experience, of which the last 5 years in JavaScript SEO. She is convinced that SEO is a very technical part of digital marketing. And without logs and in-depth data analysis, you can't do effective SEO.

Search

Categories

Get exclusive tech SEO insights
We are tech SEO geeks who believe that SEO is predictable and numeric. Don’t miss our insigths!