Discovering broken links in sitemaps is a critical task for website owners and SEOs. Sitemaps serve as one of the roadmaps for search bots, informing them about the pages on your website and any recent content updates. When properly optimized, sitemaps can significantly improve your website’s visibility in search engines. However, having broken pages, non-indexable pages, or pages blocked by robots.txt in your sitemaps can be detrimental, adversely affecting your website’s crawl budget. We encountered such a scenario while assisting a large website with technical analysis using JetOctopus. Upon analyzing the sitemaps, we discovered numerous broken and non-indexable pages, causing the search bot to waste resources crawling them. Once we rectified the broken pages in the XML sitemaps, the website’s crawl budget improved dramatically. Let’s delve further into this issue of identifying broken links in sitemaps.
To identify broken links in sitemaps, initiate a sitemap crawl. Simply click the “New crawl” button and select the “Sitemap only” crawl mode. Enter your website’s homepage URL in the designated field, and provide the links to the sitemaps you wish to examine in the “Sitemaps” field.
Configure any other necessary crawl settings specific to your website, and commence the crawl.
If you want to gain additional insights into the URLs within the sitemaps, such as their internal linking, relationship with other pages on your website, and whether they are orphans, you can conduct a full crawl by enabling the “Process sitemap” checkbox. And, in the Advanced Settings, enter the required sitemaps in the “Sitemaps” field.
Once the crawl is complete, navigate to the crawl results and access the “Sitemaps” dashboard. Here, you will find a dedicated data table displaying non-200 status codes in the sitemaps.
Thoroughly analyze the listed pages in detail. If any pages return a 5xx status code, consider conducting a recrawl, as 5xx response codes are typically temporary.
Now, what should you do with broken (non-200 status code) links in XML sitemaps?
If you come across 404, 301, 302, or other non-200 pages in your sitemaps, it is imperative to remove them promptly. If your sitemaps are generated automatically or through a plugin, reach out to the developers for assistance. Alternatively, if you manually upload sitemaps, you can generate a new one using JetOctopus, replacing the sitemap containing the broken links.
Identifying and resolving broken pages within your XML sitemaps is crucial for optimizing your website’s crawlability and ensuring an efficient use of your crawl budget. By utilizing the sitemap analysis capabilities of JetOctopus, you can streamline this process and maintain a healthy website structure. Keep a vigilant eye on your sitemaps, and regularly perform the necessary checks and updates to enhance your website’s search engine visibility.