May 18, 2022
by Sofia

How to check XML Sitemaps with JetOctopus

Sitemaps are one of the most important sources of new URLs for search engines. They scan URLs from sitemaps regularly. And if the URLs meet the requirements of search engines, they will be indexed. However, if sitemaps contain broken links, 4xx, 3xx pages, etc., this will negatively affect your crawling budget. Therefore, it is wise to regularly check the health of sitemaps.

With JetOctopus you can check all sitemaps or just a separate sitemap.

How to submit sitemaps for testing in JetOctopus

There are two ways to check sitemaps in JetOctopus:

  • site scanning with sitemaps as an additional source of URLs;
  • scan URLs only from your sitemaps.

If you need to compare/analyze data from crawl and sitemaps, select the first mode. Among other benefits, it will help find orphan pages.

To check only sitemaps, select the second mode.

Sitemaps as an additional source of URLs during the crawl

Go to “New Crawl” and activate the “Process sitemaps” checkbox in the “Basic settings”. JetOctopus will search for your sitemaps itself, for example, by address https://example.com/sitemap.xml.

How to submit sitemaps for testing in JetOctopus - Step 1 - JetOctopus

If the sitemaps have a specific URL address, you need to add it (in addition to activating the checkbox “Process sitemaps”). To do this, go to “Advanced settings” and enter the absolute URLs of sitemaps in the “Sitemaps” list. Each URL should be in a new line.

How to submit sitemaps for testing in JetOctopus - Step 2 - JetOctopus

When JetOctopus finishes scanning the links on the pages, it will process the sitemaps from the list.

You can add sitemap index files: JetOctopus will process all sitemaps from the index file.

When the crawl is finished, you can select URLs from the sitemaps for analysis only with one click. Go to the desired dataset. In the “Join Dataset” block, select “+Sitemap URLs”.

How to submit sitemaps for testing in JetOctopus - Step 3 - JetOctopus

Next, select which URLs you want to analyze: only found in the sitemap, or not presented in the sitemaps.

How to submit sitemaps for testing in JetOctopus - Step 4 - JetOctopus

Only XML Sitemap audit

To check only URLs from sitemaps, select “Only Sitemap” mode in the “Basic settings”.

How to submit sitemaps for testing in JetOctopus - Step 5 - JetOctopus

Next, go to the “Advanced settings” and specify your own list of absolute URLs of sitemaps in the “Sitemaps” field. You can also add sitemap index files to this list. JetOctopus will process all sitemaps from the index files.

How to check XML Sitemaps with JetOctopus - add sitemap

What to look for when auditing sitemaps

JetOctopus has a separate dashboard for quick analysis of sitemaps.

Go to the menu “Crawler” – “Sitemaps”. Here you will find general information about the sitemaps.

How to check XML Sitemaps with JetOctopus - sitemap report
  1. Unique URLs in sitemaps – the number of unique links in sitemaps. If you know that you have more URLs in the sitemaps than shown here, make sure the URLs in your sitemaps are not duplicated.
  2. Orphan pages – pages that are found only in sitemaps. JetOctopus crawler did not find them in the code of your website. We recommend using internal linking for these pages (internal linking is important for pages to be indexed). In addition, users cannot follow this page on your website. Users can find it only in the SERP, which reduces the chances of conversion.
  3. Sitemap files – the number of files processed by JetOctopus.
  4. Avg URLs in file – the average number of links in one sitemap. If the average number is more than 50,000, search engines will not be able to process sitemaps. 

You can also explore “Sitemaps problems by depth” and “URL Distribution by depth”. We recommend paying attention to the “URL Distribution” chart. Ideally, the percentage of URLs found in web crawl and site maps is 100%.

How to check XML Sitemaps with JetOctopus - URL distibution report

A list of orphan and non-200 pages are available for analysis. Sitemaps should only contain absolute URLs with 200 response codes. Everything else needs to be fixed.

How to check XML Sitemaps with JetOctopus - orphan URLs

How to check sitemaps structure

To check that all sitemap files are working properly, go to the data table and select “Sitemap files”. In this report, you will find a list of sitemaps processed by JetOctopus. Analyze the following metrics for each sitemap:

Number of URLs – can not be more than 50,000;

Filesize – no more than 50MB (uncompressed);

Status code – if the sitemap URL is non-200, it is not available for scanning by search engines.

How to check XML Sitemaps with JetOctopus - Sitemap Files report

Configure additional columns to see information about the “Error Message” (the reason why a sitemap was unavailable), “Date lastmod” (lastmod attribute) and “Date Crawled” (when JetOctopus crawls your sitemap).

How to check XML Sitemaps with JetOctopus - Sitemap columns

All URLs that we found in sitemaps are in a separate data table “Sitemap URLs”.

How to check XML Sitemaps with JetOctopus - Sitemap URLs

About Sofia
Technical SEO specialist. Sofia has almost 10 years of experience, of which the last 5 years in JavaScript SEO. She is convinced that SEO is a very technical part of digital marketing. And without logs and in-depth data analysis, you can't do effective SEO.

Search

Categories

Get exclusive tech SEO insights
We are tech SEO geeks who believe that SEO is predictable and numeric. Don’t miss our insigths!