Sitemap as a True Damager

Nov 12, 2018
  • Please, share!

Sitemap as a True Damager

How to find and fix problems in the website's structure and optimize a sitemap both for Googlebot and for users

About the e-commerce website:

  1. E-commerce website with 22+ years of experience
  2. 1 mln pages
  3. 1,3 mln monthly visits

The main challenge:

To find and fix problems in the website's structure and optimize a sitemap both for Googlebot and for users.

What was done?

  1. The website was crawled with JetOctopus to find all technical bugs in the website’s structure.
  2. Log lines analyzer was used to understand how Googlebot crawls the sitemap.

What problems were detected?

  1. 1 mln pages which aren't in the website's structure, but are regularly visited by Googlebot. Crawling budget is wasted on unknown content.
  2. There are 320K pages which aren’t indexed by Googlebot, but these pages are valuable for the website. Only 180K pages are effectively crawled by Googlebot.
  3. There are around 320K links to not in-stock products in the sitemap.
  4. Incorrect logic of adding new products (new products used to be indexed within 2 weeks).

What recommendations we gave:

  1. Сhange the logic of sitemap generation. For e-commerce websites it’s crucial to show new products in search results as soon as they are updated on the website. Googlebot regularly analyses website’s sitemap to index fresh content. That’s why it’s needed to generate the actual sitemap constantly (we recommend to doing it every week).
  2. Resubmit new sitemaps through Google Search Console. Decide which pages on the website should be crawled by Google, and determine the canonical version of each page. You can create your sitemap manually or choose from a number of third-party tools to generate your sitemap for you.
  3. Get rid of non in-stock products in the sitemap.

    Software engineer Matt Cutts said that e-commerce sites with hundreds of thousands of pages should set the date the page will expire using the Unavailable_after META tag. This way, when the product is added, you can immediately set when that product page will expire based on an auction date or a go-stale date.

  4. Add a new block with the most popular and prior products in each product category. Proper internal linking makes it easy for Googlebot to crawl your webpages.

JetOctopus team recommends reviewing GSC guide, explaining how to submit a new sitemap for crawling. Also, there is useful information on how to solve common problems with sitemaps.


  1. For e-commerce sites the correct sitemap generation is one of the most crucial things (new products to be added immediately, not-in-stock products to be deleted from actual sitemap). It is a true opportunity to increase sales.
  2. Accurate work with canonical and non-canonical tags is your silver bullet.
  3. Interlinking structure is your rocket tools. Don’t underestimate it.

Get more useful info:2 Different Realities: Your Site Structure & How Google Perceives It

About the author

Serge Bezborodov is a CTO of JetOctopus. He is a professional programmer with 9+ years of experience. He has worked with aggregators for 5 years - vacancies, real estate, cars. Also, he has experience in design DB architecture and query optimization. Serge has crawled more than 160 mln pages, 40 TB of data with JetOctopus.

  • Please, share!
Auto classified, 20m pages crawled
What problem was your SEO department working at when you decided to try our crawler?
We needed to detect all possible errors in no time because Google Search Console shows only 1000 results per day.

What problems did you find
That’s quite a broad question. We managed to detect old, unsupported pages and errors related to them. We also found a large number of duplicated pages and pages with 404 response code.

How quickly did you implement the required changes after the crawl?
We are still implementing them because the website is large and there are lots of errors on it. There are currently 4 teams working on the website. In view of this fact we have to assign each particular error to each particular team and draw up individual statements of work.

And what were they?
It’s quite difficult to measure results right now because we constantly work on the website and make changes. But a higher scan frequency by bots would mean the changes are productive. However, around one and a half months ago we enabled to index all the paginated pages and this has already affected our statistics.

Having seen the crawl report, what was the most surprising thing you found? (Were there errors you’d never thought you’d find?)
I was surprised to find so many old, unsupported pages which are out of the website’s structure. There were also a large number of 404 pages. We are really glad we’ve managed to get a breakdown of the website subdirectories. Thus we’ve made a decision which team we will start working with in the beginning.

You have worked with different crawlers. Can you compare JetOctopus with the others and assess it?
Every crawler looks for errors and finds them. The main point is the balance between the scanned pages and the price. JetOctopus is one of the most affordable crawlers.

Would you recommend JetOctopus to your friends?
We’re going to use it within our company from time to time. I would recommend the crawler to my friends if they were SEO optimizers.

Your suggestions for JetOctopus.
To refine the web version ASAP. There are a few things we were missing badly:
Thank you very much for such a detailed analysis. Currently we have been reflecting on a redirects' problem.
Do you wan’t
more SEO traffic?
All sites have technical errors which block your SEO traffic boost
I’ts an axiom
Find out the most epic errors ot your site and start fixing them
The amount of the provided information is quite enough and you can see at once where and what the problems are