Sometimes you delete/change some pages and in results they become "orphaned" (URLs that aren't in the website structure but are still visited by bot). Crawl budget could be wasted on useless or outdated info instead of profitable content. Here is the case where bot wasted its resources on 3M useless pages. Case study of templatemonster.com
- Delivering website templates on the Net since 2002
- 1M pages
- 5M monthly visits
The main challenge:
To find and fix technical bugs before website's migration. Know about problems to avoid pitfalls on the new CMS.
What was done:
- We crawled the website with JetOctopus to find technical errors.
- We looked at the website through Google’s ‘eyes’ with log analysis to see how search bot scans Templatemonster.com
What problems were detected?
- The website has 1 M pages, but there are 3 M pages which Google regularly visit, and we DON'T KNOW about these pages!
- There are 250K pages that aren’t crawled by Google. In this list there are valuable pages too
What recommendations we gave:
- Look at log analysis report and check each URL from 3 M unknown webpages manually. Then link valuable, commercial pages into the site structure.
- Delete useless orphaned webpages and pages with bugs. Special attention to the webpages with 5XX Status Code should be paid. Google bots visit these pages frequently and waste crawling budget on URLs with bugs.
- Analyze 250K pages that aren’t indexed by Google. Add links from indexable pages to the valuable pages and generate new sitemap (that is like the invitation for Googlebot to recrawl the website).