3 M of possible crawling budget waste. Never seen before 3 times mismatch!
Case study templatemonster.com
- Delivering website templates on the Net since 2002
- 1 M pages
- 5 M monthly visits
The main challenge:
To find and fix technical bugs before website's migration. Know about problems to avoid pitfalls on the new CMS.
What was done:
- We crawled the website with JetOctopus to find technical errors.
- We looked at the website through Google’s ‘eyes’ with log analysis to see how search bot scans Templatemonster.com
What problems were detected?
- The website has 1 M pages, but there are 3 M pages which Google regularly visit, and we DON'T KNOW about these pages!
- There are 250K pages that aren’t crawled by Google. In this list there are valuable pages too
What recommendations we gave:
- Look at log analysis report and check each URL from 3 M unknown webpages manually. Then link valuable, commercial pages into the site structure.
- Delete useless orphaned webpages and pages with bugs. Special attention to the webpages with 5XX Status Code should be paid. Google bots visit these pages frequently and waste crawling budget on URLs with bugs.
- Analyze 250K pages that aren’t indexed by Google. Add links from indexable pages to the valuable pages and generate new sitemap (that is like the invitation for Googlebot to recrawl the website).
ABOUT THE AUTHOR
Serge Bezborodov is a CTO of JetOctopus. He is a professional programmer with 9+ years of experience. He has worked with aggregators for 5 years - vacancies, real estate, cars. Also, he has experience in design DB architecture and query optimization. Serge has crawled more than 160 M pages, 40 TB of data with JetOctopus.