The case study looks at how a commercial content platform — Depositphotos — had an indexing problem for its 150 million+ pages where only 20% of its pages were indexed by Google. Read on to find out how JetOctopus helped their SEO team uncover and address this issue.
There are times when what seems like an optimal approach to a marketing challenge turns out to be anything but effective. This Depositphotos case study takes a look at a major website indexing issue commonly faced by large websites. The parameters of the website were essentially wasting its crawl budget — causing indexing issues for its fresh pages, effectively (and passively) causing major harm to the company’s organic search results.
Depositphotos is a commercial content platform that brings creators of high-quality licensed stock photos, graphics, vectors, videos, and music in front of potential buyers. It started as a simple idea — to create a platform where people can find licensed content for their marketing and design needs with ease. From the beginning, organic search was the main source of traffic and income.
I am Ihor Bankovskyi, Head of SEO at Depositphotos, I started 5 years ago as an SEO specialist with Depositphotos and after spending 4 years in another company, when I came back as Head of SEO, I noticed that as time passed and the website grew in size and traffic, it suffered a few traffic drops due to Google updates and faced various issues, the biggest one being poor page indexing.
Depositphotos had several issues that hindered its organic traffic growth, and we prioritized our main goals to:
We started our technical audit by using JetOctopus’s indexation tool to crawl over 2 million URLs in 2 days for each language version, connected logs data, and GSC.
We found a major issue with the indexation of pages. After analyzing the past one month’s log data and comparing it with crawled pages we found that only 20% of pages were crawled by Googlebot in the last month. This is a big problem as without regular recrawl, Google can’t know about the fresh content on the platform.
Based on the JetOctopus’ last 30 days page refresh rate data for all pages in logs, it took around 35-40 days for Googlebot to recrawl so we decided to increase the crawl rate.
First off, we segmented website pages and found the ones with the highest crawl rate index. With the help of the URL filtering option, we realized that the search results pages suffered the most.
From our experience, one of the best ways to increase the crawl rate for such pages is to increase the number of internal links for them.
In the crawled pages data table, we added a row with the number of internal links for each page. Next, we exported all URLs with international links quality from 1 to 5.
With the help of our developers, we created a script that directed internal links from relevant pages to these pages.
As a result, our page refresh rate decreased by half. Also, this led to a significant increase in the number of pages in the Google index.
Another important actionable finding came from our log analysis tool — something that’s a good-to-have for big websites — to add a noindex tag to optimize the crawl budget.
The most useful report from this tool is a simple data table with pages in logs. We check it regularly to find useful insights and potential problems.
When we checked it for the first time, we understood that Google crawled millions of pages with parameters that we did not know about. We sorted the data table and exported top pages by bot visits.
After analysis, we understood that Google found parameters from our internal filters and wasted the crawl budget on such pages. While these pages had a canonical tag on them to the page without parameters, it wasn’t enough.
So we added a noindex tag on them and listed all these parameters in the GSC URLs parameters tool.
Using the JetOctopus toolkit, we saved a lot of time in identifying the most impactful issues and quickly resolving them. The results were near-instant — in just a few weeks, almost all pages with internal filter parameters were gone from the logs, and Google started to crawl the content pages more often.
Great software can go a long way in simplifying SEO problem-solving for large, content-heavy websites. With clear, actionable insights, the right technical SEO tool can help save hours or even days worth of time and drive quicker results.
Here are a couple of key takeaways:
If you have any questions or thoughts on this case study, feel free to drop a comment below!