September 14, 2021
by Ihor Bankovskyi

Case Study: How JetOctopus Helped Depositphotos Index 150 Million+ Pages

The case study looks at how a commercial content platform — Depositphotos — had an indexing problem for its 150 million+ pages where only 20% of its pages were indexed by Google. Read on to find out how JetOctopus helped their SEO team uncover and address this issue.

There are times when what seems like an optimal approach to a marketing challenge turns out to be anything but effective. This Depositphotos case study takes a look at a major website indexing issue commonly faced by large websites. The parameters of the website were essentially wasting its crawl budget — causing indexing issues for its fresh pages, effectively (and passively) causing major harm to the company’s organic search results.

About the website

Depositphotos homepage

Depositphotos is a commercial content platform that brings creators of high-quality licensed stock photos, graphics, vectors, videos, and music in front of potential buyers. It started as a simple idea — to create a platform where people can find licensed content for their marketing and design needs with ease. From the beginning, organic search was the main source of traffic and income.

I am Ihor Bankovskyi, Head of SEO at Depositphotos, I started 5 years ago as an SEO specialist with Depositphotos and after spending 4 years in another company, when I came back as Head of SEO, I noticed that as time passed and the website grew in size and traffic, it suffered a few traffic drops due to Google updates and faced various issues, the biggest one being poor page indexing.

To solve this issues, we set some goals

Depositphotos had several issues that hindered its organic traffic growth, and we prioritized our main goals to:

  1. Increase Crawl Rate: After a thorough technical SEO audit and analysis with JetOctopus, we found Depositphotos had millions of pages with multiple language versions. We decided to crawl a select few main language versions and segments for each page.
  2. Increase Organic Traffic: We wanted to ensure the site ranks for the relevant search intent keywords and the content reaches the intended target audience.

Problem: Only 20% of 150 million crawled pages indexed by Googlebot

We started our technical audit by using JetOctopus’s indexation tool to crawl over 2 million URLs in 2 days for each language version, connected logs data, and GSC. 

We found a major issue with the indexation of pages. After analyzing the past one month’s log data and comparing it with crawled pages we found that only 20% of pages were crawled by Googlebot in the last month. This is a big problem as without regular recrawl, Google can’t know about the fresh content on the platform. 

Solution: 

Step 1: Increasing the crawl rate through internal linking

Based on the JetOctopus’ last 30 days page refresh rate data for all pages in logs, it took around 35-40 days for Googlebot to recrawl so we decided to increase the crawl rate.

First off, we segmented website pages and found the ones with the highest crawl rate index. With the help of the URL filtering option, we realized that the search results pages suffered the most. 

From our experience, one of the best ways to increase the crawl rate for such pages is to increase the number of internal links for them.

In the crawled pages data table, we added a row with the number of internal links for each page. Next, we exported all URLs with international links quality from 1 to 5.

With the help of our developers, we created a script that directed internal links from relevant pages to these pages.

As a result, our page refresh rate decreased by half. Also, this led to a significant increase in the number of pages in the Google index.

Step 2: Adding noindex tag using JetOctopus Log Analysis

Another important actionable finding came from our log analysis tool — something that’s a good-to-have for big websites — to add a noindex tag to optimize the crawl budget.

The most useful report from this tool is a simple data table with pages in logs. We check it regularly to find useful insights and potential problems.

When we checked it for the first time, we understood that Google crawled millions of pages with parameters that we did not know about. We sorted the data table and exported top pages by bot visits. 

After analysis, we understood that Google found parameters from our internal filters and wasted the crawl budget on such pages. While these pages had a canonical tag on them to the page without parameters, it wasn’t enough. 

So we added a noindex tag on them and listed all these parameters in the GSC URLs parameters tool.

Let’s look at the results 

Using the JetOctopus toolkit, we saved a lot of time in identifying the most impactful issues and quickly resolving them. The results were near-instant — in just a few weeks, almost all pages with internal filter parameters were gone from the logs, and Google started to crawl the content pages more often.

Final thoughts

Great software can go a long way in simplifying SEO problem-solving for large, content-heavy websites. With clear, actionable insights, the right technical SEO tool can help save hours or even days worth of time and drive quicker results.

Here are a couple of key takeaways:

  • Strategic internal linking is a powerful way to increase the crawl rate
  • Adding the noindex tag to filtered pages is a good way to optimize the crawl budget

If you have any questions or thoughts on this case study, feel free to drop a comment below!

You may be interested in reading other case studies:

How DOM.RIA Doubled Their Googlebot Visits Using JetOctopus

Importance of Ranking Signal Consolidation, Web Entity Hierarchy and Clean URL Structure for Better Communication with Search Engines

SEO case study. How TemplateMonster found 3 mln.orphaned pages that were regularly visited by Google search bot

About Ihor Bankovskyi
Ihor Bankovskyi is Head of SEO at Depositphotos with eight years of experience in online marketing and SEO. He is well versed in Tech SEO and team management and has a vast experience in SEO for big multilingual websites and building strong, efficient teams.

Search

Categories

Get exclusive tech SEO insights
We are tech SEO geeks who believe that SEO is predictable and numeric. Don’t miss our insigths!