October 13, 2025
by Serge Bezborodov

 AI Bots: What They Crawl And Why

BrightonSEO is one of the largest search marketing conferences in the world gathering thousands of SEO experts and digital marketers. Originally a local UK meetup it has grown into a global event with editions in the US. At the 2025 San Diego conference, Serge Bezborodov, co-founder of JetOctopus, delivered one of the most talked-about sessions on AI bots.

The Evolution of Crawlers

In the early days of SEO things were simple. There were two types of crawlers: Searchbots, which indexed the web and was generally considered “good” and scrapers, which stole content and were labeled “bad”. That was the landscape most SEOs grew up with.

But the landscape has shifted. Today we face a third major category – AI bots. They are neither fully “bad” nor fully “good.” On one hand, they scrape and train on our content without permission. On the other hand, user-facing AI systems like ChatGPT, Claude and Perplexity can also send real traffic back to websites.

The Internet Is Full of Bots

Bots have always been a part of the internet ecosystem – AI bots are simply the newest wave.
By examining server logs, we can analyze how AI bots crawl websites, how often and what exact pages.

The Three Types of AI Bots

AI bots are not all the same – they fall into three major categories:

Training Bots – crawlers like GPTBot or ClaudeBot that collect massive amounts of content to feed large language models. Their goal is simple: more data – more training.

Search Bots – such as OAI-SearchBot or Claude-SearchBot, designed to extend LLMs with search capabilities, helping them pull in fresh or specific information.

User Bots – perhaps the most important for SEO. Tools like ChatGPT-User or Perplexity-User bots check and verify URLs before presenting them to end users. These crawlers represent impressions and potential traffic opportunities.

Large Language Models: Content Hunger

The explosive growth of AI bots is tied to the rise of large language models. The critical factor is scale: the larger the dataset, the more powerful the model. This is why AI bots are constantly crawling – they need endless streams of content.

“More is better” has become the mantra of LLM training. The difference between a forgotten GPT-2 in 2020 and the massive global adoption of ChatGPT in late 2022 illustrates just how much data size matters.

The Rise of AI Bots in the Wild

Publicly available data confirms what SEOs see in their own logs: AI bot activity is surging.

  • Cloudflare Radar shows a steady and accelerating increase in AI-related crawling.
  • Historical log data going back to the beginning of 2025 reveals exponential growth – nearly doubling year over year.
  • The first noticeable spike from GPTBot hit in March 2023, marking the beginning of a new era in crawling.

While Googlebot still dominates the crawling landscape, AI bots are rapidly catching up. For SEOs, this shift means visibility in AI-driven environments is becoming as important as traditional search rankings.

How Do They Train the Models?

This question remains one of the industry’s biggest mysteries. Officially, model creators say they use “publicly available and licensed data.” Common Crawl archives, open datasets.

But logs tell only part of the story. Not every bot identifies itself, and they may use proxies or rotate IPs. The true scale of AI data collection is likely far larger than anyone can measure from the outside.

What Do Bots Really Want?

At the end of the day, the goal of any crawler  whether Googlebot or an AI bot is the same – to find and process the most valuable content. AI bots are not aliens. They are built on the same foundations as web crawlers have always been.

This means the classic SEO magic triangle still holds true:

  • Content – high-quality, unique and useful information
  • Links – both internal and external help bots navigate and assign importance
  • Technical health – server-side rendering, fast load times and crawlable structures

Do AI Bots Have a Crawl Budget?

Just like Googlebot, AI crawlers operate with limited resources. They can’t crawl every single page endlessly, so they have to prioritize. The big question is: do AI bots have something like Google’s well-known crawl budget?

The answer seems to be yes. Logs suggest that AI bots follow similar patterns to Googlebot:

  • They prefer useful, unique content over duplicated or low-value pages
  • They are influenced by internal linking and site structure, crawling deeper into well-connected pages
  • They respond to technical signals such as load speed

In other words, the same principles that have guided SEO for decades also apply here.

What Impacts AI Bots Most?

Analysis of logs shows that AI bots behave in ways that are strikingly similar to Googlebot. They crawl based on depth, prioritize well-linked pages and ignore orphaned or weak content.

The conclusion? The same factors that optimize for Googlebot also optimize for AI bots. The principle of building an efficient web crawler has not changed in decades.

AI User Bots: New Impressions for SEO

One of the most surprising findings is how AI user bots behave. Before showing a link to a user, systems like ChatGPT or Perplexity often verify that the page is valid and accessible – using their own user bots.

This creates a completely new type of impression. Every time an AI assistant checks your page before presenting it in an answer, that request can be tracked. In other words, AI USER bot visits could be equal to impressions inside LLMs.

For SEOs, this is a breakthrough:

  • These impressions reveal how often your pages appear in AI-generated answers
  • They can be tracked directly via server logs, giving a free and reliable source of visibility data
  • They represent a brand-new traffic channel – LLM visibility. That is quickly becoming as important as traditional SERP rankings

What To Do with AI Bots

So what’s the practical takeaway? The advice may sound simple, even boring: do the same things you already do for Googlebot. AI bots are still crawlers, built on the same fundamental principles.

  • Server-Side Rendering – make sure bots can easily access your content without JavaScript issues
  • Fast Load Times – return HTML as quickly as possible
  • Clean Site Structure & Internal Linking – help crawlers discover and prioritize your best pages

The same best practices that have worked for Google over the past two decades also apply to AI bots.

About Serge Bezborodov
Serge is the co-founder and CTO of JetOctopus, a tech SEO expert and log-file analysis enthusiast with over a decade of programming experience. A passionate advocate for data-driven SEO, he regularly shares insights from billions of crawled pages and analyzed log lines, helping SEOs turn complex data into actionable strategies.
You can find Serge on Twitter and LinkedIn.

Search

Categories

Get exclusive tech SEO insights
We are tech SEO geeks who believe that SEO is predictable and numeric. Don’t miss our insigths!