Why your crawl can’t be finished.

Sometimes crawl fails because the JetOctopus bot is blocked by the webserver.

If this problem occurs, you will the following information in your crawl project:

a lot of URLs returning a 401 or 403 status code;
the crawl slows down progressively, and appears to stop running completely;
crawler shows ‘crawl error’ status;
crawl shows that only 1 page was crawled.

Crawl my site for free

The reason for your crawl being blocked is most likely one of the following:

– JetOctopus uses different cloud providers which may be blocked by default as they are also often used by scrapers.

– There is an automated system on the server which detects and blocks suspicious activities.

– A manual block has been implemented by a server administrator, based on manual inspection of server activity, possibly triggered by a high load caused by the crawl, or a large number of crawl errors.

– The use of a Googlebot user agent has led to the failure of a reverse DNS lookup, appearing to be a scraper that is spoofing Googlebot.

We recommend trying the following solutions to blocked crawls:

– provided that you know the site, you can ask the server administrators to whitelist the IP that JetOctopus uses to crawl:

54.36.123.8
54.38.81.9
147.135.5.64
139.99.131.40
198.244.200.110

– some websites will block requests which come from a Googlebot user-agent but do not originate from a Google IP address. In this scenario, selecting a different user agent often makes the crawl succeed;

– some websites attempt to use Javascript to block crawlers that do not execute the page. This type of block can normally be circumvented by enabling our Javascript renderer.

If you have any problems with the crawl,
feel free to send a message to support@jetoctopus.com.

Sharing is Caring:

About Ann

Ann is the Content Strategist in JetOctopus. She learned the dynamics of SEO from scratch by understanding the inner workings of logs and crawl reports data. She cross-examines actual SEO cases by studying clients’ websites and loves being an active member of the technical team. Because of her dedication, she has become technically savvy and always bases her definition of quality content on real data and not on speculation. You can find Ann on Twitter and LinkedIn.

Why your crawl can’t be finished.

If this problem occurs, you will the following information in your crawl project:

The reason for your crawl being blocked is most likely one of the following:

We recommend trying the following solutions to blocked crawls:

About Ann

Search

Categories

Why your crawl can’t be finished.

If this problem occurs, you will the following information in your crawl project:

The reason for your crawl being blocked is most likely one of the following:

We recommend trying the following solutions to blocked crawls:

Related posts:

About Ann

Search

Categories