Sometimes crawl fails because the JetOctopus bot is blocked by the webserver.
– JetOctopus uses different cloud providers which may be blocked by default as they are also often used by scrapers.
– There is an automated system on the server which detects and blocks suspicious activities.
– A manual block has been implemented by a server administrator, based on manual inspection of server activity, possibly triggered by a high load caused by the crawl, or a large number of crawl errors.
– The use of a Googlebot user agent has led to the failure of a reverse DNS lookup, appearing to be a scraper that is spoofing Googlebot.
– provided that you know the site, you can ask the server administrators to whitelist the IP that JetOctopus uses to crawl:
54.36.123.8 54.38.81.9 147.135.5.64 139.99.131.40 198.244.200.110 |
– some websites will block requests which come from a Googlebot user-agent but do not originate from a Google IP address. In this scenario, selecting a different user agent often makes the crawl succeed;
– some websites attempt to use Javascript to block crawlers that do not execute the page. This type of block can normally be circumvented by enabling our Javascript renderer.
If you have any problems with the crawl,
feel free to send a message to support@jetoctopus.com.