Sometimes crawl fails because the JetOctopus bot is blocked by the webserver.
– JetOctopus uses different cloud providers which may be blocked by default as they are also often used by scrapers.
– There is an automated system on the server which detects and blocks suspicious activities.
– A manual block has been implemented by a server administrator, based on manual inspection of server activity, possibly triggered by a high load caused by the crawl, or a large number of crawl errors.
– The use of a Googlebot user agent has led to the failure of a reverse DNS lookup, appearing to be a scraper that is spoofing Googlebot.
– provided that you know the site, you can ask the server administrators to whitelist the IP that JetOctopus uses to crawl:
– some websites will block requests which come from a Googlebot user-agent but do not originate from a Google IP address. In this scenario, selecting a different user agent often makes the crawl succeed;
If you have any problems with the crawl,
feel free to send a message to email@example.com.