Why there are 5xx pages in the crawl results and why they are not reproduced with the manual checking
5xx response codes indicate that the web server cannot process a request from the client. During the crawl, the JetOctopus user agent is a client. 5xx status code is not a sign that the URL is incorrect or broken. This only indicates that the web server could not respond when the server received this request […]
Why are there missing pages in the crawl results?
While checking the crawl results, you may notice that some pages are missing. Therefore, it is very important to understand why JetOctopus did not find some pages because search engines are just as difficult to find these pages when crawling your site. Why JetOctopus didn’t find all the pages To understand why JetOctopus did not […]
How to schedule a crawl
You can run the crawl yourself or set up a regular crawl. Also, you can set the start of a one-time crawl at the time you need, for example, at night. With a regular crawl, you can track changes to your website after releases, the number of new pages created over a while, and more. […]
How to find links to broken pages
Using JetOctopus, you can find links to broken pages easily. Start a new crawl or check for broken links in the results of the previous crawl. However, keep in mind that the number of broken links may have changed since the last crawl. Links to broken pages are the links that lead to non-200 pages. […]
How to find orphan pages with JetOctopus
Orphan pages are URLs that cannot be found on your website during a crawl. They are not included in the structure of internal linking. As a consequence, users will not be able to find these pages. But search engines can follow the orphan pages from external links, from search index (for example, if the orphan […]
The easy way to check for duplicate content: duplicated titles and meta descriptions
Unique content is an essential factor in the ranking of your website in search results. If there is a lot of duplicate content, search engines may incorrectly identify the most relevant page or exclude duplicate content URLs from search results. However, the most important thing is that duplications are wasting your crawling budget. A lot […]
How to audit redirects with JetOctopus
Modern websites are very dynamic. They change frequently, URLs are updating, and websites have migrated to new domains. In all these cases, redirects are needed so that users and search engines can find the right URLs. If redirects are missing or not working properly, you may lose conversions and the URLs will drop out of […]
How to check XML Sitemaps with JetOctopus
Sitemaps are one of the most important sources of new URLs for search engines. They scan URLs from sitemaps regularly. And if the URLs meet the requirements of search engines, they will be indexed. However, if sitemaps contain broken links, 4xx, 3xx pages, etc., this will negatively affect your crawling budget. Therefore, it is wise […]
How to crawl websites using Cloudflare with the Googlebot user-agent
If you crawl your website with a regular JetOctopus user agent, it will not be blocked by Cloudflare. However, if you want to check how your site is seen by Google bot using crawling, Cloudflare will block unconfirmed Googlebot. We remind you that you can crawl your website with the Googlebot Mobile or Googlebot Desktop […]