July 17, 2020
by Serge Bezborodov
3 ways of log file integration with JetOctopus

Why SEOs need logs?

Log files contain 100% accurate information on how search engines crawl your website, which inevitably makes log file analysis an essential part of website SEO. It`s hard to find an SEO specialist who wouldn`t be aware of that, and there is no wonder log file analysis gets more popularity over past years.

It doesn`t mean there are no questions related to the whole procedure, though. 
– What information exactly is needed from logs? 
– Is it safe to integrate log files to SEO tools?
– How do I make sure the info won`t leak to my competitors, or be sold to them? 

These questions are fair, and in this article, we`re going to discuss them, as well as the ways to integrate log files to the JetOctopus tool. 

What data do we take from logs?

Log file is usually a text file that contains each step Googlebot makes here, it includes server IP, client IP, timestamp of the visit, URL requested, Http status code, user-agent, method (get/post), etc. 

What we need:

  1. IP address
    To verify the source of the request properly, you need to check an IP address from which the request was made. The User-agent detection method is not reliable, because fake bots can use Google’s User-agents “names” to crawl your site. To verify Googlebot, JetOctopus runs a reverse DNS lookup. 
  1. User-Agent 
    Each search engine has a variety of bots with different User-Agent “names”. Here is the table with Google’s User-Agents. JetOctopus uses the User-Agent string to identify each crawler
  1. The page’s URL 
  1. The page’s Status Code (HTTP response)

Apart from those main parts, there are also secondary but still meaningful logs components that provide clues about bots’ behavior:

  1. Load time. 
    How long the bot had been waiting until the server gave the full HTML code. We`re not talking about the user’s load time but about the HTML generation process, which is crucial on the large sites.
  1. Referrer
    If we have the referrer data, we can separate bots’ visits from the user’s organic visits. Thus you get unsampled data about pages with organic traffic (you will find these reports in SEO Active Pages.)

JetOctopus types of logs integration

Live Stream NGINX integration

How it works: theory
Each time bot makes a request, your server sends a small package of data (on average, 200-300 bytes) on our servers in real-time. We analyze this data immediately and store it in the system where data appears without delays in the Raw logs report. 

How it works: practice
This technology is built on the base of the UDP protocol. The main advantage of this technology is that it doesn’t impact your server, and even if all our servers break down, it won’t impact your site’s productivity at all.

Security
The only way to intercept data via Live Stream log data transmission is to get direct access to the core routers between your server and our server. You don’t have to worry because it is almost impossible to realize.

How to do it
The only thing you need to do is to insert two lines of code for NGINX configuration. Please remember that your server doesn’t send us any sensitive data such as passwords, users’ credentials, etc. 

Daily files integration

This way of log integration leads to a time gap in data ( 1 hour to 1 day approximately). Thus you won’t know what is happening on your site at the particular moment.
Nevertheless, this one is one of the most frequently used by our customers. 

How it works
Your system administrator should set up automatic data exporting for the previous day time that is available at the beginning of the present day. Every morning you download fresh log data, analyze it, and import it in our system.

Security
This way of logs integration is the most secure. Files transfer is conducted via the HTTPS protocol. Data is only accessible through the IP and with the password.

 Bulk dump integration

When to use
This way is suitable only for uploading large volumes of historical logs in the JetOctopus dashboard. This process usually takes a few days.

How to do it
Your system administrator should set up data extraction for a particular period and transfer data via FTP, S3, etc.

Security 
This way of logs integration is the most secure. Files transfer is conducted via the HTTPS protocol. Data is only accessible through the IP and with the password.

Before we continue:
The abovementioned ways can be combined for your convenience. For instance, you can download historical logs via Bulk Dump and then connect the Live Stream. If you ever decide to switch the way of integration, it won’t cause data loss in the JetOctopus interface.

Integrate my Logs

How we assure the security of your data

From our experience, different users have specific attitudes and fears connected with the security of the data. And we totally get it.

So before using any tool we have to highlight that JetOctopus follows GDPR of personal data and doesn’t use any sensitive data like POST requests with users’ passwords, invoices data, etc. We don’t collect, analyze, and save this data. 

We do not process and we do not save users’ IP addresses, even if you provide us with this data. An IP address is only saved in case the User-Agent line contains the word “bot” in it.

If you feel like you need an additional legal justification behind the data transfer we sign a mutual NDA (Non-disclosure agreement). It is not an obligatory part but is done at your request and you can suggest your own NDA document if needed. 

Our Privacy Policy is disclosed on the website, where you have full access. Please make sure to read it carefully. It aims to help you understand what data we collect, what we use it for, and how you can exercise your rights. 

Try Logs integration on our free 7-days trial.
Follow JetOctopus on Twitter and on YouTube

And stay tuned!

Serge Bezborodov
About Serge Bezborodov
The co-founder and CTO of JetOctopus. He is a tech-SEO and log-files nerd with 10+ years of experience in programming. Serge is also a data-driven SEO evangelist. He regularly shares insights from a billion crawled pages and 14 billion log lines on DigitalOlympus, TechSEOboost, BrightonSEO conferences, and through the TTT community. You can find Serge on Twitter and LinkedIn.

Categories

Find technical SEO errors at your website at free trial
Get exclusive tech SEO insights
We are tech SEO geeks who believe that SEO is predictable and numeric. Don’t miss our insigths!

Leave comment

Your email adress will not be published. Fill in the fields

Start your free trial
free 7-days trial