goaccess log analysis nginx

Analysis command:

goaccess -a -d -f /mnt/winshare/access-2023070112.log -p goaccess.conf  -o /mydata/nginx/html/2023070112_new.html

Parameters when analyzing logs

goaccess使用参数详解

-a	 开启 UserAgent 列表。开启后会降低解析速度
-c	 在程序开始运行时显示 日志/日期 配置窗口
-d	 输出 HTML 或者 JSON 报告时开启 IP 解析
-f	 指定输入日志文件的路径
-p	 指定使用自定义配置文件

Panel statistics meaning:

Official website description: goaccess.io/man#description

Interface test statistics address (my local machine)

Server Statistics

The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly.

Detailed explanation of nginx log parameters

参数  说明  示例
$remote_addr   客户端地址   172.17.0.1
$remote_user   客户端用户名称 --
$time_local    访问时间和时区 [29/Dec/2022:10:17:14 +0000]
$request   请求的URI和HTTP协议   "GET /test/nginx/proxy HTTP/1.1"
$http_host 请求地址,即浏览器中你输入的地址(IP或域名) 10.1.7.33
$status    HTTP请求状态    200
$upstream_status   upstream状态  200
$body_bytes_sent   发送给客户端文件内容大小    38
$http_referer  url跳转来源 - 
$http_user_agent   用户终端浏览器等信息  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36"
$http_cookie	用户cookie信息  "grafana_session=73d13d456cb4363f8a48f5501348669e" 
$ssl_protocol  SSL协议版本 TLSv1
$ssl_cipher    交换数据中的算法    RC4-SHA
$upstream_addr 后台upstream的地址,即真正提供服务的主机地址  "10.1.7.33:8102"
$request_time  整个请求的总时间    0.012
$upstream_response_time    请求过程中,upstream响应时间  0.012
Let’s focus on request_time

$request_time is a variable in the Nginx log, indicating the request processing time. The unit is seconds and the precision is milliseconds. It reflects the time from when Nginx starts receiving the first byte of the client's request to when it sends the response data to the client. This time includes the time for receiving request data, the time for the back-end program to respond, and the time for sending response data to the client, but does not include the time for writing logs.

If you want to use the $request_time variable, you need to set the log_format directive in the Nginx configuration file to define the log format you want to record. For example, you can set it like this:

log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" '
'"$request_time"';

In this way, you can see the value of $request_time in each log. For example:

192.168.1.100 - - [23/Sep/2023:10:15:32 +0800] "GET /index.html HTTP/1.1" 200 612 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36" "-" "0.012"

Here, the last field "0.012" is the value of $request_time, which means that the request was processed for 12 milliseconds.

The requesttime variable can help you analyze the performance and throughput of Nginx, as well as the response of the backend program. You can use the request_time variable to help you analyze the performance and throughput of Nginx, as well as the response of the backend program. you can base onrequesttim e variables can help you analyze the performance and throughput of Nginx , as well as the response of the backend program . You can use the value of request_time to determine which requests are slow, whether a timeout or error occurs, whether the program needs to be optimized or the configuration adjusted, etc.

Nginx: PV, UV, independent IP

Anyone who builds a website knows that it is often necessary to check the access data of websites such as PV and UV. Of course, if the website has a CDN, the nginx local log will be meaningless. Let’s make statistics on the log access data of the nginx website. ;

**UV (Unique Visitor): **Unique Visitor, each independent Internet computer (based on cookies) is regarded as a visitor. The number of visitors who visit your website within a day (00:00-24:00) . Visits to the same cookie within a day are only counted once.
**PV (Page View): **Visits, that is, page views or clicks. Each visit of a user to the website is recorded once. When a user visits the same page multiple times, the total number of visits is counted.
Independent IP : The same IP address is only counted once within 00:00-24:00. Friends who do website optimization are most concerned about this.

Explanation of statistical indicator parameters in goaccess

TX AMOUNT is an indicator in performance testing, indicating the number of transactions per second (Transaction Per Second), that is, the number of transactions that the system can process per unit time. A transaction refers to an interaction between a user and the system, such as login, query, payment, etc. TX AMOUNT reflects the system's processing capability and throughput, and is related to indicators such as response time and number of concurrencies.

HITS is an indicator in performance testing, indicating Hits Per Second, which is the number of requests per second. HITS reflects the throughput of the system, which is the number of requests that the system can handle per unit time. HITS has a certain relationship with TPS (transactions per second), but they are not exactly the same. A transaction may contain multiple requests. For example, a web page may contain requests for multiple images, CSS, JS and other files. Therefore, HITS will generally be greater than TPS.

AVG.TS : Average Time Served, indicating the average service time of each request, in seconds. It reflects the average time from Nginx starting to receive the first byte of the client's request to sending the response data to the client. This time includes the time for receiving request data, the time for the back-end program to respond, and the time for sending response data to the client, but does not include the time for writing logs.

CUM.TS : Cumulative Time Served, indicating the cumulative service time of each request, in seconds. It reflects the total time from Nginx starting to receive the first byte of the client request to sending the response data to the client. It is the sum of all requested AVG.TS.

MAX.TS : Maximum Time Served, indicating the maximum service time of each request, in seconds. It reflects the longest time from when Nginx starts receiving the first byte of the client's request to when it sends the response data to the client. It is the slowest of all requests.

To display these three columns, you need to include a service time field in your log format, which you can parse using the %Tor %Dspecifier. For example, if you use Apache's combined log format, you can %Denable these three columns by adding at the end of the string.

Standard configuration

How to let goaccess count the service time of the request ?

Configure nginx log format as follows:

 log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for" '
                    '$upstream_addr $request_time $upstream_response_time ';

The format of configuring goaccess log-format is as follows

time-format %H:%M:%S
date-format %d/%b/%Y
log-format %h - %^ [%d:%t %^] "%r" %s %b "%R" "%u" "%^" %^ %T %T

nginx log output format is as follows

192.168.56.1 - - [14/Sep/2023:03:02:53 +0000] "GET /cc/ HTTP/1.1" 200 15567 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.76" "-" 123.121.155.167:16002 0.433 0.432
192.168.56.1 - - [14/Sep/2023:03:02:56 +0000] "GET /cc/ HTTP/1.1" 200 15567 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.76" "-" 123.121.155.167:16002 0.330 0.330
192.168.56.1 - - [14/Sep/2023:03:02:57 +0000] "GET /cc/ HTTP/1.1" 200 15567 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.76" "-" 123.121.155.167:16002 0.367 0.367

To what extent will goaccess be supported in the future? The following are the questions I raised on the official website and the answers I gave.

github.com/allinurl/goaccess/issues/2556

About practical applications

Question: How does goaccess automatically analyze nginx logs? nginx logs are cut by day. If I want to use python to automatically analyze daily logs and output daily reports. And you can also analyze reports for a certain period of time through the interface, such as integrating reports for several days and analyzing them together. A detailed design and development plan is provided below

**Answer: **goaccess is an open source real-time web log analysis tool that can view web server statistics through a terminal or browser. goaccess supports multiple log formats, including nginx's log format, and you can also customize the log format. goaccess can generate static HTML reports, real-time HTML reports, and reports in JSON and CSV formats.

To achieve your needs, I provide you with a possible design and development solution, as follows:

  • Step one: write a python script. You need to write a python script to automatically analyze daily nginx logs and output daily reports. You can refer to

    This article

    To learn how to call goaccess using python. Your script roughly needs to implement the following functions:

    • Define a function to obtain the corresponding nginx log file name based on date. For example, if your nginx log is cut by day and named in access.log-20210901this format, then you can splice the file name based on the date parameter.
    • Define a function to get the corresponding report file name based on date. For example, if you want to generate an HTML report and name it in report-20210901.htmlthis format, you can also splice the file name based on the date parameter.
    • Define a function for calling the goaccess command, passing in the log file name and report file name as parameters, performing log analysis and outputting the report. For example, you can use os.system('goaccess -f {} -o {} --log-format=COMBINED'.format(log_file, report_file))a statement similar to this to execute the goaccess command.
    • Define a main function to traverse the specified date range, analyze the logs of each day, and generate a report for each day. For example, you could use for date in pd.date_range(start_date, end_date):a loop similar to this to iterate through the date range and call the functions defined earlier to get the file names and perform analysis.
  • Step 2: Run the python script. You need to run the python script you wrote on the server, or set it as a scheduled task to run automatically every day. 5 In this way, you can automatically analyze daily nginx logs and output daily reports.

  • Step 3: View the report. You need to view the generated HTML report on a browser, or use another tool to view the report in JSON or CSV format. 1 2 If you want to analyze reports for a certain period of time through the interface, such as integrating reports for several days and analyzing them together, then you can use the growth log processing function provided by goaccess. 6This feature allows you to merge multiple log files into a database and generate a comprehensive report from them. 6

Will continue to be replenished. .

Log processing function. ](about:blank#) 6This feature allows you to merge multiple log files into a database and generate a comprehensive report from them. 6

Will continue to be replenished. .

Guess you like

Origin blog.csdn.net/superzhang6666/article/details/132874953
Recommended