Use goaccess to analyze nginx logs

Introduction to GoAccess

GoAccess is an open source (MIT license) real-time web log analysis tool with an interactive view interface, which can be accessed through your web browser or a terminal program under the *nix system.

It can provide system administrators with fast and valuable HTTP statistics and present them in the form of online visualization servers. GoAccess parses the specified Web log file and outputs the statistical results to the X terminal. The functions are as follows:

  • General Statistics : This panel displays several main indicators, such as: the number of valid and invalid requests, the time spent analyzing these data, the status of independent visitors, the requested files, the integrity of static files (CSS, ICO, JPG, etc.) URL, 404 error, the size of the parsed log file, and the bandwidth consumed.
  • Unique visitors : This panel displays the number of visits, the number of unique visitors, and accumulated bandwidth consumption and other indicators by date. HTTP requests of the same UserAgent with the same IP and the same access time will be recognized as independent visitors. Web crawlers are included by default.
    You can also choose to use the --date-spec=hr parameter to modify the analysis by date to hour, for example: 05/Jun/2016:16. This is very helpful if you want to track daily traffic at the hourly level.
  • Requested files : This panel displays the most requested files on your server. Including the number of visits, the number of unique visitors, the percentage, the cumulative bandwidth consumption, the protocol used, and the request method.
  • Requested static files : List the most frequently requested static file types, such as: JPG, CSS, SWF, JS, GIF, and PNG, as well as other indicators like the previous panel. In addition, static files can be added to the configuration file.
  • 404 or file not found : The displayed content is similar to the previous panel, but its data includes all pages not found, and the commonly known 404 status code.
  • Host : This panel displays detailed information about the host itself. It is a good way to find malicious crawlers and identify who is eating your bandwidth.
    The extended panel will show you more information, such as the host's reverse DNS resolution results, and the host country and city. If the parameter is enabled, select the IP address you want to view and press Enter, the UserAgent list will be displayed.
  • Operating System : This panel will display information about the operating system used by the host. GoAccess will try its best to provide detailed information for each operating system.
  • Browser : This panel will display the browser information used by the visiting host. GoAccess will try its best to provide detailed information for each browser.
  • Visits : This panel reports by the hour. Therefore, 24 data points will be displayed, each corresponding to an hour of each day.
    The --hour-spec=min parameter can be set to report every ten minutes, and the time will be displayed in 16:4 format. This is helpful for discovering the peak access period of the server.
  • Virtual Host : This panel will display the status of different virtual hosts parsed from the access log. This panel is only displayed when the %v parameter is enabled in the log format.
  • Origin URL : If the host in question accesses your site through other resources, and through links from other hosts or redirects to your site, these origin URLs will be displayed in this panel. In the configuration file you can --ignore-panelturn this feature on. (Default off)
  • Incoming site : This panel will only display part of the host instead of the full URL.
  • Keywords : The report supports the use of keywords in Google search, Google cache, and Google translation. Currently only Google search via HTTP is supported. In the configuration file you can --ignore-panelturn this feature on. (Default off)
  • Geographic location : Determine geographic location based on IP address. The statistics are grouped by continent and country. Need the support of the geolocation module.
  • HTTP status code : The status code of the HTTP request expressed as a number.
  • The remote user (HTTP authentication) determines the authority to access the document through HTTP authentication. If the document is not protected by a password, this part will be displayed as "-". This panel is turned on by default, unless the parameter %e is set in the log format variable.

GoAccess use

Install goaccess

[root@VM_0_26_centos logs]# yum install goaccess
Loaded plugins: fastestmirror, langpacks
Repository epel is listed more than once in the configuration
epel                                                                | 4.7 kB  00:00:00     
extras                                                              | 2.9 kB  00:00:00     
nux-dextop                                                          | 2.9 kB  00:00:00     
os                                                                  | 3.6 kB  00:00:00     
rpmfusion-free-updates                                              | 3.7 kB  00:00:00     
rpmfusion-nonfree-updates                                           | 3.7 kB  00:00:00     
updates                                                             | 2.9 kB  00:00:00     
zabbix                                                              | 2.9 kB  00:00:00     
zabbix-non-supported                                                |  951 B  00:00:00     
(1/2): epel/7/x86_64/updateinfo                                     | 1.0 MB  00:00:00     
(2/2): epel/7/x86_64/primary_db                                     | 6.9 MB  00:00:02     
Loading mirror speeds from cached hostfile
 * nux-dextop: mirror.li.nux.ro
 * rpmfusion-free-updates: mirrors.ustc.edu.cn
 * rpmfusion-nonfree-updates: mirrors.ustc.edu.cn
Resolving Dependencies
--> Running transaction check
---> Package goaccess.x86_64 0:1.3-1.el7 will be installed
--> Processing Dependency: libtokyocabinet.so.9()(64bit) for package: goaccess-1.3-1.el7.x86_64
--> Running transaction check
---> Package tokyocabinet.x86_64 0:1.4.48-3.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

===========================================================================================
 Package                 Arch              Version                   Repository       Size
===========================================================================================
Installing:
 goaccess                x86_64            1.3-1.el7                 epel            240 k
Installing for dependencies:
 tokyocabinet            x86_64            1.4.48-3.el7              os              459 k

Transaction Summary
===========================================================================================
Install  1 Package (+1 Dependent package)

Total download size: 699 k
Installed size: 2.0 M
Is this ok [y/d/N]: y
Downloading packages:
(1/2): goaccess-1.3-1.el7.x86_64.rpm                                | 240 kB  00:00:00     
(2/2): tokyocabinet-1.4.48-3.el7.x86_64.rpm                         | 459 kB  00:00:00     
-------------------------------------------------------------------------------------------
Total                                                      1.3 MB/s | 699 kB  00:00:00     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : tokyocabinet-1.4.48-3.el7.x86_64                                        1/2 
  Installing : goaccess-1.3-1.el7.x86_64                                               2/2 
  Verifying  : tokyocabinet-1.4.48-3.el7.x86_64                                        1/2 
  Verifying  : goaccess-1.3-1.el7.x86_64                                               2/2 

Installed:
  goaccess.x86_64 0:1.3-1.el7                                                              

Dependency Installed:
  tokyocabinet.x86_64 0:1.4.48-3.el7 

See how to use

[root@VM_0_26_centos logs]# goaccess -help

GoAccess - 1.3

Usage: goaccess [filename] [ options ... ] [-c][-M][-H][-S][-q][-d][...]
The following options can also be supplied to the command:

Log & Date Format Options

  --date-format=<dateformat>      - Specify log date format. e.g., %d/%b/%Y
  --log-format=<logformat>        - Specify log format. Inner quotes need to be
                                    escaped, or use single quotes.
  --time-format=<timeformat>      - Specify log time format. e.g., %H:%M:%S

User Interface Options

  -c --config-dialog              - Prompt log/date/time configuration window.
  -i --hl-header                  - Color highlight active panel.
  -m --with-mouse                 - Enable mouse support on main dashboard.
  --color=<fg:bg[attrs, PANEL]>   - Specify custom colors. See manpage for more
                                    details and options.
  --color-scheme=<1|2|3>          - Schemes: 1 => Grey, 2 => Green, 3 => Monokai.
  --html-custom-css=<path.css>    - Specify a custom CSS file in the HTML report.
  --html-custom-js=<path.js>      - Specify a custom JS file in the HTML report.
  --html-prefs=<json_obj>         - Set default HTML report preferences.
  --html-report-title=<title>     - Set HTML report page title and header.
  --json-pretty-print             - Format JSON output w/ tabs & newlines.
  --max-items                     - Maximum number of items to show per panel.
                                    See man page for limits.
  --no-color                      - Disable colored output.
  --no-column-names               - Don't write column names in term output.
  --no-csv-summary                - Disable summary metrics on the CSV output.
  --no-html-last-updated          - Hide HTML last updated field.
  --no-parsing-spinner            - Disable progress metrics and parsing spinner.
  --no-progress                   - Disable progress metrics.
  --no-tab-scroll                 - Disable scrolling through panels on TAB.

Server Options

  --addr=<addr>                   - Specify IP address to bind server to.
  --daemonize                     - Run as daemon (if --real-time-html enabled).
  --fifo-in=<path>                - Path to read named pipe (FIFO).
  --fifo-out=<path>               - Path to write named pipe (FIFO).
  --origin=<addr>                 - Ensure clients send the specified origin header
                                    upon the WebSocket handshake.
  --pid-file=<path>               - Write PID to a file when --daemonize is used.
  --port=<port>                   - Specify the port to use.
  --real-time-html                - Enable real-time HTML output.
  --ssl-cert=<cert.crt>           - Path to TLS/SSL certificate.
  --ssl-key=<priv.key>            - Path to TLS/SSL private key.
  --ws-url=<url>                  - URL to which the WebSocket server responds.

File Options

  -                               - The log file to parse is read from stdin.
  -f --log-file=<filename>        - Path to input log file.
  -S --log-size=<number>          - Specify the log size, useful when piping in logs.
  -l --debug-file=<filename>      - Send all debug messages to the specified
                                    file.
  -p --config-file=<filename>     - Custom configuration file.
  --invalid-requests=<filename>   - Log invalid requests to the specified file.
  --no-global-config              - Don't load global configuration file.

Parse Options

  -a --agent-list                 - Enable a list of user-agents by host.
  -b --browsers-file=<path>       - Use additional custom list of browsers.
  -d --with-output-resolver       - Enable IP resolver on HTML|JSON output.
  -e --exclude-ip=<IP>            - Exclude one or multiple IPv4/6. Allows IP
                                    ranges e.g. 192.168.0.1-192.168.0.10
  -H --http-protocol=<yes|no>     - Set/unset HTTP request protocol if found.
  -M --http-method=<yes|no>       - Set/unset HTTP request method if found.
  -o --output=file.html|json|csv  - Output either an HTML, JSON or a CSV file.
  -q --no-query-string            - Ignore request's query string. Removing the
                                    query string can greatly decrease memory
                                    consumption.
  -r --no-term-resolver           - Disable IP resolver on terminal output.
  --444-as-404                    - Treat non-standard status code 444 as 404.
  --4xx-to-unique-count           - Add 4xx client errors to the unique visitors
                                    count.
  --anonymize-ip                  - Anonymize IP addresses before outputting to report.
  --all-static-files              - Include static files with a query string.
  --crawlers-only                 - Parse and display only crawlers.
  --date-spec=<date|hr>           - Date specificity. Possible values: `date`
                                    (default), or `hr`.
  --double-decode                 - Decode double-encoded values.
  --enable-panel=<PANEL>          - Enable parsing/displaying the given panel.
  --hide-referer=<NEEDLE>         - Hide a referer but still count it. Wild cards
                                    are allowed. i.e., *.bing.com
  --hour-spec=<hr|min>            - Hour specificity. Possible values: `hr`
                                    (default), or `min` (tenth of a min).
  --ignore-crawlers               - Ignore crawlers.
  --ignore-panel=<PANEL>          - Ignore parsing/displaying the given panel.
  --ignore-referer=<NEEDLE>       - Ignore a referer from being counted. Wild cards
                                    are allowed. i.e., *.bing.com
  --ignore-statics=<req|panel>    - Ignore static requests.
                                    req => Ignore from valid requests.
                                    panel => Ignore from valid requests and panels.
  --ignore-status=<CODE>          - Ignore parsing the given status code.
  --num-tests=<number>            - Number of lines to test. >= 0 (10 default)
  --process-and-exit              - Parse log and exit without outputting data.
  --real-os                       - Display real OS names. e.g, Windows XP, Snow
                                    Leopard.
  --sort-panel=PANEL,METRIC,ORDER - Sort panel on initial load. For example:
                                    --sort-panel=VISITORS,BY_HITS,ASC. See
                                    manpage for a list of panels/fields.
  --static-file=<extension>       - Add static file extension. e.g.: .mp3.
                                    Extensions are case sensitive.

GeoIP Options

  -g --std-geoip                  - Standard GeoIP database for less memory
                                    consumption.
  --geoip-database=<path>         - Specify path to GeoIP database file. i.e.,
                                    GeoLiteCity.dat, GeoIPv6.dat ...

Other Options

  -h --help                       - This help.
  -V --version                    - Display version information and exit.
  -s --storage                    - Display current storage method. e.g., B+
                                    Tree, Hash.
  --dcf                           - Display the path of the default config
                                    file when `-p` is not used.

Examples can be found by running `man goaccess`.

For more details visit: http://goaccess.io
GoAccess Copyright (C) 2009-2017 by Gerardo Orellana

Get the Nginx log format
conversion script at https://github.com/stockrt/nginx2goaccess/blob/master/nginx2goaccess.sh, the specific content is as follows

[root@VM_0_26_centos logs]# cat nginx2goaccess.sh 
#!/bin/bash
#
# Convert from this:
#   http://nginx.org/en/docs/http/ngx_http_log_module.html
# To this:
#   https://goaccess.io/man
#
# Conversion table:
#   $time_local         %d:%t %^
#   $host               %v
#   $http_host          %v
#   $remote_addr        %h
#   $request_time       %T
#   $request_method     %m
#   $request_uri        %U
#   $server_protocol    %H
#   $request            %r
#   $status             %s
#   $body_bytes_sent    %b
#   $bytes_sent         %b
#   $http_referer       %R
#   $http_user_agent    %u
#
# Samples:
#
# log_format combined '$remote_addr - $remote_user [$time_local] '
# '"$request" $status $body_bytes_sent '
# '"$http_referer" "$http_user_agent"';
#   ./nginx2goaccess.sh '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'
#
# log_format compression '$remote_addr - $remote_user [$time_local] '
# '"$request" $status $bytes_sent '
# '"$http_referer" "$http_user_agent" "$gzip_ratio"';
#   ./nginx2goaccess.sh '$remote_addr - $remote_user [$time_local] "$request" $status $bytes_sent "$http_referer" "$http_user_agent" "$gzip_ratio"'
#
# log_format main
# '$remote_addr\t$time_local\t$host\t$request\t$http_referer\t$http_x_mobile_group\t'
# 'Local:\t$status\t$body_bytes_sent\t$request_time\t'
# 'Proxy:\t$upstream_cache_status\t$upstream_status\t$upstream_response_length\t$upstream_response_time\t'
# 'Agent:\t$http_user_agent\t'
# 'Fwd:\t$http_x_forwarded_for';
#   ./nginx2goaccess.sh '$remote_addr\t$time_local\t$host\t$request\t$http_referer\t$http_x_mobile_group\tLocal:\t$status\t$body_bytes_sent\t$request_time\tProxy:\t$upstream_cache_status\t$upstream_status\t$upstream_response_length\t$upstream_response_time\tAgent:\t$http_user_agent\tFwd:\t$http_x_forwarded_for'
#
# log_format main
# '${time_local}\t${remote_addr}\t${host}\t${request_method}\t${request_uri}\t${server_protocol}\t'
# '${http_referer}\t${http_x_mobile_group}\t'
# 'Local:\t${status}\t*${connection}\t${body_bytes_sent}\t${request_time}\t'
# 'Proxy:\t${upstream_status}\t${upstream_cache_status}\t'
# '${upstream_response_length}\t${upstream_response_time}\t${uri}${log_args}\t'
# 'Agent:\t${http_user_agent}\t'
# 'Fwd:\t${http_x_forwarded_for}';
#   ./nginx2goaccess.sh '${time_local}\t${remote_addr}\t${host}\t${request_method}\t${request_uri}\t${server_protocol}\t${http_referer}\t${http_x_mobile_group}\tLocal:\t${status}\t*${connection}\t${body_bytes_sent}\t${request_time}\tProxy:\t${upstream_status}\t${upstream_cache_status}\t${upstream_response_length}\t${upstream_response_time}\t${uri}${log_args}\tAgent:\t${http_user_agent}\tFwd:\t${http_x_forwarded_for}'
#
# Author: Rogério Carvalho Schneider <[email protected]>

# Params
log_format="$1"

# Usage
if [[ -z "$log_format" ]]; then
    echo "Usage: $0 '<log_format>'"
    exit 1
fi

# Variables map
conversion_table="time_local,%d:%t_%^
host,%v
http_host,%v
remote_addr,%h
request_time,%T
request_method,%m
request_uri,%U
server_protocol,%H
request,%r
status,%s
body_bytes_sent,%b
bytes_sent,%b
http_referer,%R
http_user_agent,%u"

# Conversion
for item in $conversion_table; do
    nginx_var=${item%%,*}
    goaccess_var=${item##*,}
    goaccess_var=${goaccess_var//_/ }
    log_format=${log_format//\$\{$nginx_var\}/$goaccess_var}
    log_format=${log_format//\$$nginx_var/$goaccess_var}
done
log_format=$(echo "$log_format" | sed 's/${[a-z_]*}/%^/g')
log_format=$(echo "$log_format" | sed 's/$[a-z_]*/%^/g')

# Config output
echo "
- Generated goaccess config:
time-format %T
date-format %d/%b/%Y
log_format $log_format
"

# EOF

Note that the log_format of the nginx configuration file is as follows, and the following conversion needs to be consistent with the actual situation

      log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                        '$status $upstream_addr $body_bytes_sent "$http_referer" '
                        '"$http_user_agent" "$http_x_forwarded_for"';

Get log format

[root@VM_0_26_centos logs]# sh nginx2goaccess.sh '$remote_addr - $remote_user [$time_local] "$request" $status $upstream_addr $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for"'

- Generated goaccess config:
time-format %T
date-format %d/%b/%Y
log_format %h - %^ [%d:%t %^] "%r" %s %^ %b "%R" "%u" "%^"

Set log format

[root@VM_0_26_centos logs]# cat /etc/goaccess/goaccess.conf 
time-format %T
date-format %d/%b/%Y
log_format %h - %^ [%d:%t %^] "%r" %s %^ %b "%R" "%u" "%^"

Generate analysis report

[root@VM_0_26_centos logs]# goaccess -f ./nginx_access.log -p ./nginxlog.conf -o day-report.html
[root@VM_0_26_centos logs]# ls
day-report.html      nginx_access.log             nginx2goaccess.sh           nginxlog.conf

View report performance
browser open day-report.html, the effect is as follows
Use goaccess to analyze nginx logs

Guess you like

Origin blog.51cto.com/jerrymin/2535193