Introduction to GoAccess
GoAccess is an open source (MIT license) real-time web log analysis tool with an interactive view interface, which can be accessed through your web browser or a terminal program under the *nix system.
It can provide system administrators with fast and valuable HTTP statistics and present them in the form of online visualization servers. GoAccess parses the specified Web log file and outputs the statistical results to the X terminal. The functions are as follows:
- General Statistics : This panel displays several main indicators, such as: the number of valid and invalid requests, the time spent analyzing these data, the status of independent visitors, the requested files, the integrity of static files (CSS, ICO, JPG, etc.) URL, 404 error, the size of the parsed log file, and the bandwidth consumed.
- Unique visitors : This panel displays the number of visits, the number of unique visitors, and accumulated bandwidth consumption and other indicators by date. HTTP requests of the same UserAgent with the same IP and the same access time will be recognized as independent visitors. Web crawlers are included by default.
You can also choose to use the --date-spec=hr parameter to modify the analysis by date to hour, for example: 05/Jun/2016:16. This is very helpful if you want to track daily traffic at the hourly level. - Requested files : This panel displays the most requested files on your server. Including the number of visits, the number of unique visitors, the percentage, the cumulative bandwidth consumption, the protocol used, and the request method.
- Requested static files : List the most frequently requested static file types, such as: JPG, CSS, SWF, JS, GIF, and PNG, as well as other indicators like the previous panel. In addition, static files can be added to the configuration file.
- 404 or file not found : The displayed content is similar to the previous panel, but its data includes all pages not found, and the commonly known 404 status code.
- Host : This panel displays detailed information about the host itself. It is a good way to find malicious crawlers and identify who is eating your bandwidth.
The extended panel will show you more information, such as the host's reverse DNS resolution results, and the host country and city. If the parameter is enabled, select the IP address you want to view and press Enter, the UserAgent list will be displayed. - Operating System : This panel will display information about the operating system used by the host. GoAccess will try its best to provide detailed information for each operating system.
- Browser : This panel will display the browser information used by the visiting host. GoAccess will try its best to provide detailed information for each browser.
- Visits : This panel reports by the hour. Therefore, 24 data points will be displayed, each corresponding to an hour of each day.
The --hour-spec=min parameter can be set to report every ten minutes, and the time will be displayed in 16:4 format. This is helpful for discovering the peak access period of the server. - Virtual Host : This panel will display the status of different virtual hosts parsed from the access log. This panel is only displayed when the %v parameter is enabled in the log format.
- Origin URL : If the host in question accesses your site through other resources, and through links from other hosts or redirects to your site, these origin URLs will be displayed in this panel. In the configuration file you can
--ignore-panel
turn this feature on. (Default off) - Incoming site : This panel will only display part of the host instead of the full URL.
- Keywords : The report supports the use of keywords in Google search, Google cache, and Google translation. Currently only Google search via HTTP is supported. In the configuration file you can
--ignore-panel
turn this feature on. (Default off) - Geographic location : Determine geographic location based on IP address. The statistics are grouped by continent and country. Need the support of the geolocation module.
- HTTP status code : The status code of the HTTP request expressed as a number.
- The remote user (HTTP authentication) determines the authority to access the document through HTTP authentication. If the document is not protected by a password, this part will be displayed as "-". This panel is turned on by default, unless the parameter %e is set in the log format variable.
GoAccess use
Install goaccess
[root@VM_0_26_centos logs]# yum install goaccess
Loaded plugins: fastestmirror, langpacks
Repository epel is listed more than once in the configuration
epel | 4.7 kB 00:00:00
extras | 2.9 kB 00:00:00
nux-dextop | 2.9 kB 00:00:00
os | 3.6 kB 00:00:00
rpmfusion-free-updates | 3.7 kB 00:00:00
rpmfusion-nonfree-updates | 3.7 kB 00:00:00
updates | 2.9 kB 00:00:00
zabbix | 2.9 kB 00:00:00
zabbix-non-supported | 951 B 00:00:00
(1/2): epel/7/x86_64/updateinfo | 1.0 MB 00:00:00
(2/2): epel/7/x86_64/primary_db | 6.9 MB 00:00:02
Loading mirror speeds from cached hostfile
* nux-dextop: mirror.li.nux.ro
* rpmfusion-free-updates: mirrors.ustc.edu.cn
* rpmfusion-nonfree-updates: mirrors.ustc.edu.cn
Resolving Dependencies
--> Running transaction check
---> Package goaccess.x86_64 0:1.3-1.el7 will be installed
--> Processing Dependency: libtokyocabinet.so.9()(64bit) for package: goaccess-1.3-1.el7.x86_64
--> Running transaction check
---> Package tokyocabinet.x86_64 0:1.4.48-3.el7 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
===========================================================================================
Package Arch Version Repository Size
===========================================================================================
Installing:
goaccess x86_64 1.3-1.el7 epel 240 k
Installing for dependencies:
tokyocabinet x86_64 1.4.48-3.el7 os 459 k
Transaction Summary
===========================================================================================
Install 1 Package (+1 Dependent package)
Total download size: 699 k
Installed size: 2.0 M
Is this ok [y/d/N]: y
Downloading packages:
(1/2): goaccess-1.3-1.el7.x86_64.rpm | 240 kB 00:00:00
(2/2): tokyocabinet-1.4.48-3.el7.x86_64.rpm | 459 kB 00:00:00
-------------------------------------------------------------------------------------------
Total 1.3 MB/s | 699 kB 00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : tokyocabinet-1.4.48-3.el7.x86_64 1/2
Installing : goaccess-1.3-1.el7.x86_64 2/2
Verifying : tokyocabinet-1.4.48-3.el7.x86_64 1/2
Verifying : goaccess-1.3-1.el7.x86_64 2/2
Installed:
goaccess.x86_64 0:1.3-1.el7
Dependency Installed:
tokyocabinet.x86_64 0:1.4.48-3.el7
See how to use
[root@VM_0_26_centos logs]# goaccess -help
GoAccess - 1.3
Usage: goaccess [filename] [ options ... ] [-c][-M][-H][-S][-q][-d][...]
The following options can also be supplied to the command:
Log & Date Format Options
--date-format=<dateformat> - Specify log date format. e.g., %d/%b/%Y
--log-format=<logformat> - Specify log format. Inner quotes need to be
escaped, or use single quotes.
--time-format=<timeformat> - Specify log time format. e.g., %H:%M:%S
User Interface Options
-c --config-dialog - Prompt log/date/time configuration window.
-i --hl-header - Color highlight active panel.
-m --with-mouse - Enable mouse support on main dashboard.
--color=<fg:bg[attrs, PANEL]> - Specify custom colors. See manpage for more
details and options.
--color-scheme=<1|2|3> - Schemes: 1 => Grey, 2 => Green, 3 => Monokai.
--html-custom-css=<path.css> - Specify a custom CSS file in the HTML report.
--html-custom-js=<path.js> - Specify a custom JS file in the HTML report.
--html-prefs=<json_obj> - Set default HTML report preferences.
--html-report-title=<title> - Set HTML report page title and header.
--json-pretty-print - Format JSON output w/ tabs & newlines.
--max-items - Maximum number of items to show per panel.
See man page for limits.
--no-color - Disable colored output.
--no-column-names - Don't write column names in term output.
--no-csv-summary - Disable summary metrics on the CSV output.
--no-html-last-updated - Hide HTML last updated field.
--no-parsing-spinner - Disable progress metrics and parsing spinner.
--no-progress - Disable progress metrics.
--no-tab-scroll - Disable scrolling through panels on TAB.
Server Options
--addr=<addr> - Specify IP address to bind server to.
--daemonize - Run as daemon (if --real-time-html enabled).
--fifo-in=<path> - Path to read named pipe (FIFO).
--fifo-out=<path> - Path to write named pipe (FIFO).
--origin=<addr> - Ensure clients send the specified origin header
upon the WebSocket handshake.
--pid-file=<path> - Write PID to a file when --daemonize is used.
--port=<port> - Specify the port to use.
--real-time-html - Enable real-time HTML output.
--ssl-cert=<cert.crt> - Path to TLS/SSL certificate.
--ssl-key=<priv.key> - Path to TLS/SSL private key.
--ws-url=<url> - URL to which the WebSocket server responds.
File Options
- - The log file to parse is read from stdin.
-f --log-file=<filename> - Path to input log file.
-S --log-size=<number> - Specify the log size, useful when piping in logs.
-l --debug-file=<filename> - Send all debug messages to the specified
file.
-p --config-file=<filename> - Custom configuration file.
--invalid-requests=<filename> - Log invalid requests to the specified file.
--no-global-config - Don't load global configuration file.
Parse Options
-a --agent-list - Enable a list of user-agents by host.
-b --browsers-file=<path> - Use additional custom list of browsers.
-d --with-output-resolver - Enable IP resolver on HTML|JSON output.
-e --exclude-ip=<IP> - Exclude one or multiple IPv4/6. Allows IP
ranges e.g. 192.168.0.1-192.168.0.10
-H --http-protocol=<yes|no> - Set/unset HTTP request protocol if found.
-M --http-method=<yes|no> - Set/unset HTTP request method if found.
-o --output=file.html|json|csv - Output either an HTML, JSON or a CSV file.
-q --no-query-string - Ignore request's query string. Removing the
query string can greatly decrease memory
consumption.
-r --no-term-resolver - Disable IP resolver on terminal output.
--444-as-404 - Treat non-standard status code 444 as 404.
--4xx-to-unique-count - Add 4xx client errors to the unique visitors
count.
--anonymize-ip - Anonymize IP addresses before outputting to report.
--all-static-files - Include static files with a query string.
--crawlers-only - Parse and display only crawlers.
--date-spec=<date|hr> - Date specificity. Possible values: `date`
(default), or `hr`.
--double-decode - Decode double-encoded values.
--enable-panel=<PANEL> - Enable parsing/displaying the given panel.
--hide-referer=<NEEDLE> - Hide a referer but still count it. Wild cards
are allowed. i.e., *.bing.com
--hour-spec=<hr|min> - Hour specificity. Possible values: `hr`
(default), or `min` (tenth of a min).
--ignore-crawlers - Ignore crawlers.
--ignore-panel=<PANEL> - Ignore parsing/displaying the given panel.
--ignore-referer=<NEEDLE> - Ignore a referer from being counted. Wild cards
are allowed. i.e., *.bing.com
--ignore-statics=<req|panel> - Ignore static requests.
req => Ignore from valid requests.
panel => Ignore from valid requests and panels.
--ignore-status=<CODE> - Ignore parsing the given status code.
--num-tests=<number> - Number of lines to test. >= 0 (10 default)
--process-and-exit - Parse log and exit without outputting data.
--real-os - Display real OS names. e.g, Windows XP, Snow
Leopard.
--sort-panel=PANEL,METRIC,ORDER - Sort panel on initial load. For example:
--sort-panel=VISITORS,BY_HITS,ASC. See
manpage for a list of panels/fields.
--static-file=<extension> - Add static file extension. e.g.: .mp3.
Extensions are case sensitive.
GeoIP Options
-g --std-geoip - Standard GeoIP database for less memory
consumption.
--geoip-database=<path> - Specify path to GeoIP database file. i.e.,
GeoLiteCity.dat, GeoIPv6.dat ...
Other Options
-h --help - This help.
-V --version - Display version information and exit.
-s --storage - Display current storage method. e.g., B+
Tree, Hash.
--dcf - Display the path of the default config
file when `-p` is not used.
Examples can be found by running `man goaccess`.
For more details visit: http://goaccess.io
GoAccess Copyright (C) 2009-2017 by Gerardo Orellana
Get the Nginx log format
conversion script at https://github.com/stockrt/nginx2goaccess/blob/master/nginx2goaccess.sh, the specific content is as follows
[root@VM_0_26_centos logs]# cat nginx2goaccess.sh
#!/bin/bash
#
# Convert from this:
# http://nginx.org/en/docs/http/ngx_http_log_module.html
# To this:
# https://goaccess.io/man
#
# Conversion table:
# $time_local %d:%t %^
# $host %v
# $http_host %v
# $remote_addr %h
# $request_time %T
# $request_method %m
# $request_uri %U
# $server_protocol %H
# $request %r
# $status %s
# $body_bytes_sent %b
# $bytes_sent %b
# $http_referer %R
# $http_user_agent %u
#
# Samples:
#
# log_format combined '$remote_addr - $remote_user [$time_local] '
# '"$request" $status $body_bytes_sent '
# '"$http_referer" "$http_user_agent"';
# ./nginx2goaccess.sh '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'
#
# log_format compression '$remote_addr - $remote_user [$time_local] '
# '"$request" $status $bytes_sent '
# '"$http_referer" "$http_user_agent" "$gzip_ratio"';
# ./nginx2goaccess.sh '$remote_addr - $remote_user [$time_local] "$request" $status $bytes_sent "$http_referer" "$http_user_agent" "$gzip_ratio"'
#
# log_format main
# '$remote_addr\t$time_local\t$host\t$request\t$http_referer\t$http_x_mobile_group\t'
# 'Local:\t$status\t$body_bytes_sent\t$request_time\t'
# 'Proxy:\t$upstream_cache_status\t$upstream_status\t$upstream_response_length\t$upstream_response_time\t'
# 'Agent:\t$http_user_agent\t'
# 'Fwd:\t$http_x_forwarded_for';
# ./nginx2goaccess.sh '$remote_addr\t$time_local\t$host\t$request\t$http_referer\t$http_x_mobile_group\tLocal:\t$status\t$body_bytes_sent\t$request_time\tProxy:\t$upstream_cache_status\t$upstream_status\t$upstream_response_length\t$upstream_response_time\tAgent:\t$http_user_agent\tFwd:\t$http_x_forwarded_for'
#
# log_format main
# '${time_local}\t${remote_addr}\t${host}\t${request_method}\t${request_uri}\t${server_protocol}\t'
# '${http_referer}\t${http_x_mobile_group}\t'
# 'Local:\t${status}\t*${connection}\t${body_bytes_sent}\t${request_time}\t'
# 'Proxy:\t${upstream_status}\t${upstream_cache_status}\t'
# '${upstream_response_length}\t${upstream_response_time}\t${uri}${log_args}\t'
# 'Agent:\t${http_user_agent}\t'
# 'Fwd:\t${http_x_forwarded_for}';
# ./nginx2goaccess.sh '${time_local}\t${remote_addr}\t${host}\t${request_method}\t${request_uri}\t${server_protocol}\t${http_referer}\t${http_x_mobile_group}\tLocal:\t${status}\t*${connection}\t${body_bytes_sent}\t${request_time}\tProxy:\t${upstream_status}\t${upstream_cache_status}\t${upstream_response_length}\t${upstream_response_time}\t${uri}${log_args}\tAgent:\t${http_user_agent}\tFwd:\t${http_x_forwarded_for}'
#
# Author: Rogério Carvalho Schneider <[email protected]>
# Params
log_format="$1"
# Usage
if [[ -z "$log_format" ]]; then
echo "Usage: $0 '<log_format>'"
exit 1
fi
# Variables map
conversion_table="time_local,%d:%t_%^
host,%v
http_host,%v
remote_addr,%h
request_time,%T
request_method,%m
request_uri,%U
server_protocol,%H
request,%r
status,%s
body_bytes_sent,%b
bytes_sent,%b
http_referer,%R
http_user_agent,%u"
# Conversion
for item in $conversion_table; do
nginx_var=${item%%,*}
goaccess_var=${item##*,}
goaccess_var=${goaccess_var//_/ }
log_format=${log_format//\$\{$nginx_var\}/$goaccess_var}
log_format=${log_format//\$$nginx_var/$goaccess_var}
done
log_format=$(echo "$log_format" | sed 's/${[a-z_]*}/%^/g')
log_format=$(echo "$log_format" | sed 's/$[a-z_]*/%^/g')
# Config output
echo "
- Generated goaccess config:
time-format %T
date-format %d/%b/%Y
log_format $log_format
"
# EOF
Note that the log_format of the nginx configuration file is as follows, and the following conversion needs to be consistent with the actual situation
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $upstream_addr $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
Get log format
[root@VM_0_26_centos logs]# sh nginx2goaccess.sh '$remote_addr - $remote_user [$time_local] "$request" $status $upstream_addr $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for"'
- Generated goaccess config:
time-format %T
date-format %d/%b/%Y
log_format %h - %^ [%d:%t %^] "%r" %s %^ %b "%R" "%u" "%^"
Set log format
[root@VM_0_26_centos logs]# cat /etc/goaccess/goaccess.conf
time-format %T
date-format %d/%b/%Y
log_format %h - %^ [%d:%t %^] "%r" %s %^ %b "%R" "%u" "%^"
Generate analysis report
[root@VM_0_26_centos logs]# goaccess -f ./nginx_access.log -p ./nginxlog.conf -o day-report.html
[root@VM_0_26_centos logs]# ls
day-report.html nginx_access.log nginx2goaccess.sh nginxlog.conf
View report performance
browser open day-report.html, the effect is as follows