1. View the logstash pipeline configuration file logstash.conf
cd ~/elk/logstash/pipeline/
cat logstash.conf
The default configuration file input is beat;
beat means Beats, the core component in ELK Stack;
Beats refers to a lightweight data collector, which is a collective name for a series of beats; the current beats on the official website are:
Filebeat's
lightweight collector for collecting logs and other data
Metricbeat
lightweight indicator data collector
Packetbeat
Lightweight Network Data Collector
Winlogbeat
lightweight Windows event log collector
Auditbeat
lightweight audit data collector
Heartbeat
is a lightweight collector for running status monitoring
. The beat used to build the log system is filebeat; for filebeat to connect to logstash, there are related configurations in
docker to build a simple elk log system 4. The
default configuration file output is stdout;
stdout (standard Output), that is, the data sent by beat will be printed to the console by default
2. Modify the default output of logstash
The logs collected by filebeat are sent to logstash. After logstash is processed, they must be sent to elasticsearch to make the logs persistent to the disk, which is convenient for subsequent viewing and analysis; vim ~/elk
/logstash/pipeline/logstash.conf
is added in output
elasticsearch {
hosts => ["https://192.168.182.128:9200"]
index => "%{[fields][env]}-%{[fields][application]}-%{+YYYY.MM.dd}"
cacert => "/usr/share/logstash/config/certs/http_ca.crt"
user => "elastic"
password => "+4TMJBgOpjdgH+1MJ0nC"
}
Logstash.yml related configuration hosts: the address of elasticsearch; use https; the same as the configuration index in
logstash.yml : index, all data stored in elasticsearch needs to set the index ([fields][env] and [fields][application] are
A custom field added in filebeat)
cacert: elasticsearch certificate; same configuration user in logstash.yml
: elasticsearch account; same configuration password in logstash.yml
: password corresponding to the above user; same configuration in logstash.yml
3. Restart logstash
docker restart logstash
4. Generate logs for filebeat to collect
An error log is generated here
View the logstash log
docker logs -f --tail 200 logstash
5. Log in to kibana to see if the index is generated
An index named dev-test-2023.04.18 is generated here;
this index is generated by the index configuration in logstash.conf
index => "%{[fields][env]}-%{[fields][application]}-%{+YYYY.MM.dd}"
{[fields][env]} and %{[fields][application]} indicate to take the fileds.env and fileds.application fields in the json data respectively; fileds.env and fileds.application are set in the filebeat configuration
%
{ +YYYY.MM.dd} means to take the current date; dynamically taking the current date in the index name means that the log generated every day corresponds to a separate index in elasticsearch
6. View logs in kiban
The view name uses dev-test-* to match all indexes starting with dev-test-. The
document here corresponds to a log generated.
The fields in this document may have many fields that we don’t care about besides the log itself, which
can be filtered on the left Figure out the field you want to display
Find the message field and add it.
The content displayed at this time is the generated log
. The generated log is an error log with a lot of content; it is not displayed here but folded; set the log line width adaptive
7. Add action configuration to logstash.conf output configuration
If you configure action => "create"; the data stream will be generated, and the backup index under the data stream is the real index, similar to the alias of the index; setting the data stream is mainly to facilitate the setting of the life cycle of the index, and the log will be continuously generated. However, disks do have limited resources, so it is necessary to set a life cycle for the index, automatically delete those old logs, and make room for new logs; how to set the index life cycle will be discussed later
The data stream naming format is generally "logs-application name-environment"
to modify the configuration file
input {
beats {
port => 5044
}
}
output {
elasticsearch {
hosts => ["https://192.168.182.128:9200"]
#index => "%{[fields][env]}-%{[fields][application]}-%{+YYYY.MM.dd}"
index =>"logs-%{[fields][application]}-%{[fields][env]}"
action => "create"
cacert => "/usr/share/logstash/config/certs/http_ca.crt"
user => "elastic"
password => "+4TMJBgOpjdgH+1MJ0nC"
}
stdout {
codec => rubydebug
}
}
8. Restart logstash, generate new logs, and view logstash logs
docker restart logstash
docker logs -f logstash
Program log output
logstash log
data flow appears in elasticsearch
9. Create a data view according to the previous method
Looking at the view through Discover
from the three generated logs, I found a problem: the default sorting here is in reverse order according to the @timestamp field; but according to the order in which the logs are generated, the top two are in positive order, while the top two The log and the third log are arranged in reverse order,
so the log may look confusing; the reason for this phenomenon is that the @timestamp field is the time when filebeat collects logs, but filebeat collects logs not just because the application generates a log Collect one piece, but collect it batch by batch, that is, the top two logs are collected in the same batch, es cannot distinguish the order of the logs in the same batch; and the time in the log is the real time when the log is generated , @timestamp is just the time to collect logs;
so the problem to be solved is to extract the real log generation time from the log and set it into the @timestamp field. To extract the time from the log, you need to split the log
10. Log field splitting
Java logs generally have a fixed format; we can split each log into a fixed number of fields according to the format, just like a relational database, such as the
following log:
2023-04-21 17:09:36.394 ERROR [xcoa,,] 23072 --- [ main] com.zaxxer.hikari.pool.HikariPool : HikariPool-2 - Exception during pool initialization.
2023-04-21 17:09:36.394 => log generation time
ERROR => log level
[xcoa,] => context and other information
23072 => process id
[main] => thread name
com.zaxxer.hikari.pool.HikariPool =>Class name
HikariPool-2 - Exception during pool initialization. =>Log content
For this log, logstash can match according to the following grok expression:
(?m)^%{TIMESTAMP_ISO8601:log-time}\s+%{LOGLEVEL:log-level}\s+\[%{DATA:application}\]\s+%{INT:pid}\s+---\s+\[%{DATA:thread}\]\s+%{DATA:class}\s+:\s+%{GREEDYDATA:log-message}
The split fields are as follows:
log-time:日志产生时间
log-level:日志级别
application :上下文等信息
pid:进程id
thread:线程名
class:类名
log-message.:日志内容
Corresponding to the logstash configuration
input {
beats {
port => 5044
}
}
filter{
grok{
match => {
"message" => "(?m)^%{TIMESTAMP_ISO8601:log-time}\s+%{LOGLEVEL:log-level}\s+\[%{DATA:application}\]\s+%{INT:pid}\s+---\s+\[%{DATA:thread}\]\s+%{DATA:class}\s+:\s+%{GREEDYDATA:log-message}"
}
}
}
output {
elasticsearch {
hosts => ["https://192.168.182.128:9200"]
#index => "%{[fields][env]}-%{[fields][application]}-%{+YYYY.MM.dd}"
index =>"logs-%{[fields][application]}-%{[fields][env]}"
action => "create"
cacert => "/usr/share/logstash/config/certs/http_ca.crt"
user => "elastic"
password => "+4TMJBgOpjdgH+1MJ0nC"
}
stdout {
codec => rubydebug
}
}
11. Log field split effect test
restart logstash
docker restart logstash
Generate logs
to view in kibana (you can already see the split fields in the field list, and you can use these fields to filter later)
12. Logs are sorted by generation time
The log generation time has been split out and placed in the log-time field
, so what needs to be done is to overwrite @timestamp with the content of the log-time field and add
configuration to the filter of logstash.conf:
date {
match => ["log-time", "yyyy-MM-dd HH:mm:ss.SSS", "ISO8601"] #原字段转换成时间类型
locale => "en"
target => [ "@timestamp" ] #目标字段
timezone => "Asia/Shanghai"
}
The configuration here is to convert the log-time into a time type according to the format and then overwrite it to the @timestamp field
If you want to keep the original @timestamp, you can assign the original @timestamp value to a new field:
the following configuration sets @timestamp to a new field collection_time
ruby {
code => "event.set('collection_time', event.get('@timestamp'))"
}
The full configuration is:
input {
beats {
port => 5044
}
}
filter{
grok{
match => {
"message" => "(?m)^%{TIMESTAMP_ISO8601:log-time}\s+%{LOGLEVEL:log-level}\s+\[%{DATA:application}\]\s+%{INT:pid}\s+---\s+\[%{DATA:thread}\]\s+%{DATA:class}\s+:\s+%{GREEDYDATA:log-message}"
}
}
ruby {
code => "event.set('collection_time', event.get('@timestamp'))"
}
date {
match => ["log-time", "yyyy-MM-dd HH:mm:ss.SSS", "ISO8601"]
locale => "en"
target => [ "@timestamp" ]
timezone => "Asia/Shanghai"
}
}
output {
elasticsearch {
hosts => ["https://192.168.182.128:9200"]
#index => "%{[fields][env]}-%{[fields][application]}-%{+YYYY.MM.dd}"
index =>"logs-%{[fields][application]}-%{[fields][env]}"
action => "create"
cacert => "/usr/share/logstash/config/certs/http_ca.crt"
user => "elastic"
password => "+4TMJBgOpjdgH+1MJ0nC"
}
stdout {
codec => rubydebug
}
}
13. Effect test
restart logstash
docker restart logstash
generate log
Check the newly generated logs in kibana:
You can see that the @timestamp values of the newly generated logs are now consistent. Using @timestamp will not cause the problem of log disorder