Docker builds a simple elk log system 5 (logstash pipeline configuration file logstash.conf)

1. View the logstash pipeline configuration file logstash.conf

cd ~/elk/logstash/pipeline/
cat logstash.conf

insert image description here
The default configuration file input is beat;
beat means Beats, the core component in ELK Stack;
Beats refers to a lightweight data collector, which is a collective name for a series of beats; the current beats on the official website are:

Filebeat's
lightweight collector for collecting logs and other data

Metricbeat
lightweight indicator data collector

Packetbeat
Lightweight Network Data Collector

Winlogbeat
lightweight Windows event log collector

Auditbeat
lightweight audit data collector

Heartbeat
is a lightweight collector for running status monitoring
. The beat used to build the log system is filebeat; for filebeat to connect to logstash, there are related configurations in
docker to build a simple elk log system 4. The
default configuration file output is stdout;
stdout (standard Output), that is, the data sent by beat will be printed to the console by default

2. Modify the default output of logstash

The logs collected by filebeat are sent to logstash. After logstash is processed, they must be sent to elasticsearch to make the logs persistent to the disk, which is convenient for subsequent viewing and analysis; vim ~/elk
/logstash/pipeline/logstash.conf
is added in output

  elasticsearch {
    hosts => ["https://192.168.182.128:9200"]
    index => "%{[fields][env]}-%{[fields][application]}-%{+YYYY.MM.dd}"
    cacert => "/usr/share/logstash/config/certs/http_ca.crt"
    user => "elastic"
    password => "+4TMJBgOpjdgH+1MJ0nC"
  }

insert image description here
Logstash.yml related configuration hosts: the address of elasticsearch; use https; the same as the configuration index in
logstash.yml : index, all data stored in elasticsearch needs to set the index ([fields][env] and [fields][application] are
A custom field added in filebeat)
cacert: elasticsearch certificate; same configuration user in logstash.yml
: elasticsearch account; same configuration password in logstash.yml
: password corresponding to the above user; same configuration in logstash.yml

3. Restart logstash

docker restart logstash

insert image description here

4. Generate logs for filebeat to collect

An error log is generated here
insert image description here
View the logstash log

docker logs -f --tail 200  logstash

insert image description here

5. Log in to kibana to see if the index is generated

insert image description here
insert image description here
An index named dev-test-2023.04.18 is generated here;
this index is generated by the index configuration in logstash.conf

index => "%{[fields][env]}-%{[fields][application]}-%{+YYYY.MM.dd}"

{[fields][env]} and %{[fields][application]} indicate to take the fileds.env and fileds.application fields in the json data respectively; fileds.env and fileds.application are set in the filebeat configuration
%
insert image description here
{ +YYYY.MM.dd} means to take the current date; dynamically taking the current date in the index name means that the log generated every day corresponds to a separate index in elasticsearch

6. View logs in kiban

insert image description here
insert image description here

insert image description here
The view name uses dev-test-* to match all indexes starting with dev-test-. The
insert image description here
insert image description here
document here corresponds to a log generated.
The fields in this document may have many fields that we don’t care about besides the log itself, which
can be filtered on the left Figure out the field you want to display
Find the message field and add it.
insert image description here
The content displayed at this time is the generated log
insert image description here
. The generated log is an error log with a lot of content; it is not displayed here but folded; set the log line width adaptive
insert image description here
insert image description here
insert image description here

7. Add action configuration to logstash.conf output configuration

If you configure action => "create"; the data stream will be generated, and the backup index under the data stream is the real index, similar to the alias of the index; setting the data stream is mainly to facilitate the setting of the life cycle of the index, and the log will be continuously generated. However, disks do have limited resources, so it is necessary to set a life cycle for the index, automatically delete those old logs, and make room for new logs; how to set the index life cycle will be discussed later

insert image description here
The data stream naming format is generally "logs-application name-environment"
to modify the configuration file

input {
  beats {
    port => 5044
  }
}

output {
  elasticsearch {
    hosts => ["https://192.168.182.128:9200"]
    #index => "%{[fields][env]}-%{[fields][application]}-%{+YYYY.MM.dd}"
    index =>"logs-%{[fields][application]}-%{[fields][env]}"
    action => "create"
    cacert => "/usr/share/logstash/config/certs/http_ca.crt"
    user => "elastic"
    password => "+4TMJBgOpjdgH+1MJ0nC"
  }

  stdout {
    codec => rubydebug
  }
}

insert image description here

8. Restart logstash, generate new logs, and view logstash logs

docker restart logstash
docker logs -f logstash

Program log output
insert image description here
logstash log
insert image description here
data flow appears in elasticsearch
insert image description here

9. Create a data view according to the previous method

logs-test-dev
Looking at the view through Discover
insert image description here
from the three generated logs, I found a problem: the default sorting here is in reverse order according to the @timestamp field; but according to the order in which the logs are generated, the top two are in positive order, while the top two The log and the third log are arranged in reverse order,
insert image description here
so the log may look confusing; the reason for this phenomenon is that the @timestamp field is the time when filebeat collects logs, but filebeat collects logs not just because the application generates a log Collect one piece, but collect it batch by batch, that is, the top two logs are collected in the same batch, es cannot distinguish the order of the logs in the same batch; and the time in the log is the real time when the log is generated , @timestamp is just the time to collect logs;
so the problem to be solved is to extract the real log generation time from the log and set it into the @timestamp field. To extract the time from the log, you need to split the log

10. Log field splitting

Java logs generally have a fixed format; we can split each log into a fixed number of fields according to the format, just like a relational database, such as the
following log:

2023-04-21 17:09:36.394 ERROR [xcoa,,] 23072 --- [           main] com.zaxxer.hikari.pool.HikariPool        : HikariPool-2 - Exception during pool initialization.

2023-04-21 17:09:36.394 => log generation time
ERROR => log level
[xcoa,] => context and other information
23072 => process id
[main] => thread name
com.zaxxer.hikari.pool.HikariPool =>Class name
HikariPool-2 - Exception during pool initialization. =>Log content
For this log, logstash can match according to the following grok expression:

(?m)^%{TIMESTAMP_ISO8601:log-time}\s+%{LOGLEVEL:log-level}\s+\[%{DATA:application}\]\s+%{INT:pid}\s+---\s+\[%{DATA:thread}\]\s+%{DATA:class}\s+:\s+%{GREEDYDATA:log-message}

The split fields are as follows:

log-time:日志产生时间
log-level:日志级别
application :上下文等信息
pid:进程id
thread:线程名
class:类名
log-message.:日志内容

Corresponding to the logstash configuration

input {
  beats {
    port => 5044
  }
}

filter{
  grok{
    match => {
      "message" => "(?m)^%{TIMESTAMP_ISO8601:log-time}\s+%{LOGLEVEL:log-level}\s+\[%{DATA:application}\]\s+%{INT:pid}\s+---\s+\[%{DATA:thread}\]\s+%{DATA:class}\s+:\s+%{GREEDYDATA:log-message}"
    }
  }
}

output {
  elasticsearch {
    hosts => ["https://192.168.182.128:9200"]
    #index => "%{[fields][env]}-%{[fields][application]}-%{+YYYY.MM.dd}"
    index =>"logs-%{[fields][application]}-%{[fields][env]}"
    action => "create"
    cacert => "/usr/share/logstash/config/certs/http_ca.crt"
    user => "elastic"
    password => "+4TMJBgOpjdgH+1MJ0nC"
  }

  stdout {
    codec => rubydebug
  }
}

insert image description here

11. Log field split effect test

restart logstash

docker restart logstash

Generate logs
insert image description here
to view in kibana (you can already see the split fields in the field list, and you can use these fields to filter later)
insert image description here

12. Logs are sorted by generation time

The log generation time has been split out and placed in the log-time field
insert image description here
, so what needs to be done is to overwrite @timestamp with the content of the log-time field and add
configuration to the filter of logstash.conf:

  date {
    
    
      match => ["log-time", "yyyy-MM-dd HH:mm:ss.SSS", "ISO8601"] #原字段转换成时间类型
      locale => "en"
      target => [ "@timestamp" ] #目标字段
      timezone => "Asia/Shanghai"
  }

The configuration here is to convert the log-time into a time type according to the format and then overwrite it to the @timestamp field

If you want to keep the original @timestamp, you can assign the original @timestamp value to a new field:
the following configuration sets @timestamp to a new field collection_time

  ruby {
    
    
    code => "event.set('collection_time', event.get('@timestamp'))"
  }

The full configuration is:
insert image description here

input {
    
    
  beats {
    
    
    port => 5044
  }
}

filter{
    
    
  grok{
    
    
    match => {
    
    
      "message" => "(?m)^%{TIMESTAMP_ISO8601:log-time}\s+%{LOGLEVEL:log-level}\s+\[%{DATA:application}\]\s+%{INT:pid}\s+---\s+\[%{DATA:thread}\]\s+%{DATA:class}\s+:\s+%{GREEDYDATA:log-message}"
    }
  }
  ruby {
    
    
    code => "event.set('collection_time', event.get('@timestamp'))"
  }
  date {
    
    
      match => ["log-time", "yyyy-MM-dd HH:mm:ss.SSS", "ISO8601"]
      locale => "en"
      target => [ "@timestamp" ]
      timezone => "Asia/Shanghai"
  }
}

output {
    
    
  elasticsearch {
    
    
    hosts => ["https://192.168.182.128:9200"]
    #index => "%{[fields][env]}-%{[fields][application]}-%{+YYYY.MM.dd}"
    index =>"logs-%{[fields][application]}-%{[fields][env]}"
    action => "create"
    cacert => "/usr/share/logstash/config/certs/http_ca.crt"
    user => "elastic"
    password => "+4TMJBgOpjdgH+1MJ0nC"
  }

  stdout {
    
    
    codec => rubydebug
  }
}

13. Effect test

restart logstash

docker restart logstash

insert image description here
generate log

insert image description here
Check the newly generated logs in kibana:
insert image description here
You can see that the @timestamp values ​​of the newly generated logs are now consistent. Using @timestamp will not cause the problem of log disorder

Guess you like

Origin blog.csdn.net/weixin_44835704/article/details/130224040