This article is related to the previous article. In the previous two articles, guests can lift their legs and turn right. Guide , ELK front-end , ELK's distributed sending
# After the front-end and message queue are completed, we need to install the data collection tool filebeats and data filtering Logstash is a transport tool for aircrafts. Generally, we use filebeats to collect log files. I customized a log file. The content of the file is as follows:
55.3.244.1 GET /index.html 15824 0.043
55.3.244.1 GET /index.html 15824 0.043
file The location is placed in /tmp/test.log
cd /opt/elk/filebeat-6.2.2-linux-x86_64
#filebeat The configuration file is generally divided into two parts, the input part and the output part, the input can specify the input source, and the output can choose to go directly to elasticsearch, logstash or Message queue (redis&kafka), etc. This article uses kafka as the message queue;
vi filebeat.yml
filebeat.prospectors: #log
type
- type: log
enabled: Truelog path can be
Write multiple, wildcard
paths are supported:- /tmp/test.log #Set
character set encoding
encoding: utf-8 #Document
type
document_type: my-nginx-log #Scan
every ten seconds
scan_frequency: 10sWhen actually reading the file, 16384 bytes are read each time
harverster_buffer_size: 16384
#The maximum number of bytes that a single log message can have. All bytes after max_bytes are discarded and not sent. This setting is especially useful for multiline log messages, which can get large. The default is 10MB (10485760). 一次发生的log大小值;
max_bytes: 10485760whether to start reading from the end of the file
tail_files: true #Add
tags to the log to facilitate judgment when logstash filters the log
tags: ["nginx-access" ] #Remove
files ending with .gz
exclude_files: [".gz$"]
#output part of the configuration document
#Configure the output to be kafka, no matter whether you install it from a tar package or rpm, you will see the officially provided filebeat.full.yml OR filebeat.reference.yml in the directory. Here are all the input methods and output methods of filebeat for your reference;
output .kafka:
enabled: true
#kafka's server, you can configure the cluster, examples are as follows:
hosts:["ip:9092","ip2:9092","ip3:9092"]
#This is very important, filebeat acts as a provider to store data Input into kafka, logstash consumes this information as a consumer, this topic is required in the input of logstash, otherwise logstash cannot get the data.
topic: elk-%{[type]}The number of concurrent load-balanced Kafka output workers. Kafka's concurrent running process
worker: 2 #When
there is a problem with the transmission to kafka, the number of retries;
max_retries: 3 #The
maximum number of events in a single kafka request, the default is 2048
bulk_max_size: 2048 #The
time to wait for the response from the kafka broker, the default is 30s
timeout: 30s
# The maximum time for kafka broker to wait for a request, default 10s
broker_timeout: 10s #The
number of message buffers in the output pipeline of each kafka broker, default 256
channel_buffer_size: 256 #The
keep-alive time of the network connection, the default is 0, the keep-alive mechanism is not enabled
keep_alive : 60
#Output compression code, options are none, snappy, lz4 and gzip, the default is gzip (compression supported by kafka, the data will be compressed first, and then sent by the producer, and the compressed state is also maintained on the server side, only in the The final consumer will be decompressed)
compression: gzip #The
maximum allowable json message size, the default is 1000000, the excess will be discarded, and it should be less than the broker's message.max.bytes (the maximum bytes that the broker can receive a message) number)
max_message_bytes: 1000000
#kafka's response return value, 0 bits do not wait for a response to return, continue to send the next message; 1 means waiting for local submission (leader broker has been successfully written, but follower has not written), -1 means waiting for all The commit of the replica, defaults to 1
required_acks: 0
#The configurable ClientID used for logging, debugging, and auditing purposes. The default is "beats". The client ID is used for logging, auditing, etc. The default is beats.
client_id: beats
#Test configuration file: /opt/elk/filebeat/filebeat -c /opt/elk/filebeat/filebeat.yml test config #If
there is no problem with the configuration file, config ok will appear, if there is a problem, it will prompt specific problems Where.
#Start filebeat
You can first check whether the input filebeat works through /opt/elk/filebeat-6.2.2-linux-x86_64/filebeat -c -e /opt/elk/filebeat-6.2.2-linux-x86_64/filebeat.yml Normally, a lot of information will be printed to the screen;
nohup /opt/elk/filebeat-6.2.2-linux-x86_64/filebeat -c /opt/elk/filebeat-6.2.2-linux-x86_64/filebeat.yml >> /dev/null 2>&1&
- /tmp/test.log #Set
Logstash installation configuration:
input { #Data
source
kafka { #The index topics_pattern
corresponding to the output of
filebeat => "elk-.*"
#kafka configuration
bootstrap_servers => "IP1:9092,IP2:9092,IP3:9092"
in kafka The group ID
group_id => "logstash-g1"
}
} #Filter
, if you want the logs to be output according to your needs, you need to filter in logstash and give the corresponding key, such as the following is the nginx log, use It is a built-in filter in logstash, %{IP:client} contains two pieces of information: ip address, and client is the custom key value displayed in kibana, which is finally displayed like this, http://grokdebug.herokuapp. com/ is a tool for online debugging of logstash filter, you can debug your filtering rules online.
filter {
grok {
match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"} #Because
we take out field, so the original message field is not needed. This field contains all the fields entered by beat before.
remove_field => ["message"]
}
}
output {
elasticsearch {
hosts => ["IP:9200"]
#This is the index of the index Patterns on kibana. It is recommended to use any name for the service, because beat provides some kibana The template can be imported, and the index used when the imported template matches is the name of the corresponding service.
index => "nginx-%{+YYYY.MM.dd}"
document_type => "nginx" #Send
data to elasticsearch once every 20000
flush_size =>
If 20000 is not enough to 20000, data will be sent once every 10 seconds;,
idle_flush_time = >10
}
}
The flow chart of log from entry to output:
The logs on Kibana are filtered by logstash and then displayed on kibana as follows: