ELK's data collection and transmission filtering Filebeat+Logstash deployment

This article is related to the previous article. In the previous two articles, guests can lift their legs and turn right. Guide , ELK front-end , ELK's distributed sending
# After the front-end and message queue are completed, we need to install the data collection tool filebeats and data filtering Logstash is a transportation tool for aircrafts. Generally, we use filebeats to collect log files. I customized a log file. The content of the file is as follows:
55.3.244.1 GET /index.html 15824 0.043
55.3.244.1 GET /index.html 15824 0.043
file The location is placed in /tmp/test.log

cd /opt/elk/filebeat-6.2.2-linux-x86_64
#filebeat The configuration file is generally divided into two parts, the input part and the output part, the input can specify the input source, and the output can choose to go directly to elasticsearch, logstash or Message queue (redis&kafka), etc. This article uses kafka as the message queue;
vi filebeat.yml
filebeat.prospectors: #log
type

  • type: log
    enabled: True

    log path can be

    Write multiple, wildcard
    paths are supported:

    • /tmp/test.log #Set
      character set encoding
      encoding: utf-8 #Document
      type
      document_type: my-nginx-log #Scan
      every ten seconds
      scan_frequency: 10s

      When actually reading the file, 16384 bytes are read each time

      harverster_buffer_size: 16384
      #The maximum number of bytes that a single log message can have. All bytes after max_bytes are discarded and not sent. This setting is especially useful for multiline log messages, which can get large. The default is 10MB (10485760). 一次发生的log大小值;
      max_bytes: 10485760

      whether to start reading from the end of the file

      tail_files: true #Add
      tags to the log to facilitate judgment when logstash filters the log
      tags: ["nginx-access" ] #Remove
      files ending with .gz
      exclude_files: [".gz$"]
      #output part of the configuration document
      #Configure the output to be kafka, no matter whether you install it from a tar package or rpm, you will see the officially provided filebeat.full.yml OR filebeat.reference.yml in the directory. Here are all the input and output methods of filebeat for your reference;
      output .kafka:
      enabled: true
      #kafka's server can configure the cluster, examples are as follows:
      hosts:["ip:9092","ip2:9092","ip3:9092"]
      #This is very important, filebeat acts as a provider to store data Input into kafka, logstash consumes this information as a consumer, this topic is required in the input of logstash, otherwise logstash cannot get the data.
      topic: elk-%{[type]}

      The number of concurrent load-balanced Kafka output workers. Kafka's concurrent running process

      worker: 2 #When
      there is a problem with the transmission to kafka, the number of retries;
      max_retries: 3 #The
      maximum number of events in a single kafka request, the default is 2048
      bulk_max_size: 2048 #The
      time to wait for the response from the kafka broker, the default is 30s
      timeout: 30s
      # The maximum time for kafka broker to wait for a request, default 10s
      broker_timeout: 10s #The
      number of message buffers in the output pipeline of each kafka broker, default 256
      channel_buffer_size: 256 #The
      keep-alive time of the network connection, the default is 0, the keep-alive mechanism is not enabled
      keep_alive : 60
      #Output compression code, options are none, snappy, lz4 and gzip, the default is gzip (compression supported by kafka, the data will be compressed first, and then sent by the producer, and the compressed state is also maintained on the server side, only in the The final consumer will be decompressed)
      compression: gzip #The
      maximum allowable json message size, the default is 1000000, the excess will be discarded, and it should be less than the broker's message.max.bytes (the maximum bytes that the broker can receive a message) number)
      max_message_bytes: 1000000
      #kafka's response return value, 0 bits do not wait for the response to return, continue to send the next message; 1 means waiting for local submission (leader broker has been successfully written, but follower has not written), -1 means waiting for all The commit of the replica, defaults to 1
      required_acks: 0
      #The configurable ClientID used for logging, debugging, and auditing purposes. The default is "beats". The client ID is used for logging, auditing, etc. The default is beats.
      client_id: beats
      #Test configuration file: /opt/elk/filebeat/filebeat -c /opt/elk/filebeat/filebeat.yml test config #If
      there is no problem with the configuration file, config ok will appear, if there is a problem, it will prompt specific problems Where.
      #Start filebeat
      You can first check whether the input filebeat works through /opt/elk/filebeat-6.2.2-linux-x86_64/filebeat -c -e /opt/elk/filebeat-6.2.2-linux-x86_64/filebeat.yml Normally, a lot of information will be printed to the screen;
      nohup /opt/elk/filebeat-6.2.2-linux-x86_64/filebeat -c /opt/elk/filebeat-6.2.2-linux-x86_64/filebeat.yml >> /dev/null 2>&1&

Logstash installation configuration:
input { #Data
source
kafka { #The index topics_pattern
corresponding to the output of
filebeat => "elk-.*"
#kafka configuration
bootstrap_servers => "IP1:9092,IP2:9092,IP3:9092"
in kafka The group ID
group_id => "logstash-g1"
}
} #Filter
, if you want the logs to be output according to your needs, you need to filter in logstash and give the corresponding key, such as the following is the nginx log, use It is a built-in filter in logstash, %{IP:client} contains two pieces of information: ip address, and client is the custom key value displayed in kibana, which is finally displayed like this, http://grokdebug.herokuapp. com/ is a tool for online debugging of logstash filter, you can debug your filtering rules online.

filter {
grok {
match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"} #Because
we take out field, so the original message field is not needed. This field contains all the fields entered by beat before.
remove_field => ["message"]
}
}
output {
elasticsearch {
hosts => ["IP:9200"]
#This is the index of the index Patterns on kibana. It is recommended to use any name for the service, because beat provides some kibana The template can be imported, and the index used when the imported template matches is the name of the corresponding service.
index => "nginx-%{+YYYY.MM.dd}"
document_type => "nginx" #Send
data to elasticsearch once every 20000
flush_size =>
If 20000 is not enough to 20000, data will be sent once every 10 seconds;,
idle_flush_time = >10
}
}

The flow chart of log from entry to output:
ELK's data collection and transmission filtering Filebeat+Logstash deployment

The logs on Kibana are filtered by logstash and then displayed on kibana as follows:
ELK's data collection and transmission filtering Filebeat+Logstash deployment

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324652037&siteId=291194637