Distributed real-time log analysis solution ELK deployment architecture

I. Overview

ELK has become the most popular centralized log solution at present. It is mainly composed of Beats, Logstash, Elasticsearch, Kibana and other components to jointly complete the one-stop solution for real-time log collection, storage, and display. This article will introduce the common architecture of ELK and related problem solving.

  1. Filebeat: Filebeat is a lightweight data collection engine that occupies very few service resources. It is a new member of the ELK family. It can replace Logstash as the log collection engine on the application server side, and supports outputting the collected data to Kafka. Queues such as Redis.
  2. Logstash: The data collection engine is more heavyweight than Filebeat, but it integrates a large number of plug-ins, supports the collection of rich data sources, and can filter, analyze, and format the collected data in log format.
  3. Elasticsearch: Distributed data search engine, based on Apache Lucene, can be clustered, providing centralized data storage, analysis, and powerful data search and aggregation functions.
  4. Kibana: A data visualization platform, through which you can view relevant data in Elasticsearch in real time, and provide rich chart statistics functions.

Two, ELK common deployment architecture

2.1, Logstash as a log collector

This architecture is a relatively primitive deployment architecture. A Logstash component is deployed on each application server as a log collector, and then the data collected by Logstash is filtered, analyzed, formatted and sent to Elasticsearch for storage, and finally used Kibana. Visually, the disadvantage of this architecture is that Logstash consumes more server resources, so it will increase the load pressure on the application server.

2.2, Filebeat as a log collector

The only difference between this architecture and the first architecture is that the application-side log collector is replaced by Filebeat. Filebeat is lightweight and occupies less server resources. Therefore, Filebeat is used as the application server-side log collector. Generally, Filebeat will be used together with Logstash. This deployment method is also the most commonly used architecture today.

2.3. Introducing the deployment architecture of the cache queue

This architecture introduces the Kafka message queue (or other message queues) on the basis of the second architecture, sends the data collected by Filebeat to Kafka, and then reads the data in Kafka through Logstasth. This architecture is mainly To solve the log collection scheme under large data volume, the use of cache queues is mainly to solve data security and balance the load pressure of Logstash and Elasticsearch.

2.4. Summary of the above three architectures

The first deployment architecture is rarely used due to resource occupation. Currently, the second deployment architecture is the most used. As for the third deployment architecture, I personally feel that there is no need to introduce message queues unless there are other requirements, because the amount of data In larger cases, Filebeat sends data to Logstash or Elasticsearch using a pressure-sensitive protocol. If Logstash is busy processing data, it tells Filebeat to slow down reads. After the congestion is resolved, Filebeat will resume its initial speed and continue sending data.

3. Problems and Solutions

Question: How to implement the multi-line merge function of the log?

Logs in system applications are generally printed in a specific format, and data belonging to the same log may be printed in multiple lines. When using ELK to collect logs, it is necessary to combine multiple lines of data belonging to the same log.

Solution: Use the multiline multiline merge plugin in Filebeat or Logstash to achieve this

When using the multiline multiline merge plugin, it should be noted that different ELK deployment architectures may use multiline differently. If it is the first deployment architecture in this article, then multiline needs to be configured in Logstash. If it is the second deployment architecture, then multiline needs to be configured and used in Filebeat, and there is no need to configure multiline in Logstash.

1. How multiline is configured in Filebeat:

filebeat.prospectors:
    -
       paths:
          - /home/project/elk/logs/test.log
       input_type: log 
       multiline:
            pattern: '^\['
            negate: true
            match: after
output:
   logstash:
      hosts: ["localhost:5044"]
  • pattern: regular expression
  • negate: the default is false, which means that the lines matching the pattern are merged into the previous line; true means the lines that do not match the pattern are merged into the previous line
  • match: after means merging to the end of the previous line, before means merging to the beginning of the previous line

Such as:

pattern: '\['
negate: true
match: after

This configuration means to merge lines that do not match the pattern pattern to the end of the previous line

2. How multiline is configured in Logstash

input {
  beats {
    port => 5044
  }
}

filter {
  multiline {
    pattern => "%{LOGLEVEL}\s*\]"
    negate => true
    what => "previous"
  }
}

output {
  elasticsearch {
    hosts => "localhost:9200"
  }
}

(1) The value of the what attribute configured in Logstash is previous, which is equivalent to after in Filebeat, and the value of what attribute configured in Logstash is next, which is equivalent to before in Filebeat.
(2) LOGLEVEL in pattern => "%{LOGLEVEL}\s*\]" is a regular matching pattern prefabricated by Logstash, and there are many prefabricated regular matching patterns. For details, please see: https://github.com /logstash-plugins/logstash-patterns-core/tree/master/patterns

Question: How do I replace the time field in Kibana showing the log with the time in the log information?

By default, the time field we view in Kibana is inconsistent with the time in the log information. Because the default time field value is the current time when the log is collected, it is necessary to replace the time in this field with the time in the log information.

Solution: use grok word segmentation plugin and date time formatting plugin to achieve

Configure the grok word segmentation plugin and date time formatting plugin in the filter of the Logstash configuration file, such as:

input {
  beats {
    port => 5044
  }
}

filter {
  multiline {
    pattern => "%{LOGLEVEL}\s*\]\[%{YEAR}%{MONTHNUM}%{MONTHDAY}\s+%{TIME}\]"
    negate => true
    what => "previous"
  }

  grok {
    match => [ "message" , "(?<customer_time>%{YEAR}%{MONTHNUM}%{MONTHDAY}\s+%{TIME})" ]
  }

  date {
        match => ["customer_time", "yyyyMMdd HH:mm:ss,SSS"] //格式化时间
        target => "@timestamp" //替换默认的时间字段
  }
}

output {
  elasticsearch {
    hosts => "localhost:9200"
  }
}

If the log format to be matched is: "[DEBUG][20170811 10:07:31,359][DefaultBeanDefinitionDocumentReader:106] Loading bean definitions", the way to parse out the time field of the log is as follows:

①By introducing the written expression file, if the expression file is customer_patterns, the content is:
CUSTOMER_TIME %{YEAR}%{MONTHNUM}%{MONTHDAY}\s+%{TIME}
Note: The content format is: [custom expression name] [regex]
and then in logstash it can be referenced like this:

filter {
  grok {
      patterns_dir => ["./customer-patterms/mypatterns"] //引用表达式文件路径
      match => [ "message" , "%{CUSTOMER_TIME:customer_time}" ] //使用自定义的grok表达式
  }
}

②In the form of configuration items, the rules are: (?<custom expression name> regular matching rules), such as:

filter {
  grok {
    match => [ "message" , "(?<customer_time>%{YEAR}%{MONTHNUM}%{MONTHDAY}\s+%{TIME})" ]
  }
}

Question: How to view data in Kibana by selecting different syslog modules

Generally, the log data displayed in Kibana is mixed with data from different system modules, so how to select or filter the log data to view only the specified system module?

Solution: Add fields to identify different system modules or build ES indexes based on different system modules

1. Add a field to identify different system modules, and then in Kibana, you can filter and query the data of different modules according to this field.
Here is the second deployment architecture. The configuration content in Filebeat is:

filebeat.prospectors:
    -
       paths:
          - /home/project/elk/logs/account.log
       input_type: log 
       multiline:
            pattern: '^\['
            negate: true
            match: after
       fields: //新增log_from字段
         log_from: account

    -
       paths:
          - /home/project/elk/logs/customer.log
       input_type: log 
       multiline:
            pattern: '^\['
            negate: true
            match: after
       fields:
         log_from: customer
output:
   logstash:
      hosts: ["localhost:5044"]

Identify different system module logs by adding: log_from field

2. Configure the corresponding ES index according to different system modules, and then create the corresponding index pattern matching in Kibana, you can select different system module data through the index pattern drop-down box on the page.
The second deployment architecture is explained here, which is divided into two steps:
① The configuration content in Filebeat is:

filebeat.prospectors:
    -
       paths:
          - /home/project/elk/logs/account.log
       input_type: log 
       multiline:
            pattern: '^\['
            negate: true
            match: after
       document_type: account

    -
       paths:
          - /home/project/elk/logs/customer.log
       input_type: log 
       multiline:
            pattern: '^\['
            negate: true
            match: after
       document_type: customer
output:
   logstash:
      hosts: ["localhost:5044"]

Identify different system modules by document_type

② Modify the configuration content of output in Logstash as follows:

output {
  elasticsearch {
    hosts => "localhost:9200"
    index => "%{type}"
  }
}

Add index attribute to output, %{type} means to build ES index according to different document_type values

4. Summary

This article mainly introduces the three deployment architectures of ELK real-time log analysis and the problems that can be solved by different architectures. The second deployment method among the three architectures is the most popular and commonly used deployment method at present. Finally, it introduces how ELK is used in Some problems and solutions in log analysis. In the end, ELK can be used not only as a centralized query and management of distributed log data, but also as a project application and server resource monitoring and other scenarios. For more information, please see official website.

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324376176&siteId=291194637