Distributed real-time log: How to use the deployment architecture of ELK?

I. Overview

ELK has become the most popular centralized log solution at present. It is mainly composed of Beats, Logstash, Elasticsearch, Kibana and other components to jointly complete the one-stop solution of real-time log collection, storage, and display. This article will introduce the common architecture of ELK and related problem solving.

  • Filebeat: Filebeat is a lightweight data collection engine that takes up very little service resources. It is a new member of the ELK family and can replace Logstash as the log collection engine on the application server side. It supports outputting collected data to Kafka. Redis and other queues.

  • Logstash: The data collection engine is heavyweight compared to Filebeat, but it integrates a large number of plug-ins, supports the collection of rich data sources, and can filter, analyze, and format the log format for the collected data.

  • Elasticsearch: Distributed data search engine, implemented based on Apache Lucene, can be clustered, provides centralized data storage, analysis, and powerful data search and aggregation functions.

  • Kibana: A data visualization platform. Through this web platform, you can view the relevant data in Elasticsearch in real time, and provide rich chart statistics functions.

2. ELK Common Deployment Architecture

2.1 Logstash as a log collector

This architecture is a relatively primitive deployment architecture. A Logstash component is deployed on each application server as a log collector, and then the data collected by Logstash is filtered, analyzed, and formatted and sent to Elasticsearch storage, and finally Kibana is used for Visually, the shortcomings of this architecture are:

Logstash consumes more server resources, so it will increase the load pressure on the application server.

picture

2.2 Filebeat as a log collector

The only difference between this architecture and the first architecture is that the application-side log collector is replaced by Filebeat. Filebeat is lightweight and occupies less server resources. Therefore, Filebeat is used as the log collector on the application server. Generally, Filebeat will be used together with Logstash. This deployment method is currently the most commonly used architecture.

picture

2.3 Introduce the deployment architecture of the cache queue

This architecture introduces the Redis cache queue (or other message queues) on the basis of the second architecture, sends the data collected by Filebeat to Redis, and then reads the data in Redis through Logstasth. This architecture is mainly To solve the log collection solution under the large amount of data, the use of cache queues is mainly to solve data security and balance the load pressure of Logstash and Elasticsearch.

picture

2.4 Summary of the above three architectures

The first deployment architecture is seldom used due to the problem of resource occupation. Currently, the second deployment architecture is the most used one. As for the third deployment architecture, I personally think that there is no need to introduce message queues unless there are other requirements, because in the data volume In larger cases, Filebeat sends data to Logstash or Elasticsearch using a pressure-sensitive protocol. If Logstash is busy processing data, it tells Filebeat to slow down the read. After the congestion resolves, Filebeat will resume its initial speed and continue sending data.

3. Problems and solutions

Question: How to implement the multi-line merge function of the log?

Logs in system applications are generally printed in a specific format, and data belonging to the same log may be printed in multiple lines, so when using ELK to collect logs, it is necessary to merge multiple lines of data belonging to the same log.

Solution: Use the multiline multiline merge plugin in Filebeat or Logstash to achieve.

When using the multiline multiline merge plug-in, you need to pay attention. Different ELK deployment architectures may use multiline in different ways. If it is the first deployment architecture in this article, then multiline needs to be configured and used in Logstash. If it is the second deployment Architecture, then multiline needs to be configured and used in Filebeat, no need to configure multiline in Logstash.

1. How to configure multiline in Filebeat:

picture

  • pattern: regular expression;

  • negate: The default is false, which means that the lines matching the pattern are merged into the previous line; true means that the lines that do not match the pattern are merged into the previous line;

  • match: after means to merge to the end of the previous line, and before means to merge to the beginning of the previous line.

like:

pattern: ‘[‘
negate: true
match: after

This configuration indicates that the lines that do not match the pattern pattern will be merged to the end of the previous line

2. How to configure multiline in Logstash

picture

(1) The value of what attribute configured in Logstash is previous, which is equivalent to after in Filebeat, and the value of what attribute configured in Logstash is next, which is equivalent to before in Filebeat.

(2) LOGLEVEL in pattern => "%{LOGLEVEL}\s*]" is a prefabricated regular matching pattern of Logstash. There are many prefabricated regular matching patterns. For details, please see: https://github.com/ logstash-plugins/logstash-patterns-core/tree/master/patterns

Question: How to replace the time field displayed in logs in Kibana with the time in the log information?

By default, the time field we view in Kibana is inconsistent with the time in the log information, because the default time field value is the current time when the log is collected, so the time in this field needs to be replaced with the time in the log information.

Solution : use grok word segmentation plug-in and date time formatting plug-in to achieve

Configure the grok word segmentation plug-in and date time formatting plug-in in the filter of the Logstash configuration file, such as:

picture

If the log format to be matched is: "[DEBUG][20170811 10:07:31,359][DefaultBeanDefinitionDocumentReader:106] Loading bean definitions", the methods to parse out the time field of the log are:

① By introducing the written expression file, such as the expression file is customer_patterns, the content is:
CUSTOMER_TIME %{YEAR}%{MONTHNUM}%{MONTHDAY}\s+%{TIME}

Note: The content format is: [custom expression name] [regular expression]

Then logstash can be referenced like this:

picture

② In the form of configuration items, the rules are: (?<custom expression name>regular matching rules), such as:

picture

Question: How to view data in Kibana by selecting different syslog modules

Generally, the log data displayed in Kibana is mixed with data from different system modules, so how to select or filter to view only the log data of specified system modules?

Solution: Add fields to identify different system modules or build ES indexes based on different system modules

1. Add a field to identify different system modules, and then in Kibana, you can filter and query the data of different modules according to this field

Here, the second deployment architecture is explained. The configuration content in Filebeat is:

picture

By adding: log_from field to identify different system module logs

2. Configure the corresponding ES index according to different system modules, and then create the corresponding index pattern matching in Kibana, you can select different system module data through the index pattern drop-down box on the page.

Here, the second deployment architecture is explained, which is divided into two steps:

① The configuration content in Filebeat is:

picture

Identify different system modules by document_type

② Modify the configuration content of output in Logstash to:

Add index attribute in output, %{type} means to build ES index according to different document_type values

Four. Summary

This article mainly introduces the three deployment architectures of ELK real-time log analysis, and the problems that different architectures can solve. The second deployment method of these three architectures is the most popular and commonly used deployment method at present. Finally, it introduces ELK's role in Some problems and solutions in log analysis, in the end, ELK can not only be used as a centralized query and management of distributed log data, but also can be used as a project application and server resource monitoring and other scenarios. For more information, please see official website.

Source: https://my.oschina.net/feinik/blog/1580625

Guess you like

Origin blog.csdn.net/LinkSLA/article/details/131759577