Introduction to ELK

1. Introduction to ELK

ELK is the abbreviation of three open source software, which means: Elasticsearch, Logstash, Kibana , they are all open source software. A new FileBeat is added . It is a lightweight log collection and processing tool (Agent). Filebeat occupies less resources and is suitable for collecting logs on various servers and transmitting them to Logstash. This tool is also officially recommended.

Elasticsearch is an open source distributed search engine that provides three functions of collecting, analyzing, and storing data. Its features are: distributed, zero configuration, automatic discovery, automatic index sharding, index copy mechanism, restful style interface, multiple data sources, automatic search load, etc. Mainly responsible for indexing and storing the logs to facilitate retrieval and query by business parties.

Logstash is mainly a tool for collecting, analyzing, and filtering logs, and supports a large number of data acquisition methods. The general working method is the c/s architecture. The client is installed on the host that needs to collect logs, and the server is responsible for filtering and modifying the received logs of each node and sending them to elasticsearch at the same time. It is a middleware for log collection, filtering, and forwarding. It is mainly responsible for the unified collection and filtering of various logs of various business lines, and then forwards them to Elasticsearch for further processing.

Kibana is also an open source and free tool. Kibana can provide Logstash and ElasticSearch with a friendly web interface for log analysis, which can help aggregate, analyze, and search important data logs.

Filebeat is part of Beats . Currently Beats includes four tools:

  • Packetbeat (collect network traffic data)
  • Topbeat (collects system, process, and file system level CPU and memory usage data)
  • Filebeat (collect file data)
  • Winlogbeat (collect Windows event log data)

2. ELK architecture diagram

1. Architecture diagram 1:
Insert picture description here
This is the simplest ELK architecture. The advantage is that it is simple to build and easy to use. The disadvantage is that Logstash consumes a lot of resources, and it takes up high CPU and memory to run. In addition, there is no message queue cache, and there is a hidden danger of data loss.
This architecture is distributed by Logstash on various nodes to collect relevant logs and data, and after analysis and filtering, they are sent to Elasticsearch on the remote server for storage. Elasticsearch compresses and stores data in the form of shards and provides a variety of APIs for users to query and operate. Users can also more intuitively configure Kibana Web to conveniently query logs and generate reports based on the data.

2. Architecture diagram 2:

Insert picture description here
This architecture introduces a message queue mechanism. The Logstash Agent located on each node first transfers data/logs to Kafka (or Redis), and indirectly transfers the messages or data in the queue to Logstash. After Logstash filters and analyzes, the data is transferred to Elasticsearch storage. Finally, Kibana presents the logs and data to the user. Because of the introduction of Kafka (or Redis), even if the remote Logstash server stops running due to a failure, the data will be stored first to avoid data loss.

3. Architecture diagram 3:
Insert picture description here
This architecture replaces logstash on the collection side with beats , which is more flexible, consumes less resources, and is more scalable. At the same time, Logstash and Elasticsearch clusters can be configured to support the monitoring and query of operation and maintenance log data of large cluster systems.

3. How Filebeat works

  • 1. Filebeat consists of two main components: prospectors and harvesters . These two components work together to send file changes to the specified output.

Insert picture description here

  • 2. Harvester : Responsible for reading the contents of a single file. Each file will start a Harvester, each Harvester will read each file line by line, and send the content of the file to the specified output. Harvester is responsible for opening and closing files, which means that when the Harvester is running, the file descriptor is open. If the file is renamed or deleted during the collection, Filebeat will continue to read the file. Therefore, the disk will not be released before the Harvester is closed. By default, filebeat will keep the file open until close_inactive is reached (if this option is turned on, filebeat will close the file handle that is no longer updated within the specified time, and the time starts from the time when the harvester reads the last line. If the file handle is After the file is closed, a new harvester will be started. The time to close the file handle does not depend on the modification time of the file. If this parameter is not configured properly, the log may not be real-time. It is determined by the scan_frequency parameter, the default is 10s The Harvester uses an internal time stamp to record the last time the file was collected. For example: set 5m, after the Harvester reads the last line of the file, the countdown starts for 5 minutes, if there is no change in the file within 5 minutes, the file handle is closed. Default 5m).

  • 3. Prospector : Responsible for managing Harvester and finding all reading sources.
    Prospector will find all the info.log files in the /apps/logs/* directory and start a Harvester for each file. Prospector will check each file to see if the Harvester has been started, whether it needs to be started, or whether the file can be ignored. If Harvester is closed, Prospector will only perform inspections when the file size changes. Only local files can be detected.

filebeat.prospectors:
- input_type: log
  paths:
    - /apps/logs/*/info.log
  • 4. How does Filebeat record the file status: record the file status
    in a file (default is /var/lib/filebeat/registry). This state can remember the offset of the Harvester collection file. If you cannot connect to an output device, such as ES, filebeat will record the last line before sending, and continue sending when you can connect again. When Filebeat is running, the Prospector status will be recorded in the memory. When Filebeat restarts, it uses the state recorded by the registry to rebuild it to restore the state before the restart. Each Prospector will record a status for each file found. For each file, Filebeat stores a unique identifier to detect whether the file was previously collected.

  • 5. How does Filebeat ensure that the event is output at least once: The
    reason why Filebeat can ensure that the event is delivered to the configured output at least once without data loss is because Filebeat saves the delivery status of each event in a file. When it is not confirmed by the exporter, filebeat will try to send it until it gets a response. If filebeat is closed during transmission, all time events will not be confirmed before closing. Any time that is confirmed before filebeat is closed will be resent after filebeat is restarted. This ensures that it will be sent at least once, but it may be repeated. The time to wait for an event response before shutting down can be set by setting the shutdown_timeout parameter (disabled by default).

4. How Logstash works

1. Logstash event processing has three stages: inputs → filters → outputs. It is a tool for receiving, processing, and forwarding logs. Support system log, webserver log, error log, application log, in short, include all log types that can be thrown.
Insert picture description here
Input : Input data to logstash.

Some commonly used inputs are:

file : read from a file in the file system, similar to the tial -f command

syslog : monitor system log messages on port 514 and parse them according to the RFC3164 standard

redis : read from redis service

beats : read from filebeat

Filters : Intermediate data processing, operations on data.

Some commonly used filters are:

grok : Parsing arbitrary text data, Grok is the most important plugin for Logstash. Its main function is to convert the string in text format into specific structured data and use it with regular expressions. Built-in more than 120 analytical grammars.

mutate : Convert the field. For example, delete, replace, modify, and rename fields.

drop : Drop some events without processing.

clone : Copy the event, you can also add or remove fields in this process.

geoip : Add geographic information (used for graphical display of the front kibana)

Outputs : Outputs are the last component of the logstash processing pipeline. An event can undergo multiple outputs during processing, but once all outputs are executed, the event completes its life cycle.

Some common outputs are:

elasticsearch : data can be stored efficiently, and can be easily and simply searched.

file : Save the event data to a file.

graphite : Send event data to a graphical component, a popular open source storage component for graphical display.

Codecs : codecs are filters based on data streams, which can be configured as part of input and output. Codecs can help you easily split the data that has been serialized.

Some common codecs:

json : Use json format to encode/decode data.

multiline : Collect data from multiple events into a single line. For example: java exception information and stack information.

Guess you like

Origin blog.csdn.net/qq_43141726/article/details/114583115