Introduction to filebeat

1. Overview of filebeat

Filebeat is a lightweight delivery tool for forwarding and centralizing log data. Filebeat monitors the log files or locations you specify, collects log events, and forwards them to Elasticsearch or Logstash or kafka for indexing

1.1 Two main components of Filebeat

Prospectors and harvesters.

prospector: detector

harvester: Collector

The prospector is responsible for managing the harvester and finding all sources of files to read. If the input type is log, the finder will find all files matching the path and start a harvester for each file.

Prospector* (Prospector): **Responsible for managing the Harvester and finding all read sources . Prospector will find all info.log files in the /apps/logs/ directory and start a Harvester for each file. Prospector checks each file to see if Harvester is already started, needs to be started, or if the file can be ignored. If Harvester is closed, Prospector will only perform checks when the file size changes. Only local files can be detected.

Harvester** (harvester): **Responsible for reading the content of a single file. Each file will start a Harvester, and each Harvester will read each file line by line and send the file content to the specified output . Harvester is responsible for opening and closing files, which means that when Harvester is running, the file descriptor is open. If the file is renamed or deleted during collection, Filebeat will continue to read the file. So the disk will not be freed until the Harvester is closed. By default, filebeat will keep the file open until it reaches close_inactive (if this option is enabled, filebeat will close the file handle that is no longer updated within the specified time, and the time starts from the time when the harvester reads the last line. If the file handle is After closing, if the file changes, a new harvester will be started. The time to close the file handle does not depend on the modification time of the file. If this parameter is not configured properly, the log may not be real-time. It is determined by the scan_frequency parameter, and the default is 10s .Harvester uses the internal timestamp to record the time when the file was last collected. For example, if you set 5m, the countdown will start for 5 minutes after Harvester reads the last line of the file. If there is no change in the file within 5 minutes, the file handle will be closed. Default 5m).

How Filebeat records file status:

Log file status in a file (default /var/lib/filebeat/registry). This state can remember the offset of the Harvester collection file . If the output device cannot be connected, such as ES, etc., filebeat will record the last line before sending, and continue sending when it can be connected again. When Filebeat is running, the Prospector state will be recorded in memory. When Filebeat restarts, it uses the state recorded in the registry to rebuild and restore to the state before the restart. Each Prospector records a status for each found file, and for each file, Filebeat stores a unique identifier to detect whether the file was previously collected.

Filebeat currently supports two types of prospectors: log and stdin.

Responsible for reading the contents of a single file. If the file is deleted or renamed while it is being read, Filebeat will continue to read the file.

Summarize:

1. Prospectors: components that detect and collect log data, can detect new log files or file increments, and send read requests to Harvesters.

2.Harvesters: The component that reads log files, reads log files from Prospector, filters and captures them, and sends events to Spooler.

3. Spooler: Collect events read by Harvester, buffer them, and finally send them to Output in batches.

4. The Registry records which files have been read, and the read Offset, which is used for the next detection of file increments.

5. Filebeat continuously monitors and collects log data by repeating the above steps.

in /usr/local/filebeat-7.8.0-linux-x86_64/data/registry/filebeat

2. What are the advantages and disadvantages of filebeat and logstarch?

Both Filebeat and Logstash are important components in the ELK stack, but they have the following main advantages and disadvantages:

2.1 Advantages and disadvantages of filebeat

filebeatAdvantages:

1. Lightweight, low resource consumption, easy to deploy on each server.

2. Modular design, supports rich input and output plug-ins, easy to expand.

3. It can save the state and support resumed transmission, avoiding repeated sending of data.

4. File collection does not depend on inotify and is applicable to various environments.

Filebeat disadvantages:

1. Rely on other components (such as Logstash) for complex data processing and analysis.

2. Does not support real-time data analysis, there is a certain delay.

Harvester and Spooler use batch collection and batch sending, so there is a certain delay and real-time data analysis cannot be done.

There are two main reasons for the delay:

  1. Delay caused by caching strategy: The data collected by Harvester will be cached in the local disk first, waiting for batch transfer by Spooler. If the number of buffered events is low, or the data collection frequency is low, there may be a wait for a certain batch size to be reached, causing delays .

  2. Delay caused by network transmission : It takes a certain amount of time for the Spooler to transfer data in batches to the target data store, especially when the network between the target data store and the server where the Harvester is located is slow or unstable, which will cause greater delay.

Therefore, if real-time data analysis is required, real-time data transmission methods are required, such as using message queues such as Kafka to decouple data collection and data analysis to achieve efficient real-time data transmission and processing. At the same time, it is also necessary to optimize the performance and stability of data collection and transmission to ensure the real-time and accuracy of data.

3. The supported log formats are limited, and many formats require a custom parser.

2.2 Advantages and disadvantages of logstash

Advantages of Logstash:

1. It has powerful functions and supports rich data filtering, conversion and output.

2. Support real-time data processing and analysis.

3. It supports a wide range of log formats and data sources, and has strong community support.

4. Flexible configuration, Pipeline can combine multiple filters and outputs to realize complex data processing logic.

Logstash Disadvantages:

1. The resource consumption is large, and it is not easy to deploy on large-scale servers.

2. The configuration and management are complicated, and the debugging and maintenance of Pipeline are more difficult.

3. If the state is not saved, the upload cannot be resumed, and the process will be repeated to send the data.

4. Rely on tools such as Filebeat for data collection, which itself does not have file monitoring capabilities.

Summary: Although Filebeat and Logstash are located at different levels of the ELK stack, they can cooperate with each other to form a complete log collection and processing system. Filebeat focuses on efficient and stable log collection, and Logstash focuses on powerful and flexible data processing. The lightweight of Filebeat and the powerful functions of Logstash can well make up for each other's shortcomings. Therefore, in practical applications, Filebeat and Logstash are often used at the same time to realize the collection, filtering, conversion, enrichment and output of log data. By understanding the advantages and disadvantages of both, we can make better use of the ELK stack to build an efficient, flexible and easy-to-maintain log solution

おすすめ

転載: blog.csdn.net/weixin_44815878/article/details/131974179