Solve the problem that filebeat occupies Linux space and is not released

One of our application servers, operating system is Red Hat Linux, monitoring alarm, /opt/applog file system usage exceeds the threshold, the overall capacity is 50G, but the actual file size is 20G, what is the remaining 30G space?

We know that in the Linux environment, everything exists in the form of a file. The system is in the background, and a file descriptor is allocated for each application, which provides a general purpose for the interaction between the application and the operating system. Since it is a file, it will take up space. At this time, you can use the lsof command to list the files that are currently being opened by the system.

> lsof

COMMAND      PID      USER   FD      TYPE    DEVICE  SIZE/OFF      NODE NAME

...

filebeat  111442   app  1r      REG     253,3 209715229   1040407 /opt/applog/E.20171016.info.012.log

filebeat  111442   app  2r      REG     253,3 209715254    385080 /opt/applog/E.20171015.info.001.log (deleted)

...

The fields of the header have the following meanings:

COMMAND: the name of the process
PID: the process identifier
USER: the process owner
FD: the file descriptor by which the application identifies the file. Such as cwd, txt, etc.
TYPE: file type, such as DIR, REG, etc.
DEVICE: specify the name of the disk
SIZE: the size of the file
NODE: inode (identification of the file on the disk)
NAME: the exact name of the open file

It can be seen that in some lines, NAME is marked (deleted),

/opt/applog/E.20171015.info.001.log (deleted)

What he means is that the file has been deleted, but the handle to the opened file has not been closed. Then look at the name of COMMAND is filebeat, the owner of the USER process is app, this is our log collection process, and the app user has started the filebeat process .

Insert the log collection platform


The traditional open source log platform, namely ELK, consists of three open source tools, ElasticSearch, Logstash and Kiabana. Among them,

  • Elasticsearch is an open source distributed search engine, distributed, zero configuration, automatic discovery, automatic index sharding, index replication mechanism, restful style interface, multiple data sources, automatic search load, etc.

  • Logstash is an open source collection tool that can collect, filter, and store logs for later use.

  • Kibana is an open source graphical web tool that provides a log analysis-friendly web interface for Logstash and ElasticSearch, which can summarize, analyze and search important data logs.

A common deployment diagram is as follows,

What about the filebeat mentioned above? What is the connection with ELK?

There is an introduction by Rao Chenlin (author of "ELKstack Authoritative Guide") on Zhihu, which is very incisive.

Quoted from https://www.zhihu.com/question/54058964/answer/137882919

Because logstash is run by JVM, the resource consumption is relatively large, so the author later wrote a lightweight logstash-forwarder with less functions but less resource consumption in golang. But the author is only one person, join http://elastic.coAfter the company, because es company itself also acquired another open source project packetbeat, and this project uses golang exclusively and has a whole team, so es company simply merged the development work of logstash-forwarder into the same golang team. So the new project is called filebeat.

In short, filebeat is the process agent of log collection, which is responsible for collecting application log files.

For my question above, the reason why there are a large number of (deleted), file handles are not released, there is also a background, that is, because the disk space is very limited, temporarily add tasks, delete logs 12 hours ago every hour, in other words , the scheduled task will automatically delete some files that filebeat is opening at this time, so these files become unreleased files, so the actual files are deleted, but the space is not released.


Solution 1:


In order to quickly release the space occupied, the most direct method is to kill -9 filebeat process, and the space will be released at this time. But it is not a fundamental solution. The scheduled task will also delete these files. The files opened by filebeat will cause the space to be full.


Solution 2:


The configuration file filebeat.yml of filebeat actually has two parameters,

close_older: 1h

说明:Close older closes the file handler for which were not modified for longer then close_older. Time strings like 2h (2 hours), 5m (5 minutes) can be used.

That is, if a file has not been updated within a certain period of time, the monitored file handle will be closed, and the default is 1 hour.

force_close_files: false

说明:This option closes a file, as soon as the file name changes. This config option is recommended on windows only. Filebeat keeps the files it's reading open. This can cause issues when the file is removed, as the file will not be fully removed until also Filebeat closes the reading. Filebeat closes the file handler after ignore_older. During this time no new file with the same name can be created. Turning this feature on the other hand can lead to loss of data on rotate files. It can happen that after file rotation the beginning of the new file is skipped, as the reading starts at the end. We recommend to leave this option on false but lower the ignore_older value to release files faster.

That is, when the file name changes, including renaming and deleting, a file will be automatically closed.

These two parameters are combined. According to the application requirements, if a file is not updated within 30 minutes, the handle needs to be closed. If the file is renamed or deleted, the handle needs to be closed.

close_older: 30m

force_close_files: true

It can meet the basic requirements of filebeat collecting logs and regularly deleting historical files.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325599257&siteId=291194637