Downtime caused by too many open files

Part of the content is excerpted from https://blog.csdn.net/qq_18581221/article/details/80963741

1. Basic description, cause and solution of the problem

insert image description here
The reason for this error is that the number of open files (everything in Linux is a file, so the number of open files includes the number of files and Socket connections) exceeds the maximum number set by Linux;
insert image description here
insert image description here
insert image description here
insert image description here

2. Actual cases encountered

1. Problem analysis

The above content is excerpted from the above blog post; the problem is roughly described, so I will not describe it myself here, and also provide some solutions using configuration;

Below, I will cite a problem encountered in an actual scenario for troubleshooting and analysis;
too many open files, if it is true that so many resources are needed, it can be solved by modifying the configuration;
but we have to consider a handle leak situation; for example, the following case: a
too many open files error occurs when a service is down.
insert image description here
Restart the service and it will be fine; other colleagues have adjusted the handle parameters, such as unlimited, but the service still has the same problem in the next few days;
what does it mean?
It shows that no matter how large you adjust, as the service runs, the number of handles will always be used up; then there is such a handle leak in the program; for example, the file stream is not closed, the socket is not closed, etc.; similar to the database connection leak, the connection is not closed; similar to the leak caused by the redis connection not being closed;

How to troubleshoot:
directly print out the handle usage at that time, check the handle usage, lsof > a, that is, print the handle usage of the linux server to file a; as shown in the figure, it is basically caused by the file mobile.properties, indicating that this file is opened somewhere, but the IO process is not closed, resulting in a handle leak
;

insert image description here

And our error message happened to be related to the mobile.properties file. Of course, this place is a coincidence. It may be that the files in this place are read relatively frequently, and other files are not read, or even none, so this error is just reflected; assuming that the operation of a system may open and close files all the time, then once the handle is leaked, various error reports may appear, and we will not be able to find out that the error report is related to this file;
insert image description here

Then we further check the stack of the above code and find that fis is not closed first, and the safest way is to close it in finally;
here, repeated references to fis are also files, which will lead to leaks;
insert image description here

2. Solution:

The above code can be repaired and closed;

Guess you like

Origin blog.csdn.net/wf_feng/article/details/121866380