Linux OOM-killer mechanism (out of memory)

I went back to the company this morning and found that a certain server game process hung up. The first thing that came to my mind was that the cloud server host machine was down and it was restarted (because I encountered it twice before)

So immediately log in to the server to check, first check the process log to determine the time when the process was killed, and then check the kernel log /var/log/message to find the following

picture.png

It is determined that insufficient memory causes the system to trigger the OOM-killer mechanism and kill the process.


Let's share the mechanism of OOM-killer:

Collected at the following address:

https://blog.csdn.net/lidan3959/article/details/17350711

https://blog.csdn.net/hunanchenxingyu/article/details/26271293


Under Linux, programs are allowed to apply for more memory than the system has available memory. This feature is called Overcommit.

This is done to optimize the system, because not all programs apply for memory and use it immediately. When you use it, the system may have reclaimed some resources.

Unfortunately, when you use the memory given to you by this Overcommit, if the system has no resources, the OOM killer will jump out.

In order to protect important processes from being dropped by oom-killer, we can: echo -17 > /proc/<pid>/oom_adj, -17 means to disable OOM

We can also disable OOM for the entire system :

sysctl -w vm.panic_on_oom=1 (default is 0, which means open)

sysctl -p

The parameter /proc/sys/vm/overcommit_memory can control the process's strategy for overcommitting memory

When overcommit_memory=0 the process is allowed to overcommit memory slightly, but not for a large number of overloaded requests (default)

When overcommit_memory=1 always allow the process to overcommit

Overcommit is always prohibited when overcommit_memory=2

This selection strategy under Linux has also been constantly evolving. As users, we can influence the OOM killer to make decisions by setting some values. Each process under Linux has an OOM weight. In /proc/<pid>/oom_adj, the value is -17 to +15. The higher the value, the easier it is to be killed.

In the end, OOM killer determines which process is killed by the value of /proc/<pid>/oom_score. This value is calculated by the memory consumption, CPU time (utime + stime), survival time (uptime - start time) and oom_adj of the system synthesis process. The more memory consumed, the higher the score, and the longer the survival time, the lower the score.

In short, the general strategy is: lose the least work, free the most memory without hurting innocent processes that use a lot of memory, and kill as few processes as possible.



Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325249937&siteId=291194637