The linux process suddenly disappeared

In the past few days, I have been running experiments on the Linux server. The process takes up a lot of space and takes a long time. Sometimes I find that the process suddenly disappears.

At this time, go through the kernel log of the system. There will be a kern.log log file under the /var/log path. You can search for the process number that was killed in it (cat kern.log | grep 1026xxx), and the log will be There are the following records:

Apr 28 14:52:20 xxxx-xx kernel: [1617652.028901] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1001.slice/session-980.scope ,task=python3.9,pid=1282264,uid=1003

For such a log record, **pid is the process number that you were killed by the system kernel OOM-KILLER, uid is the user id you logged in, and there are some other information (task)** and so on.

OOM: out of memory means that in Linux, due to system memory pressure, the system will choose to protect some system processes, and kill some other processes to release memory.
If you don't want the process to be killed by the system kernel, you have to modify the OOM-score. This value calculated by the system kernel is used to evaluate whether your process is reserved.

Several files related to OOM are /proc//oom_adj, /proc//oom_score. The former is a weight from -16 to 15, the default is 0, setting it to -17 means never being killed, and in other cases, the larger the value, the easier it is to be killed. The latter is a value calculated by it, and which processes are selected to be killed based on this value.

/proc//oom_adj and /proc//oom_score will change according to the linux distribution or version selected by your server.
For example, my server is a folder with many process numbers under /proc/. You need to enter the folder under the process number you want to change. There are two files oom_adj and oom_score. (/proc/1026445/oom_adj and /proc/1026445/oom_score) 1026445 represents the process number you don't want to be killed by the system kernel.

Use echo "17" > /proc/1026445/oom_adj to modify the value inside to ensure that your process will not be killed by the system kernel.

It's broken. When I got up this morning, the server was unresponsive. It must be shut down. The teacher in the computer room hasn't come yet. Warning brothers, although this operation is good, do not use it in large quantities, otherwise it will easily cause the system to shut down

Guess you like

Origin blog.csdn.net/baidu_41810561/article/details/124537888