[Linux server] Memory and cache cleaning

Today, I discovered that when running a deep learning program, the server would get stuck and dropped directly, and it would take a long time to reconnect. At first I thought it was a problem with the gpu, but I did not encounter any problems when running the same batch of data last week. After discussing with my friends, I thought it was the cause of the CPU memory explosion. The following is the complete repair process

1. View memory usage

free -h 查看内存和缓存
watch free -h 实时查看内存和缓存

Insert picture description here
It is found that when the program is not running, the occupation reaches 71g

2. Clean up the debris

最开始认为可能是系统之前运行程序留下的缓存文件,所以使用清理缓存的方法。
sudo -s    # 进入到管理员模式
sync    # 在清理缓存先要先把buffe中的数据先写入到硬盘中
echo 3 > /proc/sys/vm/drop_caches 

The value of drop_caches can be a number between 0-3, representing different meanings:
0: not released (system default value)
1: release page cache
2: release dentries and inodes
after cleaning up, only 3 g of free memory is found , Consider cleaning up unnecessary processes.

3.Kill useless processes

Use the top command to view the system status

(this is a screenshot of the os system afterwards, only showing the command effect)
Press M to sort the processes by memory usage.
Check the memory column and find that there are many inexplicable python commands running

use

ps auxw|head -1;ps auxw|sort -rn -k4|head -20

Check the 20 processes with the highest occupancy rate. As shown in the figure below,
Insert picture description here
you can see that there are many vscode-server extensions that take up a lot of memory. The reason is that after using vscode to remotely connect to the server to run the program, the memory occupied each time is not released.
Use kill -9 JOB-ID to close these processes

Check the memory with free -h again, 90g is available, and the cleanup is successful.

Guess you like

Origin blog.csdn.net/qq_45347185/article/details/115065215