problem statement
If one day you find that the memory is almost exhausted through "Free", but you can't see that the user mode application occupies too much memory space through the top/ps command, then the kernel module may have a memory leak.
question confirmed
- Count the memory U occupied by all applications (unit is K):
$ grep Pss /proc/[1-9]*/smaps | awk '{total+=$2}; END {print total}'
1721106
- View the total system memory T , free memory F , shared memory S , cache C
$ free -h
total used free shared buff/cache available
Mem: 125G 95G 4.2G 4.0G 26G 25G
Swap: 9.4G 444M 8.9G
$ cat /proc/meminfo
MemTotal: 131748024 kB
MemFree: 4229544 kB
MemAvailable: 26634796 kB
Buffers: 141416 kB
Cached: 24657800 kB
SwapCached: 198316 kB
Active: 7972388 kB
Inactive: 19558436 kB
Active(anon): 4249920 kB
Inactive(anon): 2666784 kB
Active(file): 3722468 kB
Inactive(file): 16891652 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 9830396 kB
SwapFree: 9375476 kB
Dirty: 80 kB
Writeback: 0 kB
AnonPages: 2601440 kB
Mapped: 71828 kB
Shmem: 4185096 kB
Slab: 2607824 kB
SReclaimable: 2129004 kB
SUnreclaim: 478820 kB
KernelStack: 29616 kB
PageTables: 45636 KB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 75704408 kB
Committed_AS: 14023220 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 529824 kB
VmallocChunk: 34292084736 kB
HardwareCorrupted: 0 kB
AnonHugePages: 260096 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 125688 kB
DirectMap2M: 3973120 kB
DirectMap1G: 132120576 kB
- Under normal circumstances, there should be the following formula ( K is the memory occupied by the kernel, and the Swap memory is ignored here):
T = U + K + S + C + F
From this, the memory K used by the kernel can be calculated .
Principle analysis
According to experience, the code that generally leaks memory and runs out of memory must be the part that frequently requests to release memory.
The memory that may be frequently requested to be released in the kernel may include:
- The kernel manages data structures, such as task_struct , inode , etc., and these codes have generally undergone a lot of testing, and the possibility of problems is unlikely.
- Kernel IO subsystems or drivers, such as BIO for block devices , SKB for network protocol stacks , and storage network device drivers.
The most likely place for problems here is the driver of the storage or network device .
On-site analysis
The Linux kernel uses a hierarchical memory management method, and each layer solves different problems. The key parts from bottom to top are as follows:
- Physical memory management is mainly used to describe the layout and attributes of memory, mainly including three structures of Node , Zone and Page , so that memory can be managed in units of Page ;
- Buddy memory management, mainly to solve the problem of external fragmentation, use functions such as get_free_pages to apply for release in units of Nth power of Page ;
- Slab memory management mainly solves the problem of internal fragmentation, and can apply for memory in batches according to the size specified by the user (the object cache pool needs to be created first);
- For kernel cache objects, use Slab to pre-allocate some fixed-size caches, and use functions such as kmalloc and vmalloc to release memory in units of bytes.
Next, we must first look at the level from which the memory is leaked (additional note: there are many related memory management technologies such as large page memory, page cache, block cache, etc., and they all apply for memory from these levels. , is not critical, all are ignored here.).
- View Buddy memory usage:
$ cat /proc/buddyinfo
Node 0, zone DMA 0 1 1 0 2 1 1 0 0 1 3
Node 0, zone DMA32 3222 6030 3094 3627 379 0 0 0 0 0 0
Node 0, zone Normal 13628 0 0 0 0 0 0 0 0 0 0
Node 1, zone Normal 73167 165265 104556 17921 2120 144 1 0 0 0 0
$ cat /proc/buddyinfo | awk '{sum=0;for(i=5;i<=NF;i++) sum+=$i*(2^(i-5))};{total+=sum/256};{print $1 " " $2 " " $3 " " $4 "\t : " sum/256 "M"} END {print "total\t\t\t : " total "M"}'
Node 0, zone DMA : 14.5234M
Node 0, zone DMA32 : 245.07M
Node 0, zone Normal : 53.2344M
Node 1, zone Normal : 3921.41M
total : 4234.24M
From this we can see how much memory Buddy has allocated in total.
- View Slab memory usage:
$ slabtop -o
Active / Total Objects (% used) : 3522231 / 6345435 (55.5%)
Active / Total Slabs (% used) : 148128 / 148128 (100.0%)
Active / Total Caches (% used) : 74 / 107 (69.2%)
Active / Total Size (% used) : 1297934.98K / 2593929.78K (50.0%)
Minimum / Average / Maximum Object : 0.01K / 0.41K / 15.88K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
1449510 666502 45% 1.06K 48317 30 1546144K xfs_inode
1229592 967866 78% 0.10K 31528 39 126112K buffer_head
1018560 375285 36% 0.06K 15915 64 63660K kmalloc-64
643216 322167 50% 0.57K 11486 56 367552K radix_tree_node
350826 147688 42% 0.38K 8353 42 133648K blkdev_requests
310421 131953 42% 0.15K 5857 53 46856K xfs_ili
273420 95765 35% 0.19K 6510 42 52080K dentry
174592 36069 20% 0.25K 2728 64 43648K kmaloc-256
155680 155680 100% 0.07K 2780 56 11120K Acpi-ParseExt
88704 34318 38% 0.50K 1386 64 44352K kmalloc-512
85176 52022 61% 0.19K 2028 42 16224K kmalloc-192
59580 59580 100% 0.11K 1655 36 6620K sysfs_dir_cache
43031 42594 98% 0.21K 1163 37 9304K vm_area_struct
35392 30850 87% 0.12K 553 64 4424K kmalloc-128
35070 20418 58% 0.09K 835 42 3340K kmaloc-96
34304 34304 100% 0.03K 268 128 1072K kmalloc-32
$ cat /proc/slabinfo
slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
kvm_async_pf 0 0 136 60 2 : tunables 0 0 0 : slabdata 0 0 0
kvm_vcpu 0 0 16256 2 8 : tunables 0 0 0 : slabdata 0 0 0
kvm_mmu_page_header 0 0 168 48 2 : tunables 0 0 0 : slabdata 0 0 0
xfs_dqtrx 0 0 528 62 8 : tunables 0 0 0 : slabdata 0 0 0
xfs_dquot 0 0 472 69 8 : tunables 0 0 0 : slabdata 0 0 0
xfs_icr 0 0 144 56 2 : tunables 0 0 0 : slabdata 0 0 0
xfs_ili 131960 310421 152 53 2 : tunables 0 0 0 : slabdata 5857 5857 0
xfs_inode 666461 1449510 1088 30 8 : tunables 0 0 0 : slabdata 48317 48317 0
xfs_efd_item 8120 8280 400 40 4 : tunables 0 0 0 : slabdata 207 207 0
xfs_da_state 2176 2176 480 68 8 : tunables 0 0 0 : slabdata 32 32 0
xfs_btree_cur 1248 1248 208 39 2 : tunables 0 0 0 : slabdata 32 32 0
xfs_log_ticket 12981 13200 184 44 2 : tunables 0 0 0 : slabdata 300 300 0
nfsd4_openowners 0 0 440 37 4 : tunables 0 0 0 : slabdata 0 0 0
rpc_inode_cache 51 51 640 51 8 : tunables 0 0 0 : slabdata 1 1 0
ext4_groupinfo_4k 4440 4440 136 60 2 : tunables 0 0 0 : slabdata 74 74 0
ext4_inode_cache 4074 5921 1048 31 8 : tunables 0 0 0 : slabdata 191 191 0
ext4_xattr 276 276 88 46 1 : tunables 0 0 0 : slabdata 6 6 0
ext4_free_data 3264 3264 64 64 1 : tunables 0 0 0 : slabdata 51 51 0
ext4_allocation_context 2048 2048 128 64 2 : tunables 0 0 0 : slabdata 32 32 0
ext4_io_end 1785 1785 80 51 1 : tunables 0 0 0 : slabdata 35 35 0
ext4_extent_status 20706 20706 40 102 1 : tunables 0 0 0 : slabdata 203 203 0
jbd2_journal_handle 2720 2720 48 85 1 : tunables 0 0 0 : slabdata 32 32 0
jbd2_journal_head 4680 4680 112 36 1 : tunables 0 0 0 : slabdata 130 130 0
jbd2_revoke_table_s 256 256 16 256 1 : tunables 0 0 0 : slabdata 1 1 0
jbd2_revoke_record_s 4096 4096 32 128 1 : tunables 0 0 0 : slabdata 32 32 0
scsi_cmd_cache 7056 7272 448 36 4 : tunables 0 0 0 : slabdata 202 202 0
UDPLITEv6 0 0 1152 28 8 : tunables 0 0 0 : slabdata 0 0 0
UDPv6 728 728 1152 28 8 : tunables 0 0 0 : slabdata 26 26 0
tw_sock_TCPv6 0 0 256 64 4 : tunables 0 0 0 : slabdata 0 0 0
TCPv6 405 405 2112 15 8 : tunables 0 0 0 : slabdata 27 27 0
uhci_urb_priv 0 0 56 73 1 : tunables 0 0 0 : slabdata 0 0 0
cfq_queue 27790 27930 232 70 4 : tunables 0 0 0 : slabdata 399 399 0
bsg_cmd 0 0 312 52 4 : tunables 0 0 0 : slabdata 0 0 0
mqueue_inode_cache 36 36 896 36 8 : tunables 0 0 0 : slabdata 1 1 0
hugetlbfs_inode_cache 106 106 608 53 8 : tunables 0 0 0 : slabdata 2 2 0
configfs_dir_cache 0 0 88 46 1 : tunables 0 0 0 : slabdata 0 0 0
dquot 2048 2048 256 64 4 : tunables 0 0 0 : slabdata 32 32 0
kioctx 0 0 576 56 8 : tunables 0 0 0 : slabdata 0 0 0
userfaultfd_ctx_cache 0 0 128 64 2 : tunables 0 0 0 : slabdata 0 0 0
pid_namespace 0 0 2176 15 8 : tunables 0 0 0 : slabdata 0 0 0
user_namespace 0 0 280 58 4 : tunables 0 0 0 : slabdata 0 0 0
posix_timers_cache 0 0 248 66 4 : tunables 0 0 0 : slabdata 0 0 0
UDP-Lite 0 0 1024 32 8 : tunables 0 0 0 : slabdata 0 0 0
RAW 1530 1530 960 34 8 : tunables 0 0 0 : slabdata 45 45 0
UDP 1024 1024 1024 32 8 : tunables 0 0 0 : slabdata 32 32 0
tw_sock_TCP 10944 11328 256 64 4 : tunables 0 0 0 : slabdata 177 177 0
TCP 2886 3842 1920 17 8 : tunables 0 0 0 : slabdata 226 226 0
blkdev_queue 118 225 2088 15 8 : tunables 0 0 0 : slabdata 15 15 0
blkdev_requests 147485 350826 384 42 4 : tunables 0 0 0 : slabdata 8353 8353 0
blkdev_ioc 2262 2262 104 39 1 : tunables 0 0 0 : slabdata 58 58 0
fsnotify_event_holder 5440 5440 24 170 1 : tunables 0 0 0 : slabdata 32 32 0
fsnotify_event 15912 16252 120 68 2 : tunables 0 0 0 : slabdata 239 239 0
sock_inode_cache 12478 13260 640 51 8 : tunables 0 0 0 : slabdata 260 260 0
net_namespace 0 0 4608 7 8 : tunables 0 0 0 : slabdata 0 0 0
shmem_inode_cache 3264 3264 680 48 8 : tunables 0 0 0 : slabdata 68 68 0
Acpi-ParseExt 155680 155680 72 56 1 : tunables 0 0 0 : slabdata 2780 2780 0
Acpi-Namespace 16422 16422 40 102 1 : tunables 0 0 0 : slabdata 161 161 0
taskstats 1568 1568 328 49 4 : tunables 0 0 0 : slabdata 32 32 0
proc_inode_cache 12352 12544 656 49 8 : tunables 0 0 0 : slabdata 256 256 0
sigqueue 1632 1632 160 51 2 : tunables 0 0 0 : slabdata 32 32 0
bdev_cache 858 858 832 39 8 : tunables 0 0 0 : slabdata 22 22 0
sysfs_dir_cache 59580 59580 112 36 1 : tunables 0 0 0 : slabdata 1655 1655 0
inode_cache 15002 17050 592 55 8 : tunables 0 0 0 : slabdata 310 310 0
dentry 96235 273420 192 42 2 : tunables 0 0 0 : slabdata 6510 6510 0
iint_cache 0 0 80 51 1 : tunable 0 0 0 : slabdata 0 0 0
selinux_inode_security 22681 23205 80 51 1 : tunables 0 0 0 : slabdata 455 455 0
buffer_head 968560 1229592 104 39 1 : tunables 0 0 0 : slabdata 31528 31528 0
vm_area_struct 43185 43216 216 37 2 : tunables 0 0 0 : slabdata 1168 1168 0
mm_struct 860 860 1600 20 8 : tunables 0 0 0 : slabdata 43 43 0
files_cache 1887 1887 640 51 8 : tunables 0 0 0 : slabdata 37 37 0
signal_cache 3595 3724 1152 28 8 : tunables 0 0 0 : slabdata 133 133 0
sighand_cache 2373 2445 2112 15 8 : tunables 0 0 0 : slabdata 163 163 0
task_xstate 4920 5226 832 39 8 : tunables 0 0 0 : slabdata 134 134 0
task_struct 2303 2420 2944 11 8 : tunables 0 0 0 : slabdata 220 220 0
anon_vma 27367 27392 64 64 1 : tunables 0 0 0 : slabdata 428 428 0
shared_policy_node 5525 5525 48 85 1 : tunables 0 0 0 : slabdata 65 65 0
numa_policy 248 248 264 62 4 : tunables 0 0 0 : slabdata 4 4 0
radix_tree_node 321897 643216 584 56 8 : tunables 0 0 0 : slabdata 11486 11486 0
idr_layer_cache 953 975 2112 15 8 : tunables 0 0 0 : slabdata 65 65 0
dma-kmalloc-8192 0 0 8192 4 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-4096 0 0 4096 8 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-2048 0 0 2048 16 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-1024 0 0 1024 32 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-512 0 0 512 64 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-256 0 0 256 64 4 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-128 0 0 128 64 2 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-64 0 0 64 64 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-32 0 0 32 128 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-16 0 0 16 256 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-8 0 0 8 512 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-192 0 0 192 42 2 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-96 0 0 96 42 1 : tunables 0 0 0 : slabdata 0 0 0
kmalloc-8192 314 340 8192 4 8 : tunables 0 0 0 : slabdata 85 85 0
kmalloc-4096 983 1024 4096 8 8 : tunables 0 0 0 : slabdata 128 128 0
kmalloc-2048 4865 4928 2048 16 8 : tunables 0 0 0 : slabdata 308 308 0
kmalloc-1024 10084 10464 1024 32 8 : tunables 0 0 0 : slabdata 327 327 0
kmalloc-512 34318 88704 512 64 8 : tunables 0 0 0 : slabdata 1386 1386 0
kmalloc-256 35482 174592 256 64 4 : tunables 0 0 0 : slabdata 2728 2728 0
kmalloc-192 52022 85176 192 42 2 : tunables 0 0 0 : slabdata 2028 2028 0
kmalloc-128 30732 35392 128 64 2 : tunables 0 0 0 : slabdata 553 553 0
kmalloc-96 20418 35070 96 42 1 : tunables 0 0 0 : slabdata 835 835 0
kmalloc-64 375761 1018560 64 64 1 : tunables 0 0 0 : slabdata 15915 15915 0
kmalloc-32 34304 34304 32 128 1 : tunables 0 0 0 : slabdata 268 268 0
kmalloc-16 18432 18432 16 256 1 : tunables 0 0 0 : slabdata 72 72 0
kmalloc-8 25088 25088 8 512 1 : tunables 0 0 0 : slabdata 49 49 0
kmem_cache_node 683 704 64 64 1 : tunables 0 0 0 : slabdata 11 11 0
kmem_cache 576 576 256 64 4 : tunables 0 0 0 : slabdata 9 9 0
Through the above command, we can determine which Slab cache occupies the most memory.
From on-site data analysis, it was found that Buddy allocated more than 100 G of memory, and Slab only used a few G of memory. This shows that the leaked memory is not leaked from Slab and kmalloc , but from Buddy .
The memory allocated by Buddy may be used by Slab , huge page memory, page cache, block cache, driver, application program page fault mapping or mmap , etc. that need to apply for memory in units of pages. Among these parts, the most likely problem is still the driver.