Analysis ideas for Linux Kernel module memory leaks

problem statement

    If one day you find that the memory is almost exhausted through "Free", but you can't see that the user mode application occupies too much memory space through the top/ps command, then the kernel module may have a memory leak.

question confirmed

  •  Count the memory U  occupied by all applications (unit is K):

$ grep Pss /proc/[1-9]*/smaps | awk '{total+=$2}; END {print total}'

1721106

  • View the total system memory  T  , free memory  F  , shared memory , cache  C  

$ free -h

              total        used        free      shared  buff/cache   available

Mem:           125G         95G        4.2G        4.0G         26G         25G

Swap:          9.4G        444M        8.9G

$ cat /proc/meminfo

MemTotal: 131748024 kB

MemFree:         4229544 kB

MemAvailable: 26634796 kB

Buffers:          141416 kB

Cached:         24657800 kB

SwapCached:       198316 kB

Active:          7972388 kB

Inactive:       19558436 kB

Active(anon):    4249920 kB

Inactive(anon):  2666784 kB

Active(file):    3722468 kB

Inactive(file): 16891652 kB

Unevictable:           0 kB

Mlocked:               0 kB

SwapTotal:       9830396 kB

SwapFree:        9375476 kB

Dirty:                80 kB

Writeback:             0 kB

AnonPages:       2601440 kB

Mapped:            71828 kB

Shmem: 4185096 kB

Slab:            2607824 kB

SReclaimable:    2129004 kB

SUnreclaim:       478820 kB

KernelStack:       29616 kB

PageTables: 45636 KB

NFS_Unstable:          0 kB

Bounce:                0 kB

WritebackTmp:          0 kB

CommitLimit:    75704408 kB

Committed_AS:   14023220 kB

VmallocTotal:   34359738367 kB

VmallocUsed:      529824 kB

VmallocChunk:   34292084736 kB

HardwareCorrupted:     0 kB

AnonHugePages: 260096 kB

HugePages_Total:       0

HugePages_Free:        0

HugePages_Rsvd:        0

HugePages_Surp:        0

Hugepagesize:       2048 kB

DirectMap4k:      125688 kB

DirectMap2M:     3973120 kB

DirectMap1G:    132120576 kB

  • Under normal circumstances, there should be the following formula ( is the memory occupied by the kernel, and the Swap memory is ignored here):

T = U + K + S + C + F

 From this, the memory K used by the kernel can be calculated .

Principle analysis

According to experience, the code that generally leaks memory and runs out of memory must be the part that frequently requests to release memory.

The memory that may be frequently requested to be released in the kernel may include:

  • The kernel manages data structures, such as task_struct , inode , etc., and these codes have generally undergone a lot of testing, and the possibility of problems is unlikely.
  • Kernel IO subsystems or drivers, such as BIO for block devices , SKB for network protocol stacks , and storage network device drivers.

The most likely place for problems here is the driver of the storage or network device . 

On-site analysis

The Linux kernel uses a hierarchical memory management method, and each layer solves different problems. The key parts from bottom to top are as follows:

  1. Physical memory management is mainly used to describe the layout and attributes of memory, mainly including three structures of Node , Zone and Page , so that memory can be managed in units of Page ;
  2. Buddy memory management, mainly to solve the problem of external fragmentation, use functions such as get_free_pages to apply for release in units of Nth power of Page ;
  3. Slab memory management mainly solves the problem of internal fragmentation, and can apply for memory in batches according to the size specified by the user (the object cache pool needs to be created first);
  4. For kernel cache objects, use Slab to pre-allocate some fixed-size caches, and use functions such as kmalloc and vmalloc to release memory in units of bytes.

Next, we must first look at the level from which the memory is leaked (additional note: there are many related memory management technologies such as large page memory, page cache, block cache, etc., and they all apply for memory from these levels. , is not critical, all are ignored here.).

  • View Buddy memory usage:

$ cat /proc/buddyinfo

Node 0, zone      DMA      0      1      1      0      2      1      1      0      0      1      3

Node 0, zone    DMA32   3222   6030   3094   3627    379      0      0      0      0      0      0

Node 0, zone   Normal  13628      0      0      0      0      0      0      0      0      0      0

Node 1, zone   Normal  73167 165265 104556  17921   2120    144      1      0      0      0      0

$ cat /proc/buddyinfo | awk '{sum=0;for(i=5;i<=NF;i++) sum+=$i*(2^(i-5))};{total+=sum/256};{print $1 " " $2 " " $3 " " $4 "\t : " sum/256 "M"} END {print "total\t\t\t : " total "M"}'

Node 0, zone DMA      : 14.5234M

Node 0, zone DMA32    : 245.07M

Node 0, zone Normal   : 53.2344M

Node 1, zone Normal   : 3921.41M

total                 : 4234.24M

From this we can see how much memory Buddy has allocated in total.

  • View Slab memory usage:

$ slabtop -o

 Active / Total Objects (% used)    : 3522231 / 6345435 (55.5%)

 Active / Total Slabs (% used)      : 148128 / 148128 (100.0%)

 Active / Total Caches (% used)     : 74 / 107 (69.2%)

 Active / Total Size (% used)       : 1297934.98K / 2593929.78K (50.0%)

 Minimum / Average / Maximum Object : 0.01K / 0.41K / 15.88K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   

1449510 666502  45%    1.06K  48317       30   1546144K xfs_inode              

1229592 967866  78%    0.10K  31528       39    126112K buffer_head            

1018560 375285  36%    0.06K  15915       64     63660K kmalloc-64             

643216 322167  50%    0.57K  11486       56    367552K radix_tree_node        

350826 147688  42%    0.38K   8353       42    133648K blkdev_requests        

310421 131953 42% 0.15K 5857 53 46856K xfs_ili                

273420  95765  35%    0.19K   6510       42     52080K dentry                 

174592 36069 20% 0.25K 2728 64 43648K kmaloc-256            

155680 155680 100%    0.07K   2780       56     11120K Acpi-ParseExt          

 88704  34318  38%    0.50K   1386       64     44352K kmalloc-512            

 85176  52022  61%    0.19K   2028       42     16224K kmalloc-192            

 59580  59580 100%    0.11K   1655       36      6620K sysfs_dir_cache        

 43031  42594  98%    0.21K   1163       37      9304K vm_area_struct         

 35392  30850  87%    0.12K    553       64      4424K kmalloc-128            

 35070 20418 58% 0.09K 835 42 3340K kmaloc-96             

 34304  34304 100%    0.03K    268      128      1072K kmalloc-32

$ cat /proc/slabinfo

slabinfo - version: 2.1

# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>

kvm_async_pf           0      0    136   60    2 : tunables    0    0    0 : slabdata      0      0      0

kvm_vcpu               0      0  16256    2    8 : tunables    0    0    0 : slabdata      0      0      0

kvm_mmu_page_header      0      0    168   48    2 : tunables    0    0    0 : slabdata      0      0      0

xfs_dqtrx              0      0    528   62    8 : tunables    0    0    0 : slabdata      0      0      0

xfs_dquot              0      0    472   69    8 : tunables    0    0    0 : slabdata      0      0      0

xfs_icr                0      0    144   56    2 : tunables    0    0    0 : slabdata      0      0      0

xfs_ili           131960 310421    152   53    2 : tunables    0    0    0 : slabdata   5857   5857      0

xfs_inode         666461 1449510   1088   30    8 : tunables    0    0    0 : slabdata  48317  48317      0

xfs_efd_item        8120   8280    400   40    4 : tunables    0    0    0 : slabdata    207    207      0

xfs_da_state        2176   2176    480   68    8 : tunables    0    0    0 : slabdata     32     32      0

xfs_btree_cur       1248   1248    208   39    2 : tunables    0    0    0 : slabdata     32     32      0

xfs_log_ticket     12981  13200    184   44    2 : tunables    0    0    0 : slabdata    300    300      0

nfsd4_openowners       0      0    440   37    4 : tunables    0    0    0 : slabdata      0      0      0

rpc_inode_cache       51     51    640   51    8 : tunables    0    0    0 : slabdata      1      1      0

ext4_groupinfo_4k   4440   4440    136   60    2 : tunables    0    0    0 : slabdata     74     74      0

ext4_inode_cache    4074   5921   1048   31    8 : tunables    0    0    0 : slabdata    191    191      0

ext4_xattr           276    276     88   46    1 : tunables    0    0    0 : slabdata      6      6      0

ext4_free_data      3264   3264     64   64    1 : tunables    0    0    0 : slabdata     51     51      0

ext4_allocation_context   2048   2048    128   64    2 : tunables    0    0    0 : slabdata     32     32      0

ext4_io_end         1785   1785     80   51    1 : tunables    0    0    0 : slabdata     35     35      0

ext4_extent_status  20706  20706     40  102    1 : tunables    0    0    0 : slabdata    203    203      0

jbd2_journal_handle   2720   2720     48   85    1 : tunables    0    0    0 : slabdata     32     32      0

jbd2_journal_head   4680   4680    112   36    1 : tunables    0    0    0 : slabdata    130    130      0

jbd2_revoke_table_s    256    256     16  256    1 : tunables    0    0    0 : slabdata      1      1      0

jbd2_revoke_record_s   4096   4096     32  128    1 : tunables    0    0    0 : slabdata     32     32      0

scsi_cmd_cache      7056   7272    448   36    4 : tunables    0    0    0 : slabdata    202    202      0

UDPLITEv6              0      0   1152   28    8 : tunables    0    0    0 : slabdata      0      0      0

UDPv6                728    728   1152   28    8 : tunables    0    0    0 : slabdata     26     26      0

tw_sock_TCPv6          0      0    256   64    4 : tunables    0    0    0 : slabdata      0      0      0

TCPv6                405    405   2112   15    8 : tunables    0    0    0 : slabdata     27     27      0

uhci_urb_priv          0      0     56   73    1 : tunables    0    0    0 : slabdata      0      0      0

cfq_queue          27790  27930    232   70    4 : tunables    0    0    0 : slabdata    399    399      0

bsg_cmd                0      0    312   52    4 : tunables    0    0    0 : slabdata      0      0      0

mqueue_inode_cache     36     36    896   36    8 : tunables    0    0    0 : slabdata      1      1      0

hugetlbfs_inode_cache    106    106    608   53    8 : tunables    0    0    0 : slabdata      2      2      0

configfs_dir_cache      0      0     88   46    1 : tunables    0    0    0 : slabdata      0      0      0

dquot               2048   2048    256   64    4 : tunables    0    0    0 : slabdata     32     32      0

kioctx                 0      0    576   56    8 : tunables    0    0    0 : slabdata      0      0      0

userfaultfd_ctx_cache      0      0    128   64    2 : tunables    0    0    0 : slabdata      0      0      0

pid_namespace          0      0   2176   15    8 : tunables    0    0    0 : slabdata      0      0      0

user_namespace         0      0    280   58    4 : tunables    0    0    0 : slabdata      0      0      0

posix_timers_cache      0      0    248   66    4 : tunables    0    0    0 : slabdata      0      0      0

UDP-Lite               0      0   1024   32    8 : tunables    0    0    0 : slabdata      0      0      0

RAW                 1530   1530    960   34    8 : tunables    0    0    0 : slabdata     45     45      0

UDP                 1024   1024   1024   32    8 : tunables    0    0    0 : slabdata     32     32      0

tw_sock_TCP        10944  11328    256   64    4 : tunables    0    0    0 : slabdata    177    177      0

TCP                 2886   3842   1920   17    8 : tunables    0    0    0 : slabdata    226    226      0

blkdev_queue         118    225   2088   15    8 : tunables    0    0    0 : slabdata     15     15      0

blkdev_requests   147485 350826    384   42    4 : tunables    0    0    0 : slabdata   8353   8353      0

blkdev_ioc          2262   2262    104   39    1 : tunables    0    0    0 : slabdata     58     58      0

fsnotify_event_holder   5440   5440     24  170    1 : tunables    0    0    0 : slabdata     32     32      0

fsnotify_event     15912  16252    120   68    2 : tunables    0    0    0 : slabdata    239    239      0

sock_inode_cache   12478  13260    640   51    8 : tunables    0    0    0 : slabdata    260    260      0

net_namespace          0      0   4608    7    8 : tunables    0    0    0 : slabdata      0      0      0

shmem_inode_cache   3264   3264    680   48    8 : tunables    0    0    0 : slabdata     68     68      0

Acpi-ParseExt     155680 155680     72   56    1 : tunables    0    0    0 : slabdata   2780   2780      0

Acpi-Namespace     16422  16422     40  102    1 : tunables    0    0    0 : slabdata    161    161      0

taskstats           1568   1568    328   49    4 : tunables    0    0    0 : slabdata     32     32      0

proc_inode_cache   12352  12544    656   49    8 : tunables    0    0    0 : slabdata    256    256      0

sigqueue            1632   1632    160   51    2 : tunables    0    0    0 : slabdata     32     32      0

bdev_cache           858    858    832   39    8 : tunables    0    0    0 : slabdata     22     22      0

sysfs_dir_cache    59580  59580    112   36    1 : tunables    0    0    0 : slabdata   1655   1655      0

inode_cache        15002  17050    592   55    8 : tunables    0    0    0 : slabdata    310    310      0

dentry             96235 273420    192   42    2 : tunables    0    0    0 : slabdata   6510   6510      0

iint_cache 0 0 80 51 1 : tunable 0 0 0 : slabdata 0 0 0

selinux_inode_security  22681  23205     80   51    1 : tunables    0    0    0 : slabdata    455    455      0

buffer_head       968560 1229592    104   39    1 : tunables    0    0    0 : slabdata  31528  31528      0

vm_area_struct     43185  43216    216   37    2 : tunables    0    0    0 : slabdata   1168   1168      0

mm_struct            860    860   1600   20    8 : tunables    0    0    0 : slabdata     43     43      0

files_cache         1887   1887    640   51    8 : tunables    0    0    0 : slabdata     37     37      0

signal_cache        3595   3724   1152   28    8 : tunables    0    0    0 : slabdata    133    133      0

sighand_cache       2373   2445   2112   15    8 : tunables    0    0    0 : slabdata    163    163      0

task_xstate         4920   5226    832   39    8 : tunables    0    0    0 : slabdata    134    134      0

task_struct         2303   2420   2944   11    8 : tunables    0    0    0 : slabdata    220    220      0

anon_vma           27367  27392     64   64    1 : tunables    0    0    0 : slabdata    428    428      0

shared_policy_node   5525   5525     48   85    1 : tunables    0    0    0 : slabdata     65     65      0

numa_policy          248    248    264   62    4 : tunables    0    0    0 : slabdata      4      4      0

radix_tree_node   321897 643216    584   56    8 : tunables    0    0    0 : slabdata  11486  11486      0

idr_layer_cache      953    975   2112   15    8 : tunables    0    0    0 : slabdata     65     65      0

dma-kmalloc-8192       0      0   8192    4    8 : tunables    0    0    0 : slabdata      0      0      0

dma-kmalloc-4096       0      0   4096    8    8 : tunables    0    0    0 : slabdata      0      0      0

dma-kmalloc-2048       0      0   2048   16    8 : tunables    0    0    0 : slabdata      0      0      0

dma-kmalloc-1024       0      0   1024   32    8 : tunables    0    0    0 : slabdata      0      0      0

dma-kmalloc-512        0      0    512   64    8 : tunables    0    0    0 : slabdata      0      0      0

dma-kmalloc-256        0      0    256   64    4 : tunables    0    0    0 : slabdata      0      0      0

dma-kmalloc-128        0      0    128   64    2 : tunables    0    0    0 : slabdata      0      0      0

dma-kmalloc-64         0      0     64   64    1 : tunables    0    0    0 : slabdata      0      0      0

dma-kmalloc-32         0      0     32  128    1 : tunables    0    0    0 : slabdata      0      0      0

dma-kmalloc-16         0      0     16  256    1 : tunables    0    0    0 : slabdata      0      0      0

dma-kmalloc-8          0      0      8  512    1 : tunables    0    0    0 : slabdata      0      0      0

dma-kmalloc-192        0      0    192   42    2 : tunables    0    0    0 : slabdata      0      0      0

dma-kmalloc-96         0      0     96   42    1 : tunables    0    0    0 : slabdata      0      0      0

kmalloc-8192         314    340   8192    4    8 : tunables    0    0    0 : slabdata     85     85      0

kmalloc-4096         983   1024   4096    8    8 : tunables    0    0    0 : slabdata    128    128      0

kmalloc-2048        4865   4928   2048   16    8 : tunables    0    0    0 : slabdata    308    308      0

kmalloc-1024       10084  10464   1024   32    8 : tunables    0    0    0 : slabdata    327    327      0

kmalloc-512        34318  88704    512   64    8 : tunables    0    0    0 : slabdata   1386   1386      0

kmalloc-256        35482 174592    256   64    4 : tunables    0    0    0 : slabdata   2728   2728      0

kmalloc-192        52022  85176    192   42    2 : tunables    0    0    0 : slabdata   2028   2028      0

kmalloc-128        30732  35392    128   64    2 : tunables    0    0    0 : slabdata    553    553      0

kmalloc-96         20418  35070     96   42    1 : tunables    0    0    0 : slabdata    835    835      0

kmalloc-64        375761 1018560     64   64    1 : tunables    0    0    0 : slabdata  15915  15915      0

kmalloc-32         34304  34304     32  128    1 : tunables    0    0    0 : slabdata    268    268      0

kmalloc-16         18432  18432     16  256    1 : tunables    0    0    0 : slabdata     72     72      0

kmalloc-8          25088  25088      8  512    1 : tunables    0    0    0 : slabdata     49     49      0

kmem_cache_node      683    704     64   64    1 : tunables    0    0    0 : slabdata     11     11      0

kmem_cache           576    576    256   64    4 : tunables    0    0    0 : slabdata      9      9      0

Through the above command, we can determine which Slab cache occupies the most memory.

From on-site data analysis, it was found that Buddy allocated more than 100 G of memory, and Slab only used a few G of memory. This shows that the leaked memory is not leaked from Slab and kmalloc , but from Buddy .

The memory allocated by Buddy may be used by Slab , huge page memory, page cache, block cache, driver, application program page fault mapping or mmap , etc. that need to apply for memory in units of pages. Among these parts, the most likely problem is still the driver.

Guess you like

Origin blog.csdn.net/huapeng_guo/article/details/131895216