内存不足触发Linux OOM-killer机制分析

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/zhuyunier/article/details/84974405

启动我们的程序长时间拉流一段时间,系统内存不足导致触发linux的OOM killer保护机制,kill掉内存占用较高的HPC03C进程 ,打印如下:

[16321.605050] HPC03C invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[16321.645746] CPU: 0 PID: 119 Comm: HPC03C Tainted: G           O 3.10.14 #2
[16321.667660] Stack : 00000006 8003b6c0 00000000 804b0000 00000000 00000000 00000000 00000000
	  00000000 00000000 80522e2a 0000003e 81e90f58 000013b9 00000000 00000000
	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
	  00000000 00000000 00000000 00000000 00000000 816a3b00 816a3b64 80451674
	  804aa1c7 8003cb24 00000000 80451674 00000000 00000077 81e90f58 816a3ad8
	  ...
[16321.858096] Call Trace:
[16321.868528] [<80020ca4>] show_stack+0x48/0x70
[16321.885701] [<803abb9c>] dump_stack+0x20/0x2c
[16321.907941] [<803aa580>] dump_header.isra.14+0x8c/0x1c4
[16321.919646] [<8009db74>] oom_kill_process+0xc4/0x44c
[16321.945295] [<8009e3ec>] out_of_memory+0x2c0/0x318
[16321.966288] [<800a1cf8>] __alloc_pages_nodemask+0x638/0x830
[16321.984326] [<8009cbd0>] filemap_fault+0x330/0x498
[16322.006701] [<800b8d40>] __do_fault+0xd0/0x4b0
[16322.028251] [<800bc6d4>] handle_pte_fault+0x334/0x84c
[16322.053548] [<800bcc8c>] handle_mm_fault+0xa0/0xe0
[16322.079625] [<80025958>] do_page_fault+0x148/0x420
[16322.084753] [<8001bec4>] resume_userspace_check+0x0/0x10
[16322.110213] 
[16322.119040] Mem-Info:
[16322.121490] Normal per-cpu:
[16322.130454] CPU    0: hi:    0, btch:   1 usd:   0
[16322.145833] active_anon:5107 inactive_anon:3 isolated_anon:0
[16322.145833]  active_file:172 inactive_file:159 isolated_file:0
[16322.145833]  unevictable:0 dirty:0 writeback:0 unstable:0
[16322.145833]  free:281 slab_reclaimable:171 slab_unreclaimable:750
[16322.145833]  mapped:217 shmem:5 pagetables:73 bounce:0
[16322.145833]  free_cma:0
[16322.258017] Normal free:904kB min:720kB low:900kB high:1080kB active_anon:20428kB inactive_anon:12kB active_file:680kB inactive_file:860kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:38912kB managed:32728kB mlocked:0kB dirty:0kB writeback:0kB mapped:848kB shmem:20kB slab_reclaimable:684kB slab_unreclaimable:3000kB kernel_stack:592kB pagetables:292kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[16322.390479] lowmem_reserve[]: 0 0
[16322.407231] Normal: 45*4kB (UM) 6*8kB (U) 46*16kB (UM) 1*32kB (M) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB 0*32768kB 0*65536kB = 996kB
[16322.468905] 354 total pagecache pages
[16322.489525] 0 pages in swap cache
[16322.499744] Swap cache stats: add 0, delete 0, find 0/0
[16322.522507] Free swap  = 0kB
[16322.530809] Total swap = 0kB
[16322.537806] 9728 pages RAM
[16322.542105] 1485 pages reserved
[16322.553217] 261956 pages shared
[16322.566732] 7521 pages non-shared
[16322.579320] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[16322.606225] [   58]     0    58      463       35       4        0             0 sh
[16322.623754] [  112]     0   112   111326     5244      65        0             0 HPC03C
[16322.633654] Out of memory: Kill process 112 (HPC03C) score 612 or sacrifice child
[16322.655864] Killed process 112 (HPC03C) total-vm:445304kB, anon-rss:20244kB, file-rss:736kB
[16323.053600] codec_codec_ctl: set CODEC_TURN_OFF...
[16323.058611] codec_codec_ctl: set CODEC_TURN_OFF...
Killed

OOM信息分析
1、第一部分

[16321.605050] HPC03C invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
  • HPC03C:当前进程为HPC03C,请求分配页面时,引发了oom-killer ;
  • gfp_mask=0x201da:是alloc_page的GFP标志,对于当前场景,代表___GFP_RECLAIMABLE | ___GFP_HIGH | ___GFP_IO | ___GFP_FS;
  • order=0 : 表示alloc_page的order为0, 也就是说仅请求1^0=1个页面;
  • oom_score_adj=0: 表明这个进程被杀的几率, oom_score_adj取值0(never kill)~1000(always kill);

2、第二部分

[16321.645746] CPU: 0 PID: 119 Comm: HPC03C Tainted: G           O 3.10.14 #2
[16321.667660] Stack : 00000006 8003b6c0 00000000 804b0000 00000000 00000000 00000000 00000000
	  00000000 00000000 80522e2a 0000003e 81e90f58 000013b9 00000000 00000000
	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
	  00000000 00000000 00000000 00000000 00000000 816a3b00 816a3b64 80451674
	  804aa1c7 8003cb24 00000000 80451674 00000000 00000077 81e90f58 816a3ad8
	  ...
[16321.858096] Call Trace:
[16321.868528] [<80020ca4>] show_stack+0x48/0x70
[16321.885701] [<803abb9c>] dump_stack+0x20/0x2c
[16321.907941] [<803aa580>] dump_header.isra.14+0x8c/0x1c4
[16321.919646] [<8009db74>] oom_kill_process+0xc4/0x44c
[16321.945295] [<8009e3ec>] out_of_memory+0x2c0/0x318
[16321.966288] [<800a1cf8>] __alloc_pages_nodemask+0x638/0x830
[16321.984326] [<8009cbd0>] filemap_fault+0x330/0x498
[16322.006701] [<800b8d40>] __do_fault+0xd0/0x4b0
[16322.028251] [<800bc6d4>] handle_pte_fault+0x334/0x84c
[16322.053548] [<800bcc8c>] handle_mm_fault+0xa0/0xe0
[16322.079625] [<80025958>] do_page_fault+0x148/0x420
[16322.084753] [<8001bec4>] resume_userspace_check+0x0/0x10
  • dump_header->dump_stack的输出的引发OOM的调用函数栈,原因为调用__alloc_pages_nodemask分配内存空间时,显示out_of_memory内存不足,系统调用了oom_kill_process触发OOM;

3、第三部分

[16322.119040] Mem-Info:
[16322.121490] Normal per-cpu:
[16322.130454] CPU    0: hi:    0, btch:   1 usd:   0

每个内存管理区定义了一个“每CPU”页框高速缓存,所有“每CPU”高速缓存包含一些预先分配的页框,它们被用于满足本地CPU 发出的单个页内存请求。

  • CPU 0表示CPU 0;
  • hi: 0表示上限值,超过这个数字,则释放batch个页框到buddy系统中;
  • btch: 0 表示向高速缓存添加或者删除页框时,页框块的大小;
  • usd: 0 表示页框高速缓存中的页框数目;

4、第四部分

[16322.145833] active_anon:5107 inactive_anon:3 isolated_anon:0
[16322.145833]  active_file:172 inactive_file:159 isolated_file:0
[16322.145833]  unevictable:0 dirty:0 writeback:0 unstable:0
[16322.145833]  free:281 slab_reclaimable:171 slab_unreclaimable:750
[16322.145833]  mapped:217 shmem:5 pagetables:73 bounce:0
[16322.145833]  free_cma:0
  • active_anon: 活动的匿名映射,"活动"是指最近被访问过,"匿名"则指页面映射不与任何数据源相关;
  • inactive_anon: 非活动的匿名映射;
  • active_file: 活动的文件映射,页面映射和磁盘文件相关;
  • inactive_file: 非活动的文件映射;
  • dirty: 脏页面,表示页面的内容和快设备上的原始内容已经不一致;
  • writeback: 当前页面正处在回写状态;
  • free: 空闲页面;
  • slab_relaimable: slab cache中可回收的页面;
  • slab_unreclaimable: slab cache中不可以回收的页面;
  • mapped: BH_MAPPED,表示这个页面被用做快设备的buffer映射,注意这个映射不同于anon和file映射;
  • shmem: 用于共享内存映射的页面;
  • pagetable: 页表占用的页面,也就是PTE PTD占用的页面数目;
  • free_cma: continuous memory allocator的空闲页面;

5、第五部分

[16322.258017] Normal free:904kB min:720kB low:900kB high:1080kB active_anon:20428kB inactive_anon:12kB active_file:680kB inactive_file:860kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:38912kB managed:32728kB mlocked:0kB dirty:0kB writeback:0kB mapped:848kB shmem:20kB slab_reclaimable:684kB slab_unreclaimable:3000kB kernel_stack:592kB pagetables:292kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
  • Normal free: Normal zone的空闲空间;
  • min, low, high是normal zone执行页面置换的几个水印;
  • lowmem_reserve: 表示该分zone为其他zone预留的可分配页面数;
  • present: 表示zone的物理内存大小;
  • managed: 是buddy系统管理的present内存大小,managed = preset - reserved;
  • 有几项是Normal特有的,比如kernel_stack, pagetables, free_cma, slab_reclaimable, slab_unreclaimable,是因为normal zone的页面是直接映射,这些页面是供内核中使用的。

6、第六部分

Normal: 45*4kB (UM) 6*8kB (U) 46*16kB (UM) 1*32kB (M) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB 0*32768kB 0*65536kB = 996kB

buddy系统信息信息, order范围0~11

  • M表示 moveable;
  • R表示 Reserve;
  • C表示 CMA;
  • U表示 unmovable;
  • E表示 reclaimable;

7、第七部分

[16322.468905] 354 total pagecache pages
[16322.489525] 0 pages in swap cache
[16322.499744] Swap cache stats: add 0, delete 0, find 0/0
[16322.522507] Free swap  = 0kB
[16322.530809] Total swap = 0kB
[16322.537806] 9728 pages RAM
[16322.542105] 1485 pages reserved
[16322.553217] 261956 pages shared
[16322.566732] 7521 pages non-shared
  • swap分区内存为0;

8、第八部分

[16322.579320] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[16322.606225] [   58]     0    58      463       35       4        0             0 sh
[16322.623754] [  112]     0   112   111326     5244      65        0             0 HPC03C
[16322.633654] Out of memory: Kill process 112 (HPC03C) score 612 or sacrifice child
[16322.655864] Killed process 112 (HPC03C) total-vm:445304kB, anon-rss:20244kB, file-rss:736kB
[16323.053600] codec_codec_ctl: set CODEC_TURN_OFF...
[16323.058611] codec_codec_ctl: set CODEC_TURN_OFF...
Killed
  • 进程的oom_score 分数越高,越容易被 OOM Killer 杀掉,HPC03C的score为612;
  • total_vm:进程虚拟内存的总大小;
  • anon-rss:当前分配给进程的RAM部分;
  • file-rss:系统所有进程在swap file(交换文件)中的内存量为0KB;
  • rss实际使用物理内存(包含共享库占用的内存) (5244*4)/1024=20M;
    参考文章:https://blog.csdn.net/kickxxx/article/details/50337647

猜你喜欢

转载自blog.csdn.net/zhuyunier/article/details/84974405