page allocation stalls for 问题调研

 一.现象分析和内存管理基本概念介绍

         最近有一台linux出现卡死的状态,系统不反应,无法ssh登录,只能通过电源关机重启操作恢复,重启后登录系统后台,拉取kernel日志,如下

Jul 12 18:48:06 kernel: [141294.374983] send process: page allocation stalls for 10108ms, order:2, mode:0x14040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null)
Jul 12 18:48:06 kernel: [141294.374989] send process cpuset=/ mems_allowed=0
Jul 12 18:48:06 kernel: [141294.374994] CPU: 4 PID: 15603 Comm: send process Tainted: G           OE   4.12.9-041209-generic #201708242344
Jul 12 18:48:06 kernel: [141294.374995] Hardware name: OEM YY COME6801/EB5129, BIOS 4.6.5 03/03/2020
Jul 12 18:48:06 kernel: [141294.374996] Call Trace:
Jul 12 18:48:06 kernel: [141294.375001]  dump_stack+0x63/0x8d
Jul 12 18:48:06 kernel: [141294.375003]  warn_alloc+0x114/0x1c0
Jul 12 18:48:06 kernel: [141294.375005]  __alloc_pages_slowpath+0x8df/0xd80
Jul 12 18:48:06 kernel: [141294.375007]  ? dequeue_entity+0xed/0x4b0
Jul 12 18:48:06 kernel: [141294.375009]  ? swiotlb_full+0xb0/0xb0
Jul 12 18:48:06 kernel: [141294.375010]  __alloc_pages_nodemask+0x23f/0x260
Jul 12 18:48:06 kernel: [141294.375013]  alloc_pages_current+0x93/0x150
Jul 12 18:48:06 kernel: [141294.375017]  kmalloc_order+0x18/0x40
Jul 12 18:48:06 kernel: [141294.375018]  kmalloc_order_trace+0x24/0xb0
Jul 12 18:48:06 kernel: [141294.375021]  __kmalloc+0x1dd/0x1f0
Jul 12 18:48:06 kernel: [141294.375025]  proc_do_submiturb+0x4bf/0xd10
Jul 12 18:48:06 kernel: [141294.375026]  usbdev_do_ioctl+0xc47/0x1190
Jul 12 18:48:06 kernel: [141294.375028]  usbdev_ioctl+0xe/0x20
Jul 12 18:48:06 kernel: [141294.375031]  do_vfs_ioctl+0xa5/0x600
Jul 12 18:48:06 kernel: [141294.375034]  ? getnstimeofday64+0xe/0x20
Jul 12 18:48:06 kernel: [141294.375037]  ? __audit_syscall_entry+0xb1/0xf0
Jul 12 18:48:06 kernel: [141294.375040]  ? syscall_trace_enter+0x1d4/0x2c0
Jul 12 18:48:06 kernel: [141294.375042]  SyS_ioctl+0x79/0x90
Jul 12 18:48:06 kernel: [141294.375043]  do_syscall_64+0x5b/0xc0
Jul 12 18:48:06 kernel: [141294.375046]  entry_SYSCALL64_slow_path+0x25/0x25
Jul 12 18:48:06 kernel: [141294.375055] Mem-Info:
Jul 12 18:48:06 kernel: [141294.375058]  active_file:244309 inactive_file:250424 isolated_file:0
Jul 12 18:48:06 kernel: [141294.375058]  free:24274 free_pcp:1575 free_cma:0
Jul 12 18:48:06 kernel: [141294.375058]  mapped:39931 shmem:30621 pagetables:10987 bounce:0
Jul 12 18:48:06 kernel: [141294.375058]  slab_reclaimable:48655 slab_unreclaimable:32286
Jul 12 18:48:06 kernel: [141294.375058]  unevictable:913 dirty:587 writeback:6555 unstable:0
Jul 12 18:48:06 kernel: [141294.375058] active_anon:1124210 inactive_anon:207648 isolated_anon:382
Jul 12 18:48:06 kernel: [141294.375061] Node 0 active_anon:4496840kB inactive_anon:830592kB active_file:977236kB inactive_file:1001696kB unevictable:3652kB isolated(anon):1528kB isolated(file):0kB mapped:159724kB dirty:2348kB writeback:26220kB shmem:122484kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 8192kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Jul 12 18:48:06 kernel: [141294.375061] Node 0 DMA free:15900kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15900kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jul 12 18:48:06 kernel: [141294.375064] lowmem_reserve[]: 0 3288 7745 7745 7745
Jul 12 18:48:06 kernel: [141294.375066] Node 0 DMA32 free:47900kB min:28632kB low:35788kB high:42944kB active_anon:1957932kB inactive_anon:34164kB active_file:569156kB inactive_file:578364kB unevictable:0kB writepending:17328kB present:3479824kB managed:3414256kB mlocked:0kB slab_reclaimable:102740kB slab_unreclaimable:34272kB kernel_stack:784kB pagetables:5176kB bounce:0kB free_pcp:3548kB local_pcp:632kB free_cma:0kB
Jul 12 18:48:06 kernel: [141294.375069] lowmem_reserve[]: 0 0 4457 4457 4457
Jul 12 18:48:06 kernel: [141294.375070] Node 0 Normal free:33296kB min:38812kB low:48512kB high:58212kB active_anon:2538740kB inactive_anon:795348kB active_file:409360kB inactive_file:426836kB unevictable:3652kB writepending:9900kB present:4716544kB managed:4567500kB mlocked:3652kB slab_reclaimable:91880kB slab_unreclaimable:94872kB kernel_stack:20624kB pagetables:38772kB bounce:0kB free_pcp:2748kB local_pcp:0kB free_cma:0kB
Jul 12 18:48:06 kernel: [141294.375073] lowmem_reserve[]: 0 0 0 0 0
Jul 12 18:48:06 kernel: [141294.375075] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB
Jul 12 18:48:06 kernel: [141294.375082] Node 0 DMA32: 2070*4kB (UMEH) 3103*8kB (MEH) 914*16kB (UMEH) 7*32kB (UM) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 47952kB
Jul 12 18:48:06 kernel: [141294.375088] Node 0 Normal: 2160*4kB (UMEH) 2182*8kB (UME) 499*16kB (UMEH) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 34080kB
Jul 12 18:48:06 kernel: [141294.375094] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jul 12 18:48:06 kernel: [141294.375095] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jul 12 18:48:06 kernel: [141294.375096] 4247 pages in swap cache
Jul 12 18:48:06 kernel: [141294.375096] 530017 total pagecache pages
Jul 12 18:48:06 kernel: [141294.375097] Free swap  = 947452kB
Jul 12 18:48:06 kernel: [141294.375097] Swap cache stats: add 88609, delete 84362, find 8/16
Jul 12 18:48:06 kernel: [141294.375098] 2053088 pages RAM
Jul 12 18:48:06 kernel: [141294.375098] Total swap = 1003004kB
Jul 12 18:48:06 kernel: [141294.375099] 0 pages cma reserved
Jul 12 18:48:06 kernel: [141294.375099] 0 pages HighMem/MovableOnly
Jul 12 18:48:06 kernel: [141294.375099] 53674 pages reserved
Jul 12 18:48:06 kernel: [141294.375100] 0 pages hwpoisoned

        拿到内核日志之后,最醒目的就是上图中内核打印,page allocation stalls for 10108ms,应该是内存相关的,我们要分析内核日志,怀疑与内存相关,则需要了解linux内核的分配和回收机制,以及与内存相关的一些内核函数逻辑 。

        这个里面我关注四个打印点

1.send process: page allocation stalls for 10108ms, order:2, 
mode:0x14040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null)
2.Node 0 Normal free:33296kB min:38812kB low:48512kB high:58212kB 
active_anon:2538740kB inactive_anon:795348kB active_file:409360kB 
inactive_file:426836kB unevictable:3652kB writepending:9900kB present:4716544kB 
managed:4567500kB mlocked:3652kB slab_reclaimable:91880kB 
slab_unreclaimable:94872kB kernel_stack:20624kB pagetables:38772kB bounce:0kB free_pcp:2748kB local_pcp:0kB free_cma:0kB
3.lowmem_reserve[]: 0 0 0 0 0
4.0 pages HighMem/MovableOnly

        page allocation stalls for 10108ms,我们首先想到的是内存申请出现问题,内存申请阻塞,然后内核打印了目前的的mem info,要读懂这些打印,首先我们需要了解linux的内存管理机制,和一些基本的概念。

Jul 12 19:51:12 kernel: [    0.000000] Linux version 4.12.9-041209-generic (kernel@tangerine) 
(gcc version 7.2.0 (Ubuntu 7.2.0-1ubuntu1) ) #201708242344 SMP Fri Aug 25 03:47:24 UTC 2017
Jul 12 19:51:12 kernel: [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.12.9-041209-generic 
root=/dev/mapper/gaussian--vg-root ro net.i

猜你喜欢

转载自blog.csdn.net/yangwenchao1983/article/details/131848246