报错 kernel:NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [insmod:55902]处理

运行之前说的tcrypt.c的修改程序(只跑摘要算法md5,sha1)
insmod tcrypt.ko sec=2 mode=400
报错 kernel:NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [insmod:55902]
并且有堆栈

[106091.127829] 
testing speed of async md5 (md5-generic)
[106091.127831] test  0 (   16 byte blocks,   16 bytes per update,   1 updates): 
[106093.127071] 6929212 opers/sec, 110867392 bytes/sec
[106093.127072] test  1 (   64 byte blocks,   16 bytes per update,   4 updates): 3149079 opers/sec, 201541088 bytes/sec
[106095.126940] test  2 (   64 byte blocks,   64 bytes per update,   1 updates): 
[106097.126759] 4028096 opers/sec, 257798144 bytes/sec
[106097.126761] test  3 (  256 byte blocks,   16 bytes per update,  16 updates): 1154874 opers/sec, 295647872 bytes/sec
[106099.127126] test  4 (  256 byte blocks,   64 bytes per update,   4 updates): 1623307 opers/sec, 415566592 bytes/sec
[106101.127270] test  5 (  256 byte blocks,  256 bytes per update,   1 updates): 
[106103.126713] 1810068 opers/sec, 463377408 bytes/sec
[106103.126715] test  6 ( 1024 byte blocks,   16 bytes per update,  64 updates): 311122 opers/sec, 318589440 bytes/sec
[106105.126920] test  7 ( 1024 byte blocks,  256 bytes per update,   4 updates): 537416 opers/sec, 550314496 bytes/sec
[106107.127073] test  8 ( 1024 byte blocks, 1024 bytes per update,   1 updates): 
[106109.127042] 577204 opers/sec, 591056896 bytes/sec
[106109.127044] test  9 ( 2048 byte blocks,   16 bytes per update, 128 updates): 164557 opers/sec, 337012736 bytes/sec
[106111.126772] test 10 ( 2048 byte blocks,  256 bytes per update,   8 updates): 
[106112.131736] INFO: rcu_sched self-detected stall on CPU
[106112.131746] 	1-...: (20933 ticks this GP) idle=cd3/140000000000001/0 softirq=418992/418992 fqs=5201 
[106112.131747] 	 (t=21000 jiffies g=178681 c=178680 q=640)
[106112.131753] Task dump for CPU 1:
[106112.131753] insmod          R  running task        0 56468 128644 0x00000088
[106112.131756]  ffff88003c643e08 ffffffff8108a2e9 0000000000000001 0000000000000001
[106112.131757]  ffff88003c643e20 ffffffff8108ca29 ffffffff81e46000 ffff88003c643e50
[106112.131758]  ffffffff81136f41 ffff88003c6585c0 ffffffff81e45e00 0000000000000000
[106112.131759] Call Trace:
[106112.131761]  <IRQ> 
[106112.131765]  [<ffffffff8108a2e9>] sched_show_task+0xe9/0x150
[106112.131767]  [<ffffffff8108ca29>] dump_cpu_task+0x39/0x40
[106112.131769]  [<ffffffff81136f41>] rcu_dump_cpu_stacks+0x80/0xbc
[106112.131770]  [<ffffffff810b302b>] rcu_check_callbacks+0x70b/0x860
[106112.131772]  [<ffffffff810fa2c0>] ? __acct_update_integrals+0x30/0xb0
[106112.131773]  [<ffffffff810c7950>] ? tick_sched_do_timer+0x30/0x30
[106112.131774]  [<ffffffff810b829f>] update_process_times+0x2f/0x60
[106112.131775]  [<ffffffff810c7355>] tick_sched_handle.isra.13+0x25/0x60
[106112.131776]  [<ffffffff810c798d>] tick_sched_timer+0x3d/0x70
[106112.131777]  [<ffffffff810b8fb6>] __hrtimer_run_queues+0xe6/0x280
[106112.142164]  [<ffffffff810b9478>] hrtimer_interrupt+0xa8/0x1a0
[106112.142167] INFO: rcu_sched detected stalls on CPUs/tasks:
[106112.142174]  [<ffffffff8103e2b5>] local_apic_timer_interrupt+0x35/0x60
[106112.142213]  [<ffffffff81845add>] smp_apic_timer_interrupt+0x3d/0x50
[106112.142216]  [<ffffffff81844f0f>] apic_timer_interrupt+0x7f/0x90
[106112.142217]  <EOI> 
[106112.146252]  [<ffffffff8145ebe2>] ? md5_transform+0x6c2/0x7f0
[106112.146255]  [<ffffffff8141dd9e>] md5_update+0xde/0x130
[106112.146256]  [<ffffffff81419c08>] crypto_shash_update+0x38/0x100
[106112.146257]  [<ffffffff81419eac>] shash_ahash_update+0x2c/0x50
[106112.146258]  [<ffffffff81419ee2>] shash_async_update+0x12/0x20
[106112.146261]  [<ffffffffa006e38d>] test_ahash_speed_common.constprop.8+0x24d/0x820 [tcrypt]
[106112.146262]  [<ffffffffa007a000>] ? 0xffffffffa007a000
[106112.146263]  [<ffffffffa006ee68>] do_test+0x108/0x31a [tcrypt]
[106112.146264]  [<ffffffffa007a000>] ? 0xffffffffa007a000
[106112.146265]  [<ffffffffa007a049>] tcrypt_mod_init+0x49/0x95 [tcrypt]
[106112.146266]  [<ffffffffa007a000>] ? 0xffffffffa007a000
[106112.146267]  [<ffffffff8100043d>] do_one_initcall+0x3d/0x150
[106112.146269]  [<ffffffff8108411a>] ? __might_sleep+0x4a/0x90
[106112.146270]  [<ffffffff811370e4>] ? do_init_module+0x27/0x1d8
[106112.146287]  [<ffffffff811903c6>] ? kmem_cache_alloc_trace+0x46/0x170
[106112.146289]  [<ffffffff8113711d>] do_init_module+0x60/0x1d8
[106112.146291]  [<ffffffff810d2205>] load_module+0x1245/0x1940
[106112.146292]  [<ffffffff810cf470>] ? __symbol_put+0x40/0x40
[106112.146293]  [<ffffffff8119ba93>] ? vfs_read+0x113/0x130
[106112.146295]  [<ffffffff810d2b16>] SYSC_finit_module+0x96/0xd0
[106112.146296]  [<ffffffff810d2b6e>] SyS_finit_module+0xe/0x10
[106112.146297]  [<ffffffff810028dd>] do_syscall_64+0x4d/0xb0
[106112.146299]  [<ffffffff81843786>] entry_SYSCALL64_slow_path+0x25/0x25
[106112.146302] 	1-...: (20933 ticks this GP) idle=cd3/140000000000001/0 softirq=418992/418992 fqs=5201 
[106112.146304] 	(detected by 3, t=21002 jiffies, g=178681, c=178680, q=640)
[106112.146310] Task dump for CPU 1:
[106112.146311] insmod          R  running task        0 56468 128644 0x00000088
[106112.146313]  ffffffff8113711d 0000000000000001 ffffffffa006f5c0 ffffc9001166fe98
[106112.146315]  ffffc9001166fe78 ffffffff810d2205 ffffffffa006f5c0 ffffffff810cf470
[106112.146316]  0000000000000000 ffffffffa006f5d8 ffffffff8119ba93 ffff88000000001c
[106112.146317] Call Trace:
[106112.146320]  [<ffffffff8113711d>] ? do_init_module+0x60/0x1d8
[106112.146322]  [<ffffffff810d2205>] ? load_module+0x1245/0x1940
[106112.146323]  [<ffffffff810cf470>] ? __symbol_put+0x40/0x40
[106112.146324]  [<ffffffff8119ba93>] ? vfs_read+0x113/0x130
[106112.146326]  [<ffffffff810d2b16>] ? SYSC_finit_module+0x96/0xd0
[106112.146328]  [<ffffffff810d2b6e>] ? SyS_finit_module+0xe/0x10
[106112.146328]  [<ffffffff810028dd>] ? do_syscall_64+0x4d/0xb0
[106112.146330]  [<ffffffff81843786>] ? entry_SYSCALL64_slow_path+0x25/0x25
[106113.126964] 274310 opers/sec, 561787904 bytes/sec
[106113.126966] test 11 ( 2048 byte blocks, 1024 bytes per update,   2 updates): 299329 opers/sec, 613025792 bytes/sec
[106115.127105] test 12 ( 2048 byte blocks, 2048 bytes per update,   1 updates): 
[106117.127334] 301106 opers/sec, 616665088 bytes/sec
[106117.127336] test 13 ( 4096 byte blocks,   16 bytes per update, 256 updates):  84647 opers/sec, 346716160 bytes/sec
[106119.127368] test 14 ( 4096 byte blocks,  256 bytes per update,  16 updates): 147179 opers/sec, 602845184 bytes/sec
[106121.126869] test 15 ( 4096 byte blocks, 1024 bytes per update,   4 updates): 152129 opers/sec, 623122432 bytes/sec
[106123.127106] test 16 ( 4096 byte blocks, 4096 bytes per update,   1 updates): 
[106125.127288] 153462 opers/sec, 628580352 bytes/sec
[106125.127289] test 17 ( 8192 byte blocks,   16 bytes per update, 512 updates):  42507 opers/sec, 348217344 bytes/sec
[106127.126819] test 18 ( 8192 byte blocks,  256 bytes per update,  32 updates):  74270 opers/sec, 608423936 bytes/sec
[106129.127145] test 19 ( 8192 byte blocks, 1024 bytes per update,   8 updates):  77195 opers/sec, 632381440 bytes/sec
[106131.127150] test 20 ( 8192 byte blocks, 4096 bytes per update,   2 updates):  77426 opers/sec, 634273792 bytes/sec
[106133.127428] test 21 ( 8192 byte blocks, 8192 bytes per update,   1 updates): 
[106135.127394]  77476 opers/sec, 634683392 bytes/sec
[106135.127404] 
testing speed of async sha1 (sha1-avx2)
[106135.127428] test  0 (   16 byte blocks,   16 bytes per update,   1 updates): 
[106137.127053] 5125425 opers/sec,  82006808 bytes/sec
[106137.127055] test  1 (   64 byte blocks,   16 bytes per update,   4 updates): 2207846 opers/sec, 141302176 bytes/sec
[106139.127001] test  2 (   64 byte blocks,   64 bytes per update,   1 updates): 
[106141.126965] 3462426 opers/sec, 221595264 bytes/sec
[106141.126966] test  3 (  256 byte blocks,   16 bytes per update,  16 updates): 786876 opers/sec, 201440256 bytes/sec
[106143.127014] test  4 (  256 byte blocks,   64 bytes per update,   4 updates): 1094996 opers/sec, 280319104 bytes/sec
[106145.127384] test  5 (  256 byte blocks,  256 bytes per update,   1 updates): 
[106147.127465] 2027543 opers/sec, 519051136 bytes/sec
[106147.127467] test  6 ( 1024 byte blocks,   16 bytes per update,  64 updates): 229002 opers/sec, 234498048 bytes/sec
[106149.127480] test  7 ( 1024 byte blocks,  256 bytes per update,   4 updates): 586406 opers/sec, 600479744 bytes/sec
[106151.127461] test  8 ( 1024 byte blocks, 1024 bytes per update,   1 updates): 
[106153.127037] 792399 opers/sec, 811416576 bytes/sec
[106153.127038] test  9 ( 2048 byte blocks,   16 bytes per update, 128 updates): 117870 opers/sec, 241397760 bytes/sec
[106155.127327] test 10 ( 2048 byte blocks,  256 bytes per update,   8 updates): 309381 opers/sec, 633612288 bytes/sec
[106157.127566] test 11 ( 2048 byte blocks, 1024 bytes per update,   2 updates): 399969 opers/sec, 819136512 bytes/sec
[106159.127546] test 12 ( 2048 byte blocks, 2048 bytes per update,   1 updates): 
[106160.998382] BUG: workqueue lockup - pool
[106160.998386]  cpus=1 node=0 flags=0x0 nice=0 stuck for 69s!
[106160.998387] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=-20 stuck for 42s!
[106160.998402] Showing busy workqueues and worker pools:
[106160.998539] workqueue events_long: flags=0x0
[106160.998557]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[106160.998559]     pending: gc_worker
[106161.008435] workqueue events_power_efficient: flags=0x80
[106161.008453]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[106161.008456]     pending: neigh_periodic_work
[106161.126951] 428257 opers/sec, 877070336 bytes/sec
[106161.126952] test 13 ( 4096 byte blocks,   16 bytes per update, 256 updates): 
[106161.859069] workqueue vmstat: flags=0xc
[106161.859091]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[106161.859094]     pending: vmstat_update
[106161.880889] workqueue xfs-log/sda3: flags=0x1c
[106161.880910]   pwq 1: cpus=0 node=0 flags=0x0 nice=-20 active=1/256
[106161.880913]     pending: xfs_log_worker
[106161.893143] workqueue xfs-log/sda1: flags=0x1c
[106161.893168]   pwq 3: cpus=1 node=0 flags=0x0 nice=-20 active=1/256
[106161.893171]     pending: xfs_log_worker
[106163.127411]  59184 opers/sec, 242417664 bytes/sec
[106163.127413] test 14 ( 4096 byte blocks,  256 bytes per update,  16 updates): 161039 opers/sec, 659615744 bytes/sec
[106165.127604] test 15 ( 4096 byte blocks, 1024 bytes per update,   4 updates): 205362 opers/sec, 841164800 bytes/sec
[106167.127309] test 16 ( 4096 byte blocks, 4096 bytes per update,   1 updates): 
[106169.127599] 227169 opers/sec, 930484224 bytes/sec
[106169.127600] test 17 ( 8192 byte blocks,   16 bytes per update, 512 updates):  29139 opers/sec, 238706688 bytes/sec
[106171.127605] test 18 ( 8192 byte blocks,  256 bytes per update,  32 updates):  79406 opers/sec, 650498048 bytes/sec
[106173.127122] test 19 ( 8192 byte blocks, 1024 bytes per update,   8 updates): 103823 opers/sec, 850522112 bytes/sec
[106175.127301] test 20 ( 8192 byte blocks, 4096 bytes per update,   2 updates): 
[106175.136399] INFO: rcu_sched self-detected stall on CPU
[106175.136404] 	1-...: (83881 ticks this GP) idle=cd3/140000000000001/0 softirq=418992/418992 fqs=20924 
[106175.136405] 	 (t=84005 jiffies g=178681 c=178680 q=3014)
[106175.136411] Task dump for CPU 1:
[106175.136412] insmod          R  running task        0 56468 128644 0x00000088
[106175.136414]  ffff88003c643e08 ffffffff8108a2e9 0000000000000001 0000000000000001
[106175.136416]  ffff88003c643e20 ffffffff8108ca29 ffffffff81e46000 ffff88003c643e50
[106175.136417]  ffffffff81136f41 ffff88003c6585c0 ffffffff81e45e00 0000000000000000
[106175.136418] Call Trace:
[106175.136419]  <IRQ> 
[106175.136424]  [<ffffffff8108a2e9>] sched_show_task+0xe9/0x150
[106175.136425]  [<ffffffff8108ca29>] dump_cpu_task+0x39/0x40
[106175.136442]  [<ffffffff81136f41>] rcu_dump_cpu_stacks+0x80/0xbc
[106175.136444]  [<ffffffff810b302b>] rcu_check_callbacks+0x70b/0x860
[106175.136446]  [<ffffffff810fa2c0>] ? __acct_update_integrals+0x30/0xb0
[106175.136447]  [<ffffffff810c7950>] ? tick_sched_do_timer+0x30/0x30
[106175.136449]  [<ffffffff810b829f>] update_process_times+0x2f/0x60
[106175.136449]  [<ffffffff810c7355>] tick_sched_handle.isra.13+0x25/0x60
[106175.136450]  [<ffffffff810c798d>] tick_sched_timer+0x3d/0x70
[106175.136451]  [<ffffffff810b8fb6>] __hrtimer_run_queues+0xe6/0x280
[106175.136452]  [<ffffffff810b9478>] hrtimer_interrupt+0xa8/0x1a0
[106175.136454]  [<ffffffff8103e2b5>] local_apic_timer_interrupt+0x35/0x60
[106175.136456]  [<ffffffff81845add>] smp_apic_timer_interrupt+0x3d/0x50
[106175.136457]  [<ffffffff81844f0f>] apic_timer_interrupt+0x7f/0x90
[106175.136457]  <EOI> 
[106175.136461]  [<ffffffffa023f8d3>] ? _loop0+0x2eb/0xe56 [sha1_ssse3]
[106175.136464]  [<ffffffff810a19a6>] ? log_store+0x116/0x200
[106175.136465]  [<ffffffffa023ea70>] ? sha1_base_init+0x40/0x40 [sha1_ssse3]
[106175.136466]  [<ffffffffa023ea8a>] ? sha1_apply_transform_avx2+0x1a/0x30 [sha1_ssse3]
[106175.136467]  [<ffffffffa023ee53>] sha1_update+0xd3/0x130 [sha1_ssse3]
[106175.136468]  [<ffffffffa023ef05>] sha1_avx2_update+0x15/0x20 [sha1_ssse3]
[106175.136470]  [<ffffffff81419c08>] crypto_shash_update+0x38/0x100
[106175.136470]  [<ffffffff81419eac>] shash_ahash_update+0x2c/0x50
[106175.136471]  [<ffffffff81419ee2>] shash_async_update+0x12/0x20
[106175.136473]  [<ffffffffa006e38d>] test_ahash_speed_common.constprop.8+0x24d/0x820 [tcrypt]
[106175.136474]  [<ffffffffa007a000>] ? 0xffffffffa007a000
[106175.136475]  [<ffffffffa006ee8c>] do_test+0x12c/0x31a [tcrypt]
[106175.136476]  [<ffffffffa007a000>] ? 0xffffffffa007a000
[106175.136477]  [<ffffffffa007a049>] tcrypt_mod_init+0x49/0x95 [tcrypt]
[106175.136477]  [<ffffffffa007a000>] ? 0xffffffffa007a000
[106175.136479]  [<ffffffff8100043d>] do_one_initcall+0x3d/0x150
[106175.136480]  [<ffffffff8108411a>] ? __might_sleep+0x4a/0x90
[106175.136481]  [<ffffffff811370e4>] ? do_init_module+0x27/0x1d8
[106175.136483]  [<ffffffff811903c6>] ? kmem_cache_alloc_trace+0x46/0x170
[106175.136484]  [<ffffffff8113711d>] do_init_module+0x60/0x1d8
[106175.136486]  [<ffffffff810d2205>] load_module+0x1245/0x1940
[106175.136487]  [<ffffffff810cf470>] ? __symbol_put+0x40/0x40
[106175.136488]  [<ffffffff8119ba93>] ? vfs_read+0x113/0x130
[106175.136490]  [<ffffffff810d2b16>] SYSC_finit_module+0x96/0xd0
[106175.136491]  [<ffffffff810d2b6e>] SyS_finit_module+0xe/0x10
[106175.136492]  [<ffffffff810028dd>] do_syscall_64+0x4d/0xb0
[106175.136493]  [<ffffffff81843786>] entry_SYSCALL64_slow_path+0x25/0x25
[106177.127473] 112143 opers/sec, 918675456 bytes/sec
[106177.127474] test 21 ( 8192 byte blocks, 8192 bytes per update,   1 updates): 
[106179.127611] 112208 opers/sec, 919212032 bytes/sec
[106179.127632] 
testing speed of multibuffer sha1 (sha1-avx2)
[106179.127633] test  0 (   16 byte blocks,   16 bytes per update,   1 updates):   7572 cycles/operation,   59 cycles/byte
[106179.127636] test  2 (   64 byte blocks,   64 bytes per update,   1 updates):   9172 cycles/operation,   17 cycles/byte
[106179.127639] test  5 (  256 byte blocks,  256 bytes per update,   1 updates):  14792 cycles/operation,    7 cycles/byte
[106179.127678] test  8 ( 1024 byte blocks, 1024 bytes per update,   1 updates):  37386 cycles/operation,    4 cycles/byte
[106179.127690] test 12 ( 2048 byte blocks, 2048 bytes per update,   1 updates):  63652 cycles/operation,    3 cycles/byte
[106179.127708] test 16 ( 4096 byte blocks, 4096 bytes per update,   1 updates): 121012 cycles/operation,    3 cycles/byte
[106179.127742] test 21 ( 8192 byte blocks, 8192 bytes per update,   1 updates): 237186 cycles/operation,    3 cycles/byte
[106179.127820] 
testing speed of multibuffer md5 (md5-generic)
[106179.127820] test  0 (   16 byte blocks,   16 bytes per update,   1 updates):   6570 cycles/operation,   51 cycles/byte
[106179.127823] test  2 (   64 byte blocks,   64 bytes per update,   1 updates):   8340 cycles/operation,   16 cycles/byte
[106179.127826] test  5 (  256 byte blocks,  256 bytes per update,   1 updates):  16230 cycles/operation,    7 cycles/byte
[106179.127831] test  8 ( 1024 byte blocks, 1024 bytes per update,   1 updates):  48646 cycles/operation,    5 cycles/byte
[106179.127845] test 12 ( 2048 byte blocks, 2048 bytes per update,   1 updates):  91754 cycles/operation,    5 cycles/byte
[106179.127871] test 16 ( 4096 byte blocks, 4096 bytes per update,   1 updates): 178620 cycles/operation,    5 cycles/byte
[106179.127921] test 21 ( 8192 byte blocks, 8192 bytes per update,   1 updates): 352184 cycles/operation,    5 cycles/byte

跑大量高负载程序,造成cpu soft lockup。
Soft lockup就是内核软死锁,这个bug没有让系统彻底死机,但是若干个进程(或者kernel thread)被锁死在了某个状态(一般在内核区域),很多情况下这个是由于内核锁的使用的问题。
解决办法:

echo 30 > /proc/sys/kernel/watchdog_thresh 

临时生效

sysctl -w kernel.watchdog_thresh=30
vi /etc/sysctl.conf
kernel.watchdog_thresh=30

修改后继续运行 insmod tcrypt.ko sec=2 mode=400
仍出现以下信息

kernel:BUG: workqueue lockup - pool
kernel:BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=-20 stuck for 42s!

追踪堆栈信息


请教后,老大指出内核软死锁是由于一次IO下发命令过多导致,问题结束。

发布了7 篇原创文章 · 获赞 8 · 访问量 450

猜你喜欢

转载自blog.csdn.net/qq_44710568/article/details/104843432
今日推荐