Linux内核问题分析方法

定位内核问题难点

  1. 死机没有现场或信息不全,有信息无法进一步分析;
  2. 调试困难,gdb只能作为辅助;
  3. 内核庞杂,系统代码量大,无从下手;
  4. 问题的表现和问题的原因不一定直接关联。linux内核堆栈大多是我们开发的模块问题。

Linux内核问题分类

  1. 按问题源头分:原生内核&开发模块
  2. 按严重层度分:致命问题&严重问题
  3. 按分析难易分:有现场&没有现场
  4. 内核问题常见表现:
    kernel panic
    死锁(忙等和挂死)
    内存泄漏
    Warning

准备工作

  1. 工欲善其事必先利其器
    在这里插入图片描述
  2. 必胜的信心
    在这里插入图片描述

Linux内核分析工具箱

调试工具:gdb、ftrace、systemtap
反汇编:objdump
现场分析:kdump & crash
预警工具:各种debug
核心要素:代码是所有问题分析的最核心要素,没有之一。

现场分析

现场分析
在这里插入图片描述

准备工作(内核源码)

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

堆栈信息分析(x86架构)

在这里插入图片描述

堆栈深入分析(x86架构)

  1. 函数反汇编,找到对应行。(GDB或objdump输出饭汇编)
  2. 分析错误行,结合反汇编分析出错的C语言代码。(GDB)
  3. 结合反汇编和栈信息,分析出错的内存数据,找到对应的数据。(GDB和kdump信息)
  4. 根据流程分析出现错误的原因。

详情见博客:kdump定位

案例分析–(数据类型1)

在这里插入图片描述
经典案例——磁盘容量获取错误

案例分析–(踩内存)

调试工具:堆栈分析,打印。
该问题定位过程是在没有使用高级工具的情况下完成的,能解决主要是依靠对流程的分析。
该问题的提示编程规范还是要遵守的。代码简单是可维护性的重要保证。
在这里插入图片描述

案例分析–(死锁)

Aug 31 18:40:06 localhost kernel: =============================================
Aug 31 18:40:06 localhost kernel: [ INFO: possible recursive locking detected ]
Aug 31 18:40:06 localhost kernel: 2.6.32-131.21.1.b2.00 #4
Aug 31 18:40:06 localhost kernel: ---------------------------------------------
Aug 31 18:40:06 localhost kernel: insmod/32470 is trying to acquire lock:
Aug 31 18:40:06 localhost kernel: (&t_lock.tlock){+.+...}, at: [<ffffffffa02f70a4>] test_func_level_2+0x34/0x70 [test]
Aug 31 18:40:06 localhost kernel: 
Aug 31 18:40:06 localhost kernel: but task is already holding lock:
Aug 31 18:40:06 localhost kernel: (&t_lock.tlock){+.+...}, at: [<ffffffffa02f7114>] test_func_level_1+0x34/0x70 [test]
Aug 31 18:40:06 localhost kernel: 
Aug 31 18:40:06 localhost kernel: other info that might help us debug this:
Aug 31 18:40:06 localhost kernel: 1 lock held by insmod/32470:
Aug 31 18:40:06 localhost kernel: #0:  (&t_lock.tlock){+.+...}, at: [<ffffffffa02f7114>] test_func_level_1+0x34/0x70 [test]
Aug 31 18:40:06 localhost kernel: 
Aug 31 18:40:06 localhost kernel: stack backtrace:
Aug 31 18:40:06 localhost kernel: Pid: 32470, comm: insmod Not tainted 2.6.32-131.21.1.b2.00 #4
Aug 31 18:40:06 localhost kernel: Call Trace:
Aug 31 18:40:06 localhost kernel: [<ffffffff810a53ed>] ? __lock_acquire+0x11bd/0x1560
Aug 31 18:40:06 localhost kernel: [<ffffffff8106503b>] ? vprintk+0x2cb/0x590
Aug 31 18:40:06 localhost kernel: [<ffffffff814f3699>] ? sub_preempt_count+0x9/0xa0
Aug 31 18:40:06 localhost kernel: [<ffffffff8106504a>] ? vprintk+0x2da/0x590
Aug 31 18:40:06 localhost kernel: [<ffffffff810a5838>] ? lock_acquire+0xa8/0x150
Aug 31 18:40:06 localhost kernel: [<ffffffffa02f70a4>] ? test_func_level_2+0x34/0x70 [test]
Aug 31 18:40:06 localhost kernel: [<ffffffff814f005b>] ? _spin_lock+0x3b/0x50
Aug 31 18:40:06 localhost kernel: [<ffffffffa02f70a4>] ? test_func_level_2+0x34/0x70 [test]
Aug 31 18:40:06 localhost kernel: [<ffffffffa02f70a4>] ? test_func_level_2+0x34/0x70 [test]
Aug 31 18:40:06 localhost kernel: [<ffffffffa02f711c>] ? test_func_level_1+0x3c/0x70 [test]
Aug 31 18:40:06 localhost kernel: [<ffffffffa02fa000>] ? init_module+0x0/0xd [test]
Aug 31 18:40:06 localhost kernel: [<ffffffffa02f718d>] ? test_lock_init+0x3d/0x44 [test]
Aug 31 18:40:06 localhost kernel: [<ffffffffa02fa009>] ? init_module+0x9/0xd [test]
Aug 31 18:40:06 localhost kernel: [<ffffffff8100204c>] ? do_one_initcall+0x3c/0x1d0
Aug 31 18:40:06 localhost kernel: [<ffffffff810b5123>] ? sys_init_module+0xe3/0x260
Aug 31 18:40:06 localhost kernel: [<ffffffff8100c1f2>] ? system_call_fastpath+0x16/0x1b
Aug 31 18:41:23 localhost kernel: BUG: soft lockup - CPU#3 stuck for 67s! [insmod:32470]
Aug 31 18:41:23 localhost kernel: Modules linked in: test(+)(U) nls_utf8 cifs sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mod i7core_edac edac_core sg i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support e1000e ext4 mbcache jbd2 sd_mod crc_t10dif ahci [last unloaded: test]
Aug 31 18:41:23 localhost kernel: irq event stamp: 7163
Aug 31 18:41:23 localhost kernel: hardirqs last  enabled at (7163): [<ffffffff8100cc94>] restore_args+0x0/0x30
Aug 31 18:41:23 localhost kernel: hardirqs last disabled at (7161): [<ffffffff8106c7a1>] __do_softirq+0x131/0x230
Aug 31 18:41:23 localhost kernel: softirqs last  enabled at (7162): [<ffffffff8106c7d1>] __do_softirq+0x161/0x230
Aug 31 18:41:23 localhost kernel: softirqs last disabled at (7147): [<ffffffff8100d45c>] call_softirq+0x1c/0x30
Aug 31 18:41:23 localhost kernel: CPU 3:
Aug 31 18:41:23 localhost kernel: Modules linked in: test(+)(U) nls_utf8 cifs sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mod i7core_edac edac_core sg i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support e1000e ext4 mbcache jbd2 sd_mod crc_t10dif ahci [last unloaded: test]
Aug 31 18:41:23 localhost kernel: Pid: 32470, comm: insmod Not tainted 2.6.32-131.21.1.b2.00 #4 To be filled by O.E.M.
Aug 31 18:41:23 localhost kernel: RIP: 0010:[<ffffffff814edc48>]  [<ffffffff814edc48>] preempt_schedule+0x8/0x60
Aug 31 18:41:23 localhost kernel: RSP: 0018:ffff8801ef0e1dc8  EFLAGS: 00000282
Aug 31 18:41:23 localhost kernel: RAX: ffff8801ef0e1fd8 RBX: ffff8801ef0e1dd8 RCX: 00000000d0a1e7be
Aug 31 18:41:23 localhost kernel: RDX: 000000000005a1d2 RSI: ffffffff817b358d RDI: 0000000000000001
Aug 31 18:41:23 localhost kernel: RBP: ffffffff8100cbce R08: 0000000000000002 R09: 0000000000000000
Aug 31 18:41:23 localhost kernel: R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000
Aug 31 18:41:23 localhost kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Aug 31 18:41:23 localhost kernel: FS:  00007fbba827c700(0000) GS:ffff880028380000(0000) knlGS:0000000000000000
Aug 31 18:41:23 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 31 18:41:23 localhost kernel: CR2: 00007fbba822500f CR3: 00000001f54df000 CR4: 00000000000006e0
Aug 31 18:41:23 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 31 18:41:23 localhost kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 31 18:41:23 localhost kernel: Call Trace:
Aug 31 18:41:23 localhost kernel: [<ffffffff814f3699>] ? sub_preempt_count+0x9/0xa0
Aug 31 18:41:23 localhost kernel: [<ffffffff812894bb>] ? delay_tsc+0xeb/0xf0
Aug 31 18:41:23 localhost kernel: [<ffffffff8128936f>] ? __delay+0xf/0x20
Aug 31 18:41:23 localhost kernel: [<ffffffff8129ab40>] ? _raw_spin_lock+0x110/0x180
Aug 31 18:41:23 localhost kernel: [<ffffffff814f0063>] ? _spin_lock+0x43/0x50
Aug 31 18:41:23 localhost kernel: [<ffffffffa02f70a4>] ? test_func_level_2+0x34/0x70 [test]
Aug 31 18:41:23 localhost kernel: [<ffffffffa02f70a4>] ? test_func_level_2+0x34/0x70 [test]
Aug 31 18:41:23 localhost kernel: [<ffffffffa02f711c>] ? test_func_level_1+0x3c/0x70 [test]
Aug 31 18:41:23 localhost kernel: [<ffffffffa02fa000>] ? init_module+0x0/0xd [test]
Aug 31 18:41:23 localhost kernel: [<ffffffffa02f718d>] ? test_lock_init+0x3d/0x44 [test]
Aug 31 18:41:23 localhost kernel: [<ffffffffa02fa009>] ? init_module+0x9/0xd [test]
Aug 31 18:41:23 localhost kernel: [<ffffffff8100204c>] ? do_one_initcall+0x3c/0x1d0
Aug 31 18:41:23 localhost kernel: [<ffffffff810b5123>] ? sys_init_module+0xe3/0x260
Aug 31 18:41:23 localhost kernel: [<ffffffff8100c1f2>] ? system_call_fastpath+0x16/0x1b

Linux内核问题分析路径

在这里插入图片描述

发布了47 篇原创文章 · 获赞 69 · 访问量 23万+

猜你喜欢

转载自blog.csdn.net/qq_44710568/article/details/105192518