Linux kernel debugging 4

The method of debugging the embedded Linux kernel is as follows.
 1 "Instrument" the target machine, such as applying the KGDB patch, so that GDB on the host can communicate with the KGDB of the target machine
through a serial port or a network port.
JTAG 2 using the emulator, the emulator can be connected directly to the target machine / BDM, GDB host so on
may be controlled via a communication with the target machine emulator.
3 Use software methods such as printk(), oops, strace, etc. to "observe" and debug on the target board.

The method does not have the functions of viewing and modifying data structure, breakpoints, single step, etc.


1. Kernel print information-printk()


 The most primitive method of printk() is more widely used


 Any printk of the Linux kernel can be output by the following command

 echo 8 > /proc/sys/kernel/printk


Print to proc virtual file for driver debugging

      

  dmesg print buffer information

 Users can also directly use the "cat /proc/kmsg" command to display kernel information


2. Segmentation error analysis of driver debugging_according to oops information

Determining a given error code location

2. Analyze the segfault information printed by the kernel
a. As a module:
(find the "incident scene" according to pc=0xbf000018)
1. Determine whether the instruction belongs to the kernel or the additional module
pc=0xbf000018 according to the pc value. What address does it belong to? Is it the kernel or the driver loaded via insmod?
First determine whether it belongs to the kernel address: Look at System.map (vi System.map) to determine the address range of the kernel function: c0004000~c03265a4
CONFIG_FRMAE_POINTER Needle pointer, you can print the traceback information.

If it does not belong to the range in System.map, it belongs The driver loaded by insmod


2. Assuming it is an error introduced by the loaded driver, how to determine which driver it is?
First look at the address range of the function of the loaded driver
cat /proc/kallsyms (kernel function, the address of the loaded function)
cat /proc/kallsyms >/kallsyms.txt
Find a similar address from this information, this address< =0xbf000018 For
example, find:
bf000000 t first_drv_open [first_drv]


3. Find first_drv.ko and
disassemble it on the PC: arm-linux-objdump -D first_drv.ko> frist_drv.dis
Find first_drv_open


    first_drv.dis in the dis file After insmod
00000000 <first_drv_open>:       bf000000 t first_drv_open [first_drv]
00000018                         pc = bf000018

./firstdrvtest on
Unable to handle kernel paging request at virtual address 56000050
内核使用56000050来访问时发生了错误


pgd = c3eb0000
[56000050] *pgd=00000000
Internal error: Oops: 5 [#1]
Modules linked in: first_drv
CPU: 0    Not tainted  (2.6.22.6 #1)
PC is at first_drv_open+0x18(该指令的偏移)/0x3c(该函数的总大小) [first_drv]
PC就是发生错误的指令的地址
大多时候,PC值只会给出一个地址,不到指示说是在哪个函数里


LR is at chrdev_open+0x14c/0x164
LR寄存器的值


pc = 0xbf000018


pc : [<bf000018>]    lr : [<c008d888>]    psr: a0000013
sp : c3c7be88  ip : c3c7be98  fp : c3c7be94
r10: 00000000  r9 : c3c7a000  r8 : c049abc0
r7 : 00000000  r6 : 00000000  r5 : c3e740c0  r4 : c06d41e0
r3 : bf000000  r2 : 56000050  r1 : bf000964  r0 : 00000000
执行这条导致错误的指令时各个寄存器的值


Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  Segment user
Control: c000717f  Table: 33eb0000  DAC: 00000015
Process firstdrvtest (pid: 777, stack limit = 0xc3c7a258)
发生错误时当前进程的名称是firstdrvtest



Stack: (0xc3c7be88 to 0xc3c7c000)
be80:                   c3c7bebc c3c7be98 c008d888 bf000010 00000000 c049abc0 
bea0: c3e740c0 c008d73c c0474e20 c3e766a8 c3c7bee4 c3c7bec0 c0089e48 c008d74c 
bec0: c049abc0 c3c7bf04 00000003 ffffff9c c002c044 c3d10000 c3c7befc c3c7bee8 
bee0: c0089f64 c0089d58 00000000 00000002 c3c7bf68 c3c7bf00 c0089fb8 c0089f40 
bf00: c3c7bf04 c3e766a8 c0474e20 00000000 00000000 c3eb1000 00000101 00000001 
bf20: 00000000 c3c7a000 c04a7468 c04a7460 ffffffe8 c3d10000 c3c7bf68 c3c7bf48 
bf40: c008a16c c009fc70 00000003 00000000 c049abc0 00000002 bec1fee0 c3c7bf94 
bf60: c3c7bf6c c008a2f4 c0089f88 00008520 bec1fed4 0000860c 00008670 00000005 
bf80: c002c044 4013365c c3c7bfa4 c3c7bf98 c008a3a8 c008a2b0 00000000 c3c7bfa8 
bfa0: c002bea0 c008a394 bec1fed4 0000860c 00008720 00000002 bec1fee0 00000001 
bfc0: bec1fed4 0000860c 00008670 00000002 00008520 00000000 4013365c bec1fea8 
bfe0: 00000000 bec1fe84 0000266c 400c98e0 60000010 00008720 00000000 00000000 


Backtrace: (回溯)
[<bf000000>] (first_drv_open+0x0/0x3c [first_drv]) from [<c008d888>] (chrdev_open+0x14c/0x164)
[<c008d73c>] (chrdev_open+0x0/0x164) from [<c0089e48>] (__dentry_open+0x100/0x1e8)
 r8:c3e766a8 r7:c0474e20 r6:c008d73c r5:c3e740c0 r4:c049abc0
[<c0089d48>] (__dentry_open+0x0/0x1e8) from [<c0089f64>] (nameidata_to_filp+0x34/0x48)
[<c0089f30>] (nameidata_to_filp+0x0/0x48) from [<c0089fb8>] (do_filp_open+0x40/0x48)
 r4:00000002
[<c0089f78>] (do_filp_open+0x0/0x48) from [<c008a2f4>] (do_sys_open+0x54/0xe4)
 r5:bec1fee0 r4:00000002
[<c008a2a0>] (do_sys_open+0x0/0xe4) from [<c008a3a8>] (sys_open+0x24/0x28)
[<c008a384>] (sys_open+0x0/0x28) from [<c002bea0>] (ret_fast_syscall+0x0/0x2c)
Code: e24cb004 e59f1024 e3a00000 e5912000 (e5923000) 

Segmentation fault


3、驱动调试之段错误分析_根据栈信息确

定函数调用过程






4、驱动调试之自制工具_寄存器编辑器







5、驱动调试之修改系统时钟中断定位系统僵死问题

          思路:其实是通过在时钟的中断函数里面打印出 PC值,然后 确定是内核的PC ,还是insmod引入的,通过System.map(内核的),kallsyms(非内核的),找打对应的出错的函数, 然后把对应的文件反汇编,找到对应的出错地点   

       内核的时钟是一直在运行的,所以可以在内核的时钟中断里面,打印一些信息

      (1) 在内核程序里面搜 Timer_Tick,找到中断处理函数

         

在内核的定时器中断添加处理代码,打印信息,连续10S 都是同一个进程就打印进程号和名字,这样可以知道在哪个进程发生错误。

 



中断的处理过程,会保存现场,把pc 值打印出来就可以知道在哪里发生错误



中断异常处理的过程,找到asm_do_IRQ



找到中断的asm_do_IRQ ,找到保存现场的结构体 pt_regs ,找到

PC 值



修改中断的总入口函数asm_do_IRQ,当中断号为 30 (系统中断的中断号),和10S 内都是同一个进程,然后打印出PC值,进程名称和进程号。






根据PC(是内核的,还是imsmod 引进的)反推哪里出问

题,要用同一个内核


引出符号表,打开符号表(包含函数地址的信息),找一个相

近的地址,然后就 找到发生错误的函数,就可以反汇编该函

数,找到在那条指令附近出错。


 



Guess you like

Origin blog.csdn.net/qq_26690505/article/details/79287084