foreword

Refer to my article for the operation process of gdb debugging program under linux: gdb debugging operation ; no more description here;

The main content of this article is to introduce the underlying debugging principle of GDB local debugging. Let's take a look at what mechanism GDB uses to control the execution order of the debugged program;

The summary part is the underlying principle of breakpoint debugging, you can jump directly to see the general framework first!

local debugging

Local debugging: The debugger and the debugged program run on the same computer .

remote debugging

Remote debugging: The debugger runs on one computer, and the debugged program runs on another computer.

The visual debugger is not the point, it is just a shell used to encapsulate GDB, and we interact with the gdb debugger through it.

We can use the dark terminal window to manually enter debugging commands; we can also choose the integrated development environment (IDE), which has embedded debugger,

GDB debugging instructions

Just post a few instructions, and you can introduce them later;

Each debugging command has a lot of command options, such as breakpoints, including: setting breakpoints, deleting breakpoints, conditional breakpoints, temporarily disabling and enabling, and so on. The focus of this article is to understand the underlying debugging mechanism of gdb, so the usage of these instructions in the application layer is not listed anymore, and there are many resources on the Internet.

The relationship between GDB and the program being debugged

For the convenience of description, first write the simplest C program:

#include <stdio.h>

int main(int argc, char *argv[])
{
    
    
    int a = 1;
    int b = 2;
    int c = a + b;
    printf("c = %d \n", c);
    return 0;
}

Compile command:

$ gcc -g test.c -o test //记得-g选项，生成debug版的可执行程序

We debug the executable program test and enter the command:

$ gdb ./test

The output is as follows:

In the last line, you can see that the cursor is blinking. This is the gdb program waiting for us to issue debugging commands to it.

When the above dark terminal window is executing gdb ./test, many complicated things happen in the operating system:

Two actions performed by GDB when debugging a program

The system will first start the gdb debugging process , which will call the system function fork() to create a child process . At this time, the child process will do two things :

Call the system function ptrace(PTRACE_TRACEME, [other parameters]) to let the parent process gdb track itself;
Then, the executable program test is loaded and executed through exec , and then the test program starts to execute in the child process loaded by the child exec .

ptrace system call

After laying the groundwork for a long time, it is finally the protagonist's turn to appear, that is the system call function ptrace (the parameters will be explained later), it is with its help that gdb has a powerful debugging ability. The function prototype is:

#include <sys/ptrace.h>
long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);

The ptrace system function is a system call provided by the Linux kernel for process tracking . Through it, a process (gdb) can read and write the values of instruction space, data space, stack and registers of another process (test) . Moreover, the gdb process takes over all the signals of the test process , that is to say, all the signals sent by the system to the test process are received by the gdb process. In this way, the execution of the test process is controlled by gdb , so as to achieve the input of various gdb Instructions, the purpose of debugging the control program.

Note that ptrace is called by the child process , let the parent process gdb track itself, and then exec replaces the target debugger;

That is to say, if there is no gdb debugging , the operating system and the target process interact directly;

If you use gdb to debug the program, then the signal sent by the operating system to the target process will be intercepted by gdb , and gdb will decide according to the attribute of the signal: whether to transfer the currently intercepted signal to the target program when continuing to run the target program, so that , the target program performs corresponding actions under the command of the signal sent by gdb .

Analyze how GDB implements breakpoint instructions (the underlying principle of debugging)

I was asked in an interview, why did the debugged program stop at a breakpoint? This is often used but I don’t know the underlying point, which has troubled me for a long time. Today, with the relationship between the gdb process and the source process, I can analyze the underlying principle of the breakpoint!

In the following code example, gdb uses the break(b) command to study the underlying debugging principles:

#include <stdio.h>

int main(int argc, char *argv[])
{
    
    
    int a = 1;
    int b = 2;
    int c = a + b;
    printf("c = %d \n", c);
    return 0;
}

gcc -S test.c; cat test.S view disassembly code

、

As mentioned above, after executing gdb ./test , gdb will fork a sub-process , which first calls ptrace and then execs to execute the test program , so that the debugging environment is ready.

Enter the set breakpoint instruction "break 5" in the debug window, and gdb will do two things at this time:

The 10th line of assembly code corresponding to the 5th line of source code is stored in the breakpoint linked list .
In the 10th line of the assembly code, insert the interrupt instruction INT3 , that is to say: the 10th line of the assembly code is replaced by INT3.

The operation code of the INT 3 instruction is 0xcc, which is a soft interrupt instruction specially used for debugging, also known as a breakpoint instruction. After the CPU executes it, the OS will send a SIGTRAP No. 3 signal to the source program; after gdb captures it, it will detect When the breakpoint is reached, the debugged program will be "suspended" and paused, enter the T state of TASK_TRACED , and wait for the gdb program to be further debugged; the following will introduce in detail;

Then, continue to input the execution instruction "run" instruction in the debug window (continuously execute until it encounters a breakpoint and pause), when the PC pointer in the assembly code is executed to the 10th line, it is found to be an INT3 (interrupt trap) instruction, so the operating system Just send a SIGTRAP signal to the test process .

At this moment, the assembly code on line 10 has been executed, and the PC pointer points to line 11.

As mentioned above, any signal sent by the operating system to test is taken over by gdb, that is to say, gdb will first receive the SIGTRAP signal , and gdb finds that the current assembly code is executing line 10, so it goes to the breakpoint linked list Search in and find that the code of line 10 is stored in the breakpoint linked list , indicating that a breakpoint is set on line 10 . So gdb will suspend the debugged test process (enter TASK_TRACED; T state)! Then do 2 more operations:

Replace the 10th line "INT3" in the assembly code with the original code in the breakpoint list .
Set the PC pointer back one step, that is, set it to point to the code just replaced in line 10 .

Then, gdb continues to wait for the user's debugging instructions...At this point the test process is suspended

At this moment, it means that the next executed instruction is the 10th line of the assembly code and the 5th line of the source program; from the perspective of our debugger, the debugged program is paused at the breakpoint of the 5th line of the source program. At this time, we can continue to input other debugging commands to debug, such as: view variable values, info... view stack information, bt... modify the value of local variables, etc. It is also because of the restoration and replacement code after the interruption, and the operation of the PC pointer rollback, which ensures that the code replaced before we interrupted this line can be restored and executed normally!

Further analysis of the debugging of the next instruction

Still take the source code and assembly code just now as an example, assuming that the program stops at line 6 of the source code, that is, line 11 of the assembly code:

Enter the single-step execution command next in the debug window. Our purpose is to execute a line of code, that is, to execute the code on line 6 in the source code, and then stop at line 7.

When gdb receives the next execution, it will calculate the source code of line 7 (after executing the sixth line, it should stop at the position of the source code of line 7), which should correspond to line 14 of the assembly code, so gdb controls the PC in the assembly code The pointer has been executed until the execution of the 13th line ends, that is, when the PC points to the 14th line, it stops , and then continues to wait for the user to input debugging instructions for further debugging.

Through the two debugging instructions break and next and the relationship between the gdb debugging process and the source program, as well as the participation of the OS system, we have understood how gdb handles debugging instructions. Of course, there are many more debugging instructions in gdb, including more complex methods of obtaining stack information, modifying variable values, etc. Interested partners can continue to follow in depth.

How to debug a running process?

The above is all about debugging the process that is not running, so how to debug the running service? :

If you want to debug an already executed process B, then you need to call ptrace(PTRACE_ATTACH,[other parameters]) in the parent process of gdb (the child process calls this ptrace when debugging is not running, this is the difference);
At this time, the gdb process will attach (bind) to the already executed process B, and gdb adopts the process B as its own child process . For the child process B, it is equivalent to performing a PTRACE_TRACEME operation by itself.

At this time, the gdb process will send a SIGTRAP signal to the child process B. After the child process B receives the SIGTRAP signal, it will suspend execution and enter the TASK_TRACED state , indicating that it is ready to be debugged . Wait for the next operation of the debugger gdb

Summarize

Suppose we want to stop at the break point on line 5, the following will happen:

Start the gdb+ debugging program, gdb calls fork to create a child process, and this child process calls the ptrace system call to let the gdb parent process track itself , and then executes the exec program replacement to load the debugger test into it, realizing the control of the gdb program over test ;

Then enter the b 5 command: 1. Record the assembly code on line 5 to the breakpoint linked list (used to confirm the breakpoint and restore the code); 2. Replace the INT3 interrupt instruction in the assembly code (a soft interrupt, interrupt trap);

Then input the r command: After the debugger runs to the INT3 position , the OS recognizes the interrupt INT3, sends the SIGTRAP signal , and the debugger gdb gets it first ! In the breakpoint list search just now , it is found that there is a breakpoint with this setting, directly suspend the program , then 1. PC– , 2. The original assembly code of the fifth line is replaced , and then wait until the next step of gdb operation (for example: view variable value , view stack information, modify the value of local variables, etc.)

The underlying principle of Linux gdb debugging