In-depth understanding of debuginfo

In-depth understanding of debuginfo

Google DriveOriginal

@Chinainvent


1. Doubts about debuginfo
Programmers should know that in order to use gdb to track the program, they need to use the -g option of gcc during compilation. For the system library or Linux kernel, when using gdb to debug or use systemtap to detect, you need to install the corresponding debuginfo package.


For example, glibc and its debuginfo package are:

[yunkai@fedora t]$ rpm -qa | grep glibc

glibc-2.18-12.fc20.x86_64

glibc-debuginfo-2.18-12.fc20.x86_64

...


So I can't help but have the following questions:

What information is contained in glibc-debuginfo?

How is glibc-debuginfo created?

How does gdb or systemtap associate glibc with glibc-debuginfo?

This article will use some examples to answer these questions.


2. What information is included in debuginfo?
Let's take a look at what is included in glibc-debuginfo:

[yunkai@fedora t]$ rpm -ql glibc-debuginfo-2.18-12.fc20.x86_64

/usr/lib/debug

/usr/lib/debug/.build-id

/usr/lib/debug/.build-id/00

/usr/lib/debug/.build-id/00/a32f1b9405f5fcd41a7618f3c2c895ee4aab09

/usr/lib/debug/.build-id/00/a32f1b9405f5fcd41a7618f3c2c895ee4aab09.debug

/usr/lib/debug/lib64/libthread_db.so.1.debug

/usr/lib/debug/lib64/libutil-2.18.so.debug

/usr/lib/debug/lib64/libc-2.18.so.debug

/usr/src/debug/glibc-2.18/wcsmbs/wcwidth.h

/usr/src/debug/glibc-2.18/wcsmbs/wmemchr.c

/usr/src/debug/glibc-2.18/wcsmbs/wmemcmp.c

...


It can be seen from the above that glibc-debuginfo roughly has three types of files:

Stored under /usr/lib/debug/: .build-id/nn/nnn...nnn.debug file, the file name is hash key.

For other *.debug files stored under /usr/lib/debug/, the file name is the library file name + .debug suffix.

glibc source code


When debugging with gdb, it is necessary to establish a mapping relationship between machine code and source code. This requires three pieces of information:

Machine code: executable files, dynamic link libraries, for example: /lib64/libc-2.18.so

Source code: Obviously, it is the source files such as *.c and *.h contained in glibc-debuginfo.

Mapping relationship: You should have guessed it, they are stored in *.debug files.


3. How is debuginfo created?
When we use gcc's -g option to compile a program, the mapping relationship between machine code and source code will be merged with executable programs and dynamic link libraries by default. For example, the following a.out executable program already contains the mapping relationship:

[yunkai@fedora t]$ nl main.c 

     1  #include <stdio.h>

       

     2  int main()

     3  {

     4    printf("hello, world!\n");

     5    return 0;

     6  }

[yunkai@fedora t]$ gcc -g main.c 

[yunkai@fedora t]$ ls -l

total 16

-rwxrwxr-x 1 yunkai yunkai 9502 Apr 9 14:55 a.out

-rw-rw-r-- 1 yunkai yunkai 76 Apr 9 14:49 main.c


Combining debugging information such as mapping relationships with executable files and dynamic link libraries will bring about an obvious problem: the size of executable files or libraries becomes very large. This is unnecessary for normal users who don't care about debugging information.


For example, if the Linux kernel is equipped with Debuginfo, the size of the Linux kernel will increase by hundreds of megabytes needlessly. If all the libraries of a Linux operating system have their own Debuginfo, then just a clean operating system needs to waste several gigabytes or even a dozen gigabytes of disk space. If it is installed through the network, it will also waste the bandwidth of all users and significantly slow down the progress of the installation. Just to solve this problem, various programs and libraries on Linux have already extracted Debuginfo separately when generating RPM, thus forming an independent debuginfo package.


The question is, how to make the program generate separate debuginfo? We can achieve this through the --only-keep-debug option of the objcopy command. The following command reads the debug information from a.out and writes it to the a.out.debug file:

[yunkai@fedora t]$ objcopy --only-keep-debug ./a.out a.out.debug

[yunkai@fedora t]$ ls -l

total 24

-rwxrwxr-x 1 yunkai yunkai 9502 Apr 9 14:55 a.out

-rwxrwxr-x 1 yunkai yunkai 6022 Apr 9 15:22 a.out.debug

-rw-rw-r-- 1 yunkai yunkai 76 Apr 9 14:49 main.c

Now that the debugging information has been saved in the a.out.debug file, you can use the --strip-debug option of objcopy to slim down a.out (you can also use strip --strip-debug ./a.out, the effect Same):

[yunkai@fedora t]$ objcopy --strip-debug ./a.out

[yunkai@fedora t]$ ls -l

total 24

-rwxrwxr-x 1 yunkai yunkai 8388 Apr 9 15:27 a.out

-rwxrwxr-x 1 yunkai yunkai 6022 Apr 9 15:22 a.out.debug

-rw-rw-r-- 1 yunkai yunkai 76 Apr 9 14:49 main.c


After clearing the debugging information from a.out, use gdb to debug a.out, it will report no debugging symbols found:

[yunkai@fedora t]$ gdb ./a.out

GNU gdb (GDB) Fedora 7.6.50.20130731-19.fc20

...

Reading symbols from /home/yunkai/t/a.out...(no debugging symbols found)...done.

(gdb) 


Obviously, gdb can't find the debug information. Therefore, we need to bury some clues in a.out, so that gdb can use these clues to correctly find its corresponding debug file: a.out.debug.


Under Linux, executable files or libraries are usually in ELF (Executable and Linkable Format) format. This format contains session headers. The clues of debugging information can just be saved through an agreed session header, which is called .gnu_debuglink. The file name of the debug information (a.out.debug) can be saved to the .gnu_debuglink header of a.out through the --add-gnu-debuglink option of objcopy. Then gdb can debug normally:

[yunkai@fedora t]$ objcopy --add-gnu-debuglink=a.out.debug ./a.out

[yunkai@fedora t]$ objdump -s -j .gnu_debuglink ./a.out


./a.out:     file format elf64-x86-64


Contents of section .gnu_debuglink:

 0000 612e6f75 742e6465 62756700 3fe5803b  a.out.debug.?..;

[yunkai@fedora t]$ gdb a.out

...

Reading symbols from /home/yunkai/t/a.out...Reading symbols from /home/yunkai/t/a.out.debug...done.


The objcopy command above actually writes the file name of a.out.debug and the CRC check code of this file into the value of the header .gnu_debuglink, but it does not tell the path where a.out.debug is located (through the above The objdump command can print out the contents of the .gnu_debuglink header).


So what rules does gdb follow to find the a.out.debug file? Before answering this question, let's look at another session header called .note.gnu.build-id:

[yunkai@fedorat]$readelf -t ./a.out | grep build-id

  [ 3] .note.gnu.build-id

[yunkai@fedorat]$ readelf -n ./a.out

...

Notes at offset 0x00000274 with length 0x00000024:

  Owner                 Data size       Description

  GNU                  0x00000014       NT_GNU_BUILD_ID (unique build ID bitstring)

    Build ID: 888010ffb999590e7158422ea813169be34085a1


[yunkai@fedora t]$ readelf -n ./a.out.debug 

...

Notes at offset 0x00000274 with length 0x00000024:

  Owner                 Data size       Description

  GNU                  0x00000014       NT_GNU_BUILD_ID (unique build ID bitstring)

    Build ID: 888010ffb999590e7158422ea813169be34085a1


This session header exists natively in a.out, so it is also copied to a.out.debug. This header saves a Build ID, which is automatically calculated based on the a.out file, and each execution file or library has its unique Build ID.


In Section 2, we noticed this kind of file: .build-id/nn/nnnn...nnnn.debug, the first two "nn" are the first two digits of its Build ID, and the latter nnnn...nnnn is the rest of the Build ID. And this nnnn...nnnn.debug file just changed its name.


And gdb looks for the a.out.debug file in the following order:

<global debug directory>/.build-id/nn/nnnn...nnnn.a.out.debug

<the path of a.out>/a.out.debug

<the path of a.out>/.debug/a.out.debug

<global debug directory>/<the patch of a.out>/a.out.debug


And <global debug directory> defaults to /usr/lib/debug/. You can set or view this value through the set/show debug-file-directory command in gdb:

[yunkai@fedora t]$ gdb ./a.out

...

(gdb) show debug-file-directory

The directory where separate debug symbols are searched for is "/usr/lib/debug".


 


Since the Build ID of a.out is: 888010ffb999590e7158422ea813169be34085a1, you can move the a.out.debug file to /usr/lib/debug/.build-id/88/8010ffb999590e7158422ea813169be34085a1.debug:

[yunkai@fedora t]$ sudo cp a.out.debug \

/usr/lib/debug/.build-id/88/8010ffb999590e7158422ea813169be34085a1.debug

[yunkai@fedora t]$ gdb ./a.out

...

Reading symbols from /home/yunkai/t/a.out...Reading symbols from /usr/lib/debug/.build-id/88/8010ffb999590e7158422ea813169be34085a1.debug...done.

done.


It can be seen from the above that gdb will preferentially find the corresponding debug information from /usr/lib/debug/.build-id/.


4. What's in a.out.debug?
gcc currently uses the DWARF 4 format to save debugging information by default. You can view the contents of DWARF by readelf -w:

[yunkai@fedora t]$ readelf -w ./a.out.debug

...

Contents of the .debug_info section:


  Compilation Unit @ offset 0x0:

   Length:        0x8d (32-bit)

   Version:       4

   Abbrev Offset: 0x0

   Pointer Size:  8

 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)

    <c>   DW_AT_producer    : (indirect string, offset: 0x6a): GNU C 4.8.2 20131212 (Red Hat 4.8.2-7) -mtune=generic -march=x86-64 -g

    <10>   DW_AT_language    : 1        (ANSI C)

    <11>   DW_AT_name        : (indirect string, offset: 0x2f): main.c

    <15>   DW_AT_comp_dir    : (indirect string, offset: 0x5b): /home/yunkai/t

...


DWARF internally forms a call tree through DIE (Debugging Information Entry). When DWARF is designed, it takes into account the support of various languages. Although it usually works with ELF format files, it does not actually rely on ELF.


Due to the relatively free design of DWARF, it not only supports C/C++, but also supports the expression of debugging information in almost all languages ​​such as Java/Python.


In DWARF, it usually includes: the line number table of the mapping relationship between source code and machine code, macro information, inline function information, Call Frame information, etc.


But for ordinary users, you usually don't need to know too many details of DWARF. If you are curious, it is recommended to read Document 5.


5. Generate Marker probes in the code
Through the -g option of gcc, all function names will automatically generate corresponding debuginfo for systemtap to detect. This method is called in English: Debuginfo-based instrumentation, and its limitations The disadvantage is that only the context information of the initial moment of the function call and the end moment of the function return can be collected.


In order to solve this problem, a new method is proposed: Compiled-in instrumentation, which allows programmers to insert probes into a specified line of code, so as to collect context information when that line of code is executed. This type of probe is called a Marker probe.


To write a Marker probe, you need to include the header file in the code:

#include <sys/sdt.h>


Then on the target line, insert one of the following Marker macros:

DTRACE_PROBE(provider, name)

DTRACE_PROBE4(provider, name, arg1, arg2, arg3, arg4)


After writing the Marker probe and successfully compiling it, you can use the following systemtap command to check whether the Marker probe takes effect:

stap -L 'process("/path/to/a.out").mark("*")'


For more specific operation methods, please refer to Document 6. It is worth mentioning that the Marker probe is very lightweight and has almost no impact on the performance of the program, because it only generates nop assembly instructions in the code. It is implemented by saving the context information of the scene in a specific section header (.stapsdt.base) of the ELF file, which only increases the size of the debuginfo file.

6. References
http://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html

http://sourceware.org/binutils/docs-2.17/binutils/objcopy.html

https://blogs.oracle.com/dbx/entry/gnu_debuglink_or_debugging_system

https://blogs.oracle.com/dbx/entry/creating_separate_debug_info

http://dwarfstd.org/doc/DWARF4.pdf

https://sourceware.org/systemtap/wiki/AddingUserSpaceProbingToApps
————————————————
Copyright Notice: This article is the original article of CSDN blogger "Chinainvent", following CC 4.0 BY-SA Copyright agreement, please attach the original source link and this statement for reprinting.
Original link: https://blog.csdn.net/chinainvent/article/details/24129311

Guess you like

Origin blog.csdn.net/ayang1986/article/details/121522938