9.6. Sections and the Section Header Table

9.6. Sections and the Section Header Table

The section header table contains information about every part of an ELF file except the ELF header, the program header table, and the section header table itself. The section header table is a list (or array) of section header structures, each defining a different section in the ELF file.

节头表包含有关 ELF 文件的每个部分的信息, 除了 ELF 头、程序头表和节头表本身之外。节头表是节头结构的列表 (或数组), 每一个都定义了 ELF 文件中的不同节。

The following is the structure of a section header table entry for a 32-bit ELF file. Refer to the /usr/include/elf.h file for the 64-bit version.

下面是32位 ELF 文件的节头表项的结构。64位版本请参阅的/usr/include/elf.h 文件。

typedef struct {

  Elf32_Word    sh_name;

  Elf32_Word    sh_type;

  Elf32_Word    sh_flags;

  Elf32_Addr    sh_addr;

  Elf32_Off     sh_offset;

  Elf32_Word    sh_size;

  Elf32_Word    sh_link;

  Elf32_Word    sh_info;

  Elf32_Word    sh_addralign;

  Elf32_Word    sh_entsize;

} Elf32_Shdr;

 

sh_name

The numeric offset into the string table for the section name.

sh_type

The type of section.

sh_flags

A bit mask of miscellaneous attributes.

sh_addr

The memory address at which this section should reside in a process. The value will be zero if the section will not appear in a process memory image.

sh_offset

Contains the file offset of the actual section data. If the section type is SHT_NOBITS the section occupies no space in the file although the offset may still exist and will represent the offset as if the section was loaded in memory.

sh_size

The size of the actual section data.

sh_link

The meaning of this field depends on the section type.

sh_info

Contains extra information that depends on the section type.

sh_addralign

The alignment requirements for the section.

sh_entsize

The size of fix-sized elements for sections that use them. Example is a symbol table.

 

The ELF header contains the file offset of the section header table (e_shoff), the size of an entry in the table (e_shentsize), and the number of entries in the table (e_shnum). This is everything needed to find and sift through the contents of the section header table:

ELF 头包含节头表 (e_shoff) 的文件偏移量、表中条目的大小 (e_shentsize) 以及表中的条目数 (e_shnum)。下面是查找和筛选节头表内容所需的信息:

penguin> readelf -h foo |egrep section

  Start of section headers:           9292 (bytes into file)

  Size of section headers:            40 (bytes)

Number of section headers:            35

 

Let’s take a look at the details behind the scenes again in the same way we did for the ELF header. This is useful to understand how the section header table and ELF string tables work.

让我们看看细节, 就像我们为ELF头做的一样。这对于了解节头表和 ELF 字符串表的工作方式非常有用。

According to the ELF header for the executable foo, the section header table offset is 9292 bytes, the size of a section header is 40 bytes, and there are 35 section headers. A hex dump of the file at the offset for the section header table shows (note that the “*” in the output denotes an identical row to the previous):

根据可执行文件 foo 的 ELF 头, 节头表偏移量为9292字节, 节头的大小为40字节, 并且有35个节头。在ELF文件节头表的偏移量处,十六进制转储显示如下 (请注意, 输出中的 "*" 表示与上一个相同的行):

Code View: Scroll / Show All

penguin> hexdump -C -s 9292 -n 160 foo

0000244c   00  00 00  00  00 00  00  00   00 00 00 00 00 00 00 00 |................|

*

0000246c   00  00 00  00  00 00  00  00   1b 00 00 00 01 00 00 00 |................|

0000247c   02  00 00  00  14 81  04  08   14 01 00 00 13 00 00 00 |................|

 

The first section header (at file offset 0x244c for foo) always has a NULL type and can be ignored. Because the 32-bit section header structure is 40 (0x28) bytes, the next section header starts at 40 bytes after the first at 0x2474:

第一个节头 (在 foo 的偏移0x244c 中) 始终具有 NULL 类型, 可以忽略。由于32位节头结构大小为 40 (0x28) 字节, 所以下一节头在0x2474后的第一个40个字节处开始::

Code View: Scroll / Show All

penguin> hexdump -C -s 0x2474 -n 40 foo

00002474   1b  00  00 00  01 00  00  00   02 00 00 00 14 81 04 08 |................|

00002484   14  01  00 00  13 00  00  00   00 00 00 00 00 00 00 00 |................|

00002494  01 00 00 00 00 00 00 00                      |........|

0000249c

 

We can get the values of the section header structure by mapping it onto the raw data at 0x2474:

通过将节头结构的值映射到0x2474 中, 我们可以获得该部分的的原始数据:

sh_name:  0x1b    (section name is at offset 0x1b in string table)

sh_type:  0x1     (SHT_PROGBITS)

sh_flags: 0x2     (SHF_ALLOC)

sh_addr:  0x08048114 (virtual address)

sh_offset:        0x114   (file offset)

sh_size:  0x13    (total size in bytes)

sh_link:  0x0

sh_info:  0x0

sh_addralign:     0x1    (needs to be aligned in a single byte boundary)

sh_entsize:       0x0    (does not use fixed size elements)

 

The first field is the offset into the string table for the section name. To get the section name, we first have to find the string table and then look at offset 0x1b in it. There should be a NULL terminated string at offset 0x1b in the string table that is the name of this first section. According to the ELF header, the section header table index for the section header string table is 32:

第一个字段是节名的字符串表中的偏移量。要获取节名,我们首先必须找到字符串表,然后查看其中的偏移量0x1b。字符串表中的偏移量0x1b处应该有一个以NULL结尾的字符串,该字符串是第一部分的名称。根据ELF头,节头字符串表的节头表索引是32::

penquin> readelf -h foo | tail -2

  Number of section headers:         35

  Section header string table index: 32

 

To find the string table, we need to use the size of an element in the section header table (40 bytes), multiply it by 32 (the section header table index of the string table), and add the result to 0x244c, which is the file offset of the start of the section header table.

若要查找字符串表, 请我们需要使用节头表中元素的大小 (40 字节), 乘以 32 (字符串表的节头表索引), 然后将结果添加到 0x244c, 这是节头表开头的文件偏移量。

32 x 40 = 1280 = 0x500

0x500 + 0x244C = 0x294C

 

This offset in the file is only the entry in the section header table for the string table and is not the string table itself.

文件中的此偏移量只是字符串表的节头表中的项, 而不是字符串表本身。

Code View: Scroll / Show All

penguin> hexdump -C -s 0x294C -n 40 foo

0000294c   11  00  00 00  03 00  00  00   00 00 00 00 00 00 00 00 |................|

0000295c   13  23  00 00  39 01  00  00   00 00 00 00 00 00 00 00 |.#..9...........|

0000296c  01 00 00 00 00 00 00 00                      |........|

00002974

 

Mapping the section header structure onto this raw data (sh_offset is at offset 16, and sh_size is at offset 20 in the section header structure) shows us that the offset to the string table section is 0x2313 (as shown at offset 0x295C), and the size is 0x139 (directly after the sh_offset field). Remember that this platform is little endian, so a hexdump of 1323 is actually a value of 0x2313. A hex dump of this file offset shows:

将节头结构映射到此原始数据 (sh_offset 偏移量为 16, sh_size 在节头结构中的偏移量 20) 显示了字符串表节的偏移量 0x2313 (如偏移0x295C 所示), 大小为 0x139 (直接在 sh_offset 领域之后)。请记住, 这个平台是little endian, 所以1323的值实际上是0x2313 。此文件偏移量的十六进制转储显示:

Code View: Scroll / Show All

penguin> hexdump -C -s 0x2313 foo |head

00002313   00  2e  73 79  6d 74  61 62  00  2e  73  74  72  74  61  62 |..symtab..strtab|

00002323   00  2e  73 68  73 74  72 74  61  62  00  2e  69  6e  74  65 |..shstrtab..inte|

00002333  72 70 00 2e 6e 6f 74 65 2e 41 42 49 2d 74 61 67 |rp..note.ABI-tag|

00002343    00  2e  68  61  73  68  00 2e   64  79  6e  73 79 6d 00  2e |..hash..dynsym..|

 

Now we need to add the offset (0x1b) of the section name in the string table to the offset of the string table itself (0x2313) to get 0x232e. The section name at this offset can be found with yet another hexdump:

现在我们需要将字符串表中的节名的偏移量(0x1b)添加到字符串表本身的偏移量(0x2313)以获得0x232e。可以使用另一个16进制输出找到此偏移处的节名称:

Code View: Scroll / Show All

penguin> hexdump -C -s 0x232e foo | head -2

0000232e   2e  69  6e 74  65 72  70  00   2e 6e 6f 74 65 2e 41 42 |.interp..note.AB|

0000233e  49 2d  74  61 67 00 2e 68  61 73 68 00 2e 64 79 6e  |I-tag..hash..dyn|

 

This offset contains the string .interp, which is the name of the first useful section. Whew! This is a lot of work, but again, the point is to show that there is no magic in an ELF file format.

此偏移量包含字符串.interp,它是第一个有用部分的名称。呼! 这是重大进展,但同样重要的是表明ELF文件格式没有魔力。

After all of this work, we know that the first useful section header (the first actual section is a NULL section) is for the name of the program interpreter. Other sections can include global variables, the machine instructions, and many other types of data. The contents of a section depend entirely on its type and purpose.

在所有这些工作之后, 我们知道第一个有用的节头 (第一个实际部分是空节) 是用于程序解释器的名称。其他部分可以包括全局变量、机器指令和许多其他类型的数据。节的内容完全取决于节的类型和用途。

Of course, there is a much easier way to view the section headers using readelf (don’t discount the importance of understanding how this really works, though).

当然,使用readelf查看节头是一种更简单的方法(尽管如此,不要忽视了解它是如何工作的重要性)。

penguin> readelf -S foo | head

There are 35 section headers, starting at offset 0x244c:

 

Section Headers:

  [Nr] Name          Type   Addr     Off    Size   ES Flg Lk Inf Al

  [ 0]               NULL   00000000 000000 000000 00  0  0   0

  [ 1] .interp       PROGBITS 08048114 000114 000013 00 A  0   0  1

  [ 2] .note.ABI-tag NOTE   08048128 000128 000020 00   A  0   0  4

  [ 3] .hash         HASH   08048148 000148 000094 04   A  4   0  4

  [ 4] .dynsym       DYNSYM 080481dc 0001dc 000120 10   A  1  4

  [ 5] .dynstr       STRTAB 080482fc 0002fc 00011a 00   A  0  0  1

<...>

 

The ELF specification contains a full list of section types. Only the most common and important ones are covered in detail next. We’ll start with two common section formats, symbol table and string table, because there is more than one of these section types described.

ELF 标准包含节类型的完整列表。接下来, 只有最常见和最重要的内容被详细介绍。我们将从两个常用节格式: 符号表和字符串表开始, 因为所描述的节类型不止一个。

9.6.1. String Table Format

The string table contains a list of all strings that are used by the ELF specification. The string table is very simple. It is a range of space that contains a list of NULL terminated strings, one after the other. The strings are indexed by offset from the beginning of the file, the same offset as from base address for the shared library or executable.

字符串表包含ELF规范使用的所有字符串的列表。字符串表非常简单。它是一个包含NULL终止字符串列表的空间范围。字符串从文件开头的偏移量开始索引,与共享库或可执行文件的基址相同。

There can be a number of string tables in an ELF file, including one for the dynamic symbol table, one for the main symbol table, and one for the section header names. String tables all have the same simple format:

ELF 文件中有许多字符串表, 其中包括一个动态符号表, 一个主符号表, 一个节头名称。字符串表都具有相同的简单格式:

<string1>\0<string2>\0<string3>\0...<stringN>\0\0

 

An index into a string table will point to the start of a string.

字符串表中的索引将指向字符串的开头。

9.6.2. Symbol Table Format

A symbol table is an array of ELF symbol structures that describe a function, variable, or other type of symbol. As discussed at the beginning of this chapter, a symbol table is like a phone book for functions and variables in an ELF file.

符号表是一个ELF符号结构的数组, 用来描述函数、变量或其他类型的符号。正如本章开头所讨论的, 符号表就像ELF文件中函数和变量的电话簿。

There are actually two symbol tables for an ELF file. One is called the dynamic symbol table and is used at run time to find the various symbols in the ELF object. The other is the main symbol table and contains all of the symbols for an ELF object, including static symbol information that is not used at run time. The main symbol table is used at link time to find all of the unresolved symbols.

Each element of the array has the following structure (from /usr/include/elf.h):

实际上, ELF文件有两个符号表。一个称为动态符号表, 在运行时用于查找 ELF 对象中的各种符号。另一个是主符号表, 包含 ELF 对象的所有符号, 包括在运行时不使用的静态符号信息。在链接时使用主符号表来查找所有未解析的符号。数组的每个元素都具有以下结构 (/usr/include/elf.h):

typedef struct

{

  Elf32_Word    st_name;     /* Symbol name (string tbl index) */

  Elf32_Addr    st_value;    /* Symbol value */

  Elf32_Word    st_size;     /* Symbol size */

  unsigned char st_info;     /* Symbol type and binding */

  unsigned char st_other;    /* Symbol visibility */

  Elf32_Section st_shndx;    /* Section index */

} Elf32_Sym;

 

The st_name field is the string table index for the name of the symbol (Note: The dynamic and main symbol tables both have their own string table.). The “value” is either the offset in the ELF file, the address of the symbol when it will be loaded, or the offset in the section that contains the symbol. Executables will have values that are actual addresses; whereas, shared libraries will have offset into the ELF file. The difference is due to the fact that shared libraries do not have a specific load addresses for their memory segments. Executables, on the other hand, must be loaded at address 0x08048000 for 32-bit Linux. The st_size file is the actual size of the item described by the symbol entry. This could be a variable, function, or other. The st_info field describes the type of binding (local, global, and so on) and the type of symbol (variable, function, etc.). The st_other field contains information about the visibility of a symbol. The st_shndx describes which section contains the item described by the symbol entry.

st_name字段是符号名称的字符串表索引(注意:动态符号表和主符号表都有自己的字符串表)。“值”或者是ELF文件中的偏移量,即加载时符号的地址,或者是包含符号的节的偏移量。可执行文件的值将是实际地址; 而共享库的值是对ELF文件的偏移。不同之处在于共享库没有内存段的特定加载地址。另一方面,对于32位Linux,必须在地址0x08048000处加载可执行文件。 st_size文件是符号条目描述的项目的实际大小。这可以是变量,函数或其他。 st_info字段描述了绑定的类型(本地,全局等)和符号的类型(变量,函数等)。 st_other字段包含符号可见性的信息。 st_shndx描述哪个部分包含符号条目描述的项目。

The meaning of st_value field depends on the ELF type. For a relocatable object, the value is the offset within the section specific by the section index, st_shndx. For shared libraries and executables, the value in the symbol table structure is a virtual address. This additional complexity is to ensure efficient access by the tools and code that use these values.

st_value字段的含义取决于ELF类型。 对于可重定位目标文件,该值是节索引st_shndx特定部分中的偏移量。对于共享库和可执行文件,符号表结构中的值是虚拟地址。这种额外的复杂性是为了确保使用这些值的工具和代码能有效访问。

Given what we’ve covered about symbols, let’s see if we can find the global variable “list” in the object file foo.o. The variable “list” in foo.C is defined as follows:

考虑到我们所涉及的符号, 让我们看看是否可以在对象文件 foo.o 中找到全局变量 "list"。Foo.c 中的变量 "list" 的定义如下:

int list[10] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 } ;

 

For the object file foo.o, the value in the symbol table for the global variable “list” is 0x20, which is the offset within the .data section for this global variable as shown here:

对于对象文件 foo o, 全局变量 "list" 在符号表中的值为 0x20, 这是全局变量list在.data 节中的偏移量, 如下所示:

penguin> nm -v -f s foo.o | egrep list

list             |00000020|   D |          OBJECT|00000028|   |.data

 

We can confirm this by adding the file offset of the .data section to the value of the symbol “list.” We’ll need to use readelf to get the offset of the .data section:

我们可以通过将. data 节的文件偏移量添加到符号 "list" 的值来确认这一点。我们需要使用 readelf 来获取.data节的偏移量:

penguin> readelf -S foo.o | egrep "\.data "

[ 3] .data            PROGBITS       00000000 000140 000048 00 WA 0

0 32

 

From the file offset of the .data section listed in the output just listed, the global variable “list” should be at file offset 0x160 (0x140 + 0x20). We can use hexdump to confirm that the values for “list” are indeed at this offset in the file:

从上面的输出中列出的.data 节的文件偏移量, 全局变量 "list" 应位于文件偏移量 0x160 (0x140 + 0x20) 中。我们可以使用 hexdump 来确认 "list" 的值确实在文件的偏移量中:

Code View: Scroll / Show All

penguin> hexdump -C -s 0x160 foo.o | head -4

00000160  00 00 00 00 01 00 00 00  02 00 00 00 03 00 00 00 |................|

00000170  04 00 00 00 05 00 00 00  06 00 00 00 07 00 00 00 |................|

00000180  08 00 00 00 09 00 00 00  00 00 00 00 00 00 00 00 |................|

00000190  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|

 

For a shared library, the meaning of the “value” field in a symbol entry is a virtual address. In the case of libfoo.so, the value (virtual address) of the “list” variable is 0x1b40:

对于共享库, 符号项中 "value" 字段的含义是虚拟地址。在 libfoo.so 的例子中, "list" 变量的值 (虚拟地址) 为 0x1b40:

penguin> nm -v -f s libfoo.so | egrep list

list  00001b40|   D  |            OBJECT|00000028|   |.data

 

This isn’t the real virtual address when the shared library is loaded into memory because shared libraries can be loaded anywhere. For shared library files, the virtual address of the text segment starts at 0x0, and the virtual address of the data segment (which contains the .data section) is set to some offset from the beginning of the text segment. Looking at the .data section for libfoo.so reveals that its file offset is 0xb00 and the virtual address is 0x1b00:

当共享库加载到内存中时, 这不是真正的虚拟地址, 因为共享库可以在任何位置加载。对于共享库文件, 文本段的虚拟地址从0x0 开始, 并且数据段 (包含.data 节) 的虚拟地址设置为从文本段开头的某个偏移量。查看 libfoo.so 的.data节, 说明其文件偏移量为 0xb00, 虚拟地址为 0x1b00:

penguin> readelf -S libfoo.so | egrep "\.data "

[14] .data  PROGBITS  00001b00 000b00 000070 00  WA  0   0 32

 

Thus we can subtract 0x1000 (0x1b00 - 0xb00) from any value listed by nm for libfoo.so to get the file offset of a symbol. This means that the file offset of “list” in libfoo.so is 0xb40, as confirmed here:

因此,我们可以从nm为libfoo.so列出的任何值中减去0x1000(0x1b00  -  0xb00)以获取符号的文件偏移量。 这意味着libfoo.so中“list”的文件偏移量为0xb40,如下所示:

Code View: Scroll / Show All

penguin> hexdump -C -s 0xb40 libfoo.so | head -4

00000b40  00 00 00 00 01 00 00 00  02 00 00 00 03 00 00 00 |................|

00000b50  04 00 00 00 05 00 00 00  06 00 00 00 07 00 00 00 |................|

00000b60  08 00 00 00 09 00 00 00  80 0a 00 00 00 00 00 00 |................|

00000b70  18 00 00 00 00 00 00 00  01 7a 50 52 00 01 7c 08 |.........zPR..|.|

 

The binding of a symbol has an interesting feature worth mentioning. As mentioned earlier in this chapter, symbols can be global or local depending on their scope. Symbols can also be “weak,” which is similar to global—although a symbol with a global binding will be chosen over a weak symbol of the same name. This can actually be very useful for problem determination efforts because some system functions are declared “weak,” meaning that you can override them if needed. This is covered in more detail with an example in the section titled “Use of Weak Symbols for Problem Determination” later in this chapter.

符号的绑定有一个值得一提的有趣特性。正如本章前面所提到的, 符号可以是全局的或局部的, 这取决于它们的作用范围。符号也可以是 "弱", 这与全局相似--尽管具有全局绑定的符号将被选择在同名的弱符号上。这实际上对于问题的确定是非常有用的, 因为某些系统函数被声明为 "弱", 这意味着您可以在需要时重写它们。本章后面的 "使用弱符号来确定问题" 一节中的示例更详细地介绍了这一点。

9.6.3. Section Names and Types

The casual term “section type” has two different meanings in normal technical conversation. One refers to the section type as defined in the sh_type field of the section header structure. The other refers to the combination of name and type of a section. For example, a section can have a (sh_type) of PROGBITS, but that does not describe what is in the section. On the other hand, someone might ask “what type of section,” and the response is usually the section name, such as “.rodata” or “.text.”

非正式术语“节类型”在正常技术对话中具有两种不同的含义。一个指节头结构的sh_type字段中定义的节类型。 另一个是指节的名称和类型的组合。例如,一个节可以有一个(sh_type)PROGBITS,但是没有描述该节中的内容。另一方面,有人可能会问“什么类型的节”,而回答通常是节名称,例如“.rodata”或“.text”。

The sections included in the shared library libfoo.so are listed here:

此处列出了共享库libfoo.so中包含的节:

Code View: Scroll / Show All

penguin> readelf -S libfoo.so

There are 30 section headers, starting at offset 0x1090:

有30个节标题, 从偏移0x1090 开始:

Section Headers:

  [Nr] Name          Type   Addr     Off    Size   ES Flg Lk Inf Al

  [ 0]               NULL  00000000 000000 000000 00      0   0  0

  [ 1] .hash         HASH  000000b4 0000b4 000158 04   A  2   0  4

  [ 2] .dynsym       DYNSYM 0000020c 00020c 0002f0 10  A  3  1b  4

  [ 3] .dynstr       STRTAB 000004fc 0004fc 000133 00  A  0   0  1

  [ 4] .gnu.version  VERSYM 00000630 000630 00005e 02  A  2   0  2

  [ 5] .gnu.version_r VERNEED 00000690 000690 000050 00 A  3  2  4

  [ 6] .rel.dyn      REL   000006e0 0006e0 000060 08   A  2   0  4

  [ 7] .rel.plt      REL   00000740 000740 000028 08   A  2   9  4

  [ 8] .init         PROGBITS 00000768 000768 000018 00 AX 0  0  4

  [ 9] .plt          PROGBITS 00000780 000780 000060 04  AX 0 0  4

  [10] .text         PROGBITS 000007e0 0007e0 000280 00  AX 0 0 16

  [11] .fini         PROGBITS 00000a60 000a60 00001c 00  AX 0 0  4

  [12] .rodata       PROGBITS 00000a80 000a80 00004c 00  A  0 0 32

  [13] .eh_frame_hdr PROGBITS 00000acc 000acc 00002c 00 A  0  0  4

  [14] .data         PROGBITS 00001b00 000b00 000070 00 WA 0  0 32

  [15] .eh_frame     PROGBITS 00001b70 000b70 0000d8 00 WA 0  0  4

  [16] .dynamic      DYNAMIC 00001c48 000c48 0000d8 08  WA 3  0  4

  [17] .ctors        PROGBITS 00001d20 000d20 00000c 00 WA 0  0  4

  [18] .dtors        PROGBITS 00001d2c 000d2c 000008 00 WA 0  0  4

  [19] .jcr          PROGBITS 00001d34 000d34 000004 00 WA 0  0  4

  [20] .got          PROGBITS 00001d38 000d38 00003c 04 WA 0  0  4

  [21] .bss          NOBITS 00001d74 000d74 000010 00   WA 0  0  4

  [22] .comment      PROGBITS 00000000 000d74 000050 00   0   0  1

  [23] .debug_aranges PROGBITS 00000000 000dc8 000058 00  0   0  8

  [24] .debug_info   PROGBITS 00000000 000e20 000098 00   0   0  1

  [25] .debug_abbrev PROGBITS 00000000 000eb8 00001c 00   0   0  1

  [26] .debug_line   PROGBITS 00000000 000ed4 0000bf 00   0   0  1

  [27] .shstrtab     STRTAB 00000000 000f93 0000fb 00     0   0  1

  [28] .symtab       SYMTAB 00000000 001540 0004d0 10    29  39  4

  [29] .strtab       STRTAB 00000000 001a10 000275 00     0   0  1

Key to Flags:

 W (write), A (alloc), X (execute), M (merge), S (strings)

 I (info), L (link order), G (group), x (unknown)

 O (extra OS processing required) o (OS specific), p (processor specific)

 

The first section is always NULL, although there are 29 other sections, each with its own purpose. The most interesting sections are listed as follows with more detail.

第一节始终为 NULL, 尽管还有29节, 每个节都有各自的用途。最有趣的节的详细信息如下所示。

Before listing the details of each section, please note that we will be discussing the source file foo.C as listed later in this chapter under “Source Files.” It includes a wide range of data types that will help to clarify the various sections.

在列出每个节的详细信息之前, 请注意, 我们将讨论源文件 foo.c (在本章后面的 "源文件" 下列出)。它包含范围广泛的数据类型, 将有助于澄清各个节。

Note: All sections start with a “.” prefix in ELF.

9.6.3.1. .bss

There is some debate about what the bss acronym actually stands for. The most likely origin is from Fortran compiler on the IBM 704. The acronym most likely stands for “Block Started by Symbol” and was adopted to describe the uninitialized data for an ELF object. The acronym bss is pretty much meaningless, so consider it a term, not a useful acronym.

关于 bss 缩写词究竟代表什么, 有些争论。最可能的起源是从IBM 704上的Fortran 编译器。缩略词最有可能代表 "由符号开始的块", 并被用来描述一个 ELF 对象的未初始化数据。缩写是没有意义的, 所以把它看作一个术语, 而不是一个有用的缩写词。

The .bss section is used for global and file local variables that are not initialized with a specific value. It is zeroed out as the process starts up, which sets the initial value of any variables in it to zero. For example, the global variable noValueGlobInt in foo.C is stored in the bss section because it has no initial value.

. bss 节用于未初始化的全局和局部变量。当进程启动时, 它将任何变量的初始值设置为零。例如, foo.c 中的全局变量 noValueGlobInt存储在 bss 部分中, 因为它没有初始值。

penguin> readelf -S libfoo.so | egrep \.bss

[21] .bss         NOBITS      00001d74 000d74 000010 00  WA  0   0  4

penguin> nm libfoo.so |egrep noValueGlobInt

00001d80 B noValueGlobInt

 

According to nm, the value of the noValueGlobInt variable is 0x1d80. This value is right inside the .bss section as expected. Also notice that the section type (sh_type) is NOBITS, which indicates that this section takes up no space in the ELF file. We can confirm this by looking at the loadable program headers for this library:

根据nm,noValueGlobInt变量的值为0x1d80。 该值正好在.bss部分内。 另请注意,节类型(sh_type)是NOBITS,表示此节在ELF文件中不占用空格。 我们可以通过查看此库的可加载程序头来确认这一点:

penguin> readelf -l libfoo.so

 

Elf file type is DYN (Shared object file)

Entry point 0x7e0

There are 4 program headers, starting at offset 52

Program Headers:

  Type         Offset  VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align

  LOAD       0x000000 0x00000000 0x00000000 0x00af8 0x00af8 R E 0x1000

  LOAD       0x000b00 0x00001b00 0x00001b00 0x00274 0x00284 RW  0x1000

<...>

 

The offset for the second load segment (the “data” segment) is 0x1b00, and the value listed from nm for noValueGlobInt is 0x1d80. Therefore, the noValueGlobInt is at offset 0x280 bytes (0x1d80 - 0x1b00) into the second load segment. The value of 0x280 is larger than FileSiz but not larger than MemSiz. The FileSiz field is the size of the load segment in the file. The MemSiz field is the size of the load segment once it is loaded into memory. That confirms that the variable really does not take any space in the file, but it will (obviously) require space once loaded into memory.

第二个加载段(“数据”段)的偏移量为0x1b00,noValueGlobInt的nm列出的值为0x1d80。 因此,noValueGlobInt位于第二个加载段的偏移量0x280字节(0x1d80  -  0x1b00)。 0x280的值大于FileSiz但不大于MemSiz。 FileSiz字段是文件中加载段的大小。 MemSiz字段是加载到内存后的加载段的大小。 这证实了变量确实没有在文件中占用任何空间,但是(显然)一旦加载到内存中就会需要空间。

9.6.3.2. .data

This section contains initialized global and static, writable variables. Looking at the details for this section brings to light a few interesting things:

本节包含初始化的全局和静态可写变量。查看本节的详细信息, 可以看到一些有趣的事情:

penguin> readelf -S libfoo.so | egrep '\.data'

[14] .data     PROGBITS       00001b00 000b00 000070 00 WA 0  0 32

 

First, note that the .data section has the type PROGBITS, which means that it does occupy space in the file. This is different than the .bss section, which had type NOBITS. Another thing worth noting is that memory attributes WA are the same as those for the .bss section. “WA” means that the section is writable and will require memory in the program.

首先, 请注意,.data 节具有类型 PROGBITS, 这意味着它在文件中占用了空间。这与具有类型 NOBITS的.bss 节不同。另一件值得注意的是, 内存属性与. bss 部分相同。"WA" 表示该节是可写的, 在程序中需要内存。

The source file foo.C contains two variables, globInt and staticInt, that can illustrate how the .data section is used:

源文件 foo.c 包含两个变量, globInt 和 staticInt, 可以说明如何使用.data节:

int globInt = 5 ;

static int staticInt = 5 ;

 

The variable globInt is a global, writable (non-constant) variable, and staticInt is a static writable variable. Both should be in the .data section, but let’s confirm that here:

变量 globInt 是全局可写 (非常量) 变量, staticInt 是静态可写变量。两者都应该在.data节中, 但让我们确认以下内容:

penguin> nm -v -f s libfoo.so | egrep globInt

globInt           |00001b20|  D |          OBJECT|00000004|   |.data

penguin> nm -v -f s libfoo.so | egrep staticInt

staticInt         |00001b24|  d |          OBJECT|00000004|   |.data

 

From the section information before, the .data section had a value (“address”) of 0x1b00 and a size of 0x70. The two variables have values of 0x1b20 and 0x1b24 respectively, both of which are located in the range of the .data section and even have the .data section listed in the output.

从之前的节信息中,.data节的值(“地址”)为0x1b00,大小为0x70。 这两个变量的值分别为0x1b20和0x1b24,它们都位于.data节的范围内,甚至在输出中列出了.data节。

9.6.3.3. .dynamic

This section stores information about dynamic linking. This includes information about which libraries are required for a program or executable, where to look for these libraries (rpath), and the important sections of the ELF object needed to run the program. The DYNAMIC segment contains the .dynamic section (and contains only this one section). The readelf tool can be used to display the contents of the dynamic section/segment:

本节存储有关动态链接的信息。这包括程序或可执行文件需要哪些库、这些库 (rpath)的位置 以及运行该程序所需的 ELF 对象的重要节的信息。DYNAMIC段包含.dynamic节 (并且仅包含这一节)。readelf 工具可用于显示动态节/段的内容:

Code View: Scroll / Show All

penquin> readelf -d foo

 

Dynamic segment at offset 0x830 contains 25 entries:

  Tag        Type               Name/Value

 0x00000001 (NEEDED)            Shared library: [libfoo.so]

 0x00000001 (NEEDED)            Shared library: [libstdc++.so.5]

 0x00000001 (NEEDED)            Shared library: [libm.so.6]

 0x00000001 (NEEDED)            Shared library: [libgcc_s.so.1]

 0x00000001 (NEEDED)            Shared library: [libc.so.6]

 0x0000000f (RPATH)             Library rpath: [.]

 0x0000000c (INIT)              0x80484c8

 0x0000000d (FINI)              0x80486f0

 0x00000004 (HASH)              0x8048148

 0x00000005 (STRTAB)            0x8048310

 0x00000006 (SYMTAB)            0x80481e0

 0x0000000a (STRSZ)             282 (bytes)

 0x0000000b (SYMENT)            16 (bytes)

 0x00000015 (DEBUG)             0x0

 0x00000003 (PLTGOT)            0x8049938

 0x00000002 (PLTRELSZ)          48 (bytes)

 0x00000014 (PLTREL)            REL

 0x00000017 (JMPREL)            0x8048498

 0x00000011 (REL)               0x8048490

 0x00000012 (RELSZ)             8 (bytes)

 0x00000013 (RELENT)            8 (bytes)

 0x6ffffffe (VERNEED)           0x8048450

 0x6fffffff (VERNEEDNUM)        2

 0x6ffffff0 (VERSYM)            0x804842a

 0x00000000 (NULL)              0x0

 

The key sections listed in the dynamic section are explained in this chapter. The entries that have a type of NEEDED are for the libraries required by the executable foo. The RPATH defines a search path for the libraries. The rest of the information in the dynamic section is a convenient way to locate the information needed to run this executable, including the address of other important sections such as .init, .fini, and others.

本章将介绍动态节中列出的关键部分。具有NEEDED类型的条目用于可执行文件foo所需的库。 RPATH定义了库的搜索路径。 动态节中的其余信息是查找运行此可执行文件所需信息的便捷方式,包括.init,.fini等其他重要部分的地址。

This section is mainly used by the program interpreter, covered in more detail later in this chapter.

本节主要由程序解释器使用, 本章后面将详细介绍。

9.6.3.4. .dynsym (symbol table)

The dynamic symbol table is the smaller of the two symbol tables. It only contains symbols that are required for program execution, global symbols. The dynamic symbol table is required and cannot be stripped from an ELF object. The dynamic symbol table does not contain any static symbols because static symbol information is not needed at run time.

动态符号表是两个符号表中较小的一个。它只包含程序执行所需的符号、全局符号。动态符号表是必需的, 不能从 ELF 对象中剥离。动态符号表不包含任何静态符号, 因为在运行时不需要静态符号信息。

Static symbols are local to a file, and once a shared library or executable is linked, the offset of the static symbols are called directly through an offset that is known at link time. Dynamic symbols might be satisfied outside of the shared library and executable, and thus finding these symbols must be done at run time.

静态符号是文件的本地符号,一旦链接了共享库或可执行文件,静态符号的偏移量将直接通过链接时已知的偏移量调用。 动态符号可能在共享库和可执行文件之外,因此必须在运行时查找这些符号。

Consider the two variables again: staticInt and globInt, defined as: 再考虑两个变量: staticInt 和 globInt, 定义为:

int globInt = 5 ;

static int staticInt = 5 ;

 

The globInt variable should be in the dynamic symbol table, although the staticInt variable should only be in the main symbol table: globInt 变量应位于动态符号表中, 但 staticInt 变量只应位于主符号表中:

penguin> nm  libfoo.so | egrep staticInt

00001b24 d staticInt

penguin> nm -D libfoo.so |egrep staticInt

penguin> nm -D libfoo.so | egrep globInt

00001b20 D globInt

 

As expected, the static variable is not found in the dynamic symbol table. The main symbol table that contains all symbols is described as follows under the section heading, .symtab.

正如所料,在动态符号表中找不到静态变量。包含所有符号的主符号表在 .symtab一节讲述。

9.6.3.5. .dynstr (string table)

This string table contains only the symbols that are required for dynamic linking. The majority of the content will be for the symbol names from the dynamic symbol table. The format is the standard ELF string table format.

字符串表仅包含动态链接所需的符号。大多数内容将用于动态符号表中的符号名称。格式是标准的 ELF 字符串表格格式。

9.6.3.6. .fini

The .fini section contains the machine instructions for the function _fini. Notice that the file offset listed by nm for _fini, and the file offset for the .fini section match exactly:

.fini节包含函数 _fini 的机器指令。请注意, nm列出_fini的文件偏移量, 与.fini节的文件偏移量完全匹配:

penguin> nm foo | egrep _fini

080486e0 T _fini

penguin> readelf -S foo | egrep fini

  [13] .fini  PROGBITS       080486e0 0006e0 00001c 00  AX  0  0  4

 

The _fini function at 0x080486e0 calls the static (global) destructors for an executable or shared library. For an executable, _fini (and thus the destructors) is called when the program terminates. For shared libraries, the _fini function is called when a library is unloaded from memory. The _fini function has a counterpart function called _init that calls the global constructors.

0x080486e0 中的 _fini 函数为可执行文件或共享库调用静态 (全局) 析构函数。对于可执行文件, 当程序终止时, 将调用 _fini (从而调用析构函数)。对于共享库, 当从内存中卸载库时, 将调用 _fini 函数。_fini 函数有一个称为 _init 的对应函数, 它调用全局构造函数。

There is more information on the .init section covered further on in the chapter. 本章进一步介绍了有关. init 节的更多信息。

9.6.3.7. .got (Global Offset Table)

The Global Offset Table, known as GOT, is required for position-independent code. Position-independent code is compiled in such a way that it can be loaded and run from any address. This isn’t as easy as it sounds. Code that needs to call a function has no idea where in the address space this function will be. The GOT solves this problem by providing a level of indirection between the code in an executable or shared library and the required functions and variables that may be in other shared libraries. Let’s look at the GOT in more detail:

全局偏移表(称为GOT)是与位置无关的代码所必需的。与位置无关的代码以这样的方式编译,即可以从任何地址加载和运行它。这并不像听起来那么容易。需要调用函数的代码不知道这个函数在地址空间中的位置。GOT通过在可执行文件或共享库中的代码与可能在其他共享库中的所需函数和变量之间提供间接级别来解决此问题。让我们更详细地看一下GOT:

penguin> readelf -S libfoo.so | egrep .got

[20] .got      PROGBITS       00001d38 000d38 00003c 04  WA  0  0  4

 

From the output, we know that the .got is of type PROGBITS, meaning that it does consume space in the ELF file itself (unlike the .bss section). The output also indicates that the file offset of the .got is 0x918 and that it has a size of 0x28 or 40 decimal. Let’s look at the raw contents:

从输出中,我们知道.got是PROGBITS类型,这意味着它确实消耗了ELF文件本身的空间(与.bss节不同)。 输出还显示.got的文件偏移量为0x918,并且其大小为0x28或十进制40。 我们来看看原始内容:

Code View: Scroll / Show All

penguin> hexdump -s 0xd38 -C -n 60 libfoo.so

00000d38   48  1c  00 00  00 00  00  00   00  00  00  00  96  07  00  00 |H...............|

00000d48   a6  07  00 00  b6 07  00  00   c6  07  00  00  d6  07  00  00 |................|

00000d58   00  00  00 00  00 1b  00  00   00  00  00  00  00  00  00  00 |................|

00000d68 00 00 00 00 00 00 00 00 00 00 00 00              |............|

 

The GOT is an array of values that is stored in private memory. According to readelf, the size of this global offset table is 0x3c, which means that it has the following values:

GOT是存储在私有内存中的值数组。根据 readelf, 此全局偏移量表的大小为 0x3c, 这意味着它具有以下值:

Value number

GOT Address (File Offset)

Value

0

0x1d38 (0xd38)

0x1c48

1

0x1d3c (0xd3c)

0x0

2

0x1d40 (0xd40)

0x0

3

0x1d44 (0xd44)

0x796

4

0x1d48 (0xd48)

0x7a6

5

0x1d4c (0xd4c)

0x7b6

6

0x1d50 (0xd50)

0x7c6

7

0x1d54 (0xd54)

0x7d6

8

0x1d58 (0xd58)

0x0

9

0x1d5c (0xd5c)

0x0

10

0x1d60 (0xd60)

0x1b00

11

0x1d64 (0xd64)

0x0

12

0x1d68 (0xd68)

0x0

13

0x1d6c (0xd6c)

0x0

14

0x1d70 (0xd70)

0x0

 

The first value (0x1c48) is always the virtual address of a special global variable called _DYNAMIC. This is the address of the dynamic section (covered previously).

第一个值 (0x1c48) 始终是称为 _DYNAMIC 的特殊全局变量的虚拟地址。这是动态节的地址 (如前所述)。

penguin> nm libfoo.so | egrep 1c48

00001c48 A _DYNAMIC

 

The next two values (1 and 2) are 0x0. Addresses 4 through 8 range in value from 0x796 to 0x07d6 and all point to addresses in the Procedure Linkage Table or PLT:

接下来的两个值(1和2)是0x0。 地址4到8的值范围为0x796到0x07d6,并且都指向过程链接表或PLT中的地址:

penguin> readelf -S foo

There are 30 section headers, starting at offset 0x1090:

 

Section Headers:

  [Nr] Name       Type      Addr     Off    Size   ES Flg Lk Inf Al

<...>

  [ 9] .plt       PROGBITS  00000780 000780 000060 04 AX  0  0   4

<...>

 

From the output, the PLT starts at 0x780 and has a size of 0x60. More information on the PLT and how it relates to the GOT follows under the heading “.plt.”

从输出,PLT从0x780开始,大小为0x60。 关于PLT及其与GOT的关系的更多信息见“.plt”节。

So what are the rest of the entries in the GOT for? The answer requires some knowledge of relocation. Without going into too much detail here, some of the GOT entries are used for global variables. The machine instructions in the shared library will reference the GOT entries for the global variables, and the relocation entries ensure that the GOT entries point to the correct address at run time. The relocation entries are included here for the curious reader, although relocation will be covered in much more detail later in this chapter.

那么GOT中的其他条目是什么? 答案需要一些重定位的知识。 这里没有详细介绍,一些GOT条目用于全局变量。 共享库中的机器指令将引用全局变量的GOT条目,重定位条目确保GOT条目在运行时指向正确的地址。尽管本章后面将详细介绍重定位,在这里为好奇的读者简单介绍重定位条目。

Code View: Scroll / Show All

penguin> readelf -r libfoo.so

 

Relocation section '.rel.dyn' at offset 0x6e0 contains 12 entries:

 Offset     Info    Type            Sym.Value  Sym. Name

00001b00  00000008 R_386_RELATIVE

00001b04  00000008 R_386_RELATIVE

00001b68  00000008 R_386_RELATIVE

00001d24  00000008 R_386_RELATIVE

00001d5c  00000008 R_386_RELATIVE

00001b6c  00002101 R_386_32          00000000   __gxx_personality_v0

00001d58  00001e06 R_386_GLOB_DAT    00001d80   noValueGlobInt

00001d60  00002006 R_386_GLOB_DAT    00001b20   globInt

00001d64  00002606 R_386_GLOB_DAT    00000000   __cxa_finalize

00001d68  00002c06 R_386_GLOB_DAT    00001d7c   myObj2

00001d6c  00002d06 R_386_GLOB_DAT    00000000   _Jv_RegisterClasses

00001d70  00002e06 R_386_GLOB_DAT    00000000   __gmon_start__

 

Relocation section '.rel.plt' at offset 0x740 contains 5 entries:

 Offset     Info    Type            Sym.Value  Sym. Name

00001d44  00001d07 R_386_JUMP_SLOT   000009d6   _Z3fooi

00001d48  00002407 R_386_JUMP_SLOT   00000000   printf

00001d4c  00002607 R_386_JUMP_SLOT   00000000   __cxa_finalize

00001d50  00002b07 R_386_JUMP_SLOT   000009c8   _ZN7myClassC1Ev

00001d54  00002d07 R_386_JUMP_SLOT   00000000   _Jv_RegisterClasses

 

Notice that offset of the relocation entry for globInt is 0x1d60, which is one of the slots in the GOT. 请注意,globInt的重定位条目的偏移量是0x1d60,这是GOT中的一个插槽。

Consider the following code from foo.C:

static int bar( int c )

{

   int d = 0;

 

   d = foo( c ) + globInt ;

   d += staticInt ;

   d += constInt ;

 

   return d ;

}

 

It references the global integer globInt. Let’s see how it uses the GOT to find this variable by looking at the assembly language for this function:

它引用了全局整数globInt。 让我们看一下在此函数的汇编语言中,如何使用GOT来查找此变量

Code View: Scroll / Show All

penguin> objdump -d libfoo.so

<...>

000008c4 <_Z3bari>:

 8c4:   55                      push    %ebp

 8c5:   89 e5                   mov     %esp,%ebp

 8c7:   53                      push    %ebx

 8c8:   83 ec 04                sub     $0x4,%esp

 8cb:   e8 00 00 00 00          call    8d0 <_Z3bari+0xc>

 8d0:   5b                      pop     %ebx

 8d1:   81 c3 68 14 00 00       add     $0x1468,%ebx

 8d7:   c7 45 f8 00 00 00 00    movl    $0x0,0xfffffff8(%ebp)

 8de:   83 ec 0c                sub     $0xc,%esp

 8e1:   ff 75 08                pushl   0x8(%ebp)

 8e4:   e8 a7 fe ff ff          call    790 <_init+0x28>

 8e9:   83 c4 10                add     $0x10,%esp

 8ec:   89 c2                   mov     %eax,%edx

 8ee:   8b 83 28 00 00 00       mov     0x28(%ebx),%eax

 8f4:   8b 08                   mov     (%eax),%ecx

 8f6:   8d 04 11                lea     (%ecx,%edx,1),%eax

 8f9:   89 45 f8                mov     %eax,0xfffffff8(%ebp)

 8fc:   8b 93 ec fd ff ff       mov     0xfffffdec(%ebx),%edx

 902:   8d 45 f8                lea     0xfffffff8(%ebp),%eax

 905:   01 10                   add     %edx,(%eax)

 907:   8d 45 f8                lea     0xfffffff8(%ebp),%eax

 90a:   83 00 05                addl    $0x5,(%eax)

 90d:   8b 45 f8                mov     0xfffffff8(%ebp),%eax

 910:   8b 5d fc                mov     0xfffffffc(%ebp),%ebx

 913:   c9                      leave

 914:   c3                      ret

 915:   90                      nop

<...>

Note: In the assembly language here, the symbol name _Z3bari is the function bar. The name is a “mangled” C++ function name. The objdump tool accepts a -C switch to demangle the name, or alternatively you can use the command echo _Z3bari |c++filt.

注意:在此处的汇编语言中,符号名称_Z3bari是函数栏。 该名称是“修改”的C ++函数名称。 objdump工具接受-C开关来解码名称,或者你可以使用命令echo _Z3bari | c ++ filt。

Note: In the assembly language just listed, the hex numbers to the left of the output are file offsets and not real memory addresses. However, the methods used by ELF to achieve position independence would work even if these were real addresses. When the library is loaded into memory, the real memory addresses would be used instead.

注意: 在刚刚列出的汇编语言中, 输出左侧的十六进制数字是文件偏移量, 而不是实际内存地址。然而, ELF使用方法来实现位置独立性, 即使这些是真实的地址。将库加载到内存中时, 将使用实际内存地址。

 

The instruction at 0x8cb makes a call to 0x8d0. This seems a bit strange because that is the next instruction to be run anyway—but this does have a purpose. The instruction at 0x8d0 puts the current instruction address into register EBX. The instruction at 0x8d1 then adds a hard-coded value 0x1468 to this.

0x8cb处的指令调用0x8d0。 这看起来有点奇怪,因为这是下一条要运行的指令 - 但这样做是有目的。 0x8d0处的指令将当前指令地址放入寄存器EBX。 然后,0x8d1处的指令将硬编码值0x1468添加到此处。

0x8d0+0x1468 = 0x1d38.

 

This is where the GOT is located. For quick reference, here is the information for the GOT section again:

这是GOT所在的位置。 为了快速参考,这里再次显示GOT节的信息:

penguin> readelf -S libfoo.so | egrep .got

[20] .got      PROGBITS       00001d38 000d38 00003c 04  WA  0  0  4

 

Later in the assembly language, there is a call at 0x8ee that adds 0x28 to the value in EBX, which finds the offset for globInt.

在汇编语言的后面,在0x8ee处将0x28加到EBX中的值上,该值为globInt找到偏移量。

EBX( 0x1d38) + 0x28 = 0x1d60

 

From the relocation information just listed, this matches the value of the relocation entry for globInt:

从刚刚列出的重新定位信息中, 这与 globInt 的重新定位项的值匹配:

00001d60   00002006 R_386_GLOB_DAT     00001b20    globInt

 

The next instruction at 0x8f4 dereferences the value for globInt in the GOT to find the actual address of globInt.

0x8f4处的下一条指令取消引用GOT中globInt的值,以查找globInt的实际地址。

Note: This is a very good example of how the text segment relies on the data segment being at a specific offset. In this case, it expects the GOT to be 0x1468 away from a particular instruction. If the data section (which contains the GOT) was ever loaded in the wrong place, the hard-coded reference to the GOT would be inaccurate.

注意:这是文本段如何依赖于特定偏移量的数据段的一个非常好的示例。 在这种情况下,它期望GOT远离特定指令0x1468。 如果数据节(包含GOT)被加载到错误的位置,则对GOT的硬编码引用将是不准确的。

9.6.3.8. .hash

This is the symbol hash table. Because humans need to use symbol names (that is, function and variable names), ELF must implement a quick way to find the various symbol names to find the corresponding symbol.

这是符号哈希表。因为人类需要使用符号名称(即函数和变量名称),所以ELF必须实现快速查找各种符号名称以查找相应的符号。

We know that the symbol table in an ELF file is simply an array. Without a hash table of some sort, finding a symbol in this array would require a linear search of the array until the symbol is found or until the end of the symbol table is reached. This might not be too bad for a one-time search, but the typical ELF file contains many symbols (possibly many thousands). A linear search for each of these either during run time or for the linker (ld) would not be practical.

我们知道ELF文件中的符号表只是一个数组。如果没有某种哈希表,在此数组中查找符号将需要对数组进行线性搜索,直到找到符号或到达符号表的末尾。 对于一次性搜索,这可能不是太糟糕,但典型的ELF文件包含许多符号(可能有数千个)。 在运行时或链接器(ld)期间对这些中的每一个进行线性搜索是不实际的。

The hash mechanism in ELF is illustrated in Figure 9.3.

Figure 9.3. ELF hash algorithm.

 

In the diagram, the function printf is run through the hash function to retrieve a numeric value. The modulus of this numeric value with the hash bucket table size provides an index into the hash bucket table. The hash bucket table contains an index into the symbol table as well as the hash chain table. At this point, the symbol name in the symbol table (pointed to by the hash bucket slot) is checked against printf (“C1” in the diagram). If the symbol name at this entry in the symbol table doesn’t ARmatch, the index of the chain table is used.

在图中,函数printf通过运行哈希函数以检索数值。该数值与散列桶表大小的模数提供散列桶表的索引。哈希桶表包含符号表的索引以及哈希链表。此时,符号表中的符号名称(由哈希桶槽指向)将根据printf(图中的“C1”)进行检查。 如果符号表中此条目处的符号名称不匹配,则使用链表的索引。

Each chain table entry contains an index into the symbol table as well as the index of the next element in the chain if there are any. The symbol table entry from the first chain table entry is compared against printf (“C2” in the diagram). According to the diagram, there was no match, so the next element in the chain is used. This happens again until the function printf is found in the symbol table after four comparisons (the symbol is found in “C4” according to the diagram).

每个链表条目包含符号表的索引以及链中下一个元素的索引(如果有)。 将第一个链表条目中的符号表条目与printf(图中的“C2”)进行比较。 根据该图,没有匹配,因此使用链中的下一个元素。 再次发生这种情况,直到在四次比较后在符号表中找到函数printf(根据图表符号在“C4”中找到)。

This is more complex but much more efficient than using a linear search in the symbol table, especially for large symbol tables.

这比在符号表中使用线性搜索更复杂, 但效率更高, 尤其是对于大型符号表。

For the interested reader, here is the hash algorithm as specified in the ELF standard:

对于感兴趣的读者, 下面是 ELF 标准中指定的哈希算法:

unsigned long ElfHash(const unsigned char *name)

{

  unsigned long h=0, g;

  while (*name)

    {

      h = (h << 4) + *name++;

      if (g = h & 0xF0000000)

  h ^= g >> 24;

      h &= ~g;

    }

  return h;

}



9.6.3.9. .init

This section contains executable instructions required for initialization of an ELF object. It is almost identical to the .fini section except that the information is for initialization, not finalization. The section .init contains a function called _init in the same way that the .fini section contains a function called _fini. The _init function is responsible for initializing global variables (including objects) for an ELF library or executable. Notice in the output that the address of the _init function matches the address of the .init section:

本节包含初始化ELF对象所需的可执行指令。它几乎与.fini部分相同,只是信息用于初始化,而不是最后确定。.init部分包含一个名为_init的函数,其方式与.fini部分包含一个名为_fini的函数的方式相同。 _init函数负责初始化ELF库或可执行文件的全局变量(包括对象)。 请注意,在输出中_init函数的地址与.init节的地址匹配:

penguin> readelf -S foo |egrep "\.init"

  [10] .init   PROGBITS        080484b4 0004b4 000018 00  AX  0  0  4

penguin> nm foo |egrep 080484b4

080484b4 T _init

 

Let’s dig a bit deeper to see how the .init section works (which is very similar to how the .fini section works). We’ll start by looking at a global C++ object defined in main.C, which eventually is compiled into the executable called foo:

让我们深入了解.init节是如何工作的(这与.fini节的工作方式非常相似)。 我们将首先查看main.C中定义的全局C ++对象,该对象最终被编译为名为foo的可执行文件:

myClass myObj3 ;

The class "myClass" is defined as follows:

class myClass

{

   public:

 

   int myVar ;

 

   myClass() {

      myVar = 5 ;

   }

 

};

 

Notice that it includes a constructor. This is what should eventually be called by _init().

Disassembling the .init section using objdump shows us the following:

请注意,它包含一个构造函数。这是_init()最终应该调用的内容。使用objdump反汇编.init节向我们展示了以下内容:

penguin> objdump -d foo | head -15

 

foo:     file format elf32-i386

 

Disassembly of section .init:

 

080484b4 <_init>:

 80484b4:       55                   push    %ebp

 80484b5:       89 e5                mov     %esp,%ebp

 80484b7:       83 ec 08             sub     $0x8,%esp

 80484ba:      e8 a5 00 00 00     call  8048564 <call_gmon_start>

 80484bf:       90                   nop

 80484c0:       e8 0b 01 00 00       call  80485d0 <frame_dummy>

 80484c5:    e8 e6 01 00 00   call  80486b0 <__do_global_ctors_aux>

 80484ca:       c9                   leave

 80484cb:       c3                   ret

<...>

 

In the assembly listing, there is a call to __ do_global_ctors_aux. Let’s disassemble that function next (some output was excluded for simplicity):

在汇编程序中,调用了__ do_global_ctors_aux。 让我们接下来反汇编该函数(为简单起见,删除了一些输出):

Code View: Scroll / Show All

penguin> objdump -d foo

<...>

080486b0 <__do_global_ctors_aux>:

 80486b0:       55                      push   %ebp

 80486b1:       89 e5                   mov    %esp,%ebp

 80486b3:       53                      push   %ebx

 80486b4:       52                      push   %edx

 80486b5:       a1 04 99 04 08          mov    0x8049904,%eax

 80486ba:       83 f8 ff                cmp    $0xffffffff,%eax

 80486bd:       bb 04 99 04 08          mov    $0x8049904,%ebx

 80486c2:  74 18           je    80486dc <__do_global_ctors_aux+0x2c>

 80486c4:       8d b6 00 00 00 00       lea    0x0(%esi),%esi

 80486ca:       8d bf 00 00 00 00       lea    0x0(%edi),%edi

 80486d0:       83 eb 04                sub    $0x4,%ebx

 80486d3:       ff d0                   call   *%eax

 80486d5:       8b 03                   mov    (%ebx),%eax

 80486d7:       83 f8 ff                cmp    $0xffffffff,%eax

 80486da: 75 f4             jne   80486d0 <__do_global_ctors_aux+0x20>

 80486dc:       58                      pop    %eax

 80486dd:       5b                      pop    %ebx

 80486de:       5d                      pop    %ebp

 80486df:       c3                      ret

<...>

 

The assembly listing shows the following instruction call *%eax at instruction 0x80486d3. This takes the value of the EAX register, treats it as an address, dereferences the address and the calls function stored at the dereferenced address. From code above this call, we can see EAX being set with mov 0x8049904,%eax at instruction 0x8049904. Let’s take a look at what is located at that address:

汇编程序在指令0x80486d3处显示以下指令call *%eax。这将获取EAX寄存器的值,将其视为地址,解除引用地址,调用存储在该地址的函数。 从call上面的代码中,我们可以看到” mov 0x8049904,%eax” 设置EAX寄存器。我们来看看该地址的内容:

penguin> nm -v -f s foo

<...>

__CTOR_LIST__     |08049900| d |       OBJECT|     |   |.ctors

__CTOR_END__      |08049908| d |       OBJECT|     |   |.ctors

<...>

 

According to nm, there are two variables __CTOR_LIST and _CTOR_END, which have addresses that surround the address 0x8049904 stored in register EAX. This is the address that eventually is dereferenced to get an address that is called.

根据nm,有两个变量__CTOR_LIST和_CTOR_END,它们的地址围绕着存储在寄存器EAX中的地址0x8049904。 这是最终获取被调用地址的地址。

We need to find the value stored at address 0x8049904 to see what address is eventually called. It is kind of a pain to do this, but we first need to find the difference between the virtual address and the file offset of the segment that contains this address. We know it will be in a LOAD segment because it contains information needed at run time:

我们需要找到存储在地址0x8049904的值,以查看最终调用的地址。 这样做有点痛苦,但我们首先需要找到虚拟地址和包含该地址的段的文件偏移量之间的差异。 我们知道它将在LOAD段中,因为它包含运行时所需的信息:

penguin> readelf -l foo | egrep LOAD

  LOAD      0x000000 0x08048000 0x08048000 0x0076c 0x0076c R E 0x1000

  LOAD      0x00076c 0x0804976c 0x0804976c 0x001d4 0x001dc RW 0x1000

 

According to the output, the address 0x8049904 is in the second load segment (the data segment), and the difference between virtual address of the data segment and the file offsets in the data segment is 0x08049000 (0x0804976c -0x76c). This is the address that we need to subtract from 0x8049904 to get the file offset of 0x904. Let’s see the contents at that offset:

根据输出, 地址0x8049904 位于第二个加载段 (数据段) 中, 数据段的虚拟地址与数据段中的文件偏移量之间的差异是 0x08049000 (0x0804976c - 0x76c)。这是我们需要从0x8049904 中减去以获得文件偏移量的地址0x904。让我们看看这个偏移量的内容:

penguin> hexdump -C -s 0x904 -n 4 foo

00000904  7e 86 04 08                                     |~...|

00000908

 

According to the output, we have a value of 0x0804867e at address 0x8049904 (file offset 0x904). One more step is needed before we know what this address is for:

根据输出, 我们在地址 0x8049904 (文件偏移 0x904) 上有一个0x0804867e 的值。在我们知道这个地址的用途之前, 还需要一个步骤:

penguin> nm foo | egrep 0804867e

0804867e t _GLOBAL__I_myObj3

 

This is the global constructor of the myObj2 object, which is called, as expected, by functions under _init.

这是myObj2对象的全局构造函数,正如预期的那样,由_init下的函数调用。

The __CTOR_LIST__ variable stores the list of global constructors for an executable or shared library. We can find all of the global constructors by listing the addresses between __CTOR_LIST and _CTOR_END. This can be useful if you need to know what will be run before the main() function of an executable or when a library is first loaded.

__CTOR_LIST__ 变量存储可执行文件或共享库的全局构造函数的列表。通过列出 __CTOR_LIST 和 _CTOR_END 之间的地址, 我们可以找到所有的全局构造函数。如果您需要知道将在可执行文件的main () 函数之前运行的内容, 或者在首次加载库时, 此功能会很有用。

The .fini section has a very similar convention but uses the __DTOR_LIST__ and __DTOR_END variables to mark the addresses of the global destructors.

.fini节有一个非常相似的约定, 但使用 __DTOR_LIST__ 和 __DTOR_END 变量来标记全局析构函数的地址。

9.6.3.10. .interp

This section contains the path name of the program interpreter. The program interpreter is used for executables and is responsible for getting a process up and running with all of its required libraries, and so on. A quick way to get the program interpreter is to use the readelf command as follows:

本节包含程序解释器的路径名。程序解释器用于可执行文件, 并负责使用所有必需的库 (等等) 运行进程。获取程序解释器的快速方法是使用 readelf 命令, 如下所示:

penguin> readelf -l foo

<...>

  INTERP        0x000114 0x08048114 0x08048114 0x00013 0x00013 R 0x1

      [Requesting program interpreter: /lib/ld-linux.so.2]

<...>

 

The program interpreter is described later in the chapter under the section heading, “Program Interpreter.”

程序解释器将在本章后面的章节标题 "程序翻译" 中描述。

9.6.3.11. .plt (Procedure Linkage Table)

The procedure linkage table is required by every shared library or executable that is dependent on shared libraries to satisfy an unresolved symbol. The PLT is also used to support “lazy binding,” which means not resolving the address of a function until it is called for the first time.

每个依赖于共享库的共享库或可执行文件都需要过程链接表来提供未解析的符号。PLT还用于支持“延迟绑定”,这意味着在第一次调用函数之前不会解析函数的地址。

The procedure linkage table contains a list of instructions that help functions find other functions in the address space. First, let’s use the readelf tool to find the PLT for the executable named foo. It is always in the section named .plt.

过程链接表包含帮助函数在地址空间中查找其他函数的指令列表。 首先,让我们使用readelf工具找到名为foo的可执行文件的PLT。 它始终位于名为.plt的部分中。

penguin> readelf -S foo | egrep ' \.plt'

  [11] .plt    PROGBITS        080484cc 0004cc 000070 04 AX 0  0 4

 

The best way to look at the PLT is through the debugger. It starts at address 0x80484e0 and continues for 0x70 bytes:

查看PLT的最好方法是调试器。它从地址0x80484e0 开始, 并继续查看0x70 字节:

Code View: Scroll / Show All

(gdb) disass 0x080484cc

Dump of assembler code for function _init:

0x80484b4 <_init>:      push   %ebp

0x80484b5 <_init+1>:    mov    %esp,%ebp

0x80484b7 <_init+3>:    sub    $0x8,%esp

0x80484ba <_init+6>:    call   0x8048564 <call_gmon_start>

0x80484bf <_init+11>:   nop

0x80484c0 <_init+12>:   call   0x80485d0 <frame_dummy>

0x80484c5 <_init+17>:   call   0x80486b0 <__do_global_ctors_aux>

0x80484ca <_init+22>:   leave

0x80484cb <_init+23>:   ret

0x80484cc <_init+24>:   pushl  0x804991c

0x80484d2 <_init+30>:   jmp    *0x8049920

0x80484d8 <_init+36>:   add    %al,(%eax)

0x80484da <_init+38>:   add    %al,(%eax)

0x80484dc <sleep>:      jmp    *0x8049924

0x80484e2 <sleep+6>:    push   $0x0

0x80484e7 <sleep+11>:   jmp    0x80484cc <_init+24>

0x80484ec <uname>:      jmp    *0x8049928

0x80484f2 <uname+6>:    push   $0x8

0x80484f7 <uname+11>:   jmp    0x80484cc <_init+24>

0x80484fc <__gxx_personality_v0>:       jmp    *0x804992c

0x8048502 <__gxx_personality_v0+6>:     push   $0x10

0x8048507 <__gxx_personality_v0+11>:    jmp    0x80484cc <_init+24>

0x804850c <__libc_start_main>:  jmp    *0x8049930

0x8048512 <__libc_start_main+6>:        push   $0x18

0x8048517 <__libc_start_main+11>:       jmp    0x80484cc <_init+24>

0x804851c <_Z3bazi>:    jmp    *0x8049934

0x8048522 <_Z3bazi+6>:  push   $0x20

0x8048527 <_Z3bazi+11>: jmp    0x80484cc <_init+24>

0x804852c <printf>:     jmp    *0x8049938

0x8048532 <printf+6>:   push   $0x28

0x8048537 <printf+11>:  jmp    0x80484cc <_init+24>

End of assembler dump.

 

Disassemble the PLT? Yes, the PLT is executable, but it has very specific executable parts (one for each function in the PLT). We could have used objdump to disassemble the PLT, although objdump would not give any hints as to which parts of the PLT relate to which functions.

反汇编PLT? 是的,PLT是可执行的,但它有非常具体的可执行部分(PLT中的每个功能都有一个)。 我们可以使用objdump来反汇编PLT,但objdump不会给出任何关于PLT的哪些部分与哪些函数相关的提示。

The location of the PLT is relative to functions that require other functions located somewhere in the address space. The relative offset from a function to the PLT is known at link time, so it is possible for the linker to specify a hard coded offset from an instruction address to the PLT. Because the location of the required/defined functions is not known at compile time or link time, the code instead makes a call directly to the appropriate slot in the PLT.

PLT的位置与需要位于地址空间某处的其他功能的功能有关。 从函数到PLT的相对偏移在链接时是已知的,因此链接器可以指定从指令地址到PLT的硬编码偏移。由于在编译时或链接时不知道所需/已定义函数的位置,因此代码直接调用PLT中的相应插槽。

From the file main.C, the function main makes a call to sleep. The assembly language for this call from within gdb looks like this (some output skipped for simplicity):

从main.c文件中,函数main调用sleep。 来自gdb内的此调用的汇编语言如下所示(为简单起见,跳过了一些输出):

(gdb) disass main

Dump of assembler code for function main:

0x80485fc <main>:       push   %ebp

0x80485fd <main+1>:     mov    %esp,%ebp

0x80485ff <main+3>:     sub    $0x198,%esp

<...>

0x8048641 <main+69>:    push   $0x3f2

0x8048646 <main+74>:    call   0x80484dc <sleep>

<...>

 

Note that the call to the sleep function is to the PLT slot for the sleep function (compare the address used in the call instruction to the assembly listing for the PLT above). Going back to the PLT slot for “sleep”:

注意,sleep函数的调用是针对sleep函数的PLT槽(将call指令中使用的地址与上面PLT的汇编程序进行比较)。 让我们再回到PLT插槽“sleep”:

0x80484dc <sleep>:      jmp    *0x8049924

0x80484e2 <sleep+6>:    push   $0x0

0x80484e7 <sleep+11>:   jmp    0x80484cc <_init+24>

 

The first instruction is a jump to address 0x8049924. This is right inside the Global Offset Table or GOT. To confirm, let’s get the address of the GOT:

第一个指令是跳转到地址0x8049924。在全局偏移表或GOT中,这是正确的。要确认这一点, 让我们得到的GOT的地址:

penguin> readelf -S foo | egrep ' \.got'

[22] .got    PROGBITS        08049918 000918 000028 04 WA 0  0 4

 

Okay, so what’s in the GOT that might be of interest to the PLT? That depends on when you look. Before the program is run, the GOT looks like the following:

好吧,那么PLT可能会对GOT的什么内容感兴趣? 这取决于你什么时候看。在程序运行之前,GOT如下所示:

(gdb) x/40 0x08049918

0x8049918 <_JCR_LIST_+4>: 0x08049810 0x00000000 0x00000000 0x080484e2

0x8049928 <_JCR_LIST_+20>: 0x080484f2 0x08048502 0x08048512 0x08048522

0x8049938 <_JCR_LIST_+36>: 0x08048532 0x00000000 0x00000000 0x00000000

0x8049948:      Cannot access memory at address 0x8049948

 

The GOT slot for the sleep function (at 0x8049924) has a value of 0x080484e2. This is the address of the second instruction in the PLT slot for the sleep function.

Sleep函数的GOT插槽(位于0x8049924)的值为0x080484e2。 这是sleep函数在PLT槽中的第二条指令的地址。

0x80484e2 <sleep+6>:     push   $0x0

 

The instruction pushes a value of 0x0 onto the stack. The next instruction jumps to 0x80484e0:

指令将0x0推送到堆栈上。下一条指令跳转到 0x80484e0:

0x80484e7 <sleep+11>:   jmp     0x80484cc <_init+24>

 

It is worth noting that each of the PLT slots has a different value at offset 0x6:

值得注意的是,每个PLT插槽在偏移0x6处具有不同的值:

0x80484e2 <sleep+6>:    push   $0x0

0x80484f2 <uname+6>:    push   $0x8

0x8048502 <__gxx_personality_v0+6>:    push   $0x10

0x8048512 <__libc_start_main+6>:       push   $0x18

0x8048522 <_Z3bazi+6>:  push   $0x20

0x8048532 <printf+6>:   push   $0x28

 

For example, the uname slot pushes a value of 0x8 onto the stack. This is a special marker used to find the PLT slot, used by the dynamic linking code. Dynamic linking is explained in more detail shortly.

例如,uname slot将值0x8压入堆栈。 这是一个特殊的标记,用于查找动态链接代码使用的PLT插槽。 稍后将更详细地解释动态链接。

Let’s get back to the address of 0x80484cc. This is the address of the beginning of the PLT and contains the following instructions (ignore the offset of _init):

让我们回到0x80484cc的地址。 这是PLT开头的地址,包含以下指令(忽略_init的偏移量):

0x80484cc <_init+24>:   pushl  0x804991c

0x80484d2 <_init+30>:   jmp    *0x8049920

0x80484d8 <_init+36>:   add    %al,(%eax)

0x80484da <_init+38>:   add    %al,(%eax)

 

The first instruction pushes a value onto the stack, while the second instruction jumps to the address stored in 0x8049920. Let’s see what value is at that address:

第一个指令将值推送到堆栈上, 而第二个指令跳转到存储在0x8049920 中的地址。让我们看看这个地址有什么值:

(gdb) break main

Breakpoint 1 at 0x8048605

(gdb) run

Starting program: /home/wilding/src/Linuxbook/ELF/foo

 

Breakpoint 1, 0x08048605 in main ()

 

(gdb) x 0x8049920

0x8049920 <__JCR_LIST__+12>:     0x40009c90

 

Okay, let’s see what function is at 0x40009c90:

(gdb) disass 0x40009c90 0x40009c94

Dump of assembler code from 0x40009c90 to 0x40009c94:

0x40009c90 <_dl_runtime_resolve>:       push   %eax

0x40009c91 <_dl_runtime_resolve+1>:     push   %ecx

0x40009c92 <_dl_runtime_resolve+2>:     push   %edx

0x40009c93 <_dl_runtime_resolve+3>:     mov    0x10(%esp,1),%edx

End of assembler dump.

 

So after all of this we know that the first call to “sleep” (or any function) will eventually call _dl_runtime_resolve. This is a special function that works to resolve the address of the function. The details of how this works are a bit beyond the scope of this chapter (and fairly lengthy to explain), but suffice it to say that this finds the address of the function whose slot was just executed. It then updates the GOT with the actual address of the function in the address space so that the second call to the function (that is, “sleep”) will go directly to the address of the actual function itself.

所以我们知道第一次调用“sleep”(或其它函数)后最终会调用_dl_runtime_resolve。 这是一个特殊函数,用于解析函数的地址。这个函数工作原理的细节有点超出了本章的范围(并且解释起来相当冗长),但是可以说它找到了刚刚执行的函数的地址。 然后它使用地址空间中函数的实际地址更新GOT,准备第二次调用函数sleep。这次将直接转到实际函数本身的地址。

Let’s see what the GOT looks like after the function sleep is called:

让我们看看调用函数sleep后GOT的样子:

(gdb) cont

Continuing.

This is a printf format string in baz

This is a printf format string in main

 

Program received signal SIGINT, Interrupt.

0x401a2d01 in nanosleep () from /lib/libc.so.6

(gdb) x/40 0x08049918

0x8049918 <_JCR_LIST_+4>: 0x08049810 0x40012fd0 0x40009c90 0x401a2ab0

0x8049928 <_JCR_LIST_+20>: 0x401a2750 0x08048502 0x4011d400 0x40014916

0x8049938 <_JCR_LIST_+36>: 0x40154c90 0x00000000 0x00000000 0x00000005

0x8049948: 0x00000000 0x00000019 0x7273752f 0x62696c2f

...

 

After the program is run and the sleep function is called, the slot for the sleep function in the GOT is 0x401a2ab0. Let’s confirm that this is the address of the actual sleep function. 在程序运行并调用sleep函数后, GOT中sleep函数的插槽为0x401a2ab0。让我们确认这是睡眠功能的实际地址。

(gdb) disass 0x401a2ab0

Dump of assembler code for function sleep:

0x401a2ab0 <sleep>:     push   %ebp

0x401a2ab1 <sleep+1>:   mov    %esp,%ebp

0x401a2ab3 <sleep+3>:   push   %edi

0x401a2ab4 <sleep+4>:   push   %esi

...

 

So the next time the function sleep is called, the GOT slot points directly to the actual sleep function, avoiding the need to resolve the function a second time (that is, find its address in memory).

因此, 下一次调用函数sleep时, GOT的插槽直接指向实际的sleep函数, 避免了第二次解析 (即在内存中查找其地址) 的函数。

9.6.3.12. .rodata

This section contains read-only constant values, string literals, and other constant data such as the variable constInt from foo.C defined as:

本节包含只读常量值、字符串文本和其他常量数据, 例如 foo.c中constInt定义为:

const int constInt = 5 ;

 

This variable should be located in the .rodata section because it is read-only (that is, constant). Let’s confirm by finding the location of the .rodata section and of this read-only (constant) variable.

此变量应位于.rodata节中,因为它是只读的(即常量)。 让我们通过查找.rodata节和这个只读(常量)变量的位置来确认。

penguin> readelf -S libfoo.so |egrep rodata

  [12] .rodata   PROGBITS    00000a80 000a80 00004c 00  A 0   0 32

penguin> nm libfoo.so |egrep constInt

00000ac8 r constant

 

As expected, constInt is contained within the .rodata segment, given the values of the preceding output. Let’s see what else is in the .rodata section using the hexdump utility:

正如所料,constInt包含在.rodata段中,给定前面输出的值。让我们使用hexdump实用程序看看.rodata节还有什么:

Code View: Scroll / Show All

penguin> hexdump -C -s 0xa80 -n 76 libfoo.so

00000a80  54 68 69 73 20 69 73 20  61 20 63 6f 6e 73 74 61 |This is a consta|

00000a90  6e 74 20 73 74 72 69 6e  67 21 00 00 00 00 00 00 |nt string!......|

00000aa0  54 68 69 73 20 69 73 20  61 20 70 72 69 6e 74 66 |This is a printf|

00000ab0  20 66 6f 72 6d 61 74 20  73 74 72 69 6e 67 20 69 |format string i|

00000ac0  6e 20 62 61 7a 0a 00 00  05 00 00 00             |n baz.......|

00000acc

 

This output shows that the .rodata section also contains constant strings, including those used in printf statements. Notice that such strings are not stored in any of the ELF string tables.

此输出显示. rodata 节还包含常量字符串, 包括在 printf 语句中使用的字串。请注意, 此类字符串不存储在任何 ELF 字符串表中。

9.6.3.13. .shstrtab

This section is the string table that contains the section names of the various sections: 此节是包含各节名称的字符串表:

Code View: Scroll / Show All

penguin> hexdump -C libfoo.so -s0xffb -n 251

00000ffb  00 2e 73 79 6d 74 61 62  00 2e 73 74 72 74 61 62 |..symtab..strtab|

0000100b  00 2e 73 68 73 74 72 74  61 62 00 2e 68 61 73 68 |..shstrtab..hash|

0000101b  00 2e 64 79 6e 73 79 6d  00 2e 64 79 6e 73 74 72 |..dynsym..dynstr|

0000102b  00 2e 67 6e 75 2e 76 65  72 73 69 6f 6e 00 2e 67 |..gnu.version..g|

0000103b  6e 75 2e 76 65 72 73 69  6f 6e 5f 72 00 2e 72 65 |nu.version_r..re|

0000104b  6c 2e 64 79 6e 00 2e 72  65 6c 2e 70 6c 74 00 2e |l.dyn..rel.plt..|

0000105b  69 6e 69 74 00 2e 74 65  78 74 00 2e 66 69 6e 69 |init..text..fini|

0000106b  00 2e 72 6f 64 61 74 61  00 2e 65 68 5f 66 72 61 |..rodata..eh_fra|

0000107b  6d 65 5f 68 64 72 00 2e  64 61 74 61 00 2e 65 68 |me_hdr..data..eh|

0000108b  5f 66 72 61 6d 65 00 2e  64 79 6e 61 6d 69 63 00 |_frame..dynamic.|

0000109b  2e 63 74 6f 72 73 00 2e  64 74 6f 72 73 00 2e 6a |.ctors..dtors..j|

000010ab  63 72 00 2e 67 6f 74 00  2e 62 73 73 00 2e 63 6f |cr..got..bss..co|

000010bb  6d 6d 65 6e 74 00 2e 64  65 62 75 67 5f 61 72 61 |mment..debug_ara|

000010cb  6e 67 65 73 00 2e 64 65  62 75 67 5f 69 6e 66 6f |nges..debug_info|

000010db  00 2e 64 65 62 75 67 5f  61 62 62 72 65 76 00 2e |..debug_abbrev..|

000010eb  64 65 62 75 67 5f 6c 69  6e 65 00 |debug_line.|000010f6

9.6.3.14. .strtab (string table)

This string table stores symbol names for the main symbol table. It uses the typical string table format described above under “String Table Format.” See the section, .dynsym, for the string table for the dynamic symbol table.

此字符串表存储主符号表的符号名称。它使用上面“字符串表格式”中描述的典型字符串表格式。请参阅.dynsym部分,了解动态符号表的字符串表。

9.6.3.15. .symtab (symbol table)

This is the full (main) symbol table that also includes all static functions and variables. This is used during the linking phase when an executable or shared library is being built. This is not used during run time. In fact, only part or none of this symbol table may be loaded into the address space at run time.

这是完整(主)符号表,还包括所有静态函数和变量。在构建可执行文件或共享库的链接阶段使用此方法。 在运行时不使用它。实际上,在运行时只能将该符号表的一部分或全部不加载到地址空间中。

In the executable foo, the offset of the .symtab section is 0x29c4 and is 0x51 0 bytes in size as shown here:

在可执行文件foo中,.symtab节的偏移量为0x29c4,大小为0x510字节,如下所示:

penguin> readelf -S foo | egrep symtab

  [33] .symtab  SYMTAB        00000000 0029c4 000510 10    34 3a 4

 

Using the readelf command, we can see that this section is contained by none of the ELF segments:

使用readelf命令,我们可以看到此部分不包含任何ELF段:

penguin> readelf -l foo | head -16

 

Elf file type is EXEC (Executable file)

Entry point 0x8048540

There are 7 program headers, starting at offset 52

 

Program Headers:

  Type     Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align

  PHDR     0x000034 0x08048034 0x08048034 0x000e0 0x000e0 R E 0x4

  INTERP   0x000114 0x08048114 0x08048114 0x00013 0x00013 R  0x1

      [Requesting program interpreter: /lib/ld-linux.so.2]

  LOAD     0x000000 0x08048000 0x08048000 0x0076c 0x0076c R E 0x1000

  LOAD     0x00076c 0x0804976c 0x0804976c 0x001d4 0x001dc RW  0x1000

  DYNAMIC  0x000810 0x08049810 0x08049810 0x000f0 0x000f0 RW  0x4

  NOTE     0x000128 0x08048128 0x08048128 0x00020 0x00020 R   0x4

  GNU_EH_FRAME  0x000748 0x08048748 0x08048748 0x00024 0x00024 R 0x4

 

The fact that this symbol table is not loaded at run time makes it impossible to write a stack trace back function that can find and display the names of static functions from within the program itself.

在运行时未加载此符号表的事实使得无法编写可以在程序本身内查找和显示静态函数名称的栈跟踪功能。

Stack trace back functions are often called from within a signal handler to dump the stack trace to a file when a trap occurs. Some of the functions on the stack may be static, and the program’s address space does not contain any symbol table that can be used to map the address of stack functions to function names.

栈跟踪功能通常在信号处理程序内调用,以便在发生陷阱时将栈跟踪转储到文件中。栈上的某些函数可能是静态的,程序的地址空间不包含任何可用于将栈函数的地址映射到函数名称的符号表。

9.6.3.16. .text

This section contains the executable code of an ELF file. All of the compiled functions that an ELF file contains will be in this section. This is the most important part of an ELF file. The rest of the sections and all of the complexity of ELF serves to allow the executable instructions in this section to run. The file foo.C defines three functions:

本节包含ELF文件的可执行代码。 ELF文件包含的所有已编译函数都将在此节中。 这是ELF文件中最重要的部分。 其余的节和ELF的所有复杂性用于允许本节中的可执行指令运行。 文件foo.C定义了三个函数:

penguin> nm libfoo.so |egrep " T "

00000916 T _Z3bazi

00000a60 T _fini

00000768 T _init

 

These functions are all within the range of the .text segment: 这些函数都在. 文本段的范围内:

penguin> readelf -S libfoo.so | egrep text

[10] .text    PROGBITS     000007e0 0007e0 000280 00  AX  0  0 16

 

Because the .text section contains executable instructions, the best way to see the contents of the .text section is to disassemble an ELF file using the objdump tool and look for the .text section in the output:

因为. text 节包含可执行指令, 所以查看. text 节内容的最佳方法是使用 objdump 工具反汇编 ELF 文件, 并在输出中查找. text 节:

Code View: Scroll / Show All

penguin> objdump -d libfoo.so

<...>

Disassembly of section .text:

<...>

000009d6 <_Z3fooi>:

9d6:    55                     push   %ebp

9d7:    89 e5                  mov    %esp,%ebp

9d9:    53                     push   %ebx

9da:    83 ec 04               sub    $0x4,%esp

9dd:    e8 00 00 00 00         call   9e2 <_Z3fooi+0xc>

9e2:    5b                     pop    %ebx

9e3:    81 c3 56 13 00 00      add    $0x1356,%ebx

9e9:    c7 45 f8 00 00 00 00   movl   $0x0,0xfffffff8(%ebp)

9f0:    8b 45 08               mov    0x8(%ebp),%eax

9f3:    89 45 f8               mov    %eax,0xfffffff8(%ebp)

9f6:    83 7d f8 63            cmpl   $0x63,0xfffffff8(%ebp)

9fa:    7e 02                  jle    9fe <_Z3fooi+0x28>

9fc:    eb 07                  jmp    a05 <_Z3fooi+0x2f>

9fe:    8d 45 f8               lea    0xfffffff8(%ebp),%eax

a01:    ff 00                  incl   (%eax)

a03:    eb f1                  jmp    9f6 <_Z3fooi+0x20>

a05:    8b 93 20 00 00 00      mov    0x20(%ebx),%edx

a0b:    8b 45 08               mov    0x8(%ebp),%eax

a0e:    89 02                  mov    %eax,(%edx)

a10:    8b 45 08               mov    0x8(%ebp),%eax

a13:    03 45 f8               add    0xfffffff8(%ebp),%eax

a16:    83 c4 04               add    $0x4,%esp

a19:    5b                     pop    %ebx

a1a:    5d                     pop    %ebp

a1b:    c3                     ret

 

All of the functions from the source files will be listed, as well as a few additional functions that support the inner workings of ELF.

源文件中的所有函数将被列出,以及一些支持ELF内部工作的附加函数。

9.6.3.17. .rel

This is a relocation section. Relocation sections are prefixed by the section name that they will be operating on. For example, .rel.text is a relocation table that will work with the .text section. Relocation is a critical part of ELF because it allows shared libraries to be loaded anywhere in the address space.

这是一个重定位节。 重定位节以它们将要操作的节名为前缀。 例如,.rel.text是一个可以与.text节一起使用的重定位表。 重定位是ELF的关键部分,因为它允许在地址空间的任何位置加载共享库。

Relocation is the process of changing an address in a loaded ELF section to the current address of a corresponding function or variable as illustrated in Figure 9.4.

重定位是将加载的 ELF 节中的地址更改为相应函数或变量的当前地址的过程, 如图9.4 所示。

Figure 9.4. Relocations.

 

Figure 9.4 shows the important sections that are needed to explain relocation. The text segment contains the procedure linkage table (PLT), relocation sections, and the executable code. The data segment contains the global and static variables as well as the global offset table (GOT).

图9.4显示了解释重定位所需的重要的节。 文本段包含过程链接表(PLT),重定位节和可执行代码。 数据段包含全局和静态变量以及全局偏移表(GOT)。

A function reference first goes to the PLT to the appropriate slot. This then executes an address, which is stored in the corresponding slot in the GOT. The address in the GOT may point back to a function in the text segment, or it may point to a function in another shared library. In the diagram, this reference goes to another shared library.

函数引用首先转到适当的PLT的槽。 然后执行一个地址,该地址存储在GOT的相应插槽中。 GOT中的地址可以指向文本段中的函数,或者它可以指向另一个共享库中的函数。 在图中,此引用转到另一个共享库。

A variable reference goes directly to the GOT and then to the address of the variable. The reference could go to another shared library (or executable) or to the data segment of the same shared library as it does in the diagram.

变量引用直接转到GOT,然后转到变量的地址。 引用可以转到另一个共享库(或可执行文件)或同一共享库的数据段,如图中所示。

The relocation in this case can be as simple as changing the addresses stored in the GOT.

在这种情况下,重定位可以像更改存储在GOT中的地址一样简单。

The libfoo.so shared library actually has two sections that are prefixed by .rel: libfoo.

libfoo.so共享库实际上有两个以.rel:libfoo为前缀的部分。

penguin> readelf -S libfoo.so | egrep rel

  [ 6] .rel.dyn    REL       000006e0 0006e0 000060 08  A 2  0 4

  [ 7] .rel.plt    REL       00000740 000740 000028 08  A 2  9 4

 

These are both relocation sections that contain entries of the form:

这些都是包含表单条目的重定位节:

typedef struct

{

  Elf32_Addr     r_offset;   /* Address */

Elf32_Word     r_info;     /* Relocation type and symbol index */

} Elf32_Rel;

Note: Other platforms may use .rela sections and the corresponding Elf32_Rela structure (see /usr/include/elf.h for this structure).

注意:其他平台可能使用.rela部分和相应的Elf32_Rela结构(有关此结构,请参阅/usr/include/elf.h)。

 

The r_offset field is the target address or offset that should be changed by the relocation. For object files, the offset is within the affected section. For shared libraries and executables, this is the “value” (address) of the symbol. The r_info field contains the relocation type and the symbol index. We can see the relocation information using readelf:

r_offset字段是由重定位更改的目标地址或偏移量。 对于目标文件,偏移量在受影响的节内。 对于共享库和可执行文件,这是符号的“值”(地址)。 r_info字段包含重定位类型和符号索引。 我们可以使用readelf查看重定位信息:

Code View: Scroll / Show All

penguin> readelf -r libfoo.so

 

Relocation section '.rel.dyn' at offset 0x6e0 contains 12 entries:

 Offset     Info    Type            Sym.Value  Sym. Name

00001b00  00000008 R_386_RELATIVE

00001b04  00000008 R_386_RELATIVE

00001b68  00000008 R_386_RELATIVE

00001d24  00000008 R_386_RELATIVE

00001d5c  00000008 R_386_RELATIVE

00001b6c  00002101 R_386_32      00000000   __ gxx_personality_v0

00001d58  00001e06 R_386_GLOB_DAT   00001d80   noValueGlobInt

00001d60  00002006 R_386_GLOB_DAT   00001b20   globInt

00001d64  00002606 R_386_GLOB_DAT   00000000   cxa_finalize

00001d68  00002c06 R_386_GLOB_DAT   00001d7c   myObj2

00001d6c  00002d06 R_386_GLOB_DAT   00000000   Jv_RegisterClasses

00001d70  00002e06 R_386_GLOB_DAT   00000000   gmon_start__

 

Relocation section '.rel.plt' at offset 0x740 contains 5 entries:

 Offset     Info    Type            Sym.Value Sym. Name

00001d44  00001d07 R_386_JUMP_SLOT   000009d6 Z3fooi

00001d48  00002407 R_386_JUMP_SLOT   00000000 printf

00001d4c  00002607 R_386_JUMP_SLOT   00000000 cxa_finalize

00001d50  00002b07 R_386_JUMP_SLOT   000009c8 ZN7myClassC1Ev

00001d54  00002d07 R_386_JUMP_SLOT   00000000 Jv_RegisterClasses

 

Notice that the variables are in one section, and all of the functions are in another section.

Static variables and functions do not need to be relocated because they will always reference using a relative offset from within the shared library. For executables, they will reference using absolute addresses.

请注意,变量位于一个节,所有功能都位于另一个节中。静态变量和函数不需要重定位,因为它们总是使用共享库中的相对偏移量进行引用。 对于可执行文件,它们将使用绝对地址进行引用。

The file foo.C defines two objects, one static and one global as follows:

文件foo.C定义了两个对象,一个是静态的,一个是全局的,如下所示:

static myClass myObj ;

myClass myObj2 ;

 

Only one variable, the global variable, will be in the relocation section:

penguin> readelf -r libfoo.so |egrep "globInt|staticInt"

00001d60  00002006 R_386_GLOB_DAT    00001b20   globInt

 

The type is R_386_GLOB_DAT and the symbol offset is 0x1b20, which is the value of the symbol in the symbol table. The relocation offset for globInt points to 0x1d60, which is the address of the globInt slot in the GOT. More information on relocations follows in the next section of this chapter.

类型为R_386_GLOB_DAT,符号偏移量为0x1b20,这是符号表中符号的值。 globInt的重定位偏移量指向0x1d60,这是GOT中globInt插槽的地址。 有关重定位的更多信息,请参见本章的下一节。

 

发布了234 篇原创文章 · 获赞 12 · 访问量 24万+

猜你喜欢

转载自blog.csdn.net/mounter625/article/details/102754058