9.3. ELF Header

9.3. ELF Header

The first part of any ELF file (including object files like foo.o) is the ELF header. There are several ways to look at the header. First, we’ll use a program that dumps the raw data in hexadecimal and ascii (a text representation) for a file to see if there is anything that we can recognize.

任何ELF文件的第一部分 (包括类似于 foo. o 的对象文件) 是ELF头。有几种方法可以查看ELF头。首先, 我们将使用一个程序, 将原始数据以十六进制和 ascii (文本表示形式) 转储到文件中, 以查看是否有我们可以识别的内容。

Code View: Scroll / Show All

penguin> hexdump -C foo.o | head

00000000  7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|

00000010  01 00 03 00 01 00 00 00 00 00 00 00 00 00 00 00 |................|

00000020  58 03 00 00 00 00 00 00 34 00 00 00 00 00 28 00 |X.......4.....(.|

00000030  12 00 0f 00 55 89 e5 83 ec 08 c7 45 fc 00 00 00 |....U......E....|

00000040  00 83 ec 0c ff 75 08 e8 fc ff ff ff 83 c4 10 03 |.....u..........|

00000050  05 00 00 00 00 89 45 fc 8b 15 04 00 00 00 8d 45 |......E........E|

00000060  fc 01 10 8d 45 fc 83 00 05 8b 45 fc c9 c3 55 89 |....E.....E...U.|

00000070  e5 83 ec 08 83 ec 0c ff 75 08 e8 b5 ff ff ff 83 |........u.......|

00000080  c4 10 83 ec 0c 68 20 00 00 00 e8 fc ff ff ff 83 |.....h.........|

00000090  c4 10 b8 00 00 00 00 c9 c3 90 55 89 e5 83 ec 08 |..........U.....|

 

 

                                                

Note: hexdump is used in this chapter to show a raw hex dump of ELF files. The od tool can also be used.

注意: 本章使用 hexdump 显示 ELF 文件的原始十六进制内容。还可以使用 od 工具。

 

At first glance, the only thing recognizable is the “ELF” text at the beginning of the file in the ascii part of the output. We can confirm visually that this is an ELF file, but in order to understand the rest, we need to look at the structure of the ELF header.

乍一看, 唯一可识别的是在输出的 ascii 部分中文件开头的 "ELF" 文本。我们可以直观地确认这是一个 ELF 文件, 但为了了解其余部分, 我们需要查看 ELF 头的结构。

The structure for the ELF header is contained in various papers on the ELF specification as well as the /usr/include/elf.h file on Linux. The structure listed here is for 32-bit ELF files (refer to the elf.h header file to see the 64-bit version):

ELF头的结构包含在关于 ELF 标准的各种文件中, 以及 Linux 上的 /usr/include/elf.h 文件。此处列出的结构用于32位 ELF 文件 (请参阅elf. h 文件以查看64位版本):

#define EI_NIDENT      16

 

typedef struct {

        unsigned char  e_ident[EI_NIDENT]; /* ident bytes */

        Elf32_Half     e_type;             /* file type */

        Elf32_Half     e_machine;          /* target machine */

        Elf32_Word     e_version;          /* file version */

        Elf32_Addr     e_entry;            /* start address */

        Elf32_Off      e_phoff;            /* phdr file offset */

        Elf32_Off      e_shoff;            /* shdr file offset */

        Elf32_Word     e_flags;            /* file flags */

        Elf32_Half     e_ehsize;           /* sizeof ehdr */

        Elf32_Half     e_phentsize;        /* sizeof phdr */

        Elf32_Half     e_phnum;            /* number phdrs */

        Elf32_Half     e_shentsize;        /* sizeof shdr */

        Elf32_Half     e_shnum;            /* number shdrs */

        Elf32_Half     e_shstrndx;         /* shdr string index */

} Elf32_Ehdr;

 

If we map this structure to the raw output from the hex dump, we see that the first 16 bytes is for the e_ident field, and the first four bytes include the text “ELF.” In fact, every ELF file contains the first four bytes 0x7f, E, L, and F to identify the file type as ELF. This is called a magic number. Magic numbers are used in many file formats, and the command file foo.o (referenced earlier in the chapter) used this magic number to identify the object file as an ELF file (see the /etc/magic file or read the man page for magic for more information on magic numbers).

如果我们将此结构映射到十六进制转储的原始输出, 我们将看到前16个字节为 e_ident 字段, 前四个字节包含文本 "ELF"。实际上, 每个 ELF 文件都包含前四个字节0x7f、E、L 和 F 来标识文件类型为 ELF。这被称为一个magic数字。magic数字用在许多文件格式, 和命令文件 foo.o (在本章前面提到). 使用这个magic数字辨认对象文件为ELF文件 (有关magic数的更多信息,请参阅/ etc / magic文件或阅读magic手册页)。

Here are the fields in the ident array (one byte per field) of the ELF header:

下面是 ELF 头的识别数组中的字段 (每个字段的一个字节):

0. 0x7f

1. E

2. L

3. F

4. EI_CLASS       : ELF Class

5. EI_DATA        : Data encoding: big or little endian

6. EI_VERSION     : Must be EV_CURRENT (value of 1)

7. EI_OSABI       : Application binary interface (ABI) type

8. EI_ABIVERSION  : ABI version

9. EI_PAD         : Start of padding bytes (continues until end of array)

 

From /usr/include/elf.h, we have:

#define ELFCLASSNONE   0           /* EI_CLASS */

#define ELFCLASS32     1

#define ELFCLASS64     2

#define ELFCLASSNUM    3

 

#define ELFDATANONE    0           /* e_ident[EI_DATA] */

#define ELFDATA2LSB    1

#define ELFDATA2MSB    2

 

#define EV_NONE        0           /* e_version, EI_VERSION */

#define EV_CURRENT     1

 

Let’s use the data from a hex dump to map these three values from the ELF file foo.o:

让我们使用十六进制转储中的数据来映射ELF文件foo.o中的这三个值:

Code View: Scroll / Show All

penguin> hexdump -C foo.o | head -1

00000000   7f  45 4c  46  01  01 01 00  00 00 00 00 00 00 00 00 |.ELF............|

 

According to the hex dump, the “class” field byte at offset 4 (starting from 0x0) is 1, the data encoding field (byte at offset 5) is 1, and the version (byte at offset 6) is 1.

根据十六进制转储, 在偏移 4 (从0x0 开始) 的 "class" 字段字节为 1, 数据编码字段 (位于偏移量5) 为 1, 并且版本 (位于偏移量 6) 为1。

Thus, the class is 32-bit (ELFCLASS32), the data encoding is LSB (Least Significant Bit) (ELFDATA2LSB) or “little endian,” and the version is EV_CURRENT (which it must be).

因此, 该class是32位 (ELFCLASS32), 数据编码为 LSB (最小有效位) (ELFDATA2LSB) 或 "little endian", 版本为 EV_CURRENT (必须是)。

We can also map the next couple of fields in the ELF header structure using the raw output. The next two fields of e_type and e_machine are 16 bytes (EI_NIDENT) and 18 (EI_NIDENT + 2) bytes past the beginning of the file (at offset 0x10 in hexadecimal):

我们还可以使用原始输出映射ELF头结构中的下几个字段。 e_type和e_machine的下两个字段是16个字节(EI_NIDENT)和18个(EI_NIDENT + 2)个字节,超过文件的开始部分(偏移量为十六进制0x10)::

penguin> hexdump -C foo.o |head -2

00000000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|

00000010 01 00 03 00 01 00 00 00 00 00 00 00 00 00 00 00 |................|

 

From the output, the e_type field is 0x1 (elf.h: ET_REL), which is used for relocatable files. The e_machine field is 0x3 (elf.h: EM_386), which is used for executables that run on the x86 architecture.

在输出中, e_type 字段为 0x1 (elf.h: ET_REL), 用于可重定位的文件。e_machine 字段是 0x3 (elf.h: EM_386), 用于在 x86 体系结构上运行的可执行文件。

Note: Because this platform is little endian, the byte order must be reversed to be translated into the big endian format—the format that humans are generally more comfortable with. For little endian, a hex dumped value of 0100 is actually 0001 or 0x1 (little endian is covered in more detail in the GDB chapter of this book).

注意: 因为这个平台是little endian, 所以必须颠倒的字阶转换成big endin格式-人类更适应的格式。对于little endian, 0100 的十六进制值实际上是0001或 0x1 (在本书的 GDB 章节中, 对little endian有更详细地介绍)。

 

Mapping the ELF structure to the raw hex and ASCII output certainly works, but it is inconvenient and shows that there is no real magic or mystery behind the ELF object types. Fortunately, Linux provides a much easier way to display the ELF header:

将ELF结构映射到原始十六进制和ASCII输出当然有效,但它不方便并且表明ELF对象类型背后没有真正的魔力或神秘感。幸运的是, Linux 提供了一种更简单的方式来显示 ELF 头:

penguin> readelf -h foo.o

ELF Header:

  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00

  Class:                             ELF32

  Data:                              2's complement, little endian

  Version:                           1 (current)

  OS/ABI:                            UNIX - System V

  ABI Version:                       0

  Type:                              REL (Relocatable file)

  Machine:                           Intel 80386

  Version:                           0x1

  Entry point address:               0x0

  Start of program headers:          0 (bytes into file)

  Start of section headers:          856 (bytes into file)

  Flags:                             0x0

  Size of this header:               52 (bytes)

  Size of program headers:           0 (bytes)

  Number of program headers:         0

  Size of section headers:           40 (bytes)

  Number of section headers:         18

  Section header string table index: 15

 

The last 10 values in the output correspond directly to the last 10 fields in the ELF header structure but without the work of having to find and format the information by hand.

输出中的最后10个值直接对应于 ELF 头结构中的最后10个字段, 但无需手工查找和格式化信息。

First let’s take a look at the difference between the ELF header for different ELF file types. We’ll look at object files (which we just looked at), shared libraries, executables, and core files.

首先让我们来看看不同类型 ELF 文件的 ELF 头的区别。我们将查看对象文件 (我们刚才看过的)、共享库、可执行文件和core文件。

Here is the ELF header for an executable:

下面是可执行文件的 ELF 头:

 penguin> readelf -h foo

ELF Header:

  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00

  Class:                             ELF32

  Data:                              2's complement, little endian

  Version:                           1 (current)

  OS/ABI:                            UNIX - System V

  ABI Version:                       0

  Type:                              EXEC (Executable file)

  Machine:                           Intel 80386

  Version:                           0x1

  Entry point address:               0x8048540

  Start of program headers:          52 (bytes into file)

  Start of section headers:          9292 (bytes into file)

  Flags:                             0x0

  Size of this header:               52 (bytes)

  Size of program headers:           32 (bytes)

  Number of program headers:         7

  Size of section headers:           40 (bytes)

  Number of section headers:         35

  Section header string table index: 32

 

Besides the obvious difference that the e_type is EXEC and not REL, as it was for the object file, the e_entry and e_phoff fields are also defined for the executable. This is the information needed to load the executable into memory and start it running. This information is missing from object files, which is one of the reasons they cannot be run directly.

除了 e_type 是 EXEC 而不是REL,这个明显差异外, 与对象文件一样, e_entry 和 e_phoff 字段也定义为可执行。这是将可执行文件加载到内存并开始运行所需的信息。对象文件中缺少此信息, 这是它们无法直接运行的原因之一。

The e_entry (entry point) field contains the virtual address of the starting function for an ELF file. This field is usually only used for executable files. For executable files on Linux, this field contains the address of the _start() function, which runs before main() and ensures the proper start up of the executable. Eventually, _start() calls main() to hand control over to the user written code. Using the nm utility, we can display the symbol table and confirm that the _start function is at 0x08048540. When the executable first starts up, this is the first function that is called in the executable.

e_entry (入口点) 字段包含 ELF 文件的起始函数的虚拟地址。此字段通常只用于可执行文件。对于 Linux 上的可执行文件, 此字段包含 _start () 函数的地址, 它在 main () 之前运行, 并确保可执行程序的正确启动。最终, _start () 调用 main () 将控制权交给用户编写的代码。使用 nm 工具, 我们可以显示符号表, 并确认 _start 函数在0x08048540。当可执行文件第一次启动时, 这是在可执行文件中调用的第一个函数。

penguin> ls -l foo

-rwxr-xr-x    1 wilding  build    12609 Jan  9 11:30 foo

penguin> nm foo | egrep ' _start$'

08048540 T _start

Note: _start is a special function that initializes a new running process. It is run before main().

注意: _start 是一个特殊的函数, 它初始化一个新的运行进程。它在main () 之前运行。

 

One thing worth noting is that the “offset” (first field) in the nm output is larger than the file itself. The foo executable is only 12609 (0x3141) bytes, although nm is suggesting that _start() is at offset 0x08048540. The reason for this is that ELF provides the ability to specify a load address for a segment of an ELF file. The load address is the address where a segment should be loaded into memory. On Linux (x86 architecture), the load address for the segment that contains the machine instructions of a 32-bit executable is 0x08048000. This address is platform-specific and defined as part of the ABI (application binary interface). This value is added to the offsets of the symbols to provide the value displayed by nm. For more information on the load address, refer to the heading, “Segments and the Program Header Table,” later in the chapter.

值得注意的一点是, nm 输出中的 "偏移" (第一个字段) 大于文件本身。foo 可执行文件只有 12609 (0x3141) 字节, 虽然 nm 建议 _start () 在偏移0x08048540。其原因是, ELF 提供了指定 ELF 的段的加载地址的能力。加载地址是将段加载到内存中的地址。在 Linux (x86 体系结构) 上, 32位可执行文件的机器指令的段的加载地址是0x08048000 。此地址是特定于平台的, 并定义为 ABI (应用程序二进制接口) 的一部分。此值将添加到符号的偏移量, 以提供由 nm 显示的值。有关加载地址的详细信息, 请参阅本章后面的标题 "段和程序头表"。

The e_phoff (“start of program headers”) field contains the file offset for the program header table. The program header table is required for executables and shared libraries and defines the various segments in an ELF file. A segment is a contiguous part or range of an ELF object and has specific memory attributes such as read, write, and execute. A segment is meant to be loaded into memory with the corresponding memory attributes. The e_phentsize (“size of program headers”) field defines the size of an entry in the program header table. The e_phnum (“number of program headers”) field defines the number of entries in the program header table. All entries in the program header table have the same fixed size.

e_phoff ("程序头起始部分") 字段包含程序头表的文件偏移量。程序头表是可执行文件和共享库所必需的, 并定义了 ELF 文件中的各个段。段是 ELF 对象的连续部分或范围, 具有特定的内存属性, 如读取、写入和执行。段是要被加载到具有相应的属性的内存中。e_phentsize ("程序头大小") 字段定义了程序头表中条目的大小。e_phnum ("程序头数") 字段定义了程序头表中的条目数。程序头表中的所有项都具有相同的固定大小。

Note: The only part of an ELF file that has a fixed location is the ELF header. All other parts of an ELF header are referenced by offset starting with the offsets listed in the ELF header.

注意: ELF文件中唯一具有固定位置的部分是 ELF 头。ELF头的所有其他部分都通过从 ELF头中列出的偏移量进行访问。

 

The ELF header for a shared library is similar to that of an executable, and in fact, the two file types are almost identical. A core file, on the other hand, has some significant differences. A core file is the memory image of a once-running process. Because there is no need to execute it, there is no need for sections of the core file to contain machine instructions. There is a need, however, to load parts of a core file into memory (for example, when using a debugger), and thus there are some program headers (segments).

共享库的 ELF 头与可执行文件类似, 实际上, 这两种文件类型几乎相同。另一方面, core文件存在一些显著的差异。core文件是一个运行的进程的内存映像。因为没有必要执行它, 所以不需要core文件包含机器指令。但是, 需要将core文件的一部分加载到内存中 (例如, 在使用调试器时), 因此有一些程序头 (段)。

penguin> ls -l core

-rw———    1 wilding build    184320 Oct 14 16:36 core

penguin> file core

core: ELF 32-bit LSB core file of  'excp' (signal 6), Intel 80386,

version 1 (SYSV), from 'excp'

penguin> readelf -h core |tail

  Entry point address:               0x0

  Start of program headers:          52 (bytes into file)

  Start of section headers:          0 (bytes into file)

  Flags:                             0x0

  Size of this header:               52 (bytes)

  Size of program headers:           32 (bytes)

  Number of program headers:         17

  Size of section headers:           0 (bytes)

  Number of section headers:         0

  Section header string table index: 0

 

Notice that there is no entry point and no section headers. Sections and segments are two different types of ELF file parts and really deserve a good explanation.

注意, 没有入口点, 也没有节头。节和段是两种不同类型的 ELF 文件部分, 需要好好的解释这两者的区别。

9.4. Overview of Segments and Sections

An ELF file can be interpreted in two ways: as a set of segments or as a set of sections. Sections are smaller pieces of an ELF file that contain very specific information, such as the machine instructions or the symbol table. Segments are larger groupings of one or more sections, all of which have the same memory attributes.

ELF 文件可以用两种方式解释: 作为一组段或一组节。节是 ELF 文件中包含非常特定信息 (如机器指令或符号表) 的较小片断。段是一个或多个节的较大分组, 所有这些节具有相同的内存属性。

Using an analogy of a car, the “sections” of the car would be the undeniable features of that car such as seats, the glove compartment, the gas petal, the steering wheel, the rear window, and the dash board controls. Regardless of how these are grouped, they exist and can be separated from the car if needed. Segments, on the other hand, are not as concrete or real but rather are more like a grouping of sections. For example, we could have front and back segments. The front segment would contain the steering wheel, the front seats, and so on. The back segment would contain the rear window, the back seat, etc. We could also split the car into left and right segments. Or we could create overlapping segments such as a front segment and a left segment. In fact, one segment could completely contain another segment. Regardless of how we group the “sections” of the car into segments, the sections remain the same. The location of the sections in the car is important, however; the car wouldn’t be very practical with the steering wheel in the back seat!

使用汽车的做比喻, 汽车的 "节" 将是汽车不可缺少的功能, 如座椅, 手套箱, 油门踏板, 方向盘, 后窗, 和仪表板控件。不管这些是如何组合的, 它们的确是存在的, 如果需要, 可以与汽车分离。另一方面, 段不是具体的或真实的, 而是更像是一个节的分组。例如, 我们可以有前后段。前段将包含方向盘, 前排座椅, 等等。后段将包含后窗, 后座等。我们还可以把车分成左右段。或者, 我们可以创建重叠段, 如前段和左段。事实上, 一个段可以完全包含另一个段。无论我们如何把汽车的 "节" 分成段, 各节保持不变。然而, 在汽车中节的位置是重要的;这辆车在后座的方向盘上不会很实用!

The grouping of sections into segments for executable foo is shown in the following command:

下面的命令显示了将节分组到可执行 foo 的段中:

Code View: Scroll / Show All

penguin> readelf -l foo

 

Elf file type is EXEC (Executable file)

Entry point 0x8048540

There are 7 program headers, starting at offset 52

 

Program Headers:

  Type     Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align

  PHDR     0x000034 0x08048034 0x08048034 0x000e0 0x000e0 R E 0x4

  INTERP   0x000114 0x08048114 0x08048114 0x00013 0x00013 R   0x1

      [Requesting program interpreter: /lib/ld-linux.so.2]

  LOAD     0x000000 0x08048000 0x08048000 0x0076c 0x0076c R E 0x1000

  LOAD     0x00076c 0x0804976c 0x0804976c 0x001d4 0x001dc RW  0x1000

  DYNAMIC  0x000810 0x08049810 0x08049810 0x000f0 0x000f0 RW  0x4

  NOTE     0x000128 0x08048128 0x08048128 0x00020 0x00020 R   0x4

  GNU_EH_FRAME 0x000748 0x08048748 0x08048748 0x00024 0x00024 R 0x4

 

Section to Segment mapping:

 Segment Sections...

 00

 01      .interp

 02      .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r  .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame_hdr

 03      .data .eh_frame .dynamic .ctors .dtors .jcr .got .bss

 04      .dynamic

 05      .note.ABI-tag

 06      .eh_frame_hdr

 

The second part of the output shows which sections are contained in which segments. Notice that the “.interp” section is contained by both segment 1 and segment 2. The first part of the output, “Program Headers,” will be explained in more detail next.

输出的第二部分显示哪些节包含在哪些段中。请注意, ". interp" 节包含在段1和段2中。输出的第一部分 "程序头" 将在下文中详细说明。

 

发布了234 篇原创文章 · 获赞 12 · 访问量 24万+

猜你喜欢

转载自blog.csdn.net/mounter625/article/details/102753923
9.3