Detailed explanation of the ELF file format of Linux (Unix-like) system executable programs

We know that a Linux program includes program code and initial data, so how are these program binary codes and initial data stored in the executable program file? This is the problem that the ELF file format is intended to solve.

The memory structure of a Linux executable program can be roughly divided into code segment, data segment, BSS, heap, and stack, as shown in the following figure:

The contents of the BSS, heap, and stack are dynamically generated during program running, while the code segment and data segment content are generated during program code compilation. Therefore, the code segment and data segment content need to be written to the ELF file. So that the data can be read from the ELF file when the program is loaded, and the code segment and data segment of the memory space can be initialized. For BSS, heap, stack, etc., only the memory usage planning data needs to be saved in the ELF file, that is, the memory starting address, memory usage, etc. The above is just a simple and rough division of memory usage. The actual memory usage is more complicated than this. We will leave some small data segments, such as global offset tables, process link tables, etc., for now and will elaborate on them later.

In addition to saving the data of code segments, data segments, and memory usage planning data of each memory segment, ELF may also need to save program version information, debugging information, dynamic link symbol tables, dynamic link strings, etc., and these data They are divided into Sections for storage, and then a Section index table is stored in the ELF file to point to these Sections for positioning.

ELF is also used as a packaging method for executable programs. In order to simplify the loading process of executable programs, Sections with similar behaviors during the program loading process are planned into a continuous memory and defined as a Segment so that the entire Segment can be loaded together. Similarly, the index table of a Segment stored in the ELF file points to each Segment in order to locate them.

To sum up, the overall structural design of the ELF file is as shown below:

1. ELF header, which describes the basic information such as the program version, as well as the starting position and size of the Program header table and Section header table in the ELF file;

2. Program header table is actually the Segment index table we mentioned earlier;

3. Section header table, which is the Setion index table mentioned above;

4..text\.rodata\.data, that is, each Setion mentioned above, the starting position and size of each Setion are recorded in the Section header table, and one or more continuously distributed Sections are classified into a Segment, and then The starting position and size of each Segment are recorded in the Program header table;

After understanding the overall structure design of the ELF file, let's take a look at how the data structure of each part is designed.

ELF file header (ELF header)

The ELF header is located at the beginning of the file. Its main purpose is to locate other parts of the file. The file header mainly contains the following fields:

  • ELF file identification - a byte array used to identify whether the file is an ELF file, and provide information about common file characteristics;
  • File Type - Determine the file type. This field describes whether the file is a relocation file, or an executable file, or...;
  • target structure;
  • ELF file format version;
  • Program entry address;
  • File offset of program header table;
  • File offset of section header table;
  • The size of the ELF header;
  • The size of the entry in the program header table;
  • Other fields...

You can find the structure representing the ELF64 header in the kernel source code  elf64_hdr:

typedef struct elf64_hdr {
    unsigned char    e_ident[EI_NIDENT];
    Elf64_Half e_type;
    Elf64_Half e_machine;
    Elf64_Word e_version;
    Elf64_Addr e_entry;
    Elf64_Off e_phoff;
    Elf64_Off e_shoff;
    Elf64_Word e_flags;
    Elf64_Half e_ehsize;
    Elf64_Half e_phentsize;
    Elf64_Half e_phnum;
    Elf64_Half e_shentsize;
    Elf64_Half e_shnum;
    Elf64_Half e_shstrndx;
} Elf64_Ehdr;

This structure is defined in  elf.h

Section index

All data is stored in sections of ELF files. We identify sections by index in the section header table. Section header table entries contain the following fields:

  • section name;
  • the type of section;
  • section properties;
  • memory address;
  • offset in the file;
  • section size;
  • Links to other sections;
  • various information;
  • address alignment;
  • The size of this entry, if any;

Moreover, the structure in the linux kernel  elf64_shdr is as follows:

typedef struct elf64_shdr {
    Elf64_Word sh_name;
    Elf64_Word sh_type;
    Elf64_Xword sh_flags;
    Elf64_Addr sh_addr;
    Elf64_Off sh_offset;
    Elf64_Xword sh_size;
    Elf64_Word sh_link;
    Elf64_Word sh_info;
    Elf64_Xword sh_addralign;
    Elf64_Xword sh_entsize;
} Elf64_Shdr;

 This structure is defined in  elf.h

Segment index

All sections in an executable file or shared library are divided into segments. The program header is an array of structures, each structure represents a segment. Its structure is like this:

typedef struct elf64_phdr {
    Elf64_Word p_type;
    Elf64_Word p_flags;
    Elf64_Off p_offset;
    Elf64_Addr p_vaddr;
    Elf64_Addr p_paddr;
    Elf64_Xword p_filesz;
    Elf64_Xword p_memsz;
    Elf64_Xword p_align;
} Elf64_Phdr;

 This structure is defined  in elf.h.

To sum up, our process of parsing ELF files is roughly as follows:

1. Read the ELF header data structure located at the beginning of the ELF file, and obtain the program information, as well as the starting position and size of the Section index table and Segment index table;

2. For the executable program ELF file, read the Segment index table, traverse the elf64_phdr objects one by one, and obtain the Segment information, as well as the starting position and size in the ELF file, as well as the corresponding memory location and memory size. Load the data from the ELF file into the specified memory according to the Segment index to complete the program loading process;

3. For other types of ELF files, read the Section index table, traverse the elf64_shdr objects one by one, and obtain the information of the Section, its starting position and size in the ELF file, and the corresponding memory location. As needed, read the Section content according to the Section index and process accordingly;

We can use the readelf tool to read the ELF file structure information. Take the familiar mkdir program as an example, as shown below:

 Use the -h parameter to read the ELF header structure data. You can see that the file type is an executable program of ELF64, and the program entry memory address is 0x3700.

The offset of the starting address of the Program header in the ELF file is 64 bytes, the length of a single Segment index is 56 bytes, and there are 13 Segments in total.

The offset of the Section header starting address in the ELF file is 66112 bytes, and the length of a single Section index object is 64 bytes, with a total of 31.

 Use the -S parameter to print all the contents of the Section header table. The above header description is as follows:

Name: the name of the Section;

Type: Section type;

Address: The memory starting address after the Section is loaded into memory;

Offset: The offset of this Section's data relative to the beginning of the file in the ELF file;

Size: The memory size occupied by this Section in the program memory;

EntSize: The data size of this Section in the ELF file;

Flags:*****

Link:*****

Info:*****

Align: Data alignment size.

 Use the -l parameter to print all the contents of the Program header table. The upper header is described as follows:

Type: the Segment type;

Offset: offset position in the ELF file;

VirtAddr: The starting virtual memory address of the Segment after it is loaded into memory;

PhysAddr: The starting physical memory address of the Segment after it is loaded into memory;;

FileSiz: The size of the Segment in the ELF file;

MemSiz: The size of the Segment after it is loaded into memory;

Flags:*****;

Align: data alignment size;

The lower part shows which Sections each Segment contains.

For the process of loading ELF files in Linux system, please refer to this article:  ELF file loading process - Zhihu

Guess you like

Origin blog.csdn.net/ctbinzi/article/details/130281756