In-depth understanding of the ELF file format (1)

foreword

ELF is the abbreviation of Executable and Linking Format, which is a common binary file format on the Linux platform. In Android NDK development, almost all deal with ELF, such as:

  • The .o (or .obj) file compiled from the c/c++ file is a file in ELF format;
  • Dynamic library (.so) files and executable files are also ELF files;
  • ELF is inseparable from dynamic library string erasure, dynamic library packing, dynamic library repair, etc.;

When the author was learning the ELF format, in order to deepen the understanding of the ELF format, I created an Android project that parses ELF files with C and Java respectively . Interested partners can directly check the link.

ELF file format

As mentioned earlier, ELF is the abbreviation of Executable and Linking Format. ExecutableThe and in the name Linkingindicate that ELF has two important features.

  • Executable: executable. ELF files will participate in the program's execution (Execution) process. Including the operation of the binary program and the loading of the dynamic library .so file.

  • Linking: Connectable. ELF files participate in the compilation and linking process.

    ELF file view

The above two views indicate that the ELF format can be analyzed from two angles (View). Personally, I feel that these two views just provide how the ELF format is laid out table, but in fact, as Linking Viewit is, Program Header Tableit is determined by the ELF header file program_table_element, that is to say Program Header Table, it is a conceptual meaning and does not really exist. Of course, Section Header tablethe same is true.

ELF Header

ELF files support 64-bit and 32-bit CPU instruction architectures, and ELF supports 64-bit by defining longer field types (relative to 32-bit). The following is mainly through the analysis of the 32-bit ELF file specification.

#define EI_NIDENT 16
typedef struct elf32_hdr {
    
    
  unsigned char e_ident[EI_NIDENT];
  Elf32_Half e_type;
  Elf32_Half e_machine;
  Elf32_Word e_version;
  Elf32_Addr e_entry;
  Elf32_Off e_phoff;
  Elf32_Off e_shoff;
  Elf32_Word e_flags;
  Elf32_Half e_ehsize;
  Elf32_Half e_phentsize;
  Elf32_Half e_phnum;
  Elf32_Half e_shentsize;
  Elf32_Half e_shnum;
  Elf32_Half e_shstrndx;
} Elf32_Ehdr;

The above code snippet is parsed as follows:

e_ident[EI_NIDENT] e_ident is an array of 16 bytes in length:

  • The first 4 bytes of e_ident[0-3] represent the Magic Number, which are '0x7f', 'E', 'L', and 'F' respectively, and are generally used to check whether it is an ELF file.
  • e_ident[EI_CLASS=4] indicates whether the ELF file is a 32-bit file (value 1) or a 64-bit file (value 2).
  • e_ident[EI_DATA=5] indicates whether the byte order of the data in the ELF file is little-endian (value 1) or big-endian (value 2).
  • e_ident[EI_VERSION=5] indicates the version of the ELF file, usually the value is 1.
  • e_ident[6-15] is currently set to zero for byte alignment.

**e_type ** The length of this field is 2 bytes, indicating the type of ELF.
The e_machine field is 2 bytes in length, indicating which CPU architecture the ELF file corresponds to.

e_version This field takes the same value as e_ident[EI_VERSION=5].

e_entry This field indicates the virtual address of the program entry. When the ELF file is an executable file, the operating system will jump to the location of e_entry to execute the code of the program after loading it.

e_phoff ph is an abbreviation for program header. e_phoff indicates the starting position of the first element of the program header (the offset is recorded). It is worth noting that the elements of the program header are continuous.

e_shoff sh is the abbreviation of section header. Similar to e_phoff, if the ELF contains Section, this variable indicates the starting position of the Section element in the file.

e_flags indicates processor-specific flags.

e_ehsize eh is the abbreviation of elf header, indicating the size of the ELF file header.

e_phentsize 表示 program header’s entry size。

e_phnum indicates the program header number.

e_shentsize 表示 sections header’s entry size。

e_shnum indicates the number of sections header.

e_shstrndx associates the string table of each section name, and uses the section's sh_nameas a subscript index. If the section has no name, this value should be set to SHN_UNDEF.

Program Header

typedef struct elf32_phdr {
    
    
  Elf32_Word p_type;
  Elf32_Off p_offset;
  Elf32_Addr p_vaddr;
  Elf32_Addr p_paddr;
  Elf32_Word p_filesz;
  Elf32_Word p_memsz;
  Elf32_Word p_flags;
  Elf32_Word p_align;
} Elf32_Phdr;

ELF in Execution View must contain Program Header element. Each field of Program Header is parsed as follows:

p_type program_table_element (segment) type.
p_offset The program_table_element (segment) is located at the beginning of the file.

p_vaddr The program_table_element (segment) loaded into virtual memory is specified as a relative memory location.
p_paddr indicates the physical address corresponding to program_table_element (segment). For executables and dynamic libraries, this value has no meaning.
p_filesz indicates the size occupied by program_table_element (segment) in the file, and its value can be 0. Because a segment is composed of sections, and some sections do not take up space.
p_memsz The space occupied by the program_table_element (segment) in memory, its value can be 0.
p_flags segment flags. After
p_alignp_align program_table_element (segment) is loaded into the memory, it needs to be aligned according to the requirements.

Section Header

typedef struct elf32_shdr {
    
    
  Elf32_Word sh_name;
  Elf32_Word sh_type;
  Elf32_Word sh_flags;
  Elf32_Addr sh_addr;
  Elf32_Off sh_offset;
  Elf32_Word sh_size;
  Elf32_Word sh_link;
  Elf32_Word sh_info;
  Elf32_Word sh_addralign;
  Elf32_Word sh_entsize;
} Elf32_Shdr;

The data structure of the Section Header element is shown in the above code snippet.
sh_name This variable specifies the name of the Section. ELF has a Section (Section Header String Table Section, abbreviated as shstrtab) that stores the name of the Section. The sh_name here points to a certain location of shstrtab (it can be understood that sh_name is the subscript of shstrtab), which stores the string of the Section name.
sh_type Section type, different types of Section store different content.
sh_flags Attribute of the Section.
sh_addr If the Section is loaded into memory (executable program or dynamic library), sh_addr indicates where it should be loaded into memory.
sh_offset indicates that the Section is relative to the starting position of the file.
sh_size The size of the Section itself.

string table

As mentioned above, ELF has a Section (Section Header String Table Section, abbreviated as shstrtab) that stores the name of the Section. This Section represents the information of the string table. In object files, these strings are usually symbol names or section names. In other parts of the object file, when a certain string needs to be referenced, only the serial number of the string in the string table needs to be provided.

The first string in the string table (number 0) is always the empty string, ie null, it can be used to represent an empty name or no name. So, the first byte of the string table is \0. Since every string ends with null, the last byte of the string table must also be null.

The string table can also be empty, without any strings, but the sh_size in the ELF file header must be zero.

string table

As can be seen from the figure above, a fully defined string can be referenced through a subscript, that is, the \0entire wrapped string, or a part of it.

summary

This article mainly revolves around the ELF format to elaborate on the ELF format specification, and has not gone into every detail of it yet. The main purpose is to provide a rough layout view of the ELF file format first, so that the knowledge points can be better supplemented and improved later. The main reference above is elf - format of Executable and Linking Format (ELF) files . Those who are interested in in-depth can refer to this information.

Guess you like

Origin blog.csdn.net/HongHua_bai/article/details/122288032