Programmer's self-cultivation - link library loading and notes: load and process the executable file

After the CPU in order to be loaded into memory only the executable file execution .

1. process virtual address space

Programs and processes What is the difference: program ( or a narrow sense executable file) is a static concept, it is a document of some good instructions and data collection of pre-compiled; the process is a dynamic concept, which is running a process when , many times when the library is called dynamic running (Runtime) also have some meaning.

After each program is operational, it will have its own independent virtual address space (Virtual Address Space) , the size of this virtual address space is determined by the computer's hardware platform, specifically by the CPU determine the number of bits . Hardware determines the maximum theoretical upper limit of the address space, that is, the size of the address space hardware, such as 32-bit hardware platform to determine the address of the virtual address space from 0 to 2 ^ 32-1, namely 0x00000000 ~ 0xFFFFFFFF, that is, we often say 4GB of virtual space; and 64-bit hardware platform having 64-bit addressing capability, its virtual address space reaches 2 ^ 64 bytes, i.e., 0x0000000000000000 ~ 0xFFFFFFFFFFFFFFFF, total 17179869184GB.

From a procedural point of view, we can determine C to calculate the size of the virtual address space of the space program in the language pointer occupied. Typically, C is the same size language pointer size and the number of bits of the virtual address space, such as a 32 pointer to the next bit platform 32 bit, i.e. 4 byte; 64 pointer to the next bit platform 64 bit, i.e., 8 bytes . Of course, in some special cases, this rule does not hold.

In the following 32-bit address space of the main 64 and 32 are similar.

So 4GB virtual space in the 32-bit platform, our program is free to use it? No. Because the program at run time is under supervision of the operating system, the operating system in order to achieve a virtual space surveillance program run a series of purpose, the process in the hands of the operating system. Process operating system can only use those addresses allocated to the process, if unauthorized access to space, the operating system will capture these visits, such as access to the process of the illegal operation, terminates the process. We often encounter in Windows "process due to illegal operation needs to close" or "Segmentation fault" under Linux is often visited because of unauthorized address.

PAE (Physical Address Extension) : from hardware terms, the original 32-bit address lines can only access up to 4GB of physical memory. But since expanded to 36 address lines, Intel modifying the way the page map, making the new mapping ways to access more physical memory. Intel this address extended mode is called PAE. Extended physical address space, for general applications, it's there feeling under normal circumstances, mainly because it is the operating system of things, in the application, only 32-bit virtual address space. Then the application how to use these larger than conventional memory space? A very common method is the operating system provides a method for mapping window, these additional memory mapping process address space to be. In Windows, this memory access mode of operation is called AWE (Address Windowing Extensions); and like Linux and other UNIX-like operating system is used mmap () system call to achieve.

2. Load the way

When the program required for execution of instructions and data must be able to function properly in memory, the easiest way is to run the program instructions and data required for all into memory so that the program can run smoothly, this is the easiest static load method. But in many cases the amount of memory required for the program is greater than the amount of physical memory, when the number of insufficient memory, the fundamental solution is to add memory. The program runs is the principle of locality, so we can program most commonly used part of the resident in memory, and some of the less frequently used data is stored on disk inside, which is dynamically loaded basic principles.

Load cover (Overlay) and page mapping (Paging) are two very typical dynamic loading method, thinking that they used are similar, in principle, the principle of locality is the use of the program. The idea is dynamically loaded modules which use the program, which will be loaded into memory modules, if not it is not being loaded, stored on disk .

Load cover: use of virtual storage in the absence of a broader invention, now it has almost been eliminated. The method of covering the excavated loaded task to embedded memory programmer, who must manually program is divided into several blocks in the preparation of procedures, and then write a small auxiliary code to manage the memory module should reside when and when it should be replaced. The small auxiliary code is the so-called overlay manager (Overlay Manager). The method of loading is typically covered by a time for space.

Page mapping : it is part of a virtual storage mechanism, which along with the invention of the birth of virtual storage. Loaded with the principle of covering similar page mapping data and instructions and not all of a sudden put programs are loaded into memory, but the memory and all disk data and instructions in accordance with the " page (Page)" is divided into units number of pages, after all loading and operating units are page. Hardware requirements page size 4096 bytes, 8192 bytes, 2MB , 4MB and so on .

3. From the perspective of the operating system to load an executable file

Build process : from the operating system point of view, a process the most critical features is its own independent virtual address space, which makes it different from other processes. Many times a program is executed simultaneously accompanied by creating a new process. Create a process and then load the appropriate executable file and execute. In the case of virtual storage case, the process is just beginning to do three things:

. (1) The first is to create a virtual address space: a virtual space by a set of page mapping function to map each page corresponding to the virtual space of physical space, create a virtual space to create space but not actually create a mapping function corresponding data structure required . In the i386 Linux, creating a virtual address space allocation is really just a directory page (Page Directory) on it, not even set the page mappings These mappings to wait until later in the program page fault occurs again set.

. (2) reads the executable file header, and a mapping between the virtual space and executable files: one step above that page mapping function is a mapping relationship between the virtual space to physical memory, this step does is virtual space mapping relationship executable file. When the program executes a page fault occurs, the operating system allocates memory from the physical one physical page, then the "missing page" read from disk into memory, and then set the mapping between virtual pages and physical pages missing pages, this program was able to function properly. When the operating system to capture page fault error, a page that it should know which procedures currently required location in the executable file. This is the mapping between the virtual space with the executable file. Points of view, this step is loading the whole process is the most important step in the traditional sense is "loading" process.

Since the virtual space when loading an executable file is actually being mapped, the executable file often also called the image file (Image) .

A segment in the process Linux virtual space is called virtual memory area (VMA, Virtual Memory Area), this is called in the Windows virtual segment (Virtual Section), in fact they are the same concept.

(3) The CPU instruction register into an executable file is provided an inlet, up and running: the operating system is provided by the instruction register of the CPU control to the process, whereby the process started. Entrance executable file that is saved entry address ELF file header.

Page fault (Page Fault) : With the implementation process, will continue to generate a page fault, the operating system will assign corresponding physical page for the process to meet the needs of the implementation process.

4. The process virtual memory space distribution

ELF file links view and execution view : In a normal process, often more than the executable file contains code segments, as well as the data segment, BSS, etc., so the process is mapped to the virtual space is often more than one segment. When the increase in the number of segments, we will have wasted space problem. ELF when the file is mapped, based on the page length of the system as a unit of an integral multiple of, the length of each segment to be mapped at the page length are system; if not, then the excess part also occupies a page. ELF file is often a dozen segments, wasting memory space can be imagined.

ELF file permissions segment often only one of the few compositions, substantially are three: (1) represented in the code segment executable permissions segment readable; (2) and data segments. BSS permissions segment is represented as a readable and writable segments; (3) to read-only data segments represented by segment read-only permission. For the same privileges segment, merge them together as a segment map.

ELF executable files introduces a concept called "Segment" , a "Segment" contains one or more attributes similar to "Section" . From the perspective of the link, ELF file is based on "Section" stored; from the point of view loaded, ELF files and can be divided in accordance with the "Segment". Concept "Segment" is actually re-loading from the perspective of the respective segments divided the ELF. In the link object files into an executable file, the linker will try to assign property rights the same period in the same space. Such segments are put together readable executable, this code segment is typically segments; readable and writable segments together, this data segment is typically segments. In the ELF in the similarity of these attributes, but also linked segment is called a "Segment" , and the system is in accordance with "Segment" instead of "Section" to map the executable file .

The following example is a small program (SectionMapping.c):

#include <stdlib.h>

int main()
{
	while (1) {
		sleep(1000);
	}

	return 0;
}

Statically linked way links compiled into an executable file SectionMapping.elf, execute:

gcc -static SectionMapping.c -o SectionMapping.elf

Use readelf can be seen, there are a total SectionMappint.elf executable file 31 segments (Section), as shown below:

As described in "Section" attribute structure is called the segment table describing "Segment" structure called the program header (Program Header), which describes how the virtual space ELF file by the operating system mapped into the process, the results as shown below:

You can see, this executable file has six Segment. From the perspective of loading, the current only care about two "LOAD" type of Segment, because it only needs to be mapped, others such as the "NOTE", "TLS", "GNU_STACK" all play a supporting role at the time of loading . "Section" all of the same properties are categorized into a "Segment", and mapped to a VMA same. In general, "Segment" and "Section" is from a different point of view to divide the same ELF file . This is called a different view (View) in the ELF, ELF file is the link view (Linking View), from the "Segment" point of view is the view of execution (Execution View) from "Section" perspective. When we talk about ELF loading "section," specifically refers to "Segment"; in other cases, "segment" refers to the "Section".

ELF executable file has a special data structure called the program header table (Program Header Table) is used to store information "Segment" a. Because the ELF object file does not need to be loaded, so it has no program header table, and ELF executables and shared libraries have. Segment table with the same structure, program header table is an array of structures, or its structure Elf32_Phdr Elf64_Phdr (declared in /usr/include/elf.h) as follows:

/* Program segment header.  */
typedef struct
{
  Elf32_Word    p_type;                 /* Segment type */
  Elf32_Off     p_offset;               /* Segment file offset */
  Elf32_Addr    p_vaddr;                /* Segment virtual address */
  Elf32_Addr    p_paddr;                /* Segment physical address */
  Elf32_Word    p_filesz;               /* Segment size in file */
  Elf32_Word    p_memsz;                /* Segment size in memory */
  Elf32_Word    p_flags;                /* Segment flags */
  Elf32_Word    p_align;                /* Segment alignment */
} Elf32_Phdr;

typedef struct
{
  Elf64_Word    p_type;                 /* Segment type */
  Elf64_Word    p_flags;                /* Segment flags */
  Elf64_Off     p_offset;               /* Segment file offset */
  Elf64_Addr    p_vaddr;                /* Segment virtual address */
  Elf64_Addr    p_paddr;                /* Segment physical address */
  Elf64_Xword   p_filesz;               /* Segment size in file */
  Elf64_Xword   p_memsz;                /* Segment size in memory */
  Elf64_Xword   p_align;                /* Segment alignment */
} Elf64_Phdr;

Several members using Elf32_Phdr or Elf64_Phdr structure results "readelf -l" Print header table shows correspondence. The basic meaning of the various members of the structure, as shown in the following table:

对于”LOAD”类型的”Segment”来说,p_memsz的值不可以小于p_filesz,否则就是不符合常理的。如果p_memsz大于p_filesz,就表示该”Segment”在内存中所分配的空间大小超过文件中实际的大小,这部分”多余”的部分则全部填充为”0”。这样做的好处是,我们在构造ELF可执行文件时不需要再额外设立BSS的”Segment”了,可以把数据”Segment”的p_memsz扩大,那些额外的部分就是BSS。因为数据段和BSS的唯一区别就是:数据段从文件中初始化内容,而BSS段的内容全都初始化为0。这也就是在前面的例子中只看到了两个”LOAD”类型的段,而不是三个,BSS已经被合并到了数据类型的段里面。

堆和栈:在操作系统里面,VMA除了被用来映射可执行文件中的各个”Segment”以外,它还可以有其它的作用,操作系统通过使用VMA来对进程的地址空间进行管理。进程在执行的时候它还需要用到栈(Stack)、堆(Heap)等空间,事实上它们在进程的虚拟空间中的表现也是以VMA的形式存在的,很多情况下,一个进程中的栈和堆分别都有一个对应的VMA。

在Linux下,可以通过查看”/proc”来查看进程的虚拟空间分布,如下图所示:

上图的输出结果中:第一列是VMA的地址范围;第二列是VMA的权限,”r”表示可读,”w”表示可写,”x”表示可执行,”p”表示私有(COW, Copy on Write),”s”表示共享。第三列是偏移,表示VMA对应的Segment在映像文件中的偏移;第四列表示映像文件所在设备的主设备号和次设备号;第五列表示映像文件的节点号。最后一列是映像文件的路径。我们可以看到进程中有8个VMA,只有前两个是映射到可执行文件中的两个Segment。另外六个段的文件所在设备主设备号和次设备号及文件节点号都是0,则表示它们没有映射到文件中,这种VMA叫做匿名虚拟内存地址(Anonymous Virtual Memory Area)。我们可以看到有两个区域分别是堆(Heap)和栈(Stack),它们的大小分别为(0x0122d000-0x0120a000)/1024=140KB和(0x7ffc01c44000-0x7ffc01c23000)/1024=132KB。这两个VMA几乎在所有的进程中存在,我们在C语言程序里面最常用的malloc()内存分配函数就是从堆里面分配的,堆由系统库管理。栈一般也叫做堆栈,每个线程都有属于自己的堆栈,对于单线程的程序来讲,这个VMA堆栈就全都归它使用。另外有一个很特殊的VMA叫做”vdso”,它的地址已经位于内核空间了,事实上它是一个内核的模块,进程可以通过访问这个VMA来跟内核进行一些通信。

进程虚拟地址空间的概念:操作系统通过给进程空间划分出一个个VMA来管理进程的虚拟空间;基本原则是将相同权限属性的、有相同映像文件的映射成一个VMA;一个进程基本上可以分为如下几种VMA区域:(1). 代码VMA,权限只读、可执行;有映像文件。(2). 数据VMA,权限可读写、可执行;有映像文件。(3). 堆VMA,权限可读写、可执行;无映像文件,匿名,可向上扩展。(4). 栈VMA,权限可读写、不可执行;无映像文件,匿名,可向下扩展。当我们在讨论进程虚拟空间的”Segment”的时候,基本上就是指上面的几种VMA。

堆的最大申请数量:32位,Linux下虚拟地址空间分给进程本身的是3GB(Windows默认是2GB),一般程序中使用malloc()函数进行地址空间的申请,那么malloc的最大申请数量会受到操作系统版本、程序本身大小、用到的动态/共享库数量大小、程序栈数量大小等,甚至有可能每次最大可申请数量都会不同,因为有些操作系统使用了一种叫做随机地址空间分布的技术(主要是出于安全考虑,防止程序受恶意攻击),使得进程的堆空间变小。

段地址对齐可执行文件最终是要被操作系统装载运行的,这个装载的过程一般是通过虚拟内存的页映射机制完成的。在映射过程中,页是映射的最小单位。对于Intel 80x86系列处理器来说,默认的页大小为4096字节,也就是说,我们要映射将一段物理内存和进程虚拟地址空间之间建立映射关系,这段内存空间的长度必须是4096的整数倍,并且这段空间在物理内存和进程虚拟地址空间中的起始地址必须是4096的整数倍。由于有着长度和起始地址的限制,对于可执行文件来说,它应该尽量地优化自己的空间和地址的安排,以节省空间。在ELF文件中,对于任何一个可装载的”Segment”,它的p_vaddr除以对齐属性的余数等于p_offset除以对齐属性的余数。

进程栈初始化:进程刚开始启动的时候,须知道一些进程运行的环境,最基本的就是系统环境变量和进程的运行参数。很常见的一种做法是操作系统在进程启动前将这些信息提前保存到进程的虚拟空间的栈中(也就是VMA中的Stack VMA)。

5. Linux内核装载ELF过程简介

当我们在Linux系统的bash下输入一个命令执行某个ELF程序时,首先在用户层面,bash进程会调用fork()系统调用创建一个新的进程,然后新的进程调用execve()系统调用执行指定的ELF文件,原先的bash进程继续返回等待刚才启动的新进程结束,然后继续等待用户输入命令。execve()系统调用被声明在/usr/include/unistd.h中。Glibc对execve()系统调用进行了包装,提供了execl()、execlp()、execle()、execv()和execvp()等5个不同形式的exec系列API,它们只是在调用的参数形式上有所区别,但最终都会调用到execve()这个系统中。

在进入execve()系统调用之后,Linux内核就开始进行真正的装载工作。在内核中,execve()系统调用相应的入口是sys_execve(),sys_execve()进行一些参数的检查复制之后,调用do_execve()。do_execve()会首先查找被执行的文件,如果找到文件,则读取文件的前128个字节,目的是判断文件的格式,每种可执行文件的格式的开头几个字节都是很特殊的,特别是开头4个字节,常常被称做魔数(Magic Number),通过对魔数的判断可以确定文件的格式和类型。比如ELF的可执行文件格式的头4个字节为0x7F、’E’、’L’、’F’;而Java的可执行文件格式的头4个字节为’c’、’a’、’f’、’e’;如果被执行的是Shell脚本或perl、python等这种解释型语言的脚本,那么它的第一行往往是”#!/bin/sh”或”#!/usr/bin/perl”或”#!/usr/bin/python”,这时候前两个字节’#’和”!”就构成了魔数,系统一旦判断到这两个字节,就对后面的字符串进行解析,以确定具体的解释程序的路径。

当do_execve()读取了这128个字节的文件头部以后,然后调用search_binary_handle()去搜索和匹配合适的可执行文件装载处理过程。Linux中所有被支持的可执行文件格式都有相应的装载处理过程,search_binary_handle()会通过判断文件头部的魔数确定文件的格式,并且调用相应的装载处理过程。比如ELF可执行文件的装载处理过程叫做load_elf_binary();a.out可执行文件的装载处理过程叫做load_aout_binary();而装载可执行脚本程序的处理过程叫做load_script()。

load_elf_binary()的主要步骤是:

(1). 检查ELF可执行文件格式的有效性,比如魔数、程序头表中段(Segment)的数量。

(2). 寻找动态链接的”.interp”段,设置动态链接器路径。

(3). 根据ELF可执行文件的程序头表的描述,对ELF文件进行映射,比如代码、数据、只读数据。

(4). 初始化ELF进程环境,比如进程启动时EDX寄存器的地址应该是DT_FINI的地址。

(5). 将系统调用的返回地址修改成ELF可执行文件的入口点,这个入口点取决于程序的链接方式,对于静态链接的ELF可执行文件,这个程序入口就是ELF文件的文件头中e_entry所指的地址;对于动态链接的ELF可执行文件,程序入口点就是动态链接器。

当load_elf_binary()执行完毕,返回至do_execve()再返回sys_execve()时,上面的第5步中已经把系统调用的返回地址改成了被装载的ELF程序的入口地址了。所以当sys_execve()系统调用从内核态返回到用户态时,EIP寄存器直接跳转到了ELF程序的入口地址,于是新的程序开始执行,ELF可执行文件加载完成。

6. Windows PE的装载

PE文件的装载跟ELF有所不同,由于PE文件中,所有段的起始地址都是页的倍数,段的长度如果不是页的整数倍,那么在映射时向上补齐到页的整数倍,我们也可以简单地认为在32位的PE文件中,段的起始地址和长度都是4096字节的整数倍。由于这个特点,PE文件的映射过程会比ELF简单得多,因为它无须考虑如ELF里面诸多段地址对齐之类的问题,虽然这种会浪费一些磁盘和内存空间。PE可执行文件的段的数量一般很少,不像ELF中经常有十多个”Section”,最后不得不使用”Segment”的概念把它们合并到一起装载,PE文件中,链接器在生产可执行文件时,往往将所有的段尽可能地合并,所以一般只有代码段、数据段、只读数据段和BSS等为数不多的几个段。

PE里面很常见的术语叫做RVA(Relative Virtual Address),它表示一个相对虚拟地址,就是相当于文件中的偏移量的东西。它是相对于PE文件的装载基地址的一个偏移地址。比如,一个PE文件被装载到虚拟地址(VA)0x00400000,那么一个RVA为0x1000的地址就是0x00401000。每个PE文件在装载时都会有一个装载目标地址(Target Address),这个地址就是所谓的基地址(Base Address)。由于PE文件被设计成可以装载到任何地址,所以这个基地址并不是固定的,每次装载时都可能会变化。如果PE文件中的地址都使用绝对地址,它们都要随着基地址的变化而变化。但是,如果使用RVA这样一种基于基地址的相对地址,那么无论基地址怎么变化,PE文件中的各个RVA都保持一致。

装载一个PE可执行文件过程:

(1). 先读取文件的第一个页,在这个页中,包含了DOS头、PE文件头和段表。

(2). 检查进程地址空间中,目标地址是否可用,如果不可用,则另外选一个装载地址。这个问题对于可执行文件来说基本不存在,因为它往往是进程第一个装入的模块,所以目标地址不太可能被占用。主要是针对DLL文件的装载而言的。

(3). 使用段表中提供的信息,将PE文件中所有的段一一映射到地址空间中相应的位置。

(4). 如果装载地址不是目标地址,则进行Rebasing。

(5). 装载所有PE文件所需要的DLL文件。

(6). 对PE文件中的所有导入符号进行解析。

(7). 根据PE头中指定的参数,建立初始化栈和堆。

(8). 建立主线程并且启动进程。

PE文件中,与装载相关的主要信息都包含在PE扩展头(PE Optional Header)和段表。

GitHubhttps://github.com/fengbingchun/Messy_Test

发布了718 篇原创文章 · 获赞 1131 · 访问量 609万+

Guess you like

Origin blog.csdn.net/fengbingchun/article/details/100803751
Recommended