Eighth reading notes

  This section describes what the compiler does. One of them is parsing, which analyzes different tokens, adds their priorities, and forms a syntax tree. The tree here is similar to the expression binary tree we used in pair programming, which shows the wide application of the tree data structure. Semantic analysis is a process of modifying and improving the tree, but only static semantic analysis can be performed, and dynamic semantic errors will not appear until runtime. So we have errors when programming is running, can we first think about which errors will be errors in dynamic semantics, such as division by zero, pointer out of bounds and the like.

  The compiler is divided into a compiler front-end and a compiler back-end. The front-end is responsible for generating machine-independent intermediate code. The compiler back-end converts the intermediate code into target machine code. For cross-platform compilers, they are used for different platforms. The same frontend and several backends, so you can save work.

  After the source code is compiled into object code, the only problem is that the addresses of the variables are not determined. In fact, the absolute addresses of global variables and functions defined in other modules can only be determined at the final runtime at the final link, so the compiler compiles a source code file into an unlinked object file, and then the linker finally converts these Object files are linked to form executable files.

  In fact, the concept of linking has been started since the high-level language. The reason is that if we determine the physical addresses of each subroutine at the beginning, we need these physical addresses to jump between programs, then when we modify a line code, then all physical addresses will be modified, which is called address relocation. This trouble just explains the principle of using labels to identify addresses in the principles of microcomputers we learned.

  However, the amount of code in a software is very large now, and they are often divided into thousands of modules according to their functions or properties, and we want to test or reuse these modules individually, so they are often written separately, and the program blocks The process of combination involves communication between modules, and the communication method is the reference of symbols between modules, so linking is a splicing process.

  Linking is to handle the part of mutual reference between modules and modules. The job of the linker is to adjust the address at an advanced level.

  When the compiler compiles the program, the address of the function or variable call of different modules is first chosen to be put on hold, and the part of the work of modifying the address is to be completed by the linker, which is also the basic work of static linking. Therefore, we usually use static linking in the programs we compile.

  So to sum up, in fact, the object file already has the form of an executable file, but its address has not been adjusted a bit.

  Therefore, object files are stored in executable file format, just like executable files. In fact, both dynamic link libraries and static link libraries are stored in executable file format. The static link library is slightly different and is equivalent to the packaging of many object files.

  Object files are divided into several sections. The first is the file header, which includes various information about the file, including file attributes (whether it is an executable file/static link/dynamic link/), entry address, and operating system information. The code segment, the machine code compiled for the execution statement, initialized global variables and local static variables are placed in the data segment. This is very close to the assembly language we learned, indicating that assembly is closer to the hardware itself, and also reflects the difference between high-level programming languages ​​and low-level programming languages.

  There is a section to do. bss section, used to store uninitialized global variables and local variables.

  When I was learning assembly, I wanted to ask a question: Why should the data segment and the code segment be separated, and I got the answer here.

  The first is that after the program is loaded, the data area and the code area will be mapped to two virtual memory areas respectively, because the data can be read and written by the process, and the code is read-only, we can use these two areas. Different property settings prevent the block from being modified.

  Next, another important reason is that, given the read-only nature of the code, it can be shared. In fact, anything that is read-only can be shared, such as icons and so on. The advantage of sharing is that if the system is running many copies of a program at the same time, only one code segment is reserved in memory, which is shared between processes. Of course, the data segment of each program is private to the process.

  The concept of shared instructions occupies an extremely important position in modern operating systems, especially in systems with dynamic linking to save a lot of memory.

  在链接中,目标文件之间相互拼合实际上是目标文件对地址的引用,即对函数和变量的地址的引用。在链接中,我们将函数和变量统称为符号,函数名或变量名就是符号名。因此链接的一个关键在于符号管理。每一个目标文件都有一个相应的符号表,这个表里记录了目标文件中所用到的所有符号。,每个符号有一个对应的值叫做符号值,这也是变量和函数的地址。符号表中:定义在本目标文件中的全局符号、在本目标文件中引用的全局符号、段名(其值是该段的起始地址)、局部符号(链接器会忽略)、行号信息。在链接过程中,只有前两类的符号是重要的。

  ELF符号表是一个结构体数组,这个结构体数组记录了跟符号有关的信息。

  st_info符号类型和绑定信息,符号类型指明是/未知符号类型/数据对象,变量数组/函数或可执行代码/段/文件名。绑定信息指明局部符号/全局符号/弱引用

  st_shndx符号所在段:定义在本目标文件中:所在段在段表的下标/不定义在本目标文件:文件/COMMON类型的符号/未定义

  st_value符号值:在本目标文件中且不知COMMON类型:偏移量/COMMON块:对齐属性/可执行文件:符号的虚拟地址,与动态链接器有关。

  在使用ld作为链接器来链接产生可执行文件时,链接器会帮我们定义很多特殊的符号。这些符号并没有在自己的程序中定义,但是可以直接申明并且引用,称为特殊符号。

  这些符号被定义在ld链接器的链接脚本中。

  而随着代码文件的增多,符号名重复成为一个特别严重的问题。最初的修改方法是在编译过程中进行修改,也就有了符号修改一说。而在C++中,支持函数重载,此时函数名是一样的,于是C++就引出了符号修饰和符号改编的机制。

  函数签名:函数签名包含了一个函数的信息,包括函数名/参数类型/所在类/命名空间。作用:识别不同的函数。编译器在将C++源码编译成目标文件时会通过函数签名惊醒修饰形成符号名。

  由于C++和C的编译过程不一样,如果我们在C++中引用C的头文件并需要使用里面的函数,则会遇到编译后找不到匹配的符号的问题,这时候有一个代码可以帮助实现;

#ifdef _cplusplus
extern "C"{
#endif
void *menmset(void*,int,size_t);
#ifdef _cplusplus
}
#endif
View Code

  这个技巧在系统头文件中里面都会被用到。

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324901945&siteId=291194637