[In-depth understanding of computer systems] Chapter VII-Link

This chapter has been reread for a long time. According to the description in the reference column of the book, the link is at the intersection of the compiler, computer architecture, and operating system. It requires understanding of code generation, machine language programming, program instantiation, and virtual memory. This chapter is far from the style of the previous chapter. The last chapter left me with constant calculations and better calculations. This chapter is mainly memory. Now reread, I feel that some tools can be recorded. In this chapter, I pay more attention to the use of actual tools and the commands themselves. Do not delve into the principles of compilation.

 

  1. Static linking process
    • ASCII source file * .cpp ---- (preprocessor cpp) ---->

      ASCII intermediate file * .i ---- (compiler cc1) ---->

      ASCII assembly language file * .s ---- (assembler as) ---->

      Relocatable target file * .o ---- (linker program ld, link multiple .o files) ---->

      Executable object file p

    •  
  2. Static linker ld
    • Symbol resolution : object file definition and reference symbols. The purpose of symbol resolution is to associate each symbol reference with a symbol definition.
    • Relocation : The compiler and assembler generate code and data sections starting at address zero. The linker relocates these sections by associating each symbol definition with a memory and then modifying all references to these symbols so that they point to this memory location.
  3. Three types of object files: relocatable object file (.o); executable file; shared object file (for example: dynamic link library)
  4. [Relocatable target file (* .o)]
    • The ELF header starts with a 16-byte sequence and describes the word size and byte order of the system that generated the file.
    • The following need to call the compiler driver with the -g option to get this table.
      • .debug (debug symbol table, including local variables and type definitions defined in programs, global variables defined and referenced in programs, etc.)
      • .line (the mapping between the line number in the original C source program and the machine instruction in the .text section)
    • Symbol table (Symbol): contains the symbol information defined and referenced by this relocatable object file m (module). In the context of the linker, there are three different symbols:
      • Global symbol defined by m and referenced by other modules . Corresponding to non-static C functions and global variables that are defined without the "static property of C" (breaking sentence problem, only read the English version to understand ...).
      • Global symbols defined by other modules and referenced by m . External symbols correspond to C functions and variables defined in other modules.
      • Local symbols defined and referenced only by module m (unlike local variables, local symbols, local variables). Some correspond to C functions and global variables with static attributes. Visible in m, but cannot be referenced by other modules. The local symbol can also be obtained from the section of the object file corresponding to the module m and the name of the corresponding source file.
      • Small question: Why is there no mention of local variables?
        • My answer: the linker does not care about local variables. In the process of running the program, local variables are on the stack, and new ones are on the heap. Uninitialized global variables are in the BSS section, initialized in the data section, and constant character strings are in the static storage area. For the static storage area, the variable constants will always exist during the running of the program and will not be released, and there will be only one copy of the variable constants in it, and there will be no different copies of the same variables and constants.
      • Where are static global variables, non-static global variables, static local variables, and non-static local variables placed?
        • Both static and global are static storage methods (including data segment and code segment), one for initialization and one for uninitialization.
        • Non-static local variables are on the stack. New comes out on the heap and is directly defined on the stack.
        • Non-static global-> static global: change scope; non-static local-> static local: change storage location of variables
        • Bonus: JAVA is a "reference-based" language, and there is no structure without new. Therefore, class member variables are on the heap.
      • Use the GNU READELF tool to display the symbol table entries of .o files
      •  

         

    • Symbol resolution
      • Associate each reference with a certain symbol definition in the symbol table of the relocatable object file that it imports (One definition of each local symbol per module). (Static / Non-static) Local symbols are simple, while the global symbols are hard. Can know the exact size of the code section and data section in the input target module.
      • (Strong and weak symbols are all global symbols) Strong symbols: functions and initialized global variables, weak symbols: uninitialized global variables

        • Three rules: there cannot be "multiple strong", "one strong" and "multiple weak" to choose one strong and "multiple weak" to choose one arbitrarily.
        • Importance analysis: This problem needs to be considered when compiling multiple files. Note that the strong and weak symbols are all for the definition of global symbols. Lenovo mentioned above that you can use the static attribute to hide variable and function names, which is helpful for understanding this.

      • With a linker (ld) called GCC-warn-common, when parsing multiple definitions of global symbol definitions, a warning message is output.
      • Static library * .a
        • When the linker constructs an executable file, it only copies the target modules referenced by the application in the static library. If you don't use a static library, you might provide functions like this:
          • Let the compiler recognize the call to the standard function and directly generate the corresponding code
            • Pascal uses this method, there are too many C standard functions
            • Not friendly to programmers writing compilers, friendly to application programmers
            • The complexity of the compiler increases, and a function needs a compiler version
          • Place all standard C functions in a single relocatable target module
            • Separation of compiler implementation and standard function implementation
            • Programmers have appropriate convenience
            • But each executable file has a complete copy of the collection, wasting disk space (on a typical system, libc.a is about 8MB and libm.a is about 1MB)
            • If you want to modify a function, the source file linked to the old module will not change, so you need to recompile.
          • Create a separate relocatable file for each standard function
            • Require application programmers to explicitly link, error-prone and time-consuming
      • [Archive] A set of connected relocatable target files, the header describes the size and location of each member's target file. AR tools can be created under Unix:
      • unix> gcc -c addvec.c multvec.c # -c : compile only
        unix> ar rcs libvector.a addvec.o multvec.o

        unix> gcc -O2 -c main.c
        unix> gcc -static -o p2 main2.o ./libvector.a # -static : complete link ?.o
      • Use static libraries to resolve references
        • Maintain a set of relocatable files E; unresolved symbols U; the set of symbols D defined in the previous input file.
        • For each input file f on the command line, the linker judges
          • f is an object file, f is added to E, and U and D are modified to reflect the symbol definition and reference in f)
          • f is an archive file, try to match the unresolved symbols in U and the symbols defined by the archive file members. If an archive file member m defines a symbol to resolve a reference in U, then add m to E to modify U and D.
          • After scanning the file, if U is not empty, you will get a "symbol undefined" link error (how to be verified); if there are multiple defined global symbols in D, it will be judged according to the strong and weak symbols. A strong symbol will report a link error of "multiple definitions" (how to be verified)
        • In view of the above considerations, we must pay great attention to the dependency requirements at compile time (from right to left for dependency increase?). If two static libraries depend on each other, such as libx.a and liby.a, you need to write "libx.a liby.a libx.a" in the command, that is, rewrite libx.a.
    • Relocation: merge input modules and assign runtime addresses to each symbol
      • Relocation section and symbol definition: merge sections of the same type. For example, the .data section is merged into one section and becomes the .data section of the output executable object file.
      • Symbol references in relocation sections: Modify symbol references in code sections and data sections to point to the correct runtime address. Depends on "relocation entry".
      •  
  5. [Executable object file (p)]

    • Compare with relocatable files
    • The loader (resident in memory) runs the executable object file. The code segment always starts at address 0x08048000, and the data segment is the address of the next 4KB pair (considering the 4KB page size).

  6. [Share target file, dynamic link shared library (Unix, * .so; Microsoft, * dll)] 

    • Defects of static libraries: Static libraries need to be maintained and updated regularly, and application programmers need to show that programs and libraries are re-linked; basically every C program will use standard I / O and will be copied into the text segment of each process Storage waste.

    • shared

      • It differs in two ways:
        • In any given file system, there is only one .so file for a library. All references to this library share the code and data of this .so file.
        • Only one copy of the .text section of a shared library in memory can be shared by different running processes.

      • -fPIC instructs to generate position-independent code. -shared instructs the linker to create a shared object file.
      • unix> gcc -o p2 main2.c ./libvector.so
        • When creating an executable file, some links are statically executed; at runtime, the linking process is completed dynamically
        • At the time of creation, the code and data sections without .so were actually copied into the executable file p2; instead, some relocation and symbol table information was copied, and the runtime can resolve references to the code and data in .so. 

    • Load and link shared libraries from applications
      • The previous content is the situation where the dynamic linker loads and links the shared library before the application is executed (when loading).
      • Let us now discuss the runtime scenario.
      • Realistic example
        • Distribute software: Microsoft Windows application developers use shared libraries to distribute software updates.
        • Build a high-performance Web server: The Web server generates dynamic content.
  7. [PIC] No more details
  8. 【GNU binutils】
    • I checked GNU and found it very interesting. The name is "GNU's Not Unix".
    •  

       

       

 

 

Guess you like

Origin www.cnblogs.com/zhouys96/p/12725942.html