C ++ compiler and linker process description

Detailed link
Some people write C / C ++ (hereinafter assumed to be C ++) program, for unresolved external link or duplicated external simbol error message at a loss (because of this error message can not locate a row). Some parts of the language or do not know why (or not) such as design. Knowing this article, there may be some answers.
    First, look at how we are to write a program. If you are using some kind of IDE (Visual Studio, Elicpse, Dev C ++ , etc.), you probably will not find out how the program is organized (many people were opposed to beginners IDE). Because the use of IDE, what you do is a series of new .cpp and .h files in a project, after you've written click "Compile" in the menu, everything will be fine. But in fact the past, programmers write programs is not the case. They first have to open an editor, like writing the same text file written code, and then knock on the command line under
    CC 1.cpp -o 1.o
    CC 2.cpp -o 2.O
    CC 3.cpp -o 3. o
here cc behalf of a C / C ++ compiler, closely followed to compile cpp file, and the file you want to specify -o output (please forgive me for not using any popular compiler as an example). In this way the current directory will appear:
    1.o 2.O 3.O
Finally, the programmer must type
    link 1.o 2.o 3.o -o a.out
to generate the final executable file a.out. Now the IDE, in fact, follow the same steps with this, but everything is automated.
    Let us analyze the above process, see what we can find.
    First, the source code is compiled, is performed for each individual cpp files. For each compilation, if you exclude include situations other cpp files (this is written in C ++ code extremely wrong wording), the compiler only knows that a cpp file that is currently to be compiled in the cpp file, on the other cpp file the existence totally unaware.
    Secondly, after each compilation cpp files, .o files generated is to be a link (link) that read, in order to ultimately generate an executable file.
    Well, after these perceptions have, let us take a look at C / C ++ programs are organized.
    
    You must first know a few concepts:
    the compiler: the compiler to compile the source code, will be translated text exists in the form of source code into machine language form of the process of the target file.
    Compilation unit: For C ++, every cpp a file is a compilation unit. As can be seen from the presentation before the compilation process, compiler between the respective units are unknown to each other.
    Object file: file generated by the compiler in the form of machine code in the compilation unit contains all the code and data, and other information.
    
    Here we look at the specific compilation. We skip the grammar analysis, directly to generate the object file. Suppose we have a file 1.cpp
     int. 1 = n-;

    void F ()
     {
        ++ n-;
    }

    its 1.o compiled object files will have a region (assuming binary segment name), it contains more than data / functions, including n, f, is given in the form of a file is probably offset:
    offset content-length
    0x000 n 4
    0x004 f ??
    Note: This is just a guess, do not represent actual layout of the target file. Each data object file is not necessarily continuous, not necessarily in that order, of course, does not necessarily 0x000 from the start.
    Now we look at the content 0x004 beginning f function (guess at 0x86 platform):
    0x004 INC. DWORD PTR [0x000]
    ? 0x00 RET
    Note n ++ has been translated as: inc DWORD PTR [0x000], that is, to this unit 0x000 a DWORD (4 bytes) plus 1 at the position.
    
    If there is another following 2.cpp, as
    extern int n-;
    void G ()
    {
        ++ n-;
    }
    then segment its binary object file should be 2.o the
    offset into the content length
    0x000 g ??
    Why is there no space n (n is the definition of), because n is declared as extern, show n are as defined in another compilation unit inside. Do not forget to compile time it is impossible to know the situation of other compilation unit, so the compiler does not know exactly where n, so this time the binary code g's no way to fill inc DWORD PTR [???] in ? ? ? section. How to do it? This work can only be handed over later to deal with the linker. In order to let the linker know which parts of the address is not completed, the target file also has a "unresolved symbol table" that is unresolved symbol table. Also, provide a definition of the n target file (that is, 1.o ) also offers a "export symbol table", export symbol table, to tell the linker which addresses they can provide.
    Let's haircut ideas: Now we know that each target file, in addition to its own data and binary code, should also provide at least two tables: unresolved symbol table and export the symbol table, respectively, to tell the linker what they need and what can be provided. The following question is how to establish a correspondence between two tables. Here there is a new concept: a symbol. In the C / C ++, each variable and function has its own symbol. For example, the variable n is the symbol "n". Sign function is more complicated, it requires a combination of function names and their parameters and calling conventions, etc., to get a unique string. f is the symbol may be "_f" (may vary depending on the compiler).
    So, 1.o exported symbol table is
    symbolic addresses
    the n-0x000
    _f 0x004
    without solving symbol table is empty
    2.o exported symbol table for the
    symbolic addresses
    _g 0x000
    unresolved symbol table for the
    symbolic address    
    n-0x001    
    herein 0x001 to 0x000 inc DWORD PTR starting from [???] start address stored in binary-coded ??? (here assumed inc machine code bytes 2-5 to absolute address +1 You need to know the exact circumstances may check the manual). This tells the linker table, an address of the present position on 0x001 coding unit, the address value is unknown, but has a sign n.
    Link time, the linker found in 2.o in the unresolved symbol n, then find all the compilation units, we found the export symbol n in 1.o, the linker will fill in the 0x000 to address n 0x001 position of 2.o on.
    "Paused," you probably would incite blame me. If this is done, is it not content becomes g inc DWORD PTR [0x000], as previously understood that it is the 4-byte address 0x000 cells plus 1 of the present, instead of the corresponding 1.o plus 1 position. Yes, because the address of each compilation unit is zero-based, so the final stitching together when the address will be repeated. Therefore, the linker will be adjusted to address the various units at the time of stitching. This example, assume 2.o address 0x00000000 0x00001000 is located on the executable file, and 1.o address of 0x00000000 0x00002000 is positioned on the executable file, it is actually the linker, 1.o export the symbol table is actually
    symbolic addresses
    the n-0x000 + 0x2000
    _f 0x004 + 0x2000
    without solving symbol table is empty
    2.o exported symbol table for the
    symbolic addresses
    _g 0x000 + 0x1000
    unresolved symbol table for the
    symbolic address            
    n 0x001 + 0x1000
so the final code becomes g inc DWORD PTR [0x000 + 0x2000] .
    Finally, there is a loophole, since last address becomes 0x2000 n, then f previous code inc DWORD PTR [0x000] is wrong of. So the target file should give this a table, called the address redirection table address redirect table.
    For 1.o, its redirection table to
    address
    0x005
    This table does not need to sign, when the linker handle this table, find the address to an address on the need to redirect location 0x005, 0x005 then begin to direct 0x2000 plus four bytes on it.
    Let us sum up: a cpp compiler to compile the target file, in addition to data to be written and the code in cpp contained in the object file, but also to provide at least three tables: unresolved symbol table, symbol table and export address redirection table.
    Unresolved symbol table provides all addresses referenced but not defined in this symbol in its compilation unit that appears in the compilation unit.
    This table provides export symbols with a defined coding unit, and is willing to provide the address and other symbols used in the compilation unit.
    Address redirection table provides all records its own address cited in this compilation unit.
    When the linker link, first determine the location of each target file in the final executable file. Then access all object files address redirection table, wherein the address record to redirect (i.e., the coding unit plus the actual start address of the executable file). Then traverse all the object files unresolved symbol table, and find matching symbols in the symbol table all export and fill in the actual address of the location unresolved symbol table recorded on (also owns the plus symbol definition the actual compilation unit start address of the executable file). Finally, the contents of all of the object files written in their respective positions, and then do some other work, an executable file is baked.
    The final link 1.o 2.o .... the resulting executable file is probably
    0x00000000 ???? (some other information)
    ....
    0x00001000 INC. DWORD PTR [0x00002000] // Here is the 2.o start, g is defined
    0x00001005 ret // assumed inc is 5 bytes, where g is the end of
    ....
    0x00002000 0x00000001 // this is the beginning of 1.o, n is defined (initialized. 1)
    0x00002004 inc DWORD PTR [0x00002000] // this is the beginning of f
    0x00002009 ret // assumed inc is 5 bytes, where f is the end of
    ...
    ...
    When the actual link is more complex, because the actual target file data / code is divided into several zones, and redirection Yaoan zone, but the principle is the same.


    
    Now we can look at some of the classic link error:
    unresolved External Link ..
    this is very clear that the linker found is an unresolved symbol, but did not find the corresponding entry in the export symbol table.
    What solution, of course, is to provide a definition of this symbol in a compilation unit on the line. (Note that this symbol can be a variable, it can be a function), you can also see if there is any link to the linked file is not
    duplicated external simbols ...
    this is exported symbol table introduces duplicates, so The linker can not determine which one should be used. This may be the use of duplicate names, there may be other reasons.


    We look at the C / C ++ language for a number of properties which provided:
    extern: This tells the compiler that this symbol is defined in another compilation unit, the symbol is put into the unresolved symbol table to go. (External link)
    
    static: If the keyword in front of the declaration of global functions or variables, indicating that the compilation unit does not export this symbol function / variable. Therefore can not be used in another compilation unit. (Internal links). If a static local variable, the variable storage and global variables, but still does not export symbols.
    
    Default link properties: For functions and variables, mold recognizes the external links, for const variables, default internal links. (Link properties may be changed by adding extern and static)

    the pros and cons of external links: external links symbols may be used for the entire program range (since the derived symbol). But also asked other compilation units can not export the same symbol (or is duplicated external simbols)

    Pros and cons of internal links: internal links symbols can not be used in another compilation unit. But different compilation units can have symbolic links inside the same name.

    Why header files generally can not declare a definition: header files can be multiple compilation units contain, if the header file is defined, then each include the header files for compilation unit on the same symbol will be defined, if the symbol for the external links will result duplicated external simbols. So if you want to define the header file, you must ensure that the symbol definition can only have internal links.

    The default is why the constant internal links, rather than variable:
        This is to be able as const int n = 0 defined constants such in the header file. Because of the constant is read-only, even if each compilation unit has a defined does not matter. If a definition in the header file variable has internal links, so if multiple compilation units appear define this variable, a compilation unit in which the variable to be modified, without affecting other cells of the same variable, will produce unexpected as a result of.

    Why is the default function External links:
        Although the function is read-only, however, and variables, functions at the time of writing code is very easy to change the default if the function has an internal link, people will tend to function definitions in the header file, then Once the function is modified, all the header files contain a compilation unit must be recompiled. Further, in the definition of static local variables will also be defined in the header file.

    Why not static class variable situ Initialization: Initialization is similar to the so-called in situ a case:
        class A
        {
            static char MSG [] = "AHA";
        };
The reason is not permitted to do so, usually because the class declaration in the header file, if allowed to do so, in fact, is equivalent to a non-const variables defined in the header file.

    In C ++, a const object header file defines what will happen:
        generally do not happen, and C in this definition const int in the header file, as each contains a compilation unit will be defined in this header file of the object. But because the object is const, so little effect. But: There are two kinds of cases could undermine this situation:
        1. Uniqueness If the address relates to fetch const objects and relies on the address, then the different compilation units is taken, and the address may be different. (But rarely do)
        2. If this object has a variable mutable, a compilation unit to edit it, the same will not affect other compilation units.

    Why can not static const class initialization place:
        because it is equivalent to a const object defined in the header file. As an exception, int / char, etc. can be initialized in place, because these variables can be optimized for immediate direct it and macros.

    Inline functions:
        C ++ inline functions in a similar due to the macro, so do not Link Properties problems.

    Why public use inline functions to be defined in the header file:
        Because I do not know each other between the compile-time compilation unit, if the inline function is defined in the .cpp file, compile another compilation unit uses this function when there is no way to find defined functions, and therefore can not function expansion. So if an inline function defined in the .cpp file, then only the cpp file can use this function.

    Within the header file associated function is denied what would happen:
        If defined in the header file inline function is denied, then the compiler will automatically define the function of each of the header files contains a compilation unit in the symbols and are not exported.

    If rejected in the definition of inline function static local variable, this variable is defined to where:
        early compiler defines a compilation unit in each, and thus produce erroneous results, the newer compiler solve this problem, means unknown.

    Why nobody realized export keyword:
        export requires cross compiler to compile the unit to find the function definition, so that the compiler is very difficult to achieve.

Reproduced in: https: //www.cnblogs.com/hongfenglee/archive/2012/02/18/2356808.html

Guess you like

Origin blog.csdn.net/weixin_34221276/article/details/94102410