On the C ++ compiler theory ------ C ++ compiler and linker works

Original: https: //blog.csdn.net/zyh821351004/article/details/46425823

 

First:     
The first is the pre-compiled, this step can be roughly thought only one thing, that is, "macro expansion", which is a kind of expanded command of those who # ***.

      For example, define MAX 1000 is to establish a peer relationship between the MAX 1000 and, fortunately compilation stage to be replaced.

      E.g. ifdef / ifndef it is selective single out some qualifying code to the next stages of the process from a compilation file. There is the most sophisticated way to include, in fact, very simple, which is equivalent to the corresponding file to replace the contents inside all of a sudden this place to include *** statements.

      Followed by the compilation, this step is very important, are compiled as a separate file, a file unit will compile a target file. (Insert here a little explanation about the file compiled compiler to compile the file to identify whether by extension, therefore ".h" header file these policies will be, and ".cpp" all source files must be compiled, I We experimented with the extension .h file changed .cpp, and then in place include a corresponding change ***. cpp, this way, the compiler will compile a lot of unnecessary header files, but the first document we usually hold only a statement rather than a definition, so the size of the last link the resulting executable file will not change)

      Compiler is clearly a separate file for the unit, which is very important, so the compiler is only responsible for those things this unit, while the external things that these policies would, in this step, we can call a function without having to out of this function definition, but to get this statement function (in fact, this is the essence of include, not just to give you a statement in advance and so you use it before calling? As for the function in the end is how to achieve, requiring in this step entry address link function is therefore to find ways to provide declaration is to include a statement on the take over another file, it can be before calling himself wrote a void max (int, int); will do.), compile phase that remains is to analyze the correctness of grammar work like a. Well, to sum up, it can be considered a rough compilation phase in two steps:    

        The first step, test whether they function or variable declaration exists;

        The second step, to check whether the statement C ++ syntax.

The final step is the link, it will all compiled units all linked files as a whole, in fact, this step can be a process of "wired" than to make, such as A file with the function B file, so this step link this association will be established. The most important link I think is to check the global space which is not defined or missing duplicate definitions. This also explains why we generally do not define header files appear, because the header file is likely to be released into multiple source files, compile each source file will be alone, you will find links to the global space when there are multiple defined .

Standard C and C ++ compiler will process definition nine stages (Phases of Translation):

1. Character Map (Character Mapping)

    Physical source file characters are mapped to the source character set, which comprises a three-character operators replace, replacement control character (a carriage return linefeed end of the line) is. Many non-US keyboard does not support some of the basic source character set of characters available in the three-character file instead of these basic source character to ?? as leader. But if the US keyboard is a keyboard, some compilers may not find and replace the three-character, need to increase -trigraphs compile parameters. In C ++ programs, any character not in the basic source character set are replaced its universal character name.

2. Run the merger (Line Splicing)

    Backslash / end of the line and the next line of its merger.

3. The tokenization (Tokenization)

    Each annotation is replaced by a single null character. C ++ operator dual character is identified as a marker (to develop more readable programs, C ++ developer non-ASCII code defines a two-character set and the operator sets the new reserved words). Source code is parsed into pre-labeled.

4. Pretreatment (Preprocessing These)

    Call extension preprocessor directives and macros. #Include files instructions contained, repeat steps 1-4. The four stages collectively referred to as preprocessing stage.

The character set mapping (Character-set Mapping)

    Source character set members, the escape sequence is converted to equivalent members execution character set. For example: '/ a' in the ASCII environment will be converted into a byte value, value of 7.

6. The string concatenation (String Concatenation)

    Adjacent strings are connected. For example: "" "hahaha" "huohuohuo" will become "hahahahuohuohuo".

7. Translation (Translation)

    Syntactic and semantic analysis compiled and translated into object code.

8. Process Template

    Examples of process templates.

9. The connector (Linkage)

    Solve the problem of external references, ready to program images for execution.
The second:

A, C ++ compilation mode
Typically, in a C ++ program that contains only two types of files -. Cpp file and .h files. Wherein, the file .cpp C ++ source file is referred to, which are put in C ++ source code; .h files were referred to as the C ++ header file, which is put in the C ++ source code.
C + + language support "are compiled" (separate compilation). That is, all of the contents of a program can be divided into different sections were placed in different .cpp file. .cpp file where things are relatively independent, does not need to communicate with other documents at the time of compilation (compile), only you need to be compiled into an object file with other object files and then do a link (link) on the line. For example, in the document it is defined in a global a.cpp function "void a () {}" , and the file b.cpp need to call this function. Even so, files and documents b.cpp a.cpp do not need to know each other's existence, but they can be compiled separately, and then compiled into object files linked after the entire program can be run.
This is how to achieve it? Written from the perspective of the program is concerned, it is very simple. B.cpp in the file, the call to "void a ()" function before the first statement about the function "void a ();", on it. This is because the compiler at compile time b.cpp generates a symbol table (symbol table), like "void a ()" symbols not see such a definition, will be stored in this table. When another link, the compiler will be in another object file to find the definition of this symbol. Once found, the program will be successfully generated.
Note the mention of the two concepts, one is the "definition", is a "statement." Simply put, the "definition" is a symbol to describe the finished pieces: it is variable or function that returns what type, what parameters need and so on. The "statement" is just a statement of the existence of this symbol that tells the compiler that this symbol is defined in another file, first with me, when you link to try other places to find it in the end to see what is it. Defined time Yaoan C ++ syntax completely define a symbol (variable or function), and when it was declared only you need to write a prototype of this symbol. Note that, a symbol, the entire program can be declared many times, but will have to be defined and only once. Just think, if a symbol appears two different definitions, compiler who to listen to?
This mechanism for C ++ programmers who brings many benefits, but also leads to a method of writing programs. Consider, if there is a very handy function that "void f () {}" , in many .cpp file of the entire program will be called, then we only need to define this function in a file, while in others the document declared that function on it. Fortunately, a function to deal with, it will declare a word. However, if the function much like a lot of mathematical functions, there are hundreds, then how do? To ensure that every programmer can totally put all the form functions are accurately written down and write out?

Second, what is the first document
Obviously, the answer is impossible. But there is a very simple way, programmers can help eliminating the need to remember so much trouble function prototype: we can declare statement that hundreds of functions are all first written, in a file, programmers need to wait until their time, put all these things into his copy of the source code.
This approach is certainly feasible, but still too much trouble, but also clumsy. Thus, the header will be able to play its role in the. The so-called header files, in fact, the contents of its content with the .cpp file is the same, are C ++ source code. But the header files without being compiled. We all put all the function declarations in a header file, when one .cpp source file need them, they can, through a macro command "#include" be included in this .cpp file, thereby merging their contents to .cpp file. When .cpp file is compiled, to be included into the role of these .h files will play a.
As an example it is assumed that all the mathematical functions are only two: f1 and f2, then we put them in the definition in math.cpp:
/ * * math.cpp /
Double f1 ()
{
// do something here Wallpaper .. ..
return;
}
Double F2 (Double a)
{
// do something here Wallpaper ...
return a * a;
}
/ * End of math.cpp * /
and to declare "" functions in a header file math. h in:
/ * math.h * /

 

f1 Double ();
Double F2 (Double);
/ * End of math.h * /
in another file main.cpp, I want to call these two functions, then you only need to come in the header file that contains:
/ * main. * CPP /
#include "math.h"
main ()
{
int number1 = F1 ();
int number2 = F2 (number1);
}
/ * End of main.cpp * /
in this way, the program is a complete. Note that, .h files do not write after the compiler command, but it is necessary to find a place (such as in a directory with main.cpp) obtained in the compiler. main.cpp and math.cpp can separately compile, generate main.o and math.o, and then link the two object files, the program can run.

Three, # the include
#include is a macro command from the C language, before it is compiled in the compiler that comes into play in the pre-compile time. #include role is to document the contents of the written behind it, finished pieces, word for word included into the current file in the past. It is worth mentioning is that it itself is not any other role and function of vice, its role is to every place it appears, replace the contents of that file written behind it. Simple text replacement, and nothing else. Therefore, the first sentence (#include "math.h") main.cpp file, before compiling the contents will be replaced math.h file. That is when the compilation process will begin, the content has changed main.cpp:
/ * main.cpp * ~ /
F1 Double ();
Double F2 (Double);
main ()
{
int number1 = F1 ();
int number2 = F2 (number1);
}
/ * End of main.cpp ~ * /
not too much, just right. The same can be seen, if we except main.cpp, there are many other .cpp file also used function f1 and f2, then they all just need to write a #include "math before using these two functions. h "on the line.

Fourth, the header file should write anything
through the above discussion, we can understand the role of the header file is to be included into the other .cpp. They compiler itself is not involved, but in fact, their content has been compiled in multiple .cpp files. By "is defined only once" rule, we can easily draw, put the header files should only declare variables and functions, but can not put their definitions. Because the contents of a header file will actually be introduced into a number of different .cpp file, and they will be compiled. Of course the statement put right, if you put the definition, then it is equivalent to appear in more than one file in the definition of a symbol (variable or function), and even if these definitions are the same, but for the compiler to do so illegal.
Therefore, we should keep in mind that, .h header file, declare a variable or function can only exist, but do not put definitions. That is, the form can only be written in the header file: extern int a; and void f (); sentences. These are the statements. If the write int a; or void f () {} such a sentence, then once the header is two or .cpp file included, the compiler error immediately. (About extern, have discussed earlier, there is no longer discuss the differences with the definition of the statement.)
However, there are three exceptions to this rule are.
First, the definition of header files can be written const object. Because the global default is no extern const object declared, so it is only valid in the current file. Such objects are written into the header file, even if it is to contain a number of other .cpp file, this object is also only in that it contains a valid document, other files are not visible, so they will not It will lead to multiple definitions. At the same time, because these .cpp files are included into the object from a header file, which will ensure that the value of these .cpp file of the const object is the same sense. Similarly, the definition of static objects can be placed in the header file.
Second, the file header may be written inline function definition (inline) a. Because the inline function is required where the compiler encounters it in accordance with its definition it inline expansion, rather than as an ordinary function can be declared re-link (inline function is not linked), so the compiler needs to see the full definition of an inline function job at compile time. If an inline function like a normal function can only be defined once, then this thing easy ones. Fortunately, because in a file, I can define an inline function written in the beginning, so you can ensure use of the time can be seen behind the definition; however, if I have to use this function in other documents that how to do it? This is almost no good solution, and therefore the provisions of C ++, inline functions can be defined in the program as many times as inline functions in a .cpp file appears only once, and in all .cpp file, this inline defined function is the same, it will be able to compile. So obviously, the definition of an inline function into a header file is very wise.

Third, the definition of header files can be written class (class) is. Because when you create an object of a class in a program, the compiler only if the definition of the class of fully visible, in order to know how to objects of this class should layout, so the requirements for the class definition, with the inline function is fundamental the same. So the class definition into a header file, use the .cpp file of this class to include this header file, it is a good practice. Here, it is worth mentioning that a class definition contains the data members and function members. Data members is to wait until a specific object will be defined (allocated space) when it is created, but it is a function of the members need to be defined at the outset, and this is achieved we usually refer to the class. Generally, our approach is that the class definition in header files, and the function member implementation code in a .cpp file. It is possible, and it is a good solution. However, there is another way. That is a direct function of the members of the implementation code is also written inside the class definition. In the C ++ class, if the function is defined in the definition of the body member of the class, the compiler will view the inline function. Therefore, the definition of the class definition written into the body of the function members, put together a header file, is legal. Note that, if the members of the defined functions written in the class definition in the header file, but not written into the class definition, which is illegal, because this function at this time is not a member of the inline. Once the header is two or .cpp file contains this function member was redefined.

Fifth, the header file protection measures
Consider, if the header file containing only the declaration, then it is a .cpp file that contains the same again many times no problem - because of the emergence declaration statement is unrestricted. However, the header file discussed above with three exceptions in the header file is a very common use. Then, once a header file appears in any of the above three exceptions, it was again a .cpp contain multiple words, the problem is big. Because the syntax elements in these three exceptions though "can be defined in multiple source files", but "only appear once in a source file." Imagine if ah contains definition of class A, class B bh contain defined, since the class definition of class A B-dependent, so the #include also ah bh. Now there is a source, it also uses the class A and Classes B, then the programmer to both the source file contains came ah, bh also contains the came. At this point, the question came: the definition of class A appears twice in the source file! So the whole program will not compile a. You might think this is a programmer's mistake - he should know bh contains ah-- but in fact he should not know.
Use "#define" with the conditional compilation can solve this problem. In a header file, defined by a #define name, and through conditional compilation #ifndef ... # endif so that the compiler can be defined in terms of whether or not the name, and then decide whether or not to continue in subsequent compile the header file contents. This method is simple, but we must remember to write into the write header files.

 

[Turn] C ++ compiler and linker works

Here did not discuss the "compiler theory" learned in university courses, just write some of my own understanding and views on the principle of the C ++ compiler and linker it to my level, has not yet reached compiler theory to explain ( this is very complicated, almost did not learn to understand the university).

 

To understand a few concepts:

    1, the compiler: the compiler to compile the source file is translated text exists in the form of source files of source code into machine language form of the process of the target file, in the process, a series of compiler syntax checking. If the compiler, it will convert into corresponding CPP OBJ file.

    2, the compiler unit: According to the standard C ++, a CPP each file is a compilation unit. Between each coding unit are independent from each other and unknown.

    3, the target file: the file generated by the compiler to machine code contains all the code and data compilation unit where some of his information, if not resolve the symbol table, symbol table and export address redirection tables, etc. . Target file is present in binary form.

 

    The C ++ standard, a coding unit (Translation Unit) refers to a .cpp file and a .h file that is all, .h files include code which will be expanded to include its .cpp file, then compiler the .cpp file as a .obj file, which has a PE (Portable executable, namely Windows executable) file format, and is itself contained in binary code, but may not be able to perform, because there is no guarantee that there must be the main function . When the compiler to a project where all the .cpp file is compiled in a separated manner, and then by the linker links become a .exe or .dll files.

 

Let us analyze the work process of the compiler:

We skip the parsing, directly to generate the object file, suppose we have a A.cpp file, defined as follows:

    int n = 1;

    void FunA()

    {

        ++n;

    }

 

    It A.obj compiled object files will have a region (or a segment), containing the above data and functions, among them n, FunA, the following may be the case given to the file offset in the form:

    Offset Content Length

    0x0000    n       4

    0x0004 want ??

    Note: This only shows, with the actual layout of the destination file may be different, ?? represents the length of the unknown, each data object files may not be continuous, not necessarily start from 0x0000.

    Content FunA function might look like this:

    0x0004 inc DWORD PTR[0x0000]

    0x00 ?? right

    ++ n time has been translated into inc DWORD PTR [0x0000], that is the present position of a DWORD 0x0000 unit (4 bytes) plus 1.

 

    B.cpp there is another document, is defined as follows:

    extern int n;

    void FunB()

    {

        ++n;

    }

    B.obj corresponding binary it should be:

    Offset Content Length

    0x0000    FunB    ??

    Why there is no space n it, because n is declared as extern, the extern keyword tells the compiler that n is already defined in another compilation unit, in the unit in this, do not define. Since between unrelated coding unit is, the compiler does not know where is n, so there is no way function generated FunB address n, then the function is in this FunB:

    0x0000 inc DWORD PTR[????]

    0x00 ?? right

    then what should we do? This work can only be done by the linker.

    In order to allow the linker does not know the address of which parts of the fill (that is also ????), the target file will have a table to tell the linker, this table is "unresolved symbol table" that is unresolved symbol table. Similarly, providing n object files have a "Export Symbol Table" is exprot symbol table, to tell the linker which addresses they can provide.

 

    Well, here we already know, not only to provide a target file and binary data, but also to provide at least two tables: unresolved symbol table and exporting the symbol table to tell the linker what they need and they can provide some what. So these two tables is how to establish the correspondence between it? Here there is a new concept: a symbol. In C / C ++, each will have its own variables and function symbols, such as symbols of the variable n is n, the sign function is more complex, it is assumed that the symbol FunA _FunA (depending on different compilers).

    and so,

    A.obj export symbol table

    Symbol Address

    n       0x0000

    _FunA 0x0004

    Unresolved symbol is empty (because he does not reference other compilation unit in something).

    B.obj export symbol table

    Symbol Address

    _FunB   0x0000

    Unresolved symbol table

    Symbol Address

    n       0x0001

    This table tells the linker, have an address in this compilation unit 0x0001 location, the address is unknown, but the symbol is n.

    When linked, the link found in B.obj unresolved symbol, it will in all compilation units exported symbol table to look up to this unresolved symbol that matches the symbolic name, if found, to put this symbol fill in the address of the address B.obj unresolved symbols. If none is found, a link error will be reported. In this embodiment, the symbol will be found in A.obj n, n will be the address to fill the B.obj of 0x0001.

 

    However, here there will be a problem, if this is the case, the contents of B.obj function FunB will become inc DWORD PTR [0x000] (n because in A.obj address is 0x0000), since each compiler address 0x0000 units are from the beginning, then eventually it will lead to multiple object files address duplication link. Therefore, the linker will adjust the target address of each file at link time. In this example, if the B.obj 0x0000 is positioned onto 0x00001000 executable file, and the A.OBJ 0x0000 is positioned onto 0x00002000 executable file, then the linker is to be implemented on, the A.OBJ export symbol will add the address 0x00002000, b.obj all symbolic addresses will add 0x00001000. This ensures that the address will not be repeated.

 

    N Since the address will be added 0x00002000, then inc DWORD PTR FunA in [0x0000] is wrong, the target file but also to provide a table, call redirection address table, address redirect table.

 

    in conclusion:

    At least the target file to provide three tables: unresolved symbol table, symbol table and export address redirection table.

    Unresolved symbol table: This section lists the references but there is not a symbol of this unit and its address appears definition.

    Export Symbol Table: This provides a coding unit having a defined, and may provide to the symbol used in other compilation units and address in this unit.

    Address redirection table: This provides a compilation unit records all references to its own address.

 

    Work order linker:

    When the link is linked, first determine the location of each target file in the final executable file. Then access all object files redefinition address table, wherein the address of the redirect recorded (plus an offset, i.e., the coding unit at the start address of the executable file). Then traverse all the object files unresolved symbol table, and find matching symbols in the symbol table all export and unresolved symbol table to fill the position recorded on the realization address. Finally, the contents of all of the object files written in their respective positions, and then do some other work, it generates an executable file.

    Description: When the link will be more complicated to achieve, to achieve the general goal will be to file data code into good, redirect to a zone by zone, but the principle is the same.

    Understand the working principle of the compiler and linker, for some link errors can be easily solved.

 

    Here again look at some of the characteristics C / C ++ provided:

    extern: This tells the compiler that the variable or function defined in another compilation unit, the symbol is put into the unresolved symbol table to go inside (external link).

 

    static: If the keyword in front of a global function or variable declaration, indicating that the compilation unit that does not export a function or variable, because some of this symbol can not be used (internal link) in another compilation unit. If a static local variable, the variable storage and global variables, but still does not export symbols.

 

    The default link properties: For functions and variables, default links are external links, for const variables, the default internal links.

This requires external link symbol in the whole range of programs can all be used in other compilation units can not export the same symbol (or else it will report duplicated external symbols): the pros and cons of external links.

Pros and cons of internal links: internal links symbols can not be used in another compilation unit. But different compilation units can have the same symbol name.

 

    Why header files generally can not declare a definition: header files can be multiple compilation unit contains, if there are defined in the header file, then compile each unit will contain this header file to the same symbol definitions, if the symbol for the external links will result duplicated external symbols link errors.

 

    Why public use inline functions to be defined in the header file: because between the compile-time compilation unit and do not know if inline is defined in the .cpp file, compile another compilation unit uses this function when there is no way to find the function definition, because some functions can not be expanded. So if an inline function defined in the .cpp years, then only the .cpp file can use it.

 

Guess you like

Origin www.cnblogs.com/qiang-upc/p/11409760.html