Program Environment and Preprocessing: What does the compiler do? How does the program work?

Table of contents

The execution environment of the program:

Translation environment:

Step by step:

Compile and link:

 What is a symbol table?

Link:

Detailed preprocessing

predefined symbols

#define define identifier

#define Defining macros

 #define Substitution Rules

 #and##

​edit

 ## role

Macro arguments with side effects

Macros vs. Functions

advantage:

 shortcoming:

naming convention

#undef

command line definition

conditional compilation

file contains

In detail:

The nested file contains


The execution environment of the program:

In any implementation of ANSI C, there are two distinct environments.
The first is the translation environment, where source code is converted into executable machine instructions.
The second is the execution environment, which is used to actually execute the code.

The compiler can actually be summarized as a translator. The high-level language we input can only become a language that the machine can understand after being processed by the compiler, and the machine will help us calculate the answer we want, just like we close our eyes The "Hello World" that can be written can be written, and the compiler has to do a lot of work. The next step is a brief description of the process of the compiler "communicating" with the machine. The content itself actually has a lot of details. If If I have a chance, I will try to organize a more complete process. Here we only need to know that the compiler has made some declarations.

Recommended reference book: "Programmer's Self-cultivation"

Translation environment:

The process of the program generating the exe executable file is roughly shown in the figure below

The source file generates the target file through the compilation process, and the target file is linked and bundled with the functions in the library or the functions implemented by the programmer through the linker to generate an executable file.

Step by step:

Each source file .c will pass through the compiler and then generate an object file with the suffix .obj, and the object file and link library will generate an executable program after passing through the linker

The link library is the function of the header file provided in addition to some functions implemented by ourselves, etc. The above is a general overview, more details follow.

Compile and link:

 The compilation process can be divided into the following 3 steps step by step.

1. Precompilation, preprocessing: We borrowed Linux to observe and found that a file with the suffix .i was generated during the precompilation process. We found that in this .i file, the entire stdio.h was included. head File

 

After trying to define the symbol, it is found that the defined symbol in the .i file is replaced, the defined symbol is also deleted, and the comment is also deleted.

To sum up, in the whole preprocessing link, it is almost basically a text operation

2. compile

In this link, all the .i files that have been processed by the text are processed into .s files filled with assembly language. In this file, they are all assembly language. In summary, the C language is converted into assembly code, and the syntax Analysis, lexical analysis, symbol summary, semantic analysis, these four actions

3. Compilation

In the process of assembly, a .o file is generated. Open it and find that the assembly code is converted into binary instructions during this process, and a symbol table will be formed.

 What is a symbol table?

Each symbol, such as main and Add function, will be assigned an address to form a table, and this table will take effect during the linking process.

When we need to use certain functions, these symbol tables are equivalent to the address of the function, and the linking stage will look for the symbol of this function, and this symbol has the address of this function, so that the connection of functions between files is successfully established. And because the link is essentially a hodgepodge, there will be no problem if the declaration and definition of the function are not in the same file. When the hodgepodge is done, the function symbol can be found directly.

Link:

1. Merge segment table

After the .o file is generated, there are various segments inside, such as the code segment for storing machine instructions, the data segment for storing variables , the BSS segment for storing uninitialized variables, and various other segments, which are not excessive here Expand.

2. Merge symbol table and relocation (this process is simply to select a valid address application, for example, there are two source files, one Add function, the other Add and main function, we will generate the symbol table stage Give functions a symbol, which is almost an address.

In this case, although we have declared the Add function in another file, according to the process we have learned before, only the symbols, expressions, etc. inside the respective source files will be sorted out and optimized during the compilation process. , and then integrate the various parts in the linking phase. At this time, during the linking process, the compiler finds the symbol of Add and calls its address to complete the linking between source files.

#pragma pack once used in the header file, the function is to prevent repeated application of the header file

Detailed preprocessing

There are still many things we can do in the preprocessing link mentioned above, such as defining a macro or identifier and other things. The next step is to explain in detail the things that are replaced in the preprocessing stage, such as macros. describe.

predefined symbols

__FILE__ //进行编译的源文件
__LINE__ //文件当前的行号
__DATE__ //文件被编译的日期
__TIME__ //文件被编译的时间
__STDC__ //如果编译器遵循ANSI C,其值为1,否则未定义

These symbols are built-in in the c language, we try to print these symbols Kangkang

 By borrowing these symbols, we can easily get the information of the current file.

#define define identifier

语法:
#define name stuff

 The functions provided by #define can easily help us do some tedious work, as follows:

#define MAX 1000
#define reg register         //为 register这个关键字,创建一个简短的名字
#define do_forever for(;;)   //用更形象的符号来替换一种实现
#define CASE break;case      //在写case语句的时候自动把 break写上。

// 如果定义的 stuff过长,可以分成几行写,除了最后一行外,每行的后面都加一个反斜杠(续行符)。
#define DEBUG_PRINT printf("file:%s\tline:%d\t \
date:%s\ttime:%s\n" ,\
__FILE__,__LINE__ , \
__DATE__,__TIME__ )

Note that it is best not to add a semicolon after the created #define stuff, because this will cause ambiguity.

#define MAX 1000;
#define MAX 1000

f(condition)
max = MAX;
else
max = 0;

This will cause a syntax error.

#define Defining macros

 The #define mechanism includes a provision that allows parameters to be substituted into the text, this implementation is often called a macro (macro) or define macro (define macro)

 The definition of a macro is actually more like a text replacement. During the precompilation process, all defined macros will directly replace the text of the corresponding name. The replaced text can be a parameter or a string of expressions.

Here is how the macro is declared:

#define name( parament-list ) stuff

The parament-list is a comma-separated list of symbols that may appear in stuff. Note: The left parenthesis of the parameter list must be adjacent to name. If any whitespace exists between the two, the argument list is interpreted as part of the stuff

 For example, if we want to create a macro that directly replaces the addition function, we can create it as follows:

#define name( parament-list ) stuff

#define ADD(x,y) ((x) + (y))

 We found that it does realize the effect of the addition function, and it is also very similar to the parameter passing of the function, so we may have some questions , why do we need to add so many parentheses? Also, can macros completely replace functions in this way?

Let's discuss the first question first, why so many parentheses are added, and then compare it with functions to discuss the second question.

   Let's look at an example to understand its rules:

#define SQUARE( x ) x * x

   This macro takes one argument x .

   If after the above statement, we put

SQUARE( 5 );

   Placed in a program, the preprocessor will replace the above expression with the following expression:

5 * 5

   There is a problem with this macro:
   observe the following code snippet:

int a = 5;
printf("%d\n" ,SQUARE( a + 1) );

   At first glance, we might think that it will be replaced by 6*6 to get 36, but it actually outputs 11.

   Why?

  When replacing text, the parameter x is replaced with a + 1, so this statement actually becomes:
   printf ("%d\n", a + 1 * a + 1);

   replacement is not substitution, according to our usual The math calculation logic should be modified like this:

  

 #define SQUARE(x) (x) * (x)

   In this way, the expression will be replaced like this:

printf ("%d\n",(a + 1) * (a + 1) );

   But adding parentheses like this is still difficult to avoid some other calculation problems, such as when we try to replace other expressions

For example the following:

int a = 5;
printf("%d\n" ,10 * DOUBLE(a));

  We replace the expression into it, and it becomes the following formula:

printf ("%d\n",10 * (5) + (5));

In this way, the calculated results are different.

Change it to something like this:

#define DOUBLE( x) ( ( x ) + ( x ) )

Therefore, in order to ensure that the calculation order is not disturbed by operations such as operators, macro definitions used to evaluate numerical expressions should be parenthesized in this way
 

 #define Substitution Rules

There are several steps involved when expanding #defines to define symbols and macros in a program.
1. When calling a macro, the parameters are first checked to see if they contain any symbols defined by #define. If yes, they
are replaced first.
2. The replacement text is then inserted in the program in place of the original text. For macros, parameter names are replaced by their values.
3. Finally, the resulting file is scanned again to see if it contains any symbols defined by #define. If so, repeat
the above process.
Note:
1. Other symbols defined by #define can appear in macro parameters and #define definitions. But with macros, recursion cannot occur.
2. When the preprocessor searches for symbols defined by #define, the content of the string constant is not searched.
 

To sum up: the macro will not replace the text with the same name in the string constant, the macro cannot implement recursion, and the internal parameter value of the macro will be replaced after the replacement is completed. Here, we also answered some of the above questions about whether macros can replace functions. Obviously, the recursive function of functions cannot be replaced by macros.

Examples where the contents of string constants are not searched:

 #and##

 Since the expressions in the string will not be replaced, how do we use macros to replace the parameters in the string?

How to insert parameters into the string?

char* p = "hello ""bit\n";
printf("hello"" bit\n");
printf("%s", p);

We execute the above code and find that "" will be directly skipped, and the strings before and after will be directly spliced

 Then we can modify the macro, where the function of # is to directly convert the symbol into the corresponding string without any replacement

int i = 10;
#define PRINT(FORMAT, VALUE)\
printf("the value of " #VALUE "is "FORMAT "\n", VALUE);
...
PRINT("%d", i+3);//产生了什么效果?

 ## role

## You can combine the symbols on both sides of it into one symbol.
It allows macro definitions to create identifiers from separated text fragments.

#define ADD_TO_SUM(num, value) \
sum##num += value;
...
ADD_TO_SUM(5, 10);//作用是:给sum5增加10.

Such a connection must result in a valid identifier. Otherwise the result is undefined

Macro arguments with side effects

 Due to the substitutability of the macro itself, we should avoid using some operators or expressions with value changes such as:

#define ADD x++;
#define DOUBLE x*=2;

Such expressions will have some side effects, because the original value of x is changed.

Macros vs. Functions

 Let's go back to the question raised above again, what is the difference between a macro and a function besides recursion? Are macros better than functions in some ways?

We generally use macros to implement simple expressions because of the following advantages and disadvantages:

advantage:

1. Macro is a text replacement of the precompilation process, so unlike functions, it does not create additional stack frames, and functions take time to return values, so macros are superior to functions in terms of speed and scale.

2. More importantly, the parameters of the function must be declared as specific types.
So functions can only be used on expressions of the appropriate type. On the contrary, how can this macro be applicable to types such as integers, long integers, and floating-point types that can be
used for > to compare.
Macros are type independent. This means that it can adapt to more situations when variables of different types
 

 shortcoming:

1. If the macro we define is relatively long, the program text after replacement will also become longer accordingly.

2. There is no way to debug macros. Success is also replaced, and failure is also replaced. Macros are different from functions. We can easily enter the function to observe its runtime effect when debugging. Macros cannot be seen at all, because the expressions are directly replaced. .

3. Priority issues. As mentioned above, it is very troublesome to control the priority of expressions, and a lot of parentheses need to be added, which is very painful.

4. Macros are type-independent, so they are not rigorous enough.

Macros can sometimes do things that functions cannot. For example: macro parameters can have types , but functions cannot

#define MALLOC(num, type)\
(type *)malloc(num * sizeof(type))
...
//使用
MALLOC(10, int);//类型作为参数
//预处理器替换之后:
(int *)malloc(10 * sizeof(int));

Directly replacing the type can easily help us dynamically open up memory, which is also a great advantage of macros.

To sum up: macros can crush functions in terms of speed and size when dealing with small problems, such as simple addition operations, but once they are complicated, they are very troublesome and difficult to debug and modify. wrong type. But its substitution of types also makes it flexible.

Therefore, we prefer to use macros to do some small tasks, and leave the big ones to functions.

Attributes #defineDefine macros function
code length Macro code is inserted into the program each time it is used. Except for very
small macros, the length of the program can grow substantially
Function code appears in only one place; every
time the function is used,
the same code in that place is called
execution speed faster There is additional overhead for function calls and returns
, so it is relatively slow
operator precedence The evaluation of macro parameters is in the context of all surrounding expressions.
Unless parentheses are added, the priority of adjacent operators may produce
unpredictable results, so it is recommended to use more parentheses when writing macros
.
A function parameter is evaluated only once when the function is called
, and its resulting value is passed to the function
. The result of evaluating an expression is more predictable
.
Parameters with side effects Parameters may be substituted in multiple places within the macro body, so
parameter evaluation with side effects may produce unpredictable results.
Function parameters are only evaluated once when passed
, and the result is easier to control.
Parameter Type The parameters of the macro have nothing to do with the type, as long as the operation on the parameters is legal,
it can be used for any parameter type.
The parameters of a function are related to the type. If
the types of the parameters are different, different functions are required
, even if the tasks they perform are
different.
debugging Macros are inconvenient to debug Functions can be debugged statement by statement
recursion Macros cannot be recursive functions can be recursive

naming convention

In general, the usage syntax of function macros is very similar. So language itself cannot help us distinguish between the two.
Then one of our usual habits is:
capitalize all macro names
and not all capitalize function names

#undef

 This command is used to remove a macro definition, which is actually more like definition erasure.

#undef NAME
//如果现存的一个名字需要被重新定义,那么它的旧名字首先要被移除

command line definition

 Many C compilers support this function. Simply put, we don't need to define a symbol with #deifne outside the main function, and can directly define it in the command line just like defining a variable.

#include <stdio.h>
int main()
{
    int array [ARRAY_SIZE];
    int i = 0;

    for(i = 0; i< ARRAY_SIZE; i ++)
    {
        array[i] = i;
    }

    for(i = 0; i< ARRAY_SIZE; i ++)
    {
        printf("%d " ,array[i]);
    }
    printf("\n" );

    return 0;
}

In the LInux environment, we can directly assign a value to the symbol ARRAY_SIZE before compiling, and then output

conditional compilation

We sometimes encounter some very painful situations. For example, we want to see the status of each iteration when the array is assigned internally. It is a bit troublesome to debug. It is better to use printf, but it must be deleted when it is used up. If there is a problem after it is finished, it will be used again. It is a pity to delete it, but it will get in the way if it is kept. We might as well use conditional compilation to selectively compile a certain piece of code.

#include <stdio.h>
#define __DEBUG__
int main()
{
	int i = 0;
	int arr[10] = { 0 };
	for (i = 0; i < 10; i++)
	{
		arr[i] = i; 
	#ifdef __DEBUG__
	printf("%d\n", arr[i]);//为了观察数组是否赋值成功。
	#endif 
	}
	return 0;
}

 At this time, DEBUG is defined, and printf takes effect.

What happens if we comment out the definition of DEBUG?

 Without printing, VS is also very smart to gray out printf directly. In this way we have achieved selective compilation.

There are some other fancy conditional compilation directives, listed below:

Common conditional compilation directives:

1.
#if 常量表达式
//...
#endif

Its ontological logic is exactly the same as that of the if statement. If the constant expression is true, the content from if to endif will be executed, and if the expression is false, it will not be executed.

2.多个分支的条件编译
#if 常量表达式
//...
#elif 常量表达式
//...
#else
//...
#endif

 Of course, since there is an if statement, there will be an else if whose logic is the same, so I won't demonstrate too much here.

3.判断是否被定义
#if defined(symbol)
#ifdef symbol
#if !defined(symbol)
#ifndef symbol

This is the example cited above.

Correspondingly, conditional compilation also supports nesting.

4.嵌套指令
#if defined(OS_UNIX)//如果OS_UNIX是被定义的,那么执行以下指令
	#ifdef OPTION1
		unix_version_option1();
	#endif
	#ifdef OPTION2
		unix_version_option2();
	#endif
#elif defined(OS_MSDOS)
	#ifdef OPTION2
		msdos_version_option2();
	#endif
#endif

file contains

 The effect of #include is very easy to understand. It is deleted in the pre-compilation stage, and then finds the corresponding file name or library name to replace and expand. If a source file is included 10 times, it will be compiled 10 times. .

In detail:

The inclusion of local files: #include "name" is different from the inclusion of header files. The inclusion of local files uses double quotes instead of <>.

Search strategy : First search in the directory where the source file is located. If the header file is not found, the compiler will search for the header file in the standard location just like searching for the library function header file.
If not found, a compilation error will be prompted.

The path to the standard header files for the Linux environment:

/usr/include

The path to the standard header file for the VS environment:

C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include
//这是VS2013的默认路径

The library files contain:

#include <filename.h>

To find the header file, go directly to the standard path to find it. If it cannot find it, it will prompt a compilation error.
In this way, can it be said that the form of "" can also be used for library files?
The answer is yes, you can .
But the efficiency of searching in this way is lower. Of course, it is not easy to distinguish whether it is a library file or a local file.

The nested file contains

 If we are writing a relatively large project, there are many source files and header files, and the logic of the whole project is a bit complicated so that it becomes the logic shown in the figure below, we may face a problem, one file is repeated called multiple times

 In order to successfully call the functions or parameters inside comm.h, test1.c and test.h need to include comm.h once

The test.c that needs to call test1.h needs to include test1 once because it needs to use the internal parameters of test1.h, but some parameter variables inside test1.h may need the parameters in comm.h, resulting in repeated calls.

how to solve this problem?
Answer: conditional compilation.
 

#ifndef __TEST_H__
#define __TEST_H__
//头文件的内容
#endif //__TEST_H__

or:

#pragma once

You can avoid repeated introduction of header files.
So far, the overview is over, I hope it will be of some help to you! Thanks for reading!

 

Guess you like

Origin blog.csdn.net/m0_53607711/article/details/126957124