What has the C program gone through from the compiler?

Compile and run the program

A program needs two environments when it is running, one is the translation environment of the program, and the other is the execution environment of the program, corresponding to the two stages to go through: the compilation stage and the operation stage.
Insert picture description here
Each source file that composes a program is converted into object code through the compilation process.
Each object file is bundled together by the linker to form a single and complete executable program.
The linker will also introduce any function used by the program in the standard C function library, and it can search the programmer's personal library and link the functions it needs to the program.

Program compilation

The compilation of the program is divided into three processes:
Insert picture description here

  1. The preprocessing option gcc -E test.c -o test.i stops after the preprocessing is completed, and the results produced after the preprocessing are all placed in the test.i file
    .
    There are three more steps in preprocessing:
  • Header file expansion
    Insert picture description here
  • Macro substitution, replace the constant defined by the macro into the code.
    Insert picture description hereInsert picture description here
  • Go to the comment and remove the comment part.
    Insert picture description here
    Insert picture description here
  • Conditional compilation, according to the code set in the code, only compile the corresponding part.
    Insert picture description here
    Insert picture description here
  1. The compilation option gcc -S test.c stops after the compilation is complete, and the result is saved in test.s.
    Compile C language into assembly language.
    Insert picture description here
  2. Assemble gcc -c test.c Stop after the assembly is complete, and save the result in test.o.

Interpret assembly language into a binary file that the computer can recognize.
Insert picture description here


At this point, the test.o file still cannot be run directly. It needs to be bound by the link to realize a single executable program.

The program runs

  1. The program must be loaded into memory. In an environment with an operating system: Generally this is done by the operating system. In an independent environment, the loading of the program must
    be manually arranged, or it may be done by putting executable code into read-only memory.
  2. The execution of the program begins. Then call the main function.
  3. Start to execute the program code. At this time, the program will use a runtime stack to store the local variables and return addresses of the function. The program
    can also use static memory at the same time. Variables stored in static memory retain their values ​​throughout the execution of the program.
  4. Terminate the program. Terminate the main function normally; it may also terminate unexpectedly.

What is a link

Linking is for the program to run normally, telling the program the location of the library of the function needed and the location of the data needed to be called, and telling the program in the form of link.

There are two types of links: one is static link, the other is dynamic link

Dynamic link
The function of dynamic link is to associate the program with the library used, so that the program jumps directly to the associated library when it is called.

Static link
Static link is to copy all the libraries required by the program, which can be used at any time when the program is in use.

The advantage of static linking is that it is fast and does not require complicated search. The advantage of dynamic link is that it does not take up too much memory and is not prone to memory corruption.

Link Linux statement: gcc test.o -o mytest (gcc generated link is generated by dynamic link by default)
gcc test.o -o mytest -static (generated statically linked executable file)
The mytest file generated at this time can be run directly.

Pay attention to multi-file linking. All files needed by the program should be linked at the same time, for example: gcc main.o sum.o -o main (at this time main is the executable program)

Detailed preprocessing

Predefined symbols

These predefined symbols are built into the language.

__FILE__      //进行编译的源文件
__LINE__     //文件当前的行号
__DATE__    //文件被编译的日期
__TIME__    //文件被编译的时间
__STDC__    //如果编译器遵循ANSI C,其值为1,否则未定义

Macro

grammar:

 #define name stuff

After compilation, stuff will replace name.

for example:

#define MAX 1000 //用MAX代替常用的常量
#define reg register          //为 register这个关键字,创建一个简短的名字
#define do_forever for(;;)     //用更形象的符号来替换一种实现(do_forever代替实现for循环)
#define CASE break;case        //在写case语句的时候自动把 break写上。
// 如果定义的 stuff过长,可以分成几行写,除了最后一行外,每行的后面都加一个反斜杠(续行符)。
#define DEBUG_PRINT printf("file:%s\tline:%d\t \
                          date:%s\ttime:%s\n" ,\
                          __FILE__,__LINE__ ,       \
                          __DATE__,__TIME__ )//宏定义一个语句

#define is not recommended to add ";", because define replaces all the statements in the second part to the position of the first part. Adding ";" is easy to add semicolons in programming to cause statement errors.

Macro function usage

The #define mechanism includes a provision that allows parameters to be substituted into the text. This implementation is usually called a macro or define macro.

Statement method:

#define name( parament-list ) stuff

Note: The left parenthesis of the parameter list must be next to name. If there are any gaps between the two, the parameter list will be interpreted as part of the stuff
.

E.g:

#define SQUARE( x ) x * x

SQUARE( 5 );

At this point, when SQUARE is used in the program, a parameter 5 can be passed in, and it will eventually be replaced by the following x*x (ie 5 *5).

There is a problem with this macro:
Hypothesis 1:

int a = 5;
printf("%d\n" ,SQUARE( a + 1) );

At first glance, you might think this code will print the value 36. In fact, it will print 11.

替换文本时,参数x被替换成a + 1,所以这条语句实际上变成了: 
printf ("%d\n",a + 1 * a + 1 );

The expression resulting from the substitution is not evaluated in the order expected.

Add two parentheses to the macro definition, and this problem is easily solved:

#define SQUARE(x) (x) * (x)

Hypothesis 2:

#define DOUBLE(x) (x) + (x)

int a = 5; 
printf("%d\n" ,10 * DOUBLE(a));

It looks like 100 is printed, but in fact 55 is printed. We found that after replacing:

printf ("%d\n",10 * (5) + (5));

The solution to this problem is to add a pair of parentheses around the macro definition expression.

#define DOUBLE( x) ( ( x ) + ( x ) )

Macro definitions used to evaluate numeric expressions should be bracketed in this way to avoid
unexpected interactions between operators in parameters or adjacent operators when using macros .

#define substitution rules
When expanding #define definition symbols and macros in a program, several steps are involved.

  1. When calling the macro, first check the parameters to see if it contains any symbols defined by #define. If they are, they are replaced first.
  2. The replacement text is then inserted at the position of the original text in the program. For macros, the parameter names are replaced by their values.
  3. Finally, scan the result file again to see if it contains any symbols defined by #define. If yes, repeat the above process.

note:

  1. Variables defined by other #defines can appear in macro parameters and #define definitions. But for macros, recursion cannot occur.
  2. When the preprocessor searches for symbols defined by #define, the contents of string constants are not searched.

#Function

First of all, we need to know the characteristics of a C language.

printf("hello"," bit\n");

This sentence can be output, indicating that adjacent strings are automatically connected.

#You can turn a macro parameter into a corresponding string
For example:

int i = 10; 
#define PRINT(FORMAT, VALUE)\ 
 printf("the value of " #VALUE "is "FORMAT "\n", VALUE); 
... 
PRINT("%d", i+3);

The final output should be:

the value of i+3 is 13

##The role of

## can combine the symbols on both sides of it into one symbol. It allows macro definitions to create identifiers from separate text fragments.

E.g:

#define ADD_TO_SUM(num, value) 
 sum##num += value; 
... 
ADD_TO_SUM(5, 10);

The effect is: replace num with 5, link it with sum to text sum5, add 10 to sum5, and replace it in the program.

Macro with side effects

When a macro parameter appears more than once in the definition of a macro, if the parameter has side effects, then you may be dangerous when using this macro, leading
to unpredictable consequences. Side effects are permanent effects that occur when an expression is evaluated. E.g:

x+1;//Without side effects
x++;//With side effects

Case:

#define MAX(a, b) ( (a) > (b) ? (a) : (b) ) 
... 
x = 5; 
y = 8; 
z = MAX(x++, y++); 
printf("x=%d y=%d z=%d\n", x, y, z);/

The result of the preprocessor processing is:

z = ( (x++) > (y++) ? (x++) : (y++));

So the output result is:

x=6 y=10 z=9

Deviated from the result we expected.

Comparison of macros and functions

Both macros and functions can be applied to perform simple operations. For example, find the larger of two numbers.

#define MAX(a, b) ((a)>(b)?(a):(b))

Then why not use functions to accomplish this task? There are two reasons:

  1. The code used to call the function and return from the function may take more time than the actual execution of this small calculation work. So macros are better than functions in terms of program size and speed.
  2. More importantly, the parameters of the function must be declared as specific types. So functions can only be used on expressions of appropriate types. On the contrary, this macro can be applied to integers, long integers, floating point types, etc. It can be used for all comparable types. Macros are type independent.

weakness is:

  1. Every time a macro is used, a code of the macro definition will be inserted into the program. Unless the macro is relatively short, it may greatly increase the length of the program.
  2. Macros cannot be debugged.
  3. Macros are not rigorous enough because they are type-independent.
  4. Macros may cause problems with operator precedence, making the program prone to errors.

Macros can sometimes do things that functions cannot. For example: Macro parameters can have types, but functions cannot.

#define MALLOC(num, type) (type *)malloc(num * sizeof(type)) 
... 
//使用
MALLOC(10, int);//类型作为参数
//预处理器替换之后:
(int *)malloc(10 * sizeof(int));

The main performance comparison of macros and functions:

Attributes #definedefine macros function
Code length Every time it is used, the macro code will be inserted into the program. Except for very small macros, the length of the program will increase significantly The function code only appears in one place; every time this function is used, the same code in that place is called
Execution speed Faster There are extra overheads of function call and return, so it is relatively slow
Operator precedence The evaluation of macro parameters is in the context of all surrounding expressions. Unless parentheses are added, the precedence of adjacent operators may have unpredictable consequences. Therefore, it is recommended that macros should be written with more parentheses. Function parameters are evaluated only once when the function is called, and its result value is passed to the function. The result of evaluating an expression is easier to predict.
Parameters with side effects Parameters may be replaced in multiple locations in the macro body, so parameter evaluation with side effects may produce unexpected results. Function parameters are evaluated only once when passing parameters, and the results are easier to control.
Parameter Type The parameters of the macro have nothing to do with the type. As long as the operation on the parameters is legal, it can be used for any parameter type. The parameters of the function are related to the type. If the types of the parameters are different, different functions are required, even if the tasks they perform are different.
debugging Macros are not easy to debug Functions can be debugged statement by statement
Recursion Macros cannot be recursive Functions can be recursive

Naming convention

Generally speaking, the syntax of functions and macros is very similar. So language itself cannot help us distinguish between the two. Then one of our usual habit is: capitalize all macro names, don't capitalize all function names

#undef

This instruction is used to remove a macro definition.

#undef NAME 
//如果现存的一个名字需要被重新定义,那么它的旧名字首先要被移除。

Command line definition

Many C compilers provide the ability to allow symbols to be defined on the command line. Used to start the compilation process. For example: when we want to
compile different versions of different programs based on the same source file , this feature is useful. (Assuming that an array of a certain length is declared in a program, if the
machine's memory is limited, we need a small array, but another machine's memory is capitalized, and we need an array that can be capitalized.)
For example:

#include <stdio.h>

#define M 20

int main()
{
    
    
 int i = 0;
 for(;i<M;i++){
    
    
 printf("%d\n",i);
 }
 return 0;
 }

As above, if we want to repeatedly change the value of the macro M, we need to open the original text for recompilation many times, but the C language provides the command line definition function, and the macro can be directly defined through the command line.

gcc test.c -D M=10 

Conditional compilation

When compiling a program, it is very convenient for us to compile or abandon a statement (a group of statements). Because we have conditional compilation instructions.
For example:
debugging code, it is a pity to delete, and keep it in the way, so we can selectively compile.

E.g:

#include <stdio.h>

int main()
{
    
    
#ifdef WORLD
  printf("hello world!\n");
#elif C
  printf("hello C\n");
#else
  printf("hello bit!\n");
#endif
  return 0;
 }

For the above code, the conditional judgment is whether the macro defines a certain constant, and if it is defined, the next code is run.

In this way, we can combine the aforementioned command line definitions to compile, run or test the segmented program without opening the original program text.

Common conditional compilation instructions:

1. 
#if 常量表达式
 //... 
#endif 
//常量表达式由预处理器求值。
如:
#define __DEBUG__ 1 
#if __DEBUG__ 
 //.. 
#endif 

2.多个分支的条件编译
#if 常量表达式
 //... 
#elif 常量表达式
 //... 
#else 
 //... 
#endif 

3.判断是否被定义
#if defined(symbol) 
#ifdef symbol 
#if !defined(symbol) 
#ifndef symbol 

4.嵌套指令
#if defined(OS_UNIX) 
 #ifdef OPTION1 
 unix_version_option1(); 
 #endif 
 #ifdef OPTION2 
 unix_version_option2(); 
 #endif 
#elif defined(OS_MSDOS) 
 #ifdef OPTION2 
 msdos_version_option2(); 
 #endif 
#endif

File contains

We already know that the #include directive can make another file be compiled. Just as it actually appears in the #include directive.
This replacement method is simple: the preprocessor deletes this instruction first and replaces it with the contents of the include file.

Such a source file is included 10 times, and it is actually compiled 10 times.

There are two main ways the header file is included:

  1. Local file contains
#include "filename" 

Search strategy: first search in the directory where the source file is located. If the header file is not found, the compiler searches for the header file in the standard location just like searching for the library function header file. If you can't find it, it will prompt a compilation error.

2. Library file contains

#include <filename.h> 

Find the header file and go directly to the standard path to find it. If it is not found, it will prompt a compilation error.
The path of the standard header file of the Linux environment: /usr/include The path of the standard header file of the
VS environment: C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include

The library file can also be included in the form of "". When it is not necessary to search in the current folder, use <> to improve efficiency.

Nested inclusion of files

Files can refer to each other, which will form nesting. When there are too many nesting connections, the following situations occur:
Insert picture description here
comm.h and comm.c are common modules. Test1.h and test1.c use common modules. Test2.h and test2.c use common modules. test.h and test.c use test1 and test2 modules. Two copies of comm.h will appear in the final program. This results in duplication of file content.

Therefore, we stipulate that conditional compilation is added to the header file, so that repeated references and replacement of files can be avoided.

#ifndef __TEST_H__ 
#define __TEST_H__ 
//头文件的内容
#endif //__TEST_H__

or:

#pragma once

Guess you like

Origin blog.csdn.net/qq_40893595/article/details/105575607