[C language] Programming environment and preprocessing

Programming environment

Preface

When we wrote C language code before, we usually wrote a file of type .c, but the commands executed by the bottom layer of the computer are actually binary instructions (machine instructions) , so how does our computer execute the code we write? If you have written some code, you should know that our .c file will become an executable file like .exe when it is run and then run. In fact .exeThe commands stored in the file are binary instructions. So how our code is converted from a .c file into computer-executable binary instructions, this is what we will briefly describe in this article

Program compilation environment and execution environment

As long as the code you write is in C language, there must be two environments during the running of the program:Compilation environment and execution environment

Usually, a .c file needs to go through two processes to become a .exe file: compile + link , and then it becomes our executable program .exe, then this process will usually be carried out in the compilation environment, and the execution environment is used to execute our Executable file.exe

A Deep Dive into Compilation and Linking

Compile and link

We said above that a .c file needs to go through two processes to become a .exe file: compilation+ Link, then let’s take a closer look at these two processes

Generally, in a project file, there will be multiple source files.c or header files.h, then the compilation process will first compile eachlinker file. The tool used in the linking process is called . Then multiple target files will be linked and merged into one compiler files are generated on VS). The tool used in the compilation process is called the .c files are converted into target files (the .obj.exe

Among them, our compilation process can actually be subdivided into three processes:Pre-compilation (also called preprocessing) , Compile, Assemble

Precompilation (preprocessing)

This stage will mainly be completed: 1. Inclusion of header files 2. Execution of preprocessing instructions 3. Deletion of comments

We will explain the preprocessing process in more depth in the second half of the article.

compile

The compilation process is mainly to convert our C language code into assembly code. The operations include: syntax analysis, semantic analysis, lexical analysis, and symbol summary.

If we want to translate a piece of Chinese into English, do we also need to perform grammatical, semantic, and lexical analysis? In fact, the compilation process mainly involves operations like this

So what is this symbol summary? In fact, it is a collection of global identifiers similar to function names, because at this time, our files are still separate. In order to be able to recognize each other later when linking, we need to perform some operations on these identifiers, but in fact The above symbols are summarized. This is not all the operations. Some of the following operations will also be performed during the assembly process.

compilation

The assembly process is the last process of compilation, which is the process of generating target files (binary files). This process requires converting the assembly code generated by compilation into binary instructions, and then forming a symbol table.

As we said above, symbols are summarized during the compilation process, and assembly forms a symbol table on this basis, just like you collect data during compilation and then organize them in assembly. It is convenient for us to compare and merge during the linking stage. And in the process of forming the symbol table, the address of each identifier will be found so that these symbols can be correctly located and accessed when the program is running.

Link

Linking is the process of merging our multiple target files and generating an executable program at the same time.c. It mainly performs two operations: 1. Merging segment tables 2. Merging symbol tables and relocation

Among them, the merging of symbol tables is to merge the symbol tables of each file just now, and relocation is to change the address to a valid address.

It's worth mentioning that during the linking process the linker looks for each undefined symbol and tries to find the corresponding definition in the symbol table. If a definition is not found, the linker reports an undefined symbol error. In other words, at this stage of linking it can be found that the called function is undefined.

Then at the end, the linker will link our object files, symbol tables and link libraries (.lib, the library functions need to rely on these library files to run) to generate an executable program

Compiler Environment

Program running process

In the running environment, a program may run through the following processes:

  1. The program must be loaded into memory. In an environment with an operating system: This is usually done by the operating system. In a stand-alone environment, program loading must be arranged manually or may be done by placing executable code into read-only memory.

  2. Program execution begins. Then call the main function

  3. Start executing program code. At this time, the program will use a runtime stack to store the local variables and return address of the function. Programs can also use static memory. Variables stored in static memory retain their values ​​throughout the execution of the program.

  4. Terminate the program. Terminate the main function normally; it may also terminate unexpectedly

Detailed explanation of preprocessing

Preprocessing is the first step in the C language compilation process. It generates a preprocessed code by performing operations such as macro expansion, conditional compilation, and header file inclusion on the source code.

Then we will start to explain in detail some knowledge about preprocessing.

Predefined symbols

Predefined symbols are symbols predefined by the compiler and are used to express certain special meanings. Common predefined symbols are as follows:

  1. _FILE_: Indicates the file name where the current source code is located.
  2. _LINE_: Indicates the line number of the current source code.
  3. _DATE_: Indicates the compilation date in the format of "MMM DD YYYY", such as "Jan 01 2022".
  4. _TIME_: Indicates the compilation time, in the format of "HH:MM:SS", such as "10:30:15" ;.

These predefined symbols can be used directly because they are built into the compiler.

#define

#defineDefine parameterless macro

Macro definition is a preprocessing directive that can define certain strings or expressions as macros. When used in a program, the preprocessor will replace all macros with their defined content.

#define MAX 100
int arr[MAX];
//实际上就等效于下面的
int arr[100]

And not only constants, we can also replace some identifiers, codes, etc.

//将register定义为REG
#define REG register
//将死循环定义为DO_FOREVER
#define DO_FOREVER for(;;)
//如果定义的 stuff过长,可以分成几行写,除了最后一行外,每行的后面都加一个反斜杠(续行符)
#define DEBUG_PRINT printf("file:%s\tline:%d\t\
                          date:%s\ttime:%s\n",\
                          __FILE__,__LINE__ ,\
                          __DATE__,__TIME__ )

Then when we use #define to define a macro, it is not recommended to add ; at the end, because this will cause the problem when we replace it. Treat the semicolon as part of the definition and then replace it when replacing

Let’s look at a question next

#define INT_PTR int*
typedef int* int_ptr;
INT_PTR a,b;
int_ptr c,d;

Which one of a, b, c, d is not a pointer variable?

Since we use #define to define INT_PTR asint*

So actually the creation statements of a and b are as follows

int* a,b;

It should be noted here that when declaring multiple pointer variables, you cannot write it like this. This will only make the first variable a pointer variable, while the other variables are ordinary data types.

But secondly, since we define the type int* as int_ptr, then create c and a>d is equivalent to creating two int_ptr types of data, that is, two int* data

#defineDefine a macro with parameters

#defineThe definition macro actually supports substitution of parameters, which has the same effect as a function

Declaration format of parameter macro

#define name( parament-list ) stuff

whereparament-list is something similar to a function parameter, which will appear instuff

And the parameters of macro are not restricted by type. As long as the usage is legal, any type of parameters can be passed in directly.

#define CMP(x,y) (x > y)?x:y 

int main() {
    
    
	int a = 1;
	int b = 3;
	float c = 1.1;
	float d = 3.1;
	printf("%d\n", CMP(a, b));
	printf("%f\n", CMP(c, d));
	return 0;
}

Let’s talk about a few points you should pay attention to when writing macros:

#include<stdio.h>
#define SQUARE( x ) x * x
//一个求平方的宏
int main() {
    
    
	printf("%d", SQUARE(5));
	return 0;
}

We said above that the macro defined by #define is directly replaced in the preprocessing stage, rather than being run and calculated like a function, so we write it in the following way Some problems will occur

#define ADD(x,y) x + y
int main() {
    
    
	printf("%d", 2 * ADD(5,6));
	return 0;
}

The original intention of our code above is to first calculate the sum of 5 and 6 and then multiply 2, but Since #define is directly replaced, our code actually becomes after the preprocessing stage

int main() {
    
    
	printf("%d", 2 * 5 + 6);
	return 0;
}

In fact, it will be calculated first2*5 and then added6

So in order to prevent this error from happening, we can add as many parentheses as possible when defining the macro to ensure the macro's operation priority, as follows

#define ADD(x,y) (x + y)

Then look at another code

#define SQUARE(x) (x * x)
int main() {
    
    
	printf("%d", SQUARE(5 + 1));
	return 0;
}

We expected it to be calculated36but it was actually output11

So why? Didn’t we already add parentheses?

Then let’s write out the preprocessed replacement results and take a look.

int main() {
    
    
	printf("%d", (5 + 1 * 5 + 1));
	return 0;
}

Then it will be calculated first1*5 and then 5 and 1

Therefore, in order to prevent this from happening, we not only need to add parentheses outside the macro, we should also add parentheses to each place where parameters are used, as follows

#define SQUARE(x) ((x) * (x))

#defineReplacement rules

There are several steps involved when extending #define to define symbols and macros in a program

  1. When a macro is called, the parameters are first checked to see if they contain any symbols defined by #define. If so, they are first replaced

  2. The replacement text is then inserted into the program at the location of the original text. For macros, parameter names are replaced by their values

  3. Finally, the resulting file is scanned again to see if it contains any symbols defined by #define. If so, repeat the process

Notice:

  1. Symbols defined by other #define can appear in macro parameters and #define definitions. But for macros, recursion cannot occur
  2. When the preprocessor searches for symbols defined by #define, the contents of string constants are not searched.

###use

First look at a piece of code

int main() {
    
    
	int a = 1;
	printf("a的值是%d\n", a);
	int b = 5;
	printf("b的值是%d\n", b);
	float c = 1.11;
	printf("c的值是%f\n", c);
	return 0;
}

The above code is obviously very troublesome to write. Every timeprintf we have to change the initial a and the calling parameters. Then Is there any way we can replace this type of statement with a statement?

The function definitely won't work, because it can't know the name of the parameter I passed.

But macros can be used# to help it do this

#参数
//上面就相当于下面的
"参数"

So what exactly does that mean? Let’s actually test it and see the modified code that borrows this feature.

#define PRINT(FORMAT,VALUE) printf(#VALUE"的值是"FORMAT"\n",VALUE)

int main() {
    
    
	int a = 1;
	PRINT("%d", a);
	int b = 5;
	PRINT("%d", b);
	float c = 1.11;
	PRINT("%f", c);
	return 0;
}

Taking the first statement as an example, the replaced code will become

	int a = 1;
	printf("a" "的值是" "%d" "\n", a);

Then we actually need to know a property of printf: printf will automatically connect the strings inside, or it can also be understood as printing one by one. It’s okay to come out


Macros##There are very few application scenarios, so the functions will be introduced directly

#define ZERO(NAME,NUM) NAME##NUM = 0
int main() {
    
    
	int num1 = 101;
	ZERO(num, 1);
	return 0;
}

After replacement, it is equivalent to

int main() {
    
    
	int num1 = 101;
	num1 = 0;
	return 0;
}

It is to combine the two identifiers on the left and right into one identifier, but you must ensure that the synthesized identifier exists.

Macro parameters with side effects

What are side effects parameters, give an example

a++ //副作用参数,会改变变量的值
a+1 //无副作用参数,不会改变变量的值

When passing parameters to a function, since we calculate them first and then pass them in, the side effect parameters will not actually affect the running results of the function. But macros are different. Macros directly replace parameters. When you use parameters with side effects, the side effects parameters may be called multiple times, resulting in unpredictable results.

Look at the code below

#define MAX(a, b) ( (a) > (b) ? (a) : (b) )
int max(int a, int b) {
    
    
	return a > b ? a : b;
}
int main() {
    
    
	int x = 5;
	int y = 8;
	int z = max(x++, y++);
	printf("x=%d y=%d z=%d\n", x, y, z);
	x = 5;
	y = 8;
	z = MAX(x++, y++);
	printf("x=%d y=%d z=%d\n", x, y, z);
	return 0;
}
//输出结果
x=6 y=9 z=8
x=6 y=10 z=9

Most people subconsciously estimate that the calculation is the result of the function, but since the macro is a replacement, the actual code is

int z = ( (x++) > (y++) ? (x++) : (y++) );

When comparingx and y both will++ and then due to y=8 > x=5, So it will be done again later, but the returned value is only onceThat isy++++y9

Thenx++ happened once sox=6, y++ happened twice soy=10

Comparison between macros and functions

Attributes #definedefine macro function
code length Because it is a replacement, the code length will be greatly increased when the macro is too long. Each run calls the same function
Execution speed Equivalent to removing the return of the function call, slightly faster relatively slower
operator precedence Since the macro is a direct replacement, parentheses must be added to ensure the priority to prevent the influence of other surrounding operators after being replaced. The actual parameters of the function will only be called once when passing parameters, which is easy to control.
Parameters with side effects Since the macro is a direct replacement, parameters with side effects may be called multiple times, expanding the side effects. The actual parameters of the function will only be called once when passing parameters, which is easy to control.
Parameter Type Macros do not have strict restrictions on parameter types, and they can be used legally Functions have strict restrictions on parameter types
debug Macros cannot be debugged Functions can be debugged step by step
recursion Although macros can contain parameterless macros, they cannot be recursive. Functions can be recursive

Generally, in some codes with complex calculation processes, it is more recommended to use functions, because they are easier to control and can be debugged. At the same time, if it is a code with a large amount of calculation, the actual execution speed of the macro can be regarded as very small.

naming convention

As a general rule, macro names should be in all capital letters, and function names should not be in all capital letters.

#undef

Definition for removing macros

#undef NAME
//如果现存的一个NAME要被重新定义,那么之前定义的NAME首先要被移除

conditional compilation

Conditional compilation, as the name suggests, is to determine whether to compile based on conditions.

So how exactly should it be done? Let’s first take a look at the common conditional compilation instructions.

#if 常量表达式
//假如常量表达式的结果为是,那么中间的代码将被忽略
#endif
//例如下面
#define __DEBUG__ 1
#if __DEBUG__
//..
#endif

Like the normal if selection structure, conditional compilation also supports multiple branches

#if 常量表达式
//...
#elif 常量表达式
//...
#else
//...
#endif

There is also a statement for selective compilation based on whether the macro is defined. This practice is very common in header files.

#if defined(symbol)
#ifdef symbol
//上面两个都是一样的
//如果被定义了就不编译
#endif


#if !defined(symbol)
#ifndef symbol
//上面两个都是一样的
//如果被定义了就不编译
#endif

File contains

The header file contains

There are two ways to include header files, as follows:

#include "filename"
#include <filename>

The first one is usually used to include header files in our project, and the second one is usually used to include header files in the standard library.

The search strategies corresponding to these two writing methods are different. The first one has two steps: 1. First search in the project file 2. Then search in the standard library path

If neither of the above two paths are found, an error will be reported.

The second inclusion method is only the second step, so some people will definitely think that I can also use the first way to include the header files of the standard library?

It is indeed possible, but it is not recommended because the search efficiency will be low and it will not be easy to distinguish whether the included library file or local file is included.

Workaround to prevent duplicate inclusion

Sometimes header files are included by header files, and this nested inclusion may cause the contents of some header files to be included repeatedly.

Generally, in order to prevent repeated inclusion, header files will have a conditional compilation structure similar to this

#ifndef __TEST_H__
#define __TEST_H__
//头文件的内容
#endif

or

#pragma once

Then this can prevent the repeated introduction of header files

Guess you like

Origin blog.csdn.net/qq_42150700/article/details/130035423