Detailed explanation of preprocessing C language

1. Program translation environment and execution environment

In any implementation of ANSI C, there are two different environments.

The first is a translation environment, in which source code is converted into executable machine instructions.
The second type is the execution environment, which is used to actually execute the code.

translation environment

Insert image description here

Insert image description here

. Each source file that makes up a program is converted into object code separately through the compilation process
.
. Each object file is bundled together by the linker to form a single and complete executable program.
. The linker will also introduce any functions in the standard C function library that are used by the program, and it can search the programmer's personal program library and link the functions he needs into the program.

Code example:
add.c

int Add(int x,int y) {
    
    
	return x + y;
}

test.c

#include<stdio.h>

extern int Add(int,int);

int main() {
    
    
	int a = 10;
	int b = 20;
	int c = Add(a, b);
	printf("%d",c);
}

After running, you can find two corresponding target programs in the file:
Insert image description here
Compilation principle picture example:
Insert image description here
In the linking step, you can check whether the function is defined , such as the following picture:
Insert image description here
When add.c and test.c are linked, It is detected that the Add function is defined in add.c but not in test.c, so after the link, the Add function inherits the address defined for it in add.c, so the link can check whether the function is defined;

Here’s a question to consolidate your knowledge:

Question:
A C program composed of multiple source files will generate a final executable program after editing, preprocessing, compilation, linking and other stages. In which of the following stages can a called function be found to be undefined? ( )
A. Preprocessing
B. Compilation
C. Linking
D. Execution

From the knowledge given above, we can know that the answer is C;

Operating environment

Program execution process:

. The program must be loaded into memory. In an environment with an operating system: This is usually done by the operating system. In a stand-alone environment, the loading of the program must be arranged manually, or it may be done by placing executable code into read-only memory.
.The execution of the program begins. Then the main function is called.
. Start executing program code. At this time, the program will use a runtime stack to store the function's local variables and return address. Programs can also use static memory. Variables stored in static memory retain their values ​​throughout the execution of the program.
. Terminate the program. Terminate the main function normally; it may also terminate unexpectedly.

2. Detailed explanation of preprocessing

Predefined symbols

__FILE__      //进行编译的源文件
__LINE__     //文件当前的行号
__DATE__    //文件被编译的日期
__TIME__    //文件被编译的时间
__STDC__    //如果编译器遵循ANSI C,其值为1,否则未定义

These predefined symbols are built into the language;
for example:

#include<stdio.h>

int main()
{
    
    
	printf("file:%s line:%d\n", __FILE__, __LINE__);
	return 0;
}

This code can print the location of the file and the line number of the printf code;
Insert image description here

#define

#define define identifier

Syntax:
#define name stuff

stuff can be an expression, a number or a letter;

You can use a piece of code as an example:

#include<stdio.h>

#define CASE break;case

int main() {
    
    
	int x = 0;
	switch (x) {
    
    
	case 1:
		break;
	CASE 2:
	CASE 3 :
	CASE 4:
	}
	return;
}

Insert image description here
There is a confusion that all newbies will have:
when defining an identifier, should you add ; at the end ?
For example:

#define MAX 1000;
#define MAX 1000

In fact, there is a lot of detail here. You can take a look at the following code:

#include<stdio.h>

#define MAX 1000;

int main() {
    
    
	int max = 0;
	max = 1000;
	if (max == MAX) {
    
    
		printf("1");
	}
	else {
    
    
		printf("2");
	}
	return 0;
}

The code running results in an error:

Insert image description here
So why is this?
Insert image description here
Here's how to declare a macro:

#define name( parament-list ) stuff where parament-list is a comma-separated list of symbols that may appear in stuff.
Note: The left parenthesis of the parameter list must be immediately adjacent to name.
If any white space exists between the two, the parameter list will be interpreted as part of stuff.

For example:

#define SQUARE( x ) x * x;

If you bring in a variable equal to 5,

#include<stdio.h>

#define SQUARE( x ) x * x

int main() {
    
    
	int a = 5;
	int ret=SQUARE( a );
	printf("%d\n", ret);
	return 0;
}

The result is definitely as expected,
Insert image description here
but what if we bring in an expression?
For example:

#include<stdio.h>

#define SQUARE( x ) x * x

int main() {
    
    
	int ret=SQUARE( 5+1 );
	printf("%d\n", ret);
	return 0;
}

Will the result be 36?
The answer is 11.
Insert image description here
Why?
Insert image description here
Should be changed to:

#define SQUARE( x ) (x) * (x)

Then the calculation formula is (5+1)*(5+1)
and the result is:
Insert image description here
There is another macro definition about #define:

#define DOUBLE(x) (x) + (x)

Code example:

#include<stdio.h>

#define DOUBLE(x) (x) + (x)

int main() {
    
    
	int a = 5;
	printf("%d\n", 10 * DOUBLE(a));
	return 0;
}

What is the result?
Insert image description here
Why not 100? Of course, the problem is also in priority;
the value actually printed by the code is:

printf ("%d\n",10 * (5) + (5));

10*5+5 is of course equal to 55;
you can change the code like this:

#define DOUBLE( x) ( ( x ) + ( x ) ) 

The result will be 100;

so:

Therefore, macro definitions used to evaluate numerical expressions should be parenthesized in this way to avoid unpredictable interactions between operators in parameters or adjacent operators when using macros.

Let’s test it with two questions:

The following macro definition is provided:
#define N 4
#define Y(n) ((N+2)*n) / This kind of definition is strictly prohibited in programming specifications /

Then execute the statement:
z = 2 * (N + Y(5+1));, the value of z is ( )
A. Error
B.60
C.48
D.70

Analysis: D
2*(4+((4+2)*5+1))=70;

The result of executing the following code is: ( )
#define A 2+2
#define B 3+3
#define CA*B
int main()
{ printf(“%d\n”, C); return 0; } A.24 B.11 C.10 D.23






Analysis: B
2+2*3+3=11;

#define replacement rules

When expanding #define definition symbols and macros in a program, there are several steps involved:

. When calling a macro, the parameters are first checked to see if they contain any symbols defined by #define. If so, they are replaced first.
.The replacement text is then inserted into the program at the location of the original text. For macros, parameter names are replaced by their values.
. Finally, the resulting file is scanned again to see if it contains any symbols defined by #define. If so, repeat the above process.

Of particular note are:

. Variables defined by other #define can appear in macro parameters and #define definitions. But for macros, recursion cannot occur.
. When the preprocessor searches for symbols defined by #define, the contents of string constants are not searched.

#and##

How to insert parameters into a string?

First, let’s look at this code:
char* p = “hello ““bit\n”;
printf(“hello”,” bit\n”);
printf(“%s”, p);

Is
the hello bit output here?
The answer is a definite: yes.
We found that strings have the characteristics of automatic connection

Then we can find another great use of macro definition, code example:

#include<stdio.h>
#define PRINT(n, format) printf("the value of "#n" is " format "\n", n)
//                                         #的作用是将#后面对内容应的转化为字符串

int main()
{
    
    
	int a = 20;
	//printf("the value of a is %d\n", a);
	PRINT(a, "%d");

	int b = 15;
	//printf("the value of b is %d\n", b);
	PRINT(b, "%d");

	float f = 4.5f;
	//printf("the value of f is %f\n", f);
	PRINT(f, "%f");
	return 0;
}

The running result is:
Insert image description here

##function

##You can combine the symbols on both sides of it into one symbol. It allows macro definitions to create identifiers from detached text fragments.

For example:
1.

#include<stdio.h>
#define CAT(x,y) x##y

int main()
{
    
    
	int Year = 0;
	Year=CAT(20, 24);
	printf("%d\n",Year);
	return 0;
}

Running results:
Insert image description here
2.

#include<stdio.h>
#define CAT(x,y) x##y

int main()
{
    
    
	int NowYear = 2023;
	printf("%d\n", CAT(Now,Year));
//  等价于 
//  printf("%d\n",NowYear);
	return 0;
}

Insert image description here

Macro parameters with side effects

When macro parameters appear more than once in the definition of a macro, if the parameters have side effects, you may be in danger when using this macro, leading to unpredictable consequences. Side effects are permanent effects that occur when an expression is evaluated.

For example:

x+1;//不带副作用
x++;//带有副作用

Problems caused by parameters with side effects can be demonstrated using the MAX macro

#include<stdio.h>
#define MAX(a, b) ( (a) > (b) ? (a) : (b) )
int main() {
    
    
	int x = 5;
	int y = 8;
	int z = MAX(x++,y++);
	printf("x=%d y=%d z=%d\n", x, y, z);
}

What is the output?
Insert image description here
parse
Insert image description here

#undef

This command is used to remove a macro definition.

#undef NAME 
//如果现存的一个名字需要被重新定义,那么它的旧名字首先要被移除。

Code example:

#include<stdio.h>
#define MAX(x, y) ((x)>(y)?(x):(y))
int main()
{
    
    
	int c = MAX(3, 5);
	printf("%d\n", c);
#undef MAX
	c = MAX(5, -5);
	printf("%d\n", c);


	return 0;
}

Running results:
Insert image description here
Since undef removes the definition of macro MAX, the compiler will report an error when executing later;

3. Comparison between macros and functions

- Macros are usually used to perform simple operations. For example, find the larger of two numbers.

For example:

#define MAX(a, b) ((a)>(b)?(a):(b)) 

So why not use functions to accomplish this task?
There are two reasons:

. The code used to call the function and return from the function may take more time than it takes to actually perform this small computational work. Therefore, macros are better than functions in terms of program size and speed .
. More importantly, the function parameters must be declared as specific types. So functions can only be used on expressions of the appropriate type. On the other hand, how can this macro be applied to integers, long integers, floating point types and other types that can be compared with >? Macros are type independent .

Of course, functions also have disadvantages compared to macros:

. Each time a macro is used, a copy of the macro definition code will be inserted into the program. Unless the macro is relatively short, it may significantly increase the length of the program .
. Macros cannot be debugged.
. Macros are not rigorous enough because they are type-independent .
. Macros may cause operator precedence problems , making programs prone to errors.

Macros can sometimes do things that functions cannot.
For example: macro parameters can have types, but functions cannot.

#define MALLOC(num, type)\ 
 (type *)malloc(num * sizeof(type)) 

After preprocessor replacement:

(int *)malloc(10 * sizeof(int));

A comparison chart of macros and functions

Insert image description here

inline - inline function

Macros and functions have their own merits. Inline functions can have the benefits of both macros and functions.

Concept: A function modified with inline is called an inline function. When compiling, the C++ compiler will expand it where the inline function is called. There is no overhead of pushing the function onto the stack. Inline functions improve the efficiency of program operation.

Code example:

inline int add(int a, int b)
{
    
    
	return a + b;
}

Of course, since inline functions are supplementary knowledge, I won’t go into details here.
When I have time in the future, I will sort out inline functions in detail for your reference.

naming convention

Generally speaking, the syntax for using function macros is very similar. So language itself cannot help us distinguish between the two.
Then one of our usual habits is:

Use all capital letters for macro names and not all capital letters for function names.

For example:

#define MAX ((a)>(b)?(a):(b))
void Max(int a,int b);

4. Command line definition

Many C compilers provide the ability to define symbols on the command line. Used to start the compilation process.
For example: This feature is useful when we want to compile different versions of a program based on the same source file. (Suppose an array of a certain length is declared in a program. If the machine memory is limited, we need a very small array, but if the memory of another machine is large, we need an array that can be capitalized.) Code example
;

#include <stdio.h> 
int main() 
{
    
     
 int array [ARRAY_SIZE]; 
 int i = 0; 
 for(i = 0; i< ARRAY_SIZE; i ++) 
 {
    
     
 array[i] = i; 
 } 
 for(i = 0; i< ARRAY_SIZE; i ++) 
 {
    
     
 printf("%d " ,array[i]); 
 } 
 printf("\n" ); 
 return 0; 
} 

Compilation instructions:

gcc -D ARRAY_SIZE=10 programe.c

After using the compilation directive, we can change the size of ARRAY_SIZE so that it can be changed to the size we want;

5. Conditional compilation

When compiling a program, it is very convenient if we want to compile or abandon a statement (a group of statements). Because we have conditional compilation directives.
For example:

Debugging code is a pity to delete, but keeping it is in the way, so we can selectively compile it.

Code example:

#include <stdio.h> 
#define __DEBUG__ 
int main() 
{
    
     
 int i = 0; 
 int arr[10] = {
    
    0}; 
 for(i=0; i<10; i++) 
 {
    
     
 arr[i] = i; 
 #ifdef __DEBUG__ 
 printf("%d\n", arr[i]);//为了观察数组是否赋值成功。 
 #endif //__DEBUG__ 
 } 

operation result:

Insert image description here

Common conditional compilation directives:

  1. Conditional compilation of a single branch

#if 常量表达式
 //... 
#endif 
//常量表达式由预处理器求值。
如:
#define __DEBUG__ 1 
#if __DEBUG__ 
 //.. 
#endif

If the statement following the #if statement is true, the statement following it will be compiled, if it is false, it will not be compiled; the biggest difference from the if statement is that when the condition of the if statement is false, it will also be compiled, but will not be executed;

2. Conditional compilation of multiple branches

#if 常量表达式
 //... 
#elif 常量表达式
 //... 
#else 
 //... 
#endif

3. Determine whether it is defined

#if defined(symbol) 
#ifdef symbol 
#if !defined(symbol) 
#ifndef symbol

Code example:

#include<stdio.h>
#define WIN 0

int main()
{
    
    
#if defined(WIN)//检测是否被定义
	printf("windows");
#endif


	return 0;
}

operation result:
Insert image description here

4. Nested instructions

#if defined(OS_UNIX) 
    #ifdef OPTION1 
         unix_version_option1(); 
    #endif 
    #ifdef OPTION2 
         unix_version_option2(); 
    #endif 
#elif defined(OS_MSDOS) 
      #ifdef OPTION2 
          msdos_version_option2(); 
      #endif 
#endif 

6. File Contains

We already know that the #include directive can cause another file to be compiled. Just like where it actually appears in the #include directive.
The replacement method is simple: the preprocessor first removes this directive and replaces it with the contents of the included file. If such a source file is included 10 times, it is actually compiled 10 times.

How header files are included

1. The local file contains:

#include "filename"

Search strategy: First search in the directory where the source file is located. If the header file is not found, the compiler searches for the header file in a standard location just like it searches for library function header files. If it cannot be found, a compilation error will be prompted.
The path to the standard header file of the Linux environment:

/usr/include

The path to the standard header file of the VS environment:

C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include
. Pay attention to find it according to your own installation path.

2.Library file contains

#include <filename.h>

To find the header file, go directly to the standard path to find it. If it cannot be found, a compilation error will be prompted.
Does this mean that library files can also be included in the form of "" ? The answer is yes, it can .
However, the search efficiency is lower. Of course, it is not easy to distinguish whether it is a library file or a local file.

Nested files include

Nested file inclusion is very common, as shown in the figure below:
Insert image description here
comm.h and comm.c are common modules.
test1.h and test1.c use common modules.
test2.h and test2.c use common modules.
test.h and test.c use the test1 module and test2 module.
——In this way, two copies of comm.h will appear in the final program. This results in duplication of file content.

So how to solve this problem?
Then you need to use conditional compilation!
At the beginning of each header file write:

#ifndef __TEST_H__ 
#define __TEST_H__ 
//头文件的内容
#endif //__TEST_H__

or

#pragma once

This can avoid the repeated introduction of header files.

Here are two short answer questions to consolidate your knowledge:

  1. What is the use of ifndef/define/endif in the header file?

Analysis: Prevent redefinition, that is, prevent conflicts with declared variables

  1. What is the difference between #include <filename.h> and #include "filename.h"?

Analysis: Different search strategies and different search efficiencies

Summarize

This is another great knowledge point!
If there is anything else you want to know, you can read "Compilation Principles" and "Programmer's Self-cultivation". I hope you can master this knowledge well!

Guess you like

Origin blog.csdn.net/mdjsmg/article/details/131816323