Detailed explanation of the compilation process of C language

  • What happens when we compile a C program?
  • What are the components in the compilation process, and what is the compilation execution process like?
  • what is compilation

    The compilation process of C language is the process of converting high-level language code that we can understand into machine code that the computer can understand . It is actually a translation process.

                                                           Source code and executable machine code

Program compilation is the process of converting source code into executable code, which usually includes the following main steps:

  1. Preprocessing : In this stage, the preprocessor will process the source code file and execute some preprocessing instructions. Common preprocessing directives include #include(for including header files), #define(for defining macros), #ifdefand #ifndef(for conditional compilation), etc. The output of a preprocessor is usually a preprocessed source code file.

  2. Compilation : The compiler receives the preprocessed source code and converts it into assembly code. Assembly code is a low-level code that is tied to a specific computer architecture. The compiler performs operations such as syntax analysis, semantic analysis, and code generation. If syntax errors or other compilation errors are found at this stage, the compilation process is aborted and an error message is output.

  3. Assembly : The assembler receives the assembly code generated by the compiler and converts it into machine code or relocatable object code. The assembly process converts assembly instructions into binary form, while parsing symbol references to generate executable machine code or linkable object files.

  4. Linking : If the program consists of multiple source code files, multiple object files will be generated after compilation. The linker takes these object files, along with the required library files, and combines them into an executable file. The linking process involves resolving symbol references, resolving reference relationships between different modules, and generating the final executable file. This also includes dynamic linking (loading shared libraries at runtime) or static linking (linking libraries to the executable at compile time).

  5. Optimization ( optional): Some compilers perform optimization steps before generating target code to improve program performance or reduce executable file size. Optimizations can include operations such as constant folding, loop unrolling, inlining functions, and dead code removal.

  6. Executable Generation : The final executable file contains machine code, which is a binary file that can be directly executed by the computer. This file can be loaded into memory by the operating system and executed.

In short, the process of program compilation includes preprocessing, compilation, assembly, linking and optional optimization steps, and finally generates an executable file. Different programming languages ​​and compilers may perform this process slightly differently, but this basic flow applies to most programming environments.

Detailed analysis

The compilation process of C language mainly includes four steps:

  1. preprocessing
  2. compile
  3. compilation
  4. connect

The picture below shows the complete process of C program compilation.

Next let's look at what is happening at different stages of the compilation process.

1. Preprocessing
The first step in the compilation process is preprocessing. After the processing is completed, a temporary file with a suffix (.i) will be generated. This step is completed by the preprocessor. The preprocessor mainly completes the following tasks.

  • Delete all comments
  • macro expansion
  • File contains

The preprocessor will delete all comments during the compilation process, because comments do not belong to the program code and they have no special effect on the running of the program.

Macros are   constant values ​​or expressions defined using the #define directive. Macro calls cause macro expansion. The preprocessor creates an intermediate file in which some pre-written assembly-level instructions replace defined expressions or constants (basically matching tokens). To distinguish between original instructions and assembly instructions resulting from macro expansion, a "+" sign is added to each macro expansion statement.

File inclusion File inclusion in C language is adding another
file containing some pre-written code to our C program during preprocessing . It is done using #include directive. Including a file during preprocessing causes the entire contents of the filename to be added to the source code , replacing the #include<filename> directive, creating a new intermediate file. 2. Compilation The compilation phase in C uses built-in compiler software to convert temporary files (.i) into assembly files (.s ) with assembly-level instructions (low-level code)  . To improve program performance, the compiler converts intermediate files into assembly files. Assembly code is a simple English language used to write low level instructions (In microcontroller programs we use assembly language). The entire program code is parsed (parsed) in one go by the compiler software and tells us through the terminal window any syntax errors or warnings present in the source code . The figure below shows an example of how the compilation phase works. 3. Assemble using assembler







Convert assembly-level code (.s files) into machine-understandable code (binary/hex form). An assembler is a pre-written program that converts assembly code into machine code. It takes the basic instructions from the assembly code file and converts them into binary/hex code specific to the machine type (called object code). The generated file has the same name as the assembly file and is called an object file with the extension .obj
in DOS and  .o in the UNIX operating system  . The image below shows an example of how the assembly phase works. The assembly file hello.s is converted to the object file hello.o with the same name but a different extension. 4. Linking Linking is the process of including library files into our program. Library files are predefined files that contain function definitions in machine language. These files have the extension .lib. Some unknown statements are written into object (.o/.obj) files that our operating system does not understand. You can think of it as a book with some words you don't know and you will use a dictionary to find the meaning of these words. Likewise, we use library files to give meaning to some unknown statements in object files. The linking process generates an executable file with the extension  .exe , .out in DOS, and .out  in UNIX operating systems  . The image below shows an example of how the linking stage works, we have an object file with machine level code which is passed through the linker which links the library file with the object file to produce an executable file. 




Example

Next, let's take a detailed look at all the steps involved in the C compilation process through an example. The first step is to write a simple C program and save it as hello.c

// Simple Hello World program in C
#include<stdio.h>
int main()
{
    // printf() is a output function which prints
    // the passed string in the output console
    printf("Hello World!");
    
    return 0;
}

Then we execute the compilation command to compile hello.c:

gcc -save-temps hello.c -o compilation

The -save-temps  option will retain all intermediate files generated during the compilation process, and a total of four files will be generated.

  • hello.i file generated by the preprocessor
  • hello.s file generated by the compiler
  • hello.o assembler translated object file
  • hello.exe executable file (Linux system will generate hello.out file)

First, the preprocessing of our C program begins, comments are removed from the program because there are no macro instructions in the program, so macro expansion does not happen, we also include a stdio.h header file, and during preprocessing, the standard Declarations of input/output functions like printf(), scanf() etc. are added to our C program.

Open the hello.i file generated during the preprocessing stage and you will see code similar to the following.

# 1 "hello.c"
# 1 ""
# 1 ""
# 1 "hello.c"

# 1 "C:/Program Files (x86)/CodeBlocks/MinGW/include/stdio.h" 1 3
# 293 "C:/Program Files (x86)/CodeBlocks/MinGW/include/stdio.h" 3
 int __attribute__((__cdecl__)) __attribute__ ((__nothrow__)) fprintf (FILE*, const char*, ...);
 int __attribute__((__cdecl__)) __attribute__ ((__nothrow__)) printf (const char*, ...);
 int __attribute__((__cdecl__)) __attribute__ ((__nothrow__)) sprintf (char*, const char*, ...);

 int __attribute__((__cdecl__)) __attribute__ ((__nothrow__)) scanf (const char*, ...);
 int __attribute__((__cdecl__)) __attribute__ ((__nothrow__)) sscanf (const char*, const char*, ...);


...
...
...

 int __attribute__((__cdecl__)) __attribute__ ((__nothrow__)) putw (int, FILE*);
# 3 "hello.c" 2

int main()
{
    printf("Hello World!");
    return 0;
}

As you can see from the above code, after precompilation, all comments are gone, and #include <stdio.h> is replaced by the contents of its header file.

The next step is the compilation stage. The compiler receives the hello.i file and converts it into the assembly code hello.s file. The following things happened during this process:

  • Compiler checks for syntax errors
  • Translate source code into intermediate code, such as assembly code
  • Optimize your code

The hello.s file generated after compilation looks like this:

.file	"hello.c"
	.def	___main;	.scl	2;	.type	32;	.endef
	.section .rdata,"dr"
LC0:
	.ascii "Hello World!\0"
	.text
	.globl	_main
	.def	_main;	.scl	2;	.type	32;	.endef
_main:
LFB12:
	.cfi_startproc
	pushl	%ebp
	.cfi_def_cfa_offset 8
	.cfi_offset 5, -8
	movl	%esp, %ebp
	.cfi_def_cfa_register 5
	andl	$-16, %esp
	subl	$16, %esp
	call	___main
	movl	$LC0, (%esp)
	call	_printf
	movl	$0, %eax
	leave
	.cfi_restore 5
	.cfi_def_cfa 4, 4
	ret
	.cfi_endproc
LFE12:
	.ident	"GCC: (MinGW.org GCC-6.3.0-1) 6.3.0"
	.def	_printf;	.scl	2;	.type	32;	.endef

Next, the assembler converts the hello.s file into binary code and generates the object file hello.obj in the Windows environment and the hello.o file in the Linux system.

Next, the linker uses the library file to add the required definitions to the object file and generates an executable file hello.exe in the Windows environment and hello.out file in the Linux operating system.

  • When we run hello.exe/hello.out, we will output Hello World on the screen! .

Program flow chart
Let us take a look at the program flow chart during the C language compilation process:

in conclusion

  • The compilation process in C is also known as the process of converting human understandable code ( C program ) into machine understandable code ( binary code).
  • The compilation process of C language includes four steps: preprocessing, compilation, assembly and linking.
  • The preprocessor performs comment removal, macro expansion, and file inclusion. These commands are executed during the first step of the compilation process.
  • The compiler can improve the performance of the program and convert intermediate files into assembly files.
  • Assembler helps in converting assembly files into object files containing machine code.
  • The linker is used to link library files with object files. This is the final step in compilation to generate an executable file.

Reference URL:

Compilation of C programs_c compilation_. Withered. Blog-CSDN Blog

Detailed explanation of the compilation process of C language - Zhihu

Guess you like

Origin blog.csdn.net/m1234567q/article/details/133034539