What is a compiler?

A compiler is a program that converts human-readable source code into computer-executable machine code. In order to do this successfully, the human-readable code must conform to the grammatical rules of whatever programming language it is written in. A compiler is just a program and cannot fix code for you. If you make a mistake, you have to correct the syntax, otherwise it won't compile.

What happens when the code is compiled?

The complexity of a compiler depends on the syntax of the language and the level of abstraction provided by the programming language. C compilers are much simpler than C++ or C# compilers.

lexical analysis

When compiling, the compiler first reads a stream of characters from a source code file and generates a stream of lexical tokens. For example, C++ code:

 
 
int C= (A*B)+10;

Can be parsed into the following tokens:

  • Type "int"
  • variable "C"
  • equal
  • Left parenthesis
  • Variable "A"
  • Second-rate
  • Variable "B"
  • closing bracket
  • add
  • Literal "10"

Syntax analysis

The lexical output goes to the parser part of the compiler, which uses grammar rules to determine whether the input is valid. Unless the variables A and B were previously declared and in scope, the compiler might say:

  • 'A': Undeclared identifier.

If they are declared but not initialized. The compiler issues a warning:

  • Uninitialized used local variable 'A'.

Never ignore compiler warnings. They can break your code in strange and unexpected ways. Always fix compiler warnings.

One pass or two pass?

Some programming languages ​​are written so that the compiler reads the source code only once and generates machine code. It is such a language. Many compilers require at least two passes. Sometimes this is because of a forward declaration of a function or class.

In C++, classes can be declared, but not defined until later. The compiler cannot figure out how much memory a class needs until it compiles the body of the class. It has to re-read the source code before generating correct machine code.

generate machine code

Assuming the compiler successfully completes the lexical and syntactic analysis, the final stage is to generate machine code. It's a complicated process, especially with modern CPUs.

Compiling executable code should be as fast as possible, and can vary widely depending on the quality of the generated code and the amount of optimization requested.

Most compilers allow you to specify the amount of optimization - often known for fast debug compilations and full optimization for released code.

Code generation is challenging

Compiler writers face challenges when writing code generators. Many processors use

  • instruction pipeline
  • internal cache.

If all the instructions in a code loop can be held in the CPU cache, the loop can run much faster than if the CPU had to fetch instructions from main RAM. The CPU cache is a block of memory built into the CPU chip that can be accessed much faster than data in main RAM.

cache and queue

Most CPUs have a prefetch queue where the CPU reads instructions into cache before executing them. If a conditional branch occurs, the CPU must reload the queue. Code should be generated to minimize this.

Many CPUs have separate parts for:

  • Integer Arithmetic (Integer)
  • Floating point arithmetic (decimal)

These operations can often be run in parallel to increase speed.

Compilers typically generate machine code into object files, which are then linked together by a linker program.

Guess you like

Origin blog.csdn.net/xipengbozai/article/details/131352346