Source: Bole Online, Author: Chaobs
C language is a very low-level language, and in many ways it is similar to assembly language. In the book "Intel 32-bit Assembly Language Programming", it even introduces the method of manually translating simple C language into assembly. For system software such as compilers, it is natural to write in C language. Even high-level languages like Python still rely on C language at the bottom (the example of Python is because Intel hackers are trying to make
Python not needed. The operating system can run-in fact, the one-time C code on the BIOS is eliminated). Nowadays, after learning the principles of compilation, anyone with a little programming ability can implement a simple C-like compiler.
But here comes the problem. I don’t know if you have ever thought about it. Everyone writes compilers in C language or a language based on C. So how did the world’s first C language compiler write it? This is not a "chicken and egg" question...
Therefore, the prototype of the first C language compiler may be written in B language or mixed B language and PDP assembly language.
(Image source: C language and programming)
Therefore, the early C language compilers took a tricky approach: first use assembly language to write a subset of the C language compiler, and then use this subset to recursively complete the complete C language compiler.The detailed process is as follows:
C language |
CN language |
…… |
C0 language |
Assembly language |
Machine language |
First introduce a concept, "self-compilation" Self-Compile , that is, for some strong types with obvious bootstrapping properties (the so-called strong type means that each variable in the program must be declared before it can be used, such as C language. On the contrary, some Scripting languages don’t have the term type at all.) Programming languages can use a limited subset of them to express themselves through a limited number of recursions. Such languages include C, Pascal, Ada, etc., as for why It can be self-compiled, you can refer to the "Compilation Principles" of Tsinghua University Press, which implements a subset of Pascal compiler.
In short, some computer scientists have proved that the C language can theoretically realize a complete compiler through the above-mentioned CVM method, so how does it actually simplify it?
Is this picture a bit familiar? By the way, I saw it when I was talking about virtual machines, but here is CVM (C Language Virtual Machine), each language can be compiled independently on each virtual layer, and except for the C language, each layer The output of will be used as the input of the next layer (the output of the last layer is the application), which is the same as snowballing. Combine a small handful of snow with your hands (assembly language) and roll it down little by little to form a big snowball. This is probably the so-called 0 begets 1, 1 C, and C begets everything, right?
auto enum restrict unsigned
break extern return void
case float short volatile
char for signed while
const goto sizeof _Bool
continue if static _Complex
default inline struct _Imaginary
do int switch
double long typedef
else register union
//共37个
enum unsigned
break return void
case float short
char for signed while
goto _Bool
continue if _Complex
default struct _Imaginary
do int switch
double long
else union
//共27个
Thinking about it again, I found that there are actually many types and type modifiers in C3 that it is not necessary to add them all at once. For example, three integer types, as long as the realization of int is enough, so further remove these keywords, they are: unsigned, float, short, char (char is int), signed, _Bool, _Complex, _Imaginary, long, thus forming our C2 language, C2 language keywords are as follows:
enum
break return void
case
for while
goto
continue if
default struct
do int switch
double
else union
//共18个
Continuing to think, even the C2 language with only 18 keywords, there are still many advanced places, such as compound data structures based on basic data types. In addition, there are no operators in our keyword table. In C language Compound assignment operator ->, operator ++,-and other overly flexible expressions can also be completely deleted at this time, so the keywords that can be removed are: enum, struct, union, so that we can get the key of the C1 language word:
break return void
case
for while
goto
continue if
default
do int switch
double
else
//共15个
It's close to perfect, but the last step is naturally a little bigger. At this time, the arrays and pointers have to be removed. In addition, the C1 language still has a lot of verbosity. For example, there are multiple expression methods for controlling loops and branches. In fact, they can all be simplified into one. Specifically, loop statements have While loop, do...while loop and for loop, you only need to keep the while loop; the branch statement also has if...{}, if...{}...else, if...{}...else if..., switch, these four Form, they can all be realized by two or more if...{}, so only if,...{} is enough. But think again, the so-called branch and loop are just conditional jump statements, and the function call statement is just a stack and jump statement, so only goto (unrestricted goto) is needed. Therefore, boldly remove all structured keywords, not even functions, and the C0 language keywords obtained are as follows:
break void
goto
int
double
//共5个
5.
Disclaimer: This article is reproduced online, and the copyright belongs to the original author. If you are involved in copyright issues, please contact us, we will confirm the copyright based on the copyright certification materials you provide and pay the author's remuneration or delete the content.