JVM basics (1): Know the virtual machine

Preface

Nice to meet you~ Welcome to read my article.

JVM is a hurdle that every Java programmer must overcome because it is so important. The underlying knowledge of Java, in the final analysis, is JVM-related knowledge. Many readers feel that when they see JVM: Oh, this is the low-level knowledge, let's learn the application layer first. Or when it comes to the interview, you have to recite a few questions. In my opinion, understanding the JVM is the only way to understand the entire Java ecosystem. The development of programming languages, from the machine code 01 string to the current high-level Java language, has condensed the wisdom of countless pioneers. Understanding JVM is not only for interviews, but also to feel the wisdom of predecessors. Learning JVM can also make our Java programs more robust.

As for non-JVM researchers, maybe we don't need to do in-depth research, and understanding his conceptual model is enough. As an Android engineer, JVM is also one of the must learn. JVM basics series of articles, mainly about the key basic knowledge that JVM needs to know, and will not go deep into the underlying knowledge of JVM. For an Android engineer, these are definitely enough. Readers who are interested in JVM can continue to study in depth after reading it. I believe this series of articles can also provide you with a guide for learning JVM.

In the first article, let's talk about JVM. Why is JVM a big step for humans in programming languages? What role does JVM play in the operation of the entire Java program? JVM is written in C language, so is C language a cross-platform language? If C is a cross-platform language, why develop Java? Isn’t C faster? Why is Java so popular, but the C language is still not in decline? With these questions in mind, this article will talk about the brief history of the development of computer languages, and see how machine code evolved into Java step by step. If you understand this knowledge, you can better understand the JVM.

The birth of C language

We know that computer hardware only knows the 01 string of machine code. In the early days of computer development, programmers wrote programs by writing 01 string. For example, a pinhole cassette is one of the forms:

Of course, the computers at that time were not the same as ours today, and the computers at that time could only achieve some very simple operations. For example, we need to calculate: 4+8, then the calculation instructions need to have:, 0100,1000,1101where the first binary string represents the number 4, the second represents the number 8, and the third represents addition instructions (of course these instructions are not Existing, just to give an example). This way of writing programs is not only inefficient, complicated, and unclear code semantics, and is not suitable for increasingly complex program development . So the pioneers thought of a way, can you take a "name" for these instructions, for example 1101, write it as add, and then use a compiler to addmap it 1101, so that next time you write 4+8, you can write 4,8,addit as Our settings are converted to 0100,1000,1101, so it’s not very convenient. This is the origin of assembly language.

As mentioned above, we can not only write addition operations as instructions, but also write subtraction, assignment, loop, branch, etc. as instructions, and finally compile them into machine code. Compared with directly using machine code to write programs, using assembly language to write programs greatly improves development efficiency, and is suitable for the development of more complex programs, with clearer code semantics .

Different machines have different instructions. For example, the addition instruction of machine A is 0010, while machine B is 0101. There is a one-to-one correspondence between assembly language and cpu machine code, so different machines have different specific assembly languages. With the development of computers, business scenarios have become more and more complex, and assembly language has also begun to appear more than enough. At this time, a more advanced programming language than assembly is needed to improve the programmer's development efficiency. At this time C language was born. (Of course there is a B language in the middle, as well as some other details, here we mainly talk about the general evolution of the language, without going into details).

Similar to assembly language formed by machine code, C language also has a higher level of abstraction to assembly language. Through the C language compiler, you can compile C language into assembly language. In this way, work efficiency is once again improved, and more complex business logic is completed. As shown below:

As we mentioned earlier, the assembly language of different platforms is different, so on different platforms, it is necessary to write different C language compilers to compile C language into corresponding assembly language.

Therefore, to run a C language program on a machine, first use a C language compiler to compile the C language into the assembly language of the machine, and then use the assembly language compiler of the machine to compile the assembly language into machine code. In this way, the c program can run.

Did the chicken or the egg come first?

We know that C language compilers are mostly written in C language, so what was the world's first C language compiler written in? The chicken or the egg came first...

In fact, the C language and C language compiler did not become as perfect as they are today. The early C language compiler was written in assembly language, but it only realized the most basic functions of the C language , such as basic data types, basic operations, and basic code control. In this way, the task of the compiler is less, and the assembly language can be implemented faster. Then, build on the C language that has been implemented, use C language to continue to develop C language features, and continuously enrich the functions of the compiler. Therefore, there are C0, C1, C2...Various versions of the C language, the versions are constantly iterating, and the features of the C language are getting more and more useful. The progress of C language has also promoted its own development, which is equivalent to a virtuous circle.

Similar to the development of assembly language to C language, we can also continue to develop more advanced languages ​​on the basis of C language to satisfy our development, such as Java. The so-called 0 produces 1, 1 produces C, and C produces everything. Why do computer students generally need to learn the C language first? Because C language is the originator and foundation of all high-level programming languages ​​today. Interested readers can read this article for in- depth analysis: Since the C compiler is written in C language, how did the first C compiler come from? understand more.

Can C language be cross-platform?

We write a C language program on the window operating system, then use the compiler, click run, and the program will run. We copy the code, then compile it on the Linux system, and it can also run. So does this prove that C language is compatible on different platforms, that is, C language is cross-platform?

The answer is: no! First, we have to clarify the concept of cross-platform. Cross-platform refers to that the language does not depend on the operating system or the hardware environment. Applications developed under one operating system can still run under another operating system . The key that C language can be compiled and run on different platforms is: compiler , we must write compilers for different platforms of C language, and then use the compiler to compile C language into corresponding machine code, then this C language program can run , He can not do platform independence, so C language is not a cross-platform language .

So just write a C language compiler for different platforms, can the same C language program be run on different platforms? The answer is: NO! As we know before, the C language is actually related to the assembly language of the specific platform, and the assembly language is related to the machine instructions of the specific machine, so the C language will be different on different platforms. For example, the integer variable int may be 16 bits long or 32 bits long on different machines. Interested readers can read this article C why not cross-platform to learn more.

Cross-platform protagonist: JVM

As mentioned earlier, C language cannot be cross-platform. Every C language program must be compiled into the machine code of the corresponding machine by a compiler to run. In order to achieve cross-platform, we need to have a program A, which can dynamically compile the code into the machine code of a specific platform when ** is running, ** so that the same program can be run directly on different platforms without first Compile the program into the corresponding platform. This program A is a virtual machine. The virtual machine here does not refer to the Java virtual machine, but refers to the virtual machine in general, and the JVM is just one type of virtual machine. What the virtual machine has to accomplish is: a program does not need to be compiled, and the program can be run directly . The virtual machine can treat the program we write as its own execution instructions, and then execute it line by line, without all needing to be converted into the machine code of the corresponding platform.

JVM, the full name is Java Virtual Machine (Java Virtual Machine), but it does not support Java language, in fact, JVM only recognizes class files. The Java programs we write need to be compiled before they can be run on the virtual machine. The compiled result of the Java file is called bytecode, which is the class file. The class file is the only code that the virtual machine recognizes, just like the machine only recognizes the machine code, so the class file can also be called the "machine code" of the virtual machine. As shown below:

JVM is essentially a program written in C language. This program has a function similar to that of a C language compiler: it compiles high-level language into machine code for the corresponding platform. But the difference is that the C language compiler must fully compile the C program before running it, while the JVM can directly run the class file and compile it into machine code while executing it . Each platform has a different virtual machine, and the principle of having different C language compilers for different platforms is the same as we mentioned above. If a platform wants to run a Java program, it must first configure the Java environment, that is, install the JRE (Java Runtime Environment). The Java runtime environment can be considered as a JVM program installed on the platform. When we run the class file program on a platform, first start the virtual machine, the virtual machine loads the class file and then runs. Therefore, after the Java program we write is compiled into a unified class file, only if the platform is installed with a virtual machine, we can directly run our program, thus achieving cross-platform.

So, so far, it seems that the virtual machine is really like a "machine", but its machine code is not a 01 string, but a class file. JVM has its own memory division, such as method stack, heap area, constant pool, etc. In actual operation, the virtual machine also compiles the class file into machine code to run on the real machine. Through the virtual machine, we are shielded from the real machine, and we see a virtual machine. Our development no longer need to care about the specific machine, only need to focus on the JVM.

At this point, I don’t know if the reader will be curious: Why does the virtual machine not directly run Java programs, but must first compile Java into bytecode (that is, class files) before running it? Isn't it superfluous? This is actually another design goal of the JVM: language independence . As mentioned above, the existence of JVM makes it possible to directly execute class files, that is, bytecode, on major platforms, realizing platform independence. And any high-level programming language, as long as it can be compiled into a class file, then this language can run on the JVM, so as to realize the platform independence of the JVM. For example, Groovy, Kotlin, they can all be compiled into class files, then they can also run on the JVM, and these languages ​​can also be collectively called: JVM languages. As shown below:

Therefore, to be more precise, it should not be called a Java virtual machine, but a class virtual machine.

Why hasn't the c language declined?

Today, when cross-platform languages ​​such as the Java language are in full swing, why hasn't a compiled language such as C language declined? Instead, occupy the top of the programming language rankings all year round? This involves two important costs that a virtual machine needs to pay: speed and environment .

First look at the first factor. The virtual machine directly executes the intermediate code, which looks great, but in fact it needs to interpret the intermediate code as machine code before it can actually run. For example, the intermediate code of JVM is a class file. Then the process of running involves the process of interpreting the intermediate code, and the executable program itself is machine code, which can be directly executed, and the speed difference between the two is very much. Why the ios system always feels smoother and faster than android, the virtual machine is a very important factor. The IOS program is directly compiled into an executable program, which is fast, but also pays the price of not being able to cross-platform. But, can ios programs run on other systems? In fact, I think that Google's choice of JVM language as the development language of Android is not the best choice. The android program we develop is not practical enough to cross-platform. Our program will only run on the android system, not on other systems, and in fact cannot run on other systems. Our program needs to call the api of the android system and need to interact with the system, which in itself has determined that we must run on the android system. Choosing the JVM language to develop android brings an insignificant cross-platform feature, but at a huge performance price.

The second factor is the environment. JVM language requires JVM to run, and a platform must install JRE before executing programs. So for some small machines, such as watches, etc., the memory is very small and the cpu capacity is also very limited. The memory cost and performance consumption brought by JVM are beyond these machines. At this time, a compiled language like C is the best choice.

There is another price to add to the JVM: JVM code cannot directly manipulate memory . The more advanced and abstract a language is, the lower the correlation between it and the machine. The closer you get to people, the farther you get away from machines. Languages ​​such as C/C++ have very powerful capabilities: operating memory. In the final analysis, they also evolved from machine code, so they are very suitable for some embedded programs or programs that need to manipulate machine memory, such as JVM. High-level languages ​​such as Java are powerless. Of course, there are also JVMs written in Java programs, but in fact, it is the same as the "chicken or egg" principle we discussed earlier, and the help of C language is still needed.

In short, cross-platform languages ​​and directly compiled languages ​​are applicable to different scenarios. It is by no means a cross-platform language that defeats the C language, but another better compiled language, but for now, it cannot appear in the near future.

Virtual machine family

The above content is more biased towards the virtual machine in a broad sense. This part is dedicated to talking about the Java virtual machine.

As mentioned earlier, there is more than one type of virtual machine. In fact, there is more than one type of Java virtual machine. The current mainstream virtual machine, that is, the virtual machine used by Sun/OracleJDK and OpenJDK, is HotSpot. HotSpot was originally developed by a small company, and was later acquired and developed by Sun. After Oracle acquired Sun, it integrated the excellent features of BEA's JRockit virtual machine into HotSpot. At the same time, because Sun/OracleJDK is dominant in Java applications, HotSpot is also known as the most widely used virtual machine. In addition, there are the aforementioned JRockit virtual machine of BEA, IBM J9 virtual machine and so on. The former was acquired by Oracle and integrated into HotSpot, and the latter is still very active, but it is still relatively small compared to HotSpot.

Android executes Java programs, and naturally there are virtual machines. However, the Android virtual machine cannot be called JVM, because it does not meet the Java virtual machine specifications, but has been transformed according to the characteristics of the mobile terminal. It is based on a register architecture instead of a stack architecture. At the same time, the Android virtual machine is not running a class file or a jar file, but a dex file, but the dex file can be transformed from the first two. The first Android virtual machine was the Dalvik virtual machine. In order to improve performance, the JIT compiler (Just In Time Compiler) was used to compile the bytecode. The characteristic of JIT is: interpret the hot code during use, that is, the code that is used more frequently, as machine code, so that the next time it runs to this place, the speed can be increased. That is, the longer the running time, the smoother the application. After Android5.0, the Dalvik virtual machine was replaced with the ART virtual machine. The feature of ART is AOT (ahead of time comlilation), which means ahead of time compilation. He can compile all the code into machine code when installing the application, so the overall performance is very well improved. But it has paid a lot of price: long installation time and a certain amount of space to store the machine code. Then, in Andoid7.0, JIT was also added to the ART virtual machine, so that when installing, only a small part of it needs to be compiled in advance, and when it runs, when it encounters a place that has not been compiled, then use JIT to compile .

to sum up

This article explains the development process of programming languages, and the key to cross-platform realization: virtual machines, and finally discusses the characteristics of cross-platform languages ​​and compiled languages ​​and the virtual machine family.

As the key to cross-platform languages, virtual machines enable us to learn knowledge that cannot be ignored in cross-platform languages. Before learning virtual machines, it is necessary to understand the background of virtual machines and the problems that virtual machines solve. Today, we are able to use Java to develop all kinds of complex applications, but also because of the heavy work of developing a virtual machine. Our predecessors helped us to do it, and we are still constantly improving it. This is the crystallization of the wisdom of tens of millions of predecessors over the past decades. With awe of knowledge, it is more worthy of our study.

This is the full text. It is not easy to be original. If it is helpful, you can like it, bookmark it and forward it.
The author has limited ability. If you have any ideas, please feel free to communicate and correct in the comment area.
If you need to reprint, please private message.

Also welcome to my personal blog: Portal

Guess you like

Origin blog.csdn.net/weixin_43766753/article/details/109199647