Introduction to JVM

JVM is the abbreviation of Java Virtual Machine, which means Java Virtual Machine.
A virtual machine refers to a complete computer system with complete hardware functions simulated by software and running in a completely isolated environment.

Common virtual machines: JVM, VMwave, Virtual Box.
The difference between the JVM and the other two virtual machines:

VMwave and VirtualBox are instruction sets that simulate physical CPUs through software, and there are many registers in the physical system;
The JVM simulates the instruction set of the Java bytecode through software. In the JVM, only the PC register is mainly reserved, and other registers are
trimmed.
The JVM is a customized computer that does not exist in reality.

In daily development, Java programmers generally do not use things inside the JVM. If you want to understand more deeply, you can read this book. There are a lot of dry goods.

1. JVM memory area division

insert image description here

JVM memory is applied from the operating system, divided into different areas, and different areas complete different functions

什么是线程私有?
Since the multi-threading of the JVM is achieved by switching threads in turn and allocating processor execution time, at any given moment, a processor (a multi-core processor refers to a core) will only execute one thread. instruction. Therefore, in order to restore the correct execution position after switching threads, each thread needs an independent program counter, and the counters between each thread do not affect each other and are stored independently. We call this type of area "thread-private" memory

1.1 Program Counter (Thread Private)

The role of the program counter: used to record the line number executed by the current thread.

It is in memory, where is the address 最小的区域of the next item to be executed 指令...

指令It is the bytecode. If the program wants to run, the JVM has to load the bytecode and put it in the memory. , which one is currently being executed.
CPU is a concurrent execution process, it does not only provide services for one process, it has to serve all processes, precisely because the operating system is scheduled and executed in units of threads, each thread All have to record their own execution location, that is, the program counter, one for each thread.

1.2 Java virtual machine stack (thread private)

Describes the local variables and method call information. When a method is called, each time a new method is called, it involves a "push" operation, and every time a method is executed, it involves a "pop" operation. The
stack space is Relatively small, the size of the stack space can be configured in the JVM, but generally it is only a few M or tens of M, so the stack is very likely to be full (normally, we are generally fine when writing code, but we are afraid of recursion. Set up, there will be a stack overflow: StackOverflowException)

The role of the Java virtual machine stack: The life cycle of the Java virtual machine stack is the same as that of the thread. The Java virtual machine stack describes that
内存模型：each method executed by a Java method will create a stack frame (Stack Frame) for storing 局部变量表、操作数栈、动态链接、方法出口and other information at the same time of execution. . In the heap memory and stack memory we often talk about, the stack memory refers to the virtual machine stack.
The Java virtual machine stack includes the following four parts:
1. Local variable table: Stores various basic data types (8 basic data types) and object references known to the compiler. The memory space required by the local variable table is allocated during compilation. When entering a method, how much local variable space the method needs to allocate in the frame is completely determined, and the local variable table size will not be changed during execution. Simply put, it stores method parameters and local variables.
2. Operation stack: Each method generates a first-in, last-out operation stack.
3. Dynamic Linking: A method reference to the runtime constant pool.
4. Method return address: the address of the PC register

1.3 Native method stack (thread private)

The native method stack is similar to the virtual machine stack, except that the Java virtual machine stack is used by the JVM, and the native method stack is used by the native method.

1.4 Heap (thread sharing)

The role of the heap: all objects created in the program are stored in the heap

A process has only one copy, and multiple threads share a heap, which is also the area with the largest space in memory. The new object we create is in the heap, and the member variables of the object are naturally also in the heap.

Note: 内置类型的变量在栈上,引用类型的变量在堆上, this statement is wrong,
it should be that local variables are on the stack, member variables and new objects are on the heap

1.5 Method area (thread sharing)

The role of the method area: used to store data such as class information, constants, static variables, and code compiled by the real-time compiler loaded by the virtual machine.

In the method area, the "class object" is placed, the so-called "class object": .javathe code we write will become .class(binary bytecode) and .classwill be loaded into the memory, that is, the class constructed by the JVM Object (the loading process is called "class loading"), "class object" describes what the class looks like, what is the name of the class, what members are in it, what methods are there, and what is the name of each member? Type (public/private...), what is the name of each method, what type is it (public/private...), the instructions contained in the method...
There is also a very important thing in "class object", 静态成员(static)

Members modified by static become "class attributes", while ordinary members are called "instance attributes"

2. JVM class loading mechanism

Class loading is actually an important core function of designing a runtime environment. What does class loading do? It loads .classfiles into memory and builds them into class objects

2.1 Class loading process

Class loading life cycle:
The first 5 steps are in a fixed order and are also the process of class loading. The middle 3 steps are all connected, so for class loading, it is divided into three steps: Loading,Linking,Initialization(Try to use when answering others English)

2.1.1 Loading

The "Loading" stage is a stage in the entire "Class Loading" (Class Loading) process. It is different from Class Loading. One is Loading and the other is Class Loading, so don't put the two together. confused

在加载 Loading 阶段，Java虚拟机需要完成以下三件事情：
1) Get the binary byte stream defining this class by its fully qualified name.
2) Convert the static storage structure represented by this byte stream into the runtime data structure of the method area.
3) Generate a java.lang.Class object representing this class in memory as the access entry for various data of this class in the method area.

The summary is to first find the corresponding .classfile, then open and read the .classfile (byte stream), and initially generate a class object

In a key link in Loading, .classwhat exactly is in the file? For
details, please refer to the official documentation: https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html

According to the format in the above figure, the read and parsed information will be initially filled into the class object

2.1.2 Linking

A connection is generally the establishment of a connection between multiple entities

一: 验证(Verification)
The main thing is to verify whether the content read exactly matches the format specified in the specification. If it is found that the data format read here does not conform to the specification, the class loading will fail and an exception will be thrown.

二: 准备（Preparation）
The preparation stage is to formally allocate memory for the variables defined in the class (ie static variables, variables modified by static) and set the initial value of the class variable
比如:
.
123

三: 解析（Resolution）
The parsing phase is the process in which the Java virtual machine replaces the symbolic references in the constant pool with direct references, that is, the process of initializing constants.
It means that in the .classfile , the constants are centrally placed, and each constant has a number. .classThe initial situation in the file structure only records the number, so it is necessary to find the corresponding content according to the number and fill it into the class object.

2.1.3 Initializing

In the initialization phase, the Java virtual machine actually starts to execute the Java program code written in the class, and transfers the dominance to the application. The initialization phase is the process of executing the class constructor method, which is to really initialize the class object, especially for static members

Typical interview questions: When will the loading of a certain class be triggered (code example)?

What is its printing order?
Result:

As long as the class is used, the class must be loaded first (like instantiation, calling methods, calling static methods, inherited... all counted as being used)

大的原则:
1: The static code block will be executed in the class loading phase. To create an instance, it is necessary to perform class loading first;
2: The static code block is only executed once in the class loading phase
. Execution, the construction code block is in front of the construction method
4: The parent class is executed first, and the subclass is executed in the back
5: Our program is executed from the main, main here is the method of Test, so to execute the main, you need to load TestDemo first

Our program is executed from the main method, main here is the method of the TestDemo class, so to execute main first, you need to load TestDemo first, and TestDemo inherits B, to load TestDemo, you need to load B first, and B inherits A, and load A first

2.2 Parent delegation model

This thing is not very useful in our work, but it is often asked during interviews...
This thing is a link in class loading. This link is in the Loading stage (the front part), the parent delegation model, in fact, is In the JVM 类加载器, how to find the .classfile

类加载器:The JVM provides a special object, called a class loader, which is responsible for class loading. Of course, the process of finding files is also the responsibility of the class loader...
.classFiles may be placed in many locations, some are placed in the JDK directory, some are placed in In the project directory, some are placed in other specific locations, etc. Therefore, multiple class loaders are provided in the JVM, and each class loader is responsible for a slice...

There are three default class loaders:
1: BootstrapClassLoader
responsible for loading the classes in the standard library (String, ArrayList, Random, Scanner...)
2: ExtensionClassLoader
responsible for loading the classes extended by the JDK (rarely used now)
3: ApplicationClassLoader
responsible for loading the classes in the current project directory

In addition, programmers can also customize the class loader to load classes in other directories. For example, Tomcat customizes the class loader to specifically load the class loader in webapps .class…

Our parent delegation model describes this process of finding the directory, that is, how the above classloaders work together...
insert image description here

This set of search rules is called the "parental delegation model" (this is transliterated, parent can be either father or mother, according to the rules, it is not impossible to call him "single-parent delegation model", of course, name it It's not up to us to decide)

Why is the JVM designed this way?
The reason is that once the fully qualified class name of the class written by the programmer and the class in the standard library is repeated, the class in the standard library can also be loaded smoothly!!
Just like java.lang Classes such as .String are defined by ourselves. If the program is loaded, it is still a class in the standard library, so that there will be no conflicts and security is guaranteed.

If a custom class loader also abides by the parent delegation model?
It can be complied with or not, depending on the requirements.
Just like Tomcat loads classes in webapps, it does not comply (because compliance is meaningless)

3. JVM garbage collection mechanism (GC)

3.1 What is garbage collection

Garbage collection (GarbageCollection, GC), when we write code, we often apply for memory, create variables, new objects, load classes... These are all applying for memory, all from the operating system, since memory is applied, then We are 不用的时候definitely going to return the memory as well.

Generally speaking, the timing of applying for memory is clear (if you need to save some data, you need to apply for memory), but the period of releasing memory is not so clear. We don’t know if we still need this memory.

For example: Suppose when you get home in the afternoon, you just throw away your clothes and don't care about them. When your mother finds out, she organizes your clothes and puts them in the closet. The next day , If you don't wear it, it's fine, but you have to wear this dress to go out, you go to the original location to find it, eh? It's gone, isn't it embarrassing... Is it okay (这就是内存释放早了)
to release later? It's not very good, like You occupy a seat in the library, you occupy the seat early in the morning, and you don't go all day, isn't it embarrassing, you don't need to occupy the seat, and no one else can use (这就是内存释放迟了)
it and what we want is to be able to not sooner or later

3.2 Why does the garbage collection mechanism appear?

Like C language: "I don't care about memory release, you programmers can do it yourself, anyway, you won't deduct my money..." Therefore, in C language, you will encounter a common headache = > "内存泄露"(申请之后,忘了释放)=> The available memory is getting less and less, and finally no memory is available!! Therefore, "memory leak" is a headache for C/C++ programmers, some leak fast, some leak slowly, and the timing of exposure is uncertain , if it appears, it is difficult to investigate. C++ later proposed a 智能指针(大概就只是简单依赖了一下 C++ 中的 RAII机制,其实一点也不智能..)mechanism like this, through which the risk of "memory leak" can be reduced to a certain extent... But it is a younger brother in front of many mechanisms of java (funny )

So, like Java, GO, PHP... Most of the mainstream programming languages on the market now have adopted a solution, which is the garbage collection mechanism!!
Probably there is a runtime environment (like JVM, Python interpreter, Go runtime...) To determine whether the memory can be reclaimed through a more complex strategy, and perform the reclamation action... Garbage collection, in essence, relies on the runtime environment, and does a lot of extra work to complete the operation of automatically releasing memory, which puts a great burden on the programmer's mind. Reduced

However, garbage collection also has disadvantages:
1: It consumes additional overhead (more resources are consumed)
2: It may affect the smooth running of the program (garbage collection often introduces STW (Stop The World, like time still) problem )

Garbage collection is so fragrant, why does C++ not introduce GC?
In fact, some bosses have proposed this plan, but it has not been implemented, because the C++ language has two high-voltage lines, which are its core principles:
1:it is compatible with the C language, and can also be compatible with various All kinds of hardware operating systems are maximized and compatible with
2: the most extreme performance...
such as artificial intelligence, game engines, high-performance servers, operating system kernels... For scenarios with extremely high compatibility/performance requirements, C/C++ is still required.

3.3 What to Recycle in Garbage Collection

What is reclaimed is memory, but the memory includes: program counter, stack, heap and method, some are reclaimed, some are not reclaimed:

程序计数器:Fixed size, no release is involved, so the GC
栈:function does not need to be executed, and the corresponding stack frame is automatically released, so there is no need for GC
堆:to require GC. A large amount of memory in the code is in the heap
方法区:class object, which is loaded by the class. To perform "class unloading", memory needs to be released, and the unloading operation is actually a very low frequency operation (rarely involves garbage collection)

We will discuss garbage collection on the heap here.
First look at this picture:
insert image description here

The above picture can be understood as three factions: 积极派,消极派,中间摇摆派,

积极派: The memory that is in use will not be released
消极派:. The memory that is no longer in use must be released
中间摇摆派:. The part between red and blue represents that some are in use, and some are not used. In this case 是不释放的, it is not released until it is used up.

There will be no "half object" in GC, mainly to make garbage collection more convenient and simpler, remember:垃圾回收的基本单位是"对象",而不是字节

3.4 How to implement garbage collection

is divided into two major stages,第一阶段: 找垃圾/判定垃圾.., 第二阶段: 释放垃圾..

Just like cleaning the room, first put all the rubbish into the trash can, and then throw it out of the room...

3.4.1 How to find garbage/judgment garbage

Our current mainstream thinking has two solutions:
1: 基于引用计数（不是Java中采取的方案，这是别的语言，像Python采取的方案）
2:基于可达性分析（这个是Java采取的方案）
pay attention to when others ask you:
1: Talk about how to determine whether it is garbage
in the garbage collection mechanism 2: Talk about how to determine whether it is garbage in the garbage collection mechanism of Java
These two problems are pitted , this is based on reachability analysis, but it is based on reference counting.

① Based on reference counting

For each object, an additional small piece of memory will be introduced to save how many references to this object point to it

For example: Test t = new Test(); t is a reference to this object, so the Test object has a reference, and the reference count is 1.

If you write again: Test t2 = t , then it means that both t and t2 point to this object , at which point our reference count becomes 2

When the reference count is 0, it is not in use, it is considered garbage, and the memory is released
insert image description here

Disadvantages of reference counting:
1: space utilization is relatively low! ! Each new object must be matched with a counter (the counter assumes 4 bytes), if the object itself is very large (hundreds of bytes), 4 bytes more, it is nothing, but if the object itself is small (I only have 4 bytes), 4 bytes more, which is equivalent to half of the space being wasted.
2:There will be a problem of circular reference

② Based on accessibility analysis

It is to scan objects in the entire memory space periodically through additional threads. There are some starting positions (called GCRoots), which will mark all objects that can be accessed like depth-first traversal. (with markup The object is reachable), the object that is not marked is unreachable, that is, garbage...

GCRoots: refers to local variables on the stack, objects pointed to by references in the constant pool, objects pointed to by static members in the method area...

for example:

优点:Overcome the two shortcomings of reference counting, low space utilization, circular reference problem
缺点:, high system overhead, if there are too many objects in memory, it may be slow to traverse once, consuming time and system resources

In short, the core of looking for garbage is to confirm whether the object will be used in the future, so what is not used? That is, if there is no reference point, it will not be used.

3.4.2 Garbage Collection Algorithm

① Mark-Sweep Algorithm

Marking is the process of reachability analysis, and clearing is to directly release the memory
insert image description here
. If the memory is released directly at this time, although the memory is returned to the system, we find that the released memory is discrete and not continuous, and the problem it brings us is "Memory Fragmentation"

There is a lot of free memory. If we assume that the total memory is 1G, if we apply for 500M of memory, it is also possible that the application will fail (because the applied 500M is contiguous memory), and each application, the memory must be a contiguous space , and the 1G free memory here may just be "memory fragmentation", adding up to 1G

② Copy algorithm

In order to solve the "problem of memory fragmentation", a copying algorithm was introduced. Generally speaking, it is "use half, lose half"
insert image description here

Directly copy the non-garbage to the other half, and release the original space as a whole!!
insert image description here

优点:Solved the problem of "memory fragmentation"
缺点:1. Low memory space utilization 2. If there are many objects to be retained and few objects to be released, the replication overhead will be very large.

③ Marking-collating algorithm

insert image description here

优点:The space utilization is high, but the problem of high
缺点:cost of copying/moving elements is still not solved

Although all of the above are flawed, the implementation in the JVM will use a combination of various schemes.

④ Generational recycling algorithm

For classifying objects (classified according to the "age" of the object), an object that has survived a round of GC scans is called "one year older", and different plans are adopted for objects of different ages...

insert image description here

注意:There may be a saying on the Internet: 98% of new objects cannot survive a round of GC, and 2% of new objects will enter the survival area. This number is actually unreliable. If someone asks, it is best not to say that, just say Most objects can't survive one GC round.

Talking about JVM (interview often test)

Introduction to JVM

1. JVM memory area division

1.1 Program Counter (Thread Private)

1.2 Java virtual machine stack (thread private)

1.3 Native method stack (thread private)

1.4 Heap (thread sharing)

1.5 Method area (thread sharing)

2. JVM class loading mechanism

2.1 Class loading process

2.1.1 Loading

2.1.2 Linking

2.1.3 Initializing

2.2 Parent delegation model

3. JVM garbage collection mechanism (GC)

3.1 What is garbage collection

3.2 Why does the garbage collection mechanism appear?

3.3 What to Recycle in Garbage Collection

3.4 How to implement garbage collection

3.4.1 How to find garbage/judgment garbage

① Based on reference counting

② Based on accessibility analysis

3.4.2 Garbage Collection Algorithm

① Mark-Sweep Algorithm

② Copy algorithm

③ Marking-collating algorithm

④ Generational recycling algorithm

Guess you like