JVM - class loading and garbage collection

Table of contents

foreword

Introduction to JVM

JVM memory area division

JVM class loading mechanism

1. load

Parental Delegation Model

2. Verification

verification options

3. Prepare

4. Analysis

5. Initialization

trigger class loading

JVM garbage collection strategy GC

One: find out who is garbage 

1. Reference counting

2. Accessibility analysis (this solution is adopted by Java).

Two: Release the garbage object

Three Typical Strategies

JVM implementation ideas


foreword

When we are learning JVM, there are actually a lot of content in it, but most of the content inside is stereotyped. If you want to understand it thoroughly, you need to read a lot of source code about JVM. The source code of JVM is Written in C++. If you want to study in depth, you can read the book "In-depth Understanding of Java Virtual Machine".

This article focuses on common interview questions in JVM.

Introduction to JVM

JVM is the abbreviation of Java Virtual Machine, which means Java virtual machine.
A virtual machine refers to a complete computer system that is simulated by software and has complete hardware functions and runs in a completely isolated environment.
Common virtual machines: JVM, VMwave, Virtual Box.
The difference between the JVM and the other two virtual machines:

  1. VMwave and VirtualBox simulate the instruction set of the physical CPU through software, and there will be many registers in the physical system;
  2. The JVM simulates the instruction set of Java bytecode through software. Only the PC register is mainly reserved in the JVM, and other registers are trimmed.

The JVM is a customized computer that doesn't exist in reality.

JVM memory area division

The JVM is actually a Java process, and the Java process, that is, the JVM will apply for a large memory space from the operating system for Java code to use.

The JVM further divides the memory space requested by the operating system, and provides different uses of each divided space.

Among them, the core is the stack, heap, and metadata area (method area).

  • The virtual machine stack is used by Java code, mainly storing some local variables, and maintaining the calling relationship between methods.
  • The native method stack is used by native methods inside the JVM.
  • Stored on the heap are new objects and member variables.
  • What is stored in the program counter is a memory address. This memory address is the address where the next bytecode is to be executed. Its function is to record the instruction that the current program executes to.

It should be noted that there is only one copy of the heap and metadata area in one JVM, that is, multiple threads share the heap area and metadata area.

There are multiple copies of the stack (local method stack and virtual machine stack) and program counter, that is, each thread will have one.

There is a one-to-one relationship between the thread operation of the JVM and the thread operation of the operating system. That is to say, every thread created in Java code will have a thread corresponding to it in the operating system.

The interview questions here are mainly to determine which area of ​​the JVM a certain variable or object is in?

For example the following code:

void func() {
    Test t1 = new Test();
}

In the above code, we instantiate a Test object in a method.

 The func method is stored as some binary instructions in the metadata area.

We can see that the t1 variable is defined in the method, so it is a local variable, and the local variable is stored on the stack.

And new Test(); The body of this object is on the heap.

In fact, like the interview questions about the JVM area here, we only need to know what is stored in each area of ​​the JVM.

  • The virtual machine stack is used by Java code, mainly storing some local variables, and maintaining the calling relationship between methods.
  • The native method stack is used by native methods inside the JVM.
  • Stored on the heap are new objects and member variables.
  • What is stored in the program counter is a memory address. This memory address is the address where the next bytecode is to be executed. Its function is to record the instruction that the current program executes to.

JVM class loading mechanism

For a class, its life cycle is as follows:

 The previous 5 steps are also the process of class loading and the fixed order. We mainly study the previous 5 steps.

Specifically, class loading is to load a .class file, that is, a compiled class file, into memory, and the process of obtaining a class object is called class loading.

For a program to run, it needs to load instructions and data into memory. This is what class loading does.

Here are the 5 steps of class loading:

1. load

The loading process here is actually simple, that is, to find the .class file and then read the contents of the file.

But in the process of finding .class files, there will be a very important mechanism: the parent delegation model

Parental Delegation Model

In the JVM, loading classes requires a special set of modules: class loaders.

In the JVM, there are three built-in class loaders.

  • BootStrap ClassLoader is responsible for loading classes in the Java standard library
  • Extension ClassLoader is responsible for loading some non-standard classes that are Sun/Oracle extension libraries
  • Application ClassLoader is responsible for loading classes written in the project and classes in third-party libraries

When specifically loading a class, his process is like this:

A fully qualified class name of a class needs to be given first, and the class name "java.lang.String" is in the form of a string.

If a class loader receives a class loading request, it will not try to load the class by itself first, but delegate the request to the parent class loader to complete. This is the case for each level of class loader, so all The loading request should eventually be transmitted to the top-level BootStrap ClassLoader class loader. Only when the parent loader reports that it cannot complete the loading request (it does not find the required class in its search scope), the child loader will try. Do the loading yourself.

For details, please refer to the following figure:

2. Verification

Since the .class file has a clear data format (binary), the main purpose of this stage is to ensure that the information contained in the byte stream in the Class file complies with all the constraints of the "Java Virtual Machine Specification".

verification options

File Format Validation

bytecode verification

Symbolic reference verification...

3. Prepare

The preparation stage is the stage of formally allocating memory for the variables defined in the class (that is, static variables, variables modified by static) and setting the initial value of the class variables.

For example, the following code:

public static int value = 123;

At this time, the value of value in the preparation phase is not 123, but 0.
 

4. Analysis

The parsing phase is the process in which the Java virtual machine replaces the symbol references in the constant pool with direct references, that is, the process of initializing constants.

  • Symbolic references: string constants already exist in the .class file, but they only know their relative positions to each other, not their specific positions in memory.
  • Direct reference: When actually loaded into memory, the string constant will be filled to a specific address in memory. At this time, the string reference is a direct reference (that is, a common reference in Java).

5. Initialization

In the initialization phase, the JVM actually executes the Java code written in the class and hands over the dominance to the application. The initialization phase is the process of executing the construction method of the class. (If a class has a parent class, you need to initialize the parent class first, and then initialize the subclass).

trigger class loading

Note: The action of class loading does not mean that the JVM will be loaded as soon as it starts, because the JVM as a whole is a lazy loading strategy, that is, it is not necessary and does not load.

The following three conditions will be loaded:

  1. Created an instance of this class
  2. Static methods/static properties of this class are used
  3. Using a subclass will trigger the loading of the parent class

JVM garbage collection strategy GC

Garbage collection in Java is a mechanism to help us automatically release memory.

Interview Question: Why is Garbage Collection Mechanism Needed?

Because during the running of the program, a large amount of memory space will be applied to the operating system, but these spaces may also be exhausted, because the memory space is continuously allocated without recycling, it is like constantly producing domestic garbage without cleaning it. .

Above we talked about several areas of the JVM, so which area does garbage collection release?

It should be noted that each thread will have a copy of the stack and program counter. They will be destroyed together with the destruction of the thread.

The class objects stored in the metadata area are rarely destroyed.

So what we free is the space in the heap. We mentioned above that the heap mainly stores new objects.

GC is released in units of objects. (release object)

GC is mainly divided into two phases:

One: find out who is garbage 

Java uses references to determine whether it is a garbage object. If there is no reference to it, it is determined that the object is garbage.

1. Reference counting

Arrange an additional space for the object, and save an integer, indicating that the object has several references pointing to it. Java does not actually adopt such a scheme (Python and PHP adopt this scheme).

Test t1 = new Test();

 At this time, there is a reference pointing to it, so the reference counter is 1.

If the code becomes like this:

Test t1 = new Test();
Test t2 = t1;

 That is to say, as the reference increases, the counter will increase, and the reference will be destroyed, and the counter will decrease.

When the counter is 0, it will be considered that the object has no reference point, it is garbage.

But the disadvantages are also obvious:

  1. wasting memory space
  2. Circular references exist

2. Accessibility analysis (this solution is adopted by Java).

Understand the reference relationship between objects as a tree structure, start from some special starting points, traverse, as long as it can be accessed, it is reachable, not garbage, and then treat the unreachable as garbage.

 At this time, any node of the entire tree can be accessed through the reference of root.

The key point of reachability analysis is that in order to perform the above traversal, a starting point is required.

A starting point can be:

  1. Local variables on the stack (each local variable per stack is a starting point)
  2. Objects referenced in the constant pool
  3. Objects referenced by static members in the method area

Accessibility analysis, in general, is to start from all starting points, see which references in the object can access those objects, follow the vines to visit all accessible objects, and mark the object as "reachable" while traversing ".

Reachability analysis, which overcomes two shortcomings of reference counting

But it also has its own problems:

  • It consumes more time, so even if an object becomes garbage, it cannot be found in the first time, because it takes time during the scanning process.
  • When performing reachability analysis, you must follow the vines. Once the reference relationship of objects in the current code changes during the process, bugs may appear.

Therefore, in order to better complete this follow-up process, it is necessary to suspend the work of other business threads! ! ! (STW)

(STW)   stop the world !

But after all, Java has been developed for so many years, and it is also being continuously optimized when it is pulled into recycling. The problem of STW can also be dealt with better.

Two: Release the garbage object

Three Typical Strategies

1: mark clear

 If I apply for a space like the following to the memory now, then what I mark is a garbage object that needs to be cleared.

 This strategy is to directly release the memory of the garbage object.

But this simple and crude way will generate memory fragmentation.

Memory fragmentation: The application space is a continuous block of space. Now the free space in the above figure is scattered in an independent space. Now the total free space may exceed 1G, but I want to apply for 500M, but I can't apply.

2: Copy Algorithm

The approach is to divide the space into two parts. Use only half at a time.

The copy algorithm is to copy objects that are not garbage to one side, and then release the entire area uniformly.

 At this time, what I want to release is 2 and 4, and I need to copy the remaining 1 and 3 to the other side. Then release it all here.

 The copy algorithm solves the problem of memory fragmentation, but it also has disadvantages:

  • low memory utilization
  • If most objects are reserved and there is little garbage, the cost of copying is relatively high at this time

3: Mark finishing

Similar to the order table to delete the middle element, there is a process of handling

 The problem of memory fragmentation is solved, but the overall overhead of handling is relatively large.

JVM implementation ideas

In fact, the implementation of the JVM is a method that combines the above-mentioned ideas.

Generational recycling ideas

details:

  • A concept such as age is set to an object to describe how long the object has existed. If an object has just been born, then it is 0 years old.
  • Every time a scan (reachability analysis) is performed, if it is not marked as a garbage object, the age of the object is increased by one year.
  • The active time of this object is distinguished by age.

Empirical rule: The older the object, the longer it will last.

Adopt different recycling strategies for different ages

The JVM implements different strategies for these areas.

1: The newly created object is placed in the Eden area

After garbage collection scans to the Eden area, most objects will be eliminated by GC in the first round of scanning.

2: If the object in the Eden area survives the first round of GC, it will be copied to the survival area through the copy algorithm.

The living area is divided into two halves (of equal size), and only one half of them is used at a time.

If the GC scans the living area and finds garbage objects, it will be eliminated. If it is not garbage, it will be copied to the other side of the living area through the copy algorithm.

3: When the object survives several GCs in the living area, the age also becomes older. At this time, it will be copied to the old generation through the replication algorithm.

4: After entering the old age, because the age is relatively old, the concept of being marked as a garbage object is also very small, so the frequency of GC scanning for the old age will also be reduced.

Special case: If the object is very large, directly enter the old generation (the cost of copying the large object is very high, and there are not many large objects).

Guess you like

Origin blog.csdn.net/qq_63525426/article/details/131725086