An article to get the "Perfect Illustration of JVM"

foreword

Hey, why did you suddenly get to the JVM when you were working on Android?
Of course, the JVM serves as the running platform for Java and Kotlin in the Android language. Of course you need to know.
But the purpose of this article is not simply to learn JVM but:
1. To pave the way for the following Binder articles (the most important)
2. To help you review JVM-related issues (attached)

So: This article will learn JVM from the perspective of Android
ps: It will not be very deep, but some content will be needed
This article will answer JVM in the form of multiple questions

FAQ

These questions have been sorted in a small way, and it seems that there is a sequence.

1. Parental entrustment mechanism

insert image description here
The steps of the parent delegation mechanism can be summed up as the following four steps:

  • First determine whether the Class has been loaded
  • If not, instead of searching by itself, it delegates to the parent loader to search, and then recurses in turn until it delegates to the topmost Bootstrap ClassLoader
  • If Bootstrap ClassLoader finds the Class, it will return directly
  • If not found, continue to search downwards one by one. If not found, it will be searched by itself in the end.
    The role of each loader in the parent delegation
  • Bootstrap Class Loader: Responsible for loading the core class library in the JDK, such as the classes in the java.lang package.
    • Load Java core class library, such as Object, String, etc.
  • Extension Class Loader (Extension Class Loader): Responsible for loading Java's extended class library, such as the classes in the javax package.
    • Load the extended class library of the JDK, the jar file located in the lib/ext directory of the JRE.
  • Application Class Loader (Application Class Loader): The class responsible for loading the application, that is, the user-defined class.
    • Load application classes, including developer-defined classes and other dependent third-party class libraries.
  • Custom Class Loader (Custom Class Loader): A class loader customized by developers, which can load classes according to their own needs.
    • Load classes according to your own needs, you can implement specific loading logic, such as loading classes from the network, database, etc.
      The Role of the Parental Delegation Mechanism
  • Avoid repeated loading. If the Class has already been loaded once, it does not need to be loaded again, but is read directly from the cache first.
  • In terms of security, if you do not use the parent delegation mode, you can customize a String class to replace the system String class, which will cause security risks. Adopting the parental delegation mode will cause the core classes of the system to be loaded when the Java virtual machine starts, and it is impossible to customize the core classes to replace the core classes of the system.

2. Class loading process

insert image description here

load

  • Loading: Find and load the bytecode file through the fully qualified name of the class. Usually, the bytecode file can be loaded from the local file system, network, etc. through the parent delegation mechanism.

Link

The three phases of verification, preparation and parsing can be collectively referred to as Linking.

  • Verification: Verify the bytecode to ensure that it complies with the JVM specification, including file format verification, metadata verification, bytecode verification, and symbol reference verification.
  • Preparation: Allocate memory for static variables of the class and set default initial values. These variables usually allocate memory space in the preparation phase of the class, and then perform specific assignments in the initialization phase.
  • Resolution: Convert symbolic references in classes to direct references. A symbolic reference is a compilation prototype used in the compilation phase, and a direct reference refers to a pointer or offset directly pointing to a method or variable in memory.

initialization

  • Initialization: Initialize static variables and static code blocks of the class, including variable assignment and execution of static code blocks. The initialization phase is the last phase of class loading, and the initialization method of the class will be executed.
    Use and Uninstall
  • Usage: Use the initialized class, including calling the static method of the class, instantiating the class object and so on.
  • Unloading: When a class is no longer used, it will be unloaded by the JVM to release related memory and resources.

3. JVM memory structure diagram

The method area (metaspace) is shared by threads. Stacks, native method stacks, and program counters are private to threads. (already classified by color)
insert image description here

method area

The method area mainly contains some static information, such as: class metadata, constant pool, method information, class variables, etc. The following code HelloWorld.class is class metadata, hello, main are method information, etc. are stored in the method area.

public class HelloWorld {
    
    
    public static void main(String[] args) {
    
      }
    public void hello(String who) {
    
    }
} 

Before JDK1.7, the method area was called the permanent generation, and after 1.8 it was called the metaspace. The reason is that in order to release management pressure, JDK1.8 handed over the runtime constant pool to the heap for management.
You can think of it this way: the method area is the permanent generation in the implementation method area before interface 1.7. After 1.8, the metaspace is in the implementation method area.

heap

The heap mainly stores instance objects, and it is also the main space recovered by GC (some knowledge about recycling algorithms and objects is in the following two topics).
You can understand it this way, as long as you see an object with the keyword new, the data is placed on the heap. The following code:

HelloWorld helloWorld = new HelloWorld()

Among them, helloWorld is the reference of the HelloWorld object, pointing to the new HelloWorld object instance, and the helloWorld reference is placed in the stack.
Objects created by New HelloWorld are placed on the heap. (You can look at the composition of the object below to understand the size of the object)
When the reference in the stack disappears, the object in the heap can be recycled.
In the heap memory, the memory needs to be divided into two areas, the new generation and the old generation, and the relative size ratio between them is 1:2.

  • New generation: In the heap memory, the new generation is divided into three parts: eden, from, and to. These three memory areas all belong to the new generation, and the default ratio is 8:1:1
    • 1. Each new object will be stored in eden first, if the memory in the eden area is full. It will trigger GC to reclaim the area (copy recovery algorithm).
    • 2. Objects that have not been recycled will be put into from or to, from, and one of the to memory is empty, which is convenient for objects to organize and mark in memory. Every time GC, from and to space objects are moved once (new from eden The incoming objects and the existing objects in from or to are moved into the new from or to space together).
    • 3. In this way, only the space of 1 ratio of from and to is wasted: the trivial space of the clear mark method is reduced, and half of the space wasted by the copy method of dividing the space is avoided.
    • 4. Every time the object moves between from and to, the age of the object will also be increased by 1 (the age is in the object header). When the age reaches 15 years old, it will enter the old age.
  • Old generation: When the old generation is full, it will trigger Full GC recycling. If the system is too large, Full GC cannot recycle, and the program will appear similar to java.lang.OutOfMemoryError: Java heap space. You can configure JVM parameters: such as -Xmx32m Set the maximum heap memory to 32M.

Specifically, the default heap memory size in JDK1.8 is 1/4 of the physical memory, and the ratio of the new generation to the old generation is 1:2. In other words, the new generation occupies 1/6 of the heap memory, and the old generation occupies 2/3 of the heap memory.

the stack

  • The stack memory space is relatively small compared to the heap space, and it is also private to the thread. The stack is mainly a stack frame, which is first-in, last-out.
  • The stack frame corresponds to a method, which includes local variables, method parameters, method exits, access constant pointers, and exception information tables.
  • The exception information table can handle when the program execution reports an error, which line of code execution will jump to, and the JVM is fed back through the exception table.
  • A thread may correspond to multiple stack frames, and stack frames are pushed in from top to bottom, first in last out.
  • As shown in the figure below, method B is called in method A, method C is called in method B, and method D is called in method C
  • The main thread corresponds to the push of the stack frame, and the order of popping is D->C ->B ->A, and finally the program ends.
public static void main(String[] args){
    
    
    HelloWord helloWord = new HelloWord();
    helloWord.A();
}

public void A(){
    
    
    B();
}

public void B(){
    
    
    C();
}

public void C(){
    
    
    D();
}

public void D(){
    
    
    
}

insert image description here

  • java.lang.OutOfMemoryError: Java stack space. stack overflow. There are too many call chains in the stack. According to the above example, everyone should understand. If you keep pushing the stack and exiting the stack, won't you report an error?
  • Yes: the most common OutOfMemoryError (stack overflow error) of all is the constant recursive cycle of references, resulting in too long reference chains. The result of constant pushes on the stack.

native method stack

It is the same as the stack structure and is an independent area. It just corresponds to the Native method.

program counter

When running a program with multiple threads, it depends on the CPU to allocate time slices for alternate execution. As shown below.
So here comes the problem. When the time slice of the thread is switched, for example, thread 1 is executed, and then enters the waiting state, and thread 0 starts to execute. After thread 0 waits, then thread 1 wants to resume. At this time, where did thread 1 execute last time? Specifically, how many lines of code are there, and has this line of code been executed?
At this time, the information is stored in the program counter. It is convenient to resume operation next time. This is why the program counter is exclusive to the thread.
insert image description here

4. The composition of the object

Then create an object for you to explain directly to the code.

public class data {
    
    
    public int a;
    public int b;
    public byte c;
}

So how many bytes does this object occupy in the memory space?
The answer is: 24 bytes (32-bit systems). 32 bytes (64-bit system plus 8 bytes to store Mark Word). Let's disassemble this answer to fully understand the composition of objects. (All behind Android are 64-bit systems)

object header

  • Mark field (Mark Word): (32-bit system)
    • Hash code HashCode: 25bit (supports fast lookup and comparison of data structures such as hash tables. Hash code can be calculated from object data by hash algorithm)
    • Age of GC generation: 4bit (supports generational garbage collection algorithm, and it is also mentioned above that 15 years old is added to the old generation)
    • Lock status identification: 2bit (the object tag field is used to record the lock status of the object, including lock-free, biased lock, lightweight lock, and heavyweight lock, etc. By checking the status of the object tag field, the virtual machine can realize the concurrency of the object control and thread synchronization)
    • Whether it is a biased lock: 1 bit (identifies whether it is a biased lock)
  • Type pointer (Klass Pointer): 32bit (pointer to heap space)
  • Array Length (Array Length): 0bit (only the array object saves this part of the data, normally 32bit, because it is not an array, so it is 0) the
    above object header is 8 bytes (ps: for reference only because the virtual machine also It may be different to merge or optimize the bits)
    insert image description here

sample data

This is relatively simple: int is 4 bytes byte is 1 byte: 4*2+1 = 9 bytes

alignment byte

Padding bytes used for memory alignment to ensure that object storage is aligned on alignment boundaries. Boundaries are generally 8 bytes. So the alignment space of this object is 7 bytes.

To sum up, it is 24 bytes in total, that is, 8 bytes of object header + 9 bytes of sample data + 7 bytes of alignment byte = 24 bytes.

5. How to determine whether an object can be GC in JVM

There are mainly two algorithms for judging

Reference counting (early strategy)

Every concrete object (not a reference) in the Java heap has a reference counter. When an object is created and initially assigned, the variable count is set to 1. Every time it is referenced somewhere, the counter value is incremented by 1. The counter value is decremented by 1 when the reference expires, that is, when a reference to an object exceeds the lifetime (after going out of scope) or is set to a new value. Any object with a reference count of 0 can be garbage collected. When an object is garbage collected, the count of any objects it references is decremented by 1.
advantage:

  • The reference counting collector is simple to implement and has high judgment efficiency, which is intertwined in the running of the program. It is more beneficial to the real-time environment where the program is not interrupted for a long time.
    shortcoming:
  • Circular references cannot be detected. If the parent object has a reference to the child object, the child object in turn references the parent object. In this way, their reference count can never be 0.

Reachability analysis algorithm (mainstream solution)

The reachability analysis algorithm is also called the root search algorithm, which uses a series of objects called GC Roots as the starting point, and then searches downward. The path traveled by the search is called a reference chain (Reference Chain). When an object is not connected to GC Roots by any reference chain, that is, the object is unreachable, which means that the object is unavailable.
As shown in the figure below: Although Object5, Object6, and Object7 are related to each other, they are not reachable to GC Roots, so they will also be judged as recyclable objects.
insert image description here

6. What are the GC Roots in Android

1. Static variables (Static Roots): The static variables of the class are stored in the method area. When a class is loaded, the objects referenced by its static variables will be regarded as GC Roots.
2. JNI Reference (JNI Roots): Saves object references created through JNI (Java Native Interface).
3. System Class Roots: Classes including basic data types (such as int, double) and important system classes. Instances of these classes are regarded as GC Roots.
4. ThreadLocal: The ThreadLocal variable in each thread will be used as GC Roots (actually because it is statically modified)
5. Finalizer reference: the object in the finalizer queue to execute the finalize() method will be used as GC Roots, The garbage collector will reclaim these objects after executing the finalize() method.

7. Classification of object references

Strong Reference

For references like Object obj = new Object() that are ubiquitous in the code, as long as the strong reference is still there, the garbage collector will never recycle the referenced object.

Soft Reference (Sofe Reference)

Objects that are useful but not required can be soft referenced using the SoftReference class. Before a memory overflow exception occurs in the system, these objects will be included in the recovery scope for secondary recovery. If there is not enough memory for this recovery, a memory overflow exception will be thrown.

Weak References

Non-essential objects, but its strength is weaker than soft references. Objects associated with weak references can only survive until the next garbage collection occurs. JDK provides the WeakReference class to implement weak references. Regardless of whether the current memory is sufficient, objects associated with soft references will be recycled. (Weak references will be used to solve some memory leak problems)

Phantom Reference

Phantom references, also known as ghost references or phantom references, are the weakest type of reference relationship. JDK provides the PhantomReference class to implement phantom references. The only purpose of setting a phantom reference to an object is to be able to receive a system notification when the object is reclaimed by the garbage collector.

8. Finalize() secondary mark

Whether an object should be reclaimed when the garbage collector is in GC, at least two marking processes are required. (with one chance of probation)

  • The first mark: Analyze whether the object is reachable to GC Roots through the reachability analysis algorithm. Objects that have been marked for the first time and are filtered out as unreachable will be marked for the second time.
  • The second marking: After the first marking, a screening will be performed, and the filtering condition is whether it is necessary to execute the finalize() method for this object. In the finalize() method, if the association relationship with the reference chain is not re-established, it will be marked for the second time.

The object marked successfully for the second time will actually be recycled. If the object re-establishes an association with the reference chain in the finalize() method, it will escape this recycling and continue to survive.

9. JVM recycling algorithm

mark-sweep algorithm

The mark-clear algorithm scans from the root set (GC Roots) to mark the surviving objects. After marking, it scans the unmarked objects in the entire space for recycling.
The mark-clear algorithm does not need to move objects, but only needs to process non-surviving objects. It is extremely efficient when there are many surviving objects. However, since the mark-clear algorithm directly reclaims non-surviving objects, it will cause memory fragmentation. .

  • Advantages :
    Simple implementation, no need to move objects.
  • Disadvantages :
    The marking and clearing process is inefficient, produces a large number of discontinuous memory fragments, and increases the frequency of garbage collection.

copy algorithm

The replication algorithm is proposed to overcome the overhead of the handle and solve the problem of memory fragmentation.
Simply put, the memory area is divided into the same two memory blocks. Only half of the space is used each time, and new objects generated by the JVM are placed in half of the space. When half of the space is used up, GC is performed, the reachable objects are copied to the other half of the space, and the used memory space is cleaned up at one time.

  • Advantages :
    Allocate memory in sequence, simple implementation, efficient operation, no need to consider memory fragmentation.
  • Disadvantages :
    The available memory size is reduced to half of the original, and the object will be copied frequently when the survival rate of the object is high.

Mark-Collating Algorithm

The mark-organize algorithm uses the same method as the mark-clear algorithm to mark objects, but instead of directly cleaning up recyclable objects, it moves all surviving objects to the free space at one end, and then cleans up the memory beyond the boundary of the end space.

  • Advantages :
    Solve the problem of memory fragmentation in the mark-sweep algorithm.
  • Disadvantages :
    Local object movement is still required, which reduces efficiency to a certain extent

Generational Collection Algorithm

The generational collection algorithm is currently the algorithm used by most JVM garbage collectors. Its core idea is to divide memory into several different areas according to the life cycle of objects. Generally, the heap area is divided into the old generation (Tenured Generation) and the new generation (Young Generation). There is another generation outside the heap area, which is the permanent generation (Permanet Generation).
The characteristic of the old generation is that only a small number of objects need to be recycled each time garbage collection, while the characteristic of the new generation is that a large number of objects need to be recycled each time garbage collection is performed, so the most suitable collection can be adopted according to the characteristics of different generations algorithm.
This is actually the above JVM memory structure diagram: It has been mentioned in the heap (paste it here directly).
insert image description here
In the heap memory, the memory needs to be divided into two areas, the new generation and the old generation. The relative size of the two is 1. :2.

  • New generation: In the heap memory, the new generation is divided into three parts: eden, from, and to. These three memory areas all belong to the new generation, and the default ratio is 8:1:1
    • 1. Each new object will be stored in eden first, if the memory in the eden area is full. It will trigger GC to reclaim the area (copy recovery algorithm).
    • 2. Objects that have not been recycled will be put into from or to, from, and one of the to memory is empty, which is convenient for objects to organize and mark in memory. Every time GC, from and to space objects are moved once (new from eden The incoming objects and the existing objects in from or to are moved into the new from or to space together).
    • 3. In this way, only the space of 1 ratio of from and to is wasted: the trivial space of the clear mark method is reduced, and half of the space wasted by the copy method of dividing the space is avoided.
    • 4. Every time the object moves between from and to, the age of the object will also be increased by 1 (the age is in the object header). When the age reaches 15 years old, it will enter the old age.
  • Old generation: When the old generation is full, it will trigger Full GC recycling. If the system is too large, Full GC cannot recycle, and the program will appear similar to java.lang.OutOfMemoryError: Java heap space. You can configure JVM parameters: such as -Xmx32m Set the maximum heap memory to 32M.
    Specifically, the default heap memory size in JDK1.8 is 1/4 of the physical memory, and the ratio of the new generation to the old generation is 1:2. In other words, the new generation occupies 1/6 of the heap memory, and the old generation occupies 2/3 of the heap memory.

10. When is GC triggered

Because the objects are processed by generations, the garbage collection area and time are also different. There are two types of GC: Scavenge GC and Full GC.

Scavenge GC

  • Under normal circumstances, when a new object is generated and Eden fails to apply for space, Scavenge GC will be triggered.
    This method of GC is performed on the Eden area of ​​the young generation and will not affect the old generation. Because most objects start from the Eden area, and the Eden area will not be allocated very large, so the GC in the Eden area will be performed frequently.
    Therefore, it is generally necessary to use a fast and efficient algorithm here, so that Eden can be free as soon as possible.

Full GC

  • The old generation (Tenured) is filled
  • System.gc() is shown to be called
    slower than Scavenge GC, so the number of Full GC should be reduced as much as possible. In the process of tuning the JVM, a large part of the work is the adjustment of Full GC.

11. What is STW (the cause of memory jitter)

STW: Stop-The-World: The process of suspending all application threads for garbage collection operations.
A common cause of memory thrashing is frequent STW events.
Memory jitter: refers to the phenomenon that the garbage collector frequently performs memory recovery operations within a period of time, but cannot effectively release enough memory space, resulting in a serious decline in the performance of the application and inefficient use of memory.

12. Three-color marking

JVM's three-color marking is an algorithm used for garbage collection, which divides all objects into three different colors, namely black, gray and white.

  • Black indicates an object that has been marked and all its references have also been marked.
  • Gray indicates objects that have been marked but whose references have not yet been marked. That is, its reference has not been traversed, and further marking is required.
  • White indicates objects that have not been labeled.

During the marking process, the root object is first marked as gray, and then traversed, the references of the gray object are traversed and marked as gray until there is no gray object or the reference has no white object.
Finally, the sweep phase recycles white objects and remarks black objects as white, forming a new collection cycle.
The three-color marking algorithm improves the performance and responsiveness of the application by dividing the task of marking into multiple steps, which can be garbage collected without interrupting the operation of the application.

13. What is JIT?

JIT is an acronym for Just-In-Time Compilation.
It dynamically compiles hot code (the most frequently executed code) in real-time at runtime so that it can be executed more efficiently.
As shown in the flow chart below: This can reduce the efficiency problems caused by line-by-line interpretation and execution.
insert image description here

Summarize

The conclusion is that learning is over. Dry! ! !

Guess you like

Origin blog.csdn.net/weixin_45112340/article/details/131712165