1. Introduction to JVM

Most of the content of this article refers to Zhou Zhiming's "In-depth Understanding of the Java Virtual Machine" . If you want to learn more, please read the original book.

1. Runtime data area


program counter

Records the address of the virtual machine bytecode instruction being executed (empty if the native method is being executed).

Java virtual machine stack

Each Java method will create a stack frame to store local variable table, operand stack, constant pool reference and other information while executing. The process from method invocation to execution completion corresponds to the process of a stack frame being pushed and popped in the Java virtual machine stack.


The Java virtual machine stack memory size of each thread can be specified through the -Xss virtual machine parameter, which defaults to 256K in JDK 1.4 and 1M in JDK 1.5+:

java -Xss2M HackTheJava

This zone may throw the following exceptions:

  • When the stack depth requested by the thread exceeds the maximum value, a StackOverflowError exception will be thrown;
  • When the stack is dynamically expanded, if it cannot apply for enough memory, an OutOfMemoryError exception will be thrown.

native method stack

The native method stack is similar to the Java virtual machine stack, the difference between them is that the native method stack serves the native method.

Native methods, generally written in other languages ​​(C, C++, or assembly language, etc.) and compiled into programs based on the native hardware and operating system, require special treatment.


heap

All objects are allocated memory here and is the main area for garbage collection (the "GC heap").

Modern garbage collectors basically use generational collection algorithms, the main idea of ​​which is to adopt different garbage collection algorithms for different types of objects. The heap can be divided into two chunks:

  • Young Generation
  • Old Generation

The heap does not require contiguous memory, and its memory can be increased dynamically. If the increase fails, an OutOfMemoryError exception will be thrown.

You can specify the heap memory size of a program through the two virtual machine parameters -Xms and -Xmx. The first parameter sets the initial value, and the second parameter sets the maximum value.

java -Xms1M -Xmx2M HackTheJava

method area

It is used to store data such as loaded class information, constants, static variables, and code compiled by the just-in-time compiler.

Like the heap, it does not require continuous memory and can be expanded dynamically. If the dynamic expansion fails, an OutOfMemoryError exception will be thrown.

The main goal of garbage collection in this area is to recycle the constant pool and unload classes, but it is generally difficult to achieve.

The HotSpot virtual machine treats it as a permanent generation for garbage collection. But it is difficult to determine the size of the permanent generation, because it is affected by many factors, and the size of the permanent generation will change after each Full GC, so an OutOfMemoryError exception is often thrown. In order to manage the method area more easily, starting from JDK 1.8, the permanent generation is removed and the method area is moved to the metaspace, which is located in the local memory instead of the virtual machine memory.

The method area is a JVM specification, and both the permanent generation and the metaspace are one of its implementation methods. After JDK 1.8, the original permanent generation data was divided into heap and metaspace. Metaspace stores meta information of classes, static variables and constant pools, etc. into the heap.

runtime constant pool

The runtime constant pool is part of the method area.

The constant pool (literal and symbolic references generated by the compiler) in the Class file will be placed in this area after the class is loaded.

In addition to constants generated at compile time, dynamic generation is also allowed, such as intern() of the String class.

direct memory

The NIO class is newly introduced in JDK 1.4, which can use the Native function library to directly allocate off-heap memory, and then use the DirectByteBuffer object in the Java heap as a reference to this memory to operate. This can significantly improve performance in some scenarios, because it avoids copying data back and forth between heap memory and off-heap memory.

2. Garbage collection

Garbage collection is mainly for the heap and method area. The three areas of program counter, virtual machine stack and local method stack are private to the thread and only exist in the life cycle of the thread, and will disappear after the thread ends, so there is no need to perform garbage collection on these three areas.

Determine whether an object can be recycled

1. Reference counting algorithm

Add a reference counter to the object. When the object adds a reference, the counter is incremented by 1, and the counter is decremented by 1 when the reference becomes invalid. Objects with a reference count of 0 can be recycled.

In the case of two objects with circular references, the reference counters will never be 0 at this time, making them impossible to recycle. It is because of the existence of circular references that the Java virtual machine does not use reference counting algorithms.

public class Test {
    
    

    public Object instance = null;

    public static void main(String[] args) {
    
    
        Test a = new Test();
        Test b = new Test();
        a.instance = b;
        b.instance = a;
        a = null;
        b = null;
        doSomething();
    }
}

In the above code, the object instances referenced by a and b hold object references to each other, so when we remove the references to object a and object b, the two Test objects cannot be recycled because the two objects still have references to each other.

2. Accessibility analysis algorithm

Searching with GC Roots as the starting point, all reachable objects are alive, and unreachable objects can be recycled.

The Java virtual machine uses this algorithm to determine whether an object can be recycled. GC Roots generally include the following:

  • Objects referenced in the local variable table in the virtual machine stack
  • Objects referenced in JNI in the native method stack
  • Objects referenced by class static properties in the method area
  • Objects referenced by constants in the method area

3. Recovery of the method area

Because the method area mainly stores permanent generation objects, and the recycling rate of permanent generation objects is much lower than that of the new generation, recycling in the method area is not cost-effective.

Mainly the recycling of the constant pool and the unloading of the class.

In order to avoid memory overflow, the virtual machine is required to have the class unloading function in a large number of scenarios that use reflection and dynamic proxy.

There are many unloading conditions for a class, and the following three conditions need to be met, and if the conditions are met, it will not necessarily be unloaded:

  • All instances of this class have been recycled, and there are no instances of this class in the heap at this time.
  • The ClassLoader that loaded this class has been recycled.
  • The Class object corresponding to this class is not referenced anywhere, so the method of this class cannot be accessed through reflection anywhere.

4. finalize()

C++-like destructor for closing external resources. However, ways such as try-finally can do better, and this method is expensive to run, has great uncertainty, and cannot guarantee the calling order of each object, so it is best not to use it.

When an object can be recycled, if the object's finalize() method needs to be executed, it is possible to allow the object to be re-referenced in this method, thereby realizing self-help. Self-rescue can only be performed once. If the recovered object has called the finalize() method to save itself before, this method will not be called again when it is recycled later.

reference type

Whether it is judging the number of references to an object through the reference counting algorithm, or judging whether the object is reachable through the reachability analysis algorithm, judging whether the object can be recycled is related to the reference.

Java provides four reference types of varying strengths.

1. Strong reference

Objects associated with strong references will not be reclaimed.

Use new to create a new object to create a strong reference.

Object obj = new Object();

2. Soft references

Objects associated with soft references will only be recycled if there is insufficient memory.

Use the SoftReference class to create soft references.

Object obj = new Object();
SoftReference<Object> sf = new SoftReference<Object>(obj);
obj = null;  // 使对象只被软引用关联

3. Weak citations

Objects associated with weak references must be recycled, which means that they can only survive until the next garbage collection occurs.

Use the WeakReference class to create weak references.

Object obj = new Object();
WeakReference<Object> wf = new WeakReference<Object>(obj);
obj = null;

4. Phantom references

Also known as ghost reference or phantom reference, whether an object has a virtual reference will not affect its survival time, and an object cannot be obtained through a virtual reference.

The only purpose of setting a phantom reference to an object is to receive a system notification when the object is garbage collected.

Use PhantomReference to create phantom references.

Object obj = new Object();
PhantomReference<Object> pf = new PhantomReference<Object>(obj, null);
obj = null;

garbage collection algorithm

1. Mark-Clear


In the marking phase, the program will check whether each object is a live object, and if it is a live object, the program will mark the object head.

In the clearing phase, the object will be recycled and the flag will be canceled. In addition, it will be judged whether the recovered block is continuous with the previous free block. If it is continuous, the two blocks will be merged. Recycling objects is to use objects as blocks and connect them to a one-way linked list called "free list". Afterwards, when allocating, you only need to traverse this free list to find blocks.

When allocating, the program will search the free list to find a block whose space is greater than or equal to the size of the new object. If the block it finds is equal to size, it will directly return the block; if the block it finds is larger than size, it will divide the block into two parts of size and (block - size), return the block of size, and return the block of size (block - size) to the free list.

insufficient:

  • Marking and removal processes are inefficient;
  • A large number of discontinuous memory fragments will be generated, making it impossible to allocate memory for large objects.

2. Marking - Organize


Let all surviving objects move to one end, and then directly clean up the memory outside the end boundary.

advantage:

  • no memory fragmentation

insufficient:

  • A large number of objects need to be moved, and the processing efficiency is relatively low.

3. Copy


Divide the memory into two pieces of equal size, and only use one of them each time. When this piece of memory is used up, copy the surviving object to the other piece, and then clean up the used memory space again.

The main downside is that only half of the memory is used.

Today's commercial virtual machines use this collection algorithm to recycle the new generation, but it is not divided into two equal-sized blocks, but a larger Eden space and two smaller Survivor spaces, and Eden and one of the Survivor spaces are used each time. When recycling, copy all the surviving objects in Eden and Survivor to another piece of Survivor, and finally clean up Eden and the used piece of Survivor.

The default size ratio of Eden and Survivor for HotSpot virtual machine is 8:1, which ensures that the memory utilization rate reaches 90%. If more than 10% of the objects survive each collection, one piece of Survivor is not enough. At this time, it is necessary to rely on the old generation for space allocation guarantee, that is, to borrow the space of the old generation to store objects that cannot fit.

4. Generational collection

The current commercial virtual machine adopts the generational collection algorithm, which divides the memory into several blocks according to the object life cycle, and uses the appropriate collection algorithm for different blocks.

Generally, the heap is divided into the young generation and the old generation.

  • New Generation Use: Replication Algorithm
  • The old generation uses: mark-sweep or mark-sort algorithm

garbage collector


The above are the 7 garbage collectors in the HotSpot virtual machine, and the connection indicates that the garbage collectors can be used together.

  • Single-threaded and multi-threaded: single-threaded means that the garbage collector uses only one thread, while multi-threaded uses multiple threads;
  • Serial and parallel: serial refers to the alternate execution of the garbage collector and the user program, which means that the user program needs to be paused when garbage collection is performed; parallel refers to the simultaneous execution of the garbage collector and the user program. Except for CMS and G1, other garbage collectors are executed in a serial manner.

1. Serial collector


Serial translates to serial, which means it executes in a serial manner.

It is a single-threaded collector and only uses one thread for garbage collection.

Its advantage is simple and efficient. In a single CPU environment, because there is no thread interaction overhead, it has the highest single-thread collection efficiency.

It is the default new generation collector in the Client scenario, because the memory in this scenario is generally not very large. It can control the pause time of collecting one or two hundred megabytes of garbage within more than one hundred milliseconds. As long as it is not too frequent, this pause time is acceptable.

2. ParNew collector


It is a multi-threaded version of the Serial collector.

It is the default new generation collector in the Server scenario. In addition to performance reasons, it is mainly because it can only be used with the CMS collector except for the Serial collector.

3. Parallel Scavenge Collector

Like ParNew, it is a multithreaded collector.

The goal of other collectors is to minimize the pause time of user threads during garbage collection, and its goal is to achieve a controllable throughput, so it is called a "throughput-first" collector. The throughput here refers to the ratio of the time spent by the CPU on running user programs to the total time.

The shorter the pause time, the more suitable for programs that need to interact with users, and a good response speed can improve user experience. High throughput can efficiently use CPU time and complete the calculation tasks of the program as soon as possible, which is suitable for tasks that operate in the background without requiring too much interaction.

Shortening the pause time is in exchange for sacrificing throughput and young generation space: the new generation space becomes smaller and garbage collection becomes more frequent, resulting in a decrease in throughput.

You can turn on the GC adaptive adjustment strategy (GC Ergonomics) through a switch parameter, so you don't need to manually specify the size of the new generation (-Xmn), the ratio of Eden and Survivor areas, and the age of objects promoted to the old generation. The virtual machine collects performance monitoring information according to the current system operating conditions, and dynamically adjusts these parameters to provide the most suitable pause time or maximum throughput.

4. Serial Old Collector


It is the old version of the Serial collector, and it is also used by the virtual machine in the Client scenario. If used in the Server scenario, it has two major uses:

  • Works with the Parallel Scavenge collector in JDK 1.5 and earlier (before Parallel Old).
  • As a backup plan for the CMS collector, it is used when Concurrent Mode Failure occurs in concurrent collection.

5. Parallel Old Collector


Is the old generation version of the Parallel Scavenge collector.

In occasions that focus on throughput and are sensitive to CPU resources, Parallel Scavenge plus Parallel Old collector can be given priority.

6. CMS Collector


CMS (Concurrent Mark Sweep), Mark Sweep refers to the mark-clear algorithm.

Divided into the following four processes:

  • Initial marking: just mark the objects that GC Roots can directly relate to, the speed is very fast, and a pause is required.
  • Concurrent marking: the process of GC Roots Tracing, which takes the longest time in the entire recycling process and does not require a pause.
  • Remarking: A pause is required in order to correct the mark record for that part of the object whose mark has changed due to the continuation of the user program during concurrent marking.
  • Concurrent clearing: No pauses required.

During the longest concurrent mark and sweep process in the entire process, the collector thread can work with the user thread without stalling.

Has the following disadvantages:

  • Low throughput: Low pause times come at the expense of throughput, resulting in insufficient CPU utilization.
  • Unable to process floating garbage, Concurrent Mode Failure may occur. Floating garbage refers to the garbage generated during the concurrent cleanup phase due to the continuous running of user threads. This part of garbage can only be recycled in the next GC. Due to the existence of floating garbage, a part of memory needs to be reserved, which means that CMS collection cannot wait for the old generation to be full before recycling like other collectors. If the reserved memory is not enough to store floating garbage, Concurrent Mode Failure will occur, and the virtual machine will temporarily enable Serial Old to replace CMS.
  • The space fragmentation caused by the mark-clear algorithm often has space remaining in the old age, but cannot find a large enough contiguous space to allocate the current object, and has to trigger a Full GC in advance.

7. G1 Collector

G1 (Garbage-First), which is a garbage collector for server-side applications, has good performance in scenarios with multiple CPUs and large memory. The mission entrusted by the HotSpot development team is to replace the CMS collector in the future.

The heap is divided into the new generation and the old generation. Other collectors collect the entire new generation or the old generation, while G1 can directly recycle the new generation and the old generation together.


G1 divides the heap into multiple independent regions (Regions) of equal size, and the new generation and the old generation are no longer physically isolated.


By introducing the concept of Region, the original whole memory space is divided into multiple small spaces, so that each small space can be garbage collected separately. This method of partitioning brings a lot of flexibility, making predictable pause time models possible. By recording the garbage collection time of each Region and the space obtained by recycling (these two values ​​​​are obtained through past recycling experience), and maintaining a priority list, each time according to the allowed collection time, the Region with the highest value is preferentially recycled.

Each Region has a Remembered Set, which is used to record the Region where the reference object of the Region object is located. By using the Remembered Set, a full heap scan can be avoided when doing reachability analysis.


If the operation of maintaining the Remembered Set is not counted, the operation of the G1 collector can be roughly divided into the following steps:

  • initial mark
  • concurrent mark
  • Final marking: In order to correct the part of the marking record that is changed due to the continued operation of the user program during the concurrent marking period, the virtual machine records the object changes during this period in the Remembered Set Logs of the thread. In the final marking phase, the data of the Remembered Set Logs needs to be merged into the Remembered Set. This phase requires thread stalling, but can be executed in parallel.
  • Screening recovery: first sort the recovery value and cost in each Region, and formulate a recovery plan according to the user's expected GC pause time. In fact, this stage can also be executed concurrently with the user program, but because only a part of the Region is recycled, the time is controllable by the user, and pausing the user thread will greatly improve the collection efficiency.

It has the following characteristics:

  • Space integration: Overall, the collector is implemented based on the "mark-sort" algorithm, and locally (between two Regions), it is implemented based on the "copy" algorithm, which means that memory space fragmentation will not be generated during operation.
  • Predictable Pause: It allows users to specify that within a time segment of M milliseconds, the time spent on GC should not exceed N milliseconds.

3. Memory allocation and recovery strategy

Minor GC and Full GC

  • Minor GC: Recycle the new generation, because the survival time of the new generation objects is very short, so Minor GC will be executed frequently, and the execution speed will generally be faster.

  • Full GC: Recycle the old generation and the new generation. The old generation objects have a long survival time, so Full GC is rarely executed, and the execution speed will be much slower than Minor GC.

memory allocation strategy

1. Objects are first allocated in Eden

In most cases, objects are allocated in the new generation Eden. When the Eden space is not enough, Minor GC is initiated.

2. Large objects directly enter the old generation

Large objects refer to objects that require continuous memory space. The most typical large objects are long strings and arrays.

Often large objects will trigger garbage collection in advance to obtain enough contiguous space to allocate to large objects.

-XX:PretenureSizeThreshold, objects larger than this value are directly allocated in the old age, avoiding a large amount of memory copying between Eden and Survivor.

3. Long-lived objects enter the old age

Define the age counter for the object. The object is born in Eden and still survives after Minor GC. It will be moved to Survivor, and the age will be increased by 1 year. If it reaches a certain age, it will be moved to the old age.

-XX:MaxTenuringThreshold is used to define the age threshold.

4. Dynamic object age determination

The virtual machine does not always require that the age of the object must reach MaxTenuringThreshold to be promoted to the old age. If the sum of the size of all objects of the same age in Survivor is greater than half of the Survivor space, objects whose age is greater than or equal to this age can directly enter the old age without waiting for the age required in MaxTenuringThreshold.

5. Space Allocation Guarantee

Before Minor GC occurs, the virtual machine first checks whether the maximum available continuous space in the old generation is greater than the total space of all objects in the new generation. If the condition is true, then Minor GC can be confirmed to be safe.

If not, the virtual machine will check whether the value of HandlePromotionFailure allows guarantee failure. If it is allowed, it will continue to check whether the maximum available continuous space in the old generation is greater than the average size of objects promoted to the old generation. If it is larger, it will try to perform a Minor GC;

Trigger conditions for Full GC

For Minor GC, its trigger condition is very simple, when the Eden space is full, a Minor GC will be triggered. The Full GC is relatively complicated, with the following conditions:

1. Call System.gc()

It is only recommended that the virtual machine execute Full GC, but the virtual machine does not necessarily execute it. This approach is not recommended, instead let the virtual machine manage the memory.

2. Insufficient space in the old generation

Common scenarios where there is insufficient space in the old generation are the large objects mentioned above directly entering the old generation, and objects that have survived for a long time entering the old generation.

In order to avoid the Full GC caused by the above reasons, you should try not to create too large objects and arrays. In addition, you can use the -Xmn virtual machine parameter to increase the size of the new generation, so that objects are recycled in the new generation as much as possible and do not enter the old generation. You can also use -XX:MaxTenuringThreshold to increase the age at which the object enters the old generation, so that the object can survive for a longer period of time in the new generation.

3. Space allocation guarantee failed

Minor GC using the copy algorithm needs the memory space of the old generation as a guarantee, and if the guarantee fails, a Full GC will be executed. Please refer to Section 5 above for details.

4. JDK 1.7 and previous permanent generation space is insufficient

In JDK 1.7 and before, the method area in the HotSpot virtual machine is implemented by the permanent generation, which stores some Class information, constants, static variables and other data.

When there are many classes to be loaded, reflected classes, and methods to call in the system, the permanent generation may be full, and Full GC will also be executed if it is not configured to use CMS GC. If it still cannot be recycled after Full GC, then the virtual machine will throw java.lang.OutOfMemoryError.

In order to avoid the Full GC caused by the above reasons, the available methods are to increase the permanent generation space or switch to CMS GC.

5. Concurrent Mode Failure

During the execution of CMS GC, there are objects to be put into the old generation at the same time, and the space in the old generation is insufficient at this time (probably due to too much floating garbage during the GC process, resulting in a temporary insufficient space), a Concurrent Mode Failure error will be reported and a Full GC will be triggered.

4. Class loading mechanism

Classes are dynamically loaded when they are first used at runtime, rather than all classes being loaded at once. Because if it is loaded at one time, it will take up a lot of memory.

class life cycle


It includes the following 7 stages:

  • Loading
  • Verification
  • Preparation
  • Resolution
  • Initialization
  • Using
  • Unloading

class loading process

Contains five stages of loading, verification, preparation, parsing and initialization.

1. load

Loading is a stage of class loading, be careful not to confuse it.

The loading process does three things:

  • Get the binary byte stream defining the class by its fully qualified name.
  • Convert the static storage structure represented by the byte stream into the runtime storage structure of the method area.
  • Generate a Class object representing this class in the memory as the access entry for various data of this class in the method area.

The binary byte stream can be obtained from the following ways:

  • Read from a ZIP package, which forms the basis for JAR, EAR, WAR formats.
  • Obtained from the network, the most typical application is Applet.
  • Run-time computation generates, for example, dynamic proxy technology, the binary byte stream of the proxy class using ProxyGenerator.generateProxyClass in java.lang.reflect.Proxy.
  • It is generated by other files, for example, the corresponding Class class is generated by a JSP file.

2. Verification

Make sure that the information contained in the byte stream of the Class file meets the requirements of the current virtual machine and will not endanger the security of the virtual machine itself.

3. Prepare

Class variables are variables modified by static. In the preparation stage, memory is allocated for class variables and initial values ​​are set, using the memory in the method area.

Instance variables will not allocate memory at this stage, it will be allocated in the heap along with the object when the object is instantiated. It should be noted that instantiation is not a process of class loading. Class loading occurs before all instantiation operations, and class loading is only performed once, and instantiation can be performed multiple times.

The initial value is generally 0. For example, the class variable value below is initialized to 0 instead of 123.

public static int value = 123;

If the class variable is constant, then it will be initialized to the value defined by the expression instead of 0. For example the constant value below is initialized to 123 instead of 0.

public static final int value = 123;

4. Analysis

The process of replacing a symbolic reference to a constant pool with a direct reference.

Among them, the parsing process can start after the initialization phase in some cases, which is to support Java's dynamic binding.

### 5. Initialization
The initialization phase really begins to execute the Java program code defined in the class. The initialization phase is the process in which the virtual machine executes the <clinit>() method of the class constructor. In the preparation stage, the class variable has been assigned the initial value required by the system once, and in the initialization stage, the class variable and other resources are initialized according to the subjective plan formulated by the programmer through the program.

<clinit>() is generated by the combination of the assignment actions of all class variables in the class automatically collected by the compiler and the statements in the static statement block. The order of collection by the compiler is determined by the order in which the statements appear in the source file. Special attention is that the static statement block can only access the class variables defined before it, and the class variables defined after it can only be assigned and cannot be accessed. For example the following code:

public class Test {
    
    
    static {
    
    
        i = 0;                // 给变量赋值可以正常编译通过
        System.out.print(i);  // 这句编译器会提示“非法向前引用”
    }
    static int i = 1;
}

Since the <clinit>() method of the parent class is executed first, it means that the execution of the static statement blocks defined in the parent class takes precedence over that of the subclass. For example the following code:

static class Parent {
    
    
    public static int A = 1;
    static {
    
    
        A = 2;
    }
}

static class Sub extends Parent {
    
    
    public static int B = A;
}

public static void main(String[] args) {
    
    
     System.out.println(Sub.B);  // 2
}

Static statement blocks cannot be used in interfaces, but there are still assignment operations for class variable initialization, so interfaces, like classes, will generate the <clinit>() method. However, the difference between an interface and a class is that the <clinit>() method of the interface does not need to execute the <clinit>() method of the parent interface first. The parent interface is initialized only when variables defined in the parent interface are used. In addition, the implementation class of the interface will not execute the <clinit>() method of the interface when it is initialized.

The virtual machine ensures that the <clinit>() method of a class is correctly locked and synchronized in a multi-threaded environment. If multiple threads initialize a class at the same time, only one thread will execute the <clinit>() method of this class, and other threads will block and wait until the active thread finishes executing the <clinit>() method. If there is a time-consuming operation in the <clinit>() method of a class, it may cause multiple threads to block, which is very hidden in the actual process.

Class initialization time

1. Unsolicited citations

There is no mandatory constraint on when to load in the virtual machine specification, but the specification strictly stipulates that there are only five situations in which the class must be initialized (loading, verification, and preparation will follow):

  • When encountering the four bytecode instructions of new, getstatic, putstatic, and invokestatic, if the class has not been initialized, its initialization must be triggered first. The most common scenarios for generating these 4 instructions are: when using the new keyword to instantiate an object; when reading or setting a static field of a class (except for static fields that are final modified and have put the result into the constant pool at compile time); and when calling a static method of a class.

  • When using the methods of the java.lang.reflect package to make reflective calls to a class, if the class has not been initialized, its initialization needs to be triggered first.

  • When initializing a class, if you find that its parent class has not been initialized, you need to trigger the initialization of its parent class first.

  • When the virtual machine starts, the user needs to specify a main class to be executed (the class containing the main() method), and the virtual machine initializes the main class first;

  • When using the dynamic language support of JDK 1.7, if the final analysis result of a java.lang.invoke.MethodHandle instance is a method handle of REF_getStatic, REF_putStatic, REF_invokeStatic, and the class corresponding to this method handle has not been initialized, you need to trigger its initialization first;

2. Passive references

The behavior in the above five scenarios is called an active reference to a class. In addition, all ways of referencing the class will not trigger initialization, which is called passive reference. Common examples of passive citations include:

  • Referring to the static fields of the parent class through the subclass will not cause the subclass to be initialized.
System.out.println(SubClass.value);  // value 字段在 SuperClass 中定义
  • Referencing a class through an array definition does not trigger initialization of the class. This process will initialize the array class, which is a subclass automatically generated by the virtual machine and directly inherited from Object, which contains the properties and methods of the array.
SuperClass[] sca = new SuperClass[10];
  • Constants will be stored in the constant pool of the calling class during the compilation phase, and in essence, they do not directly refer to the class that defines the constant, so the initialization of the class that defines the constant will not be triggered.
System.out.println(ConstClass.HELLOWORLD);

Classes and class loaders

For two classes to be equal, the classes themselves must be equal and loaded using the same class loader. This is because each class loader has an independent class namespace.

The equality here includes the return result of the equals() method, isAssignableFrom() method and isInstance() method of the Class object of the class is true, and also includes the use of the instanceof keyword to determine the object ownership relationship to be true.

Class loader classification

From the perspective of the Java virtual machine, there are only two different class loaders:

  • Bootstrap ClassLoader, implemented in C++, is part of the virtual machine itself;

  • The loaders of all other classes are implemented in Java, independent of the virtual machine, and inherit from the abstract class java.lang.ClassLoader.

From a Java developer's point of view, class loaders can be more finely divided:

  • Bootstrap ClassLoader (Bootstrap ClassLoader) This type of loader is responsible for loading the class library stored in the <JRE_HOME>\lib directory, or in the path specified by the -Xbootclasspath parameter, and recognized by the virtual machine (recognized only by the file name, such as rt. The startup class loader cannot be directly referenced by the Java program. When the user writes a custom class loader, if he needs to delegate the loading request to the startup class loader, he can directly use null instead.

  • Extension ClassLoader (Extension ClassLoader) This class loader is implemented by ExtClassLoader (sun.misc.Launcher$ExtClassLoader). It is responsible for loading all class libraries in <JAVA_HOME>/lib/ext or the path specified by the java.ext.dir system variable into memory, and developers can directly use the extended class loader.

  • Application ClassLoader (Application ClassLoader) This class loader is implemented by AppClassLoader (sun.misc.Launcher$AppClassLoader). Since this class loader is the return value of the getSystemClassLoader() method in ClassLoader, it is generally called the system class loader. It is responsible for loading the class library specified on the user's class path (ClassPath). Developers can use this class loader directly. If the application has not customized its own class loader, generally this is the default class loader in the program.

## Parent delegation model

The application program is implemented by the cooperation of three class loaders to achieve class loading. In addition, you can also add your own defined class loader.

The figure below shows the hierarchical relationship between class loaders, known as the Parents Delegation Model. This model requires that, except for the top-level startup class loader, all other class loaders must have their own parent class loader. The parent-child relationship here is generally realized through composition rather than inheritance.


1. Working process

A class loader first forwards class loading requests to the parent class loader, and only tries to load it itself if the parent class loader cannot complete.

2. Benefits

Make the Java class have a hierarchical relationship with priority along with its class loader, so that the basic class can be unified.

For example, java.lang.Object is stored in rt.jar. If you write another java.lang.Object and put it in ClassPath, the program can be compiled and passed. Because of the parental delegation model, Objects in rt.jar have higher priority than Objects in ClassPath because Objects in rt.jar use the startup class loader, while Objects in ClassPath use the application class loader. The Object in rt.jar has a higher priority, so all the Objects in the program are this Object.

3. Realize

The following is the code snippet of the abstract class java.lang.ClassLoader, in which the loadClass() method operates as follows: first check whether the class has been loaded, if not, let the parent class loader to load it. When the parent class loader fails to load, a ClassNotFoundException is thrown, and at this time try to load it yourself.

public abstract class ClassLoader {
    
    
    // The parent class loader for delegation
    private final ClassLoader parent;

    public Class<?> loadClass(String name) throws ClassNotFoundException {
    
    
        return loadClass(name, false);
    }

    protected Class<?> loadClass(String name, boolean resolve) throws ClassNotFoundException {
    
    
        synchronized (getClassLoadingLock(name)) {
    
    
            // First, check if the class has already been loaded
            Class<?> c = findLoadedClass(name);
            if (c == null) {
    
    
                try {
    
    
                    if (parent != null) {
    
    
                        c = parent.loadClass(name, false);
                    } else {
    
    
                        c = findBootstrapClassOrNull(name);
                    }
                } catch (ClassNotFoundException e) {
    
    
                    // ClassNotFoundException thrown if class not found
                    // from the non-null parent class loader
                }

                if (c == null) {
    
    
                    // If still not found, then invoke findClass in order
                    // to find the class.
                    c = findClass(name);
                }
            }
            if (resolve) {
    
    
                resolveClass(c);
            }
            return c;
        }
    }

    protected Class<?> findClass(String name) throws ClassNotFoundException {
    
    
        throw new ClassNotFoundException(name);
    }
}

Custom class loader implementation

FileSystemClassLoader in the following code is a custom class loader, inherited from java.lang.ClassLoader, used to load classes on the file system. It first looks for the byte code file (.class file) of the class on the file system according to the full name of the class, then reads the content of the file, and finally converts these byte codes into instances of the java.lang.Class class through the defineClass() method.

The loadClass() of java.lang.ClassLoader implements the logic of the parental delegation model. Custom class loaders generally do not rewrite it, but they need to rewrite the findClass() method.

public class FileSystemClassLoader extends ClassLoader {
    
    

    private String rootDir;

    public FileSystemClassLoader(String rootDir) {
    
    
        this.rootDir = rootDir;
    }

    protected Class<?> findClass(String name) throws ClassNotFoundException {
    
    
        byte[] classData = getClassData(name);
        if (classData == null) {
    
    
            throw new ClassNotFoundException();
        } else {
    
    
            return defineClass(name, classData, 0, classData.length);
        }
    }

    private byte[] getClassData(String className) {
    
    
        String path = classNameToPath(className);
        try {
    
    
            InputStream ins = new FileInputStream(path);
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            int bufferSize = 4096;
            byte[] buffer = new byte[bufferSize];
            int bytesNumRead;
            while ((bytesNumRead = ins.read(buffer)) != -1) {
    
    
                baos.write(buffer, 0, bytesNumRead);
            }
            return baos.toByteArray();
        } catch (IOException e) {
    
    
            e.printStackTrace();
        }
        return null;
    }

    private String classNameToPath(String className) {
    
    
        return rootDir + File.separatorChar
                + className.replace('.', File.separatorChar) + ".class";
    }
}

References

Guess you like

Origin blog.csdn.net/yang134679/article/details/131798449