Java interview must-test points--Lecture 03: Understanding JVM in simple terms

The topic of this class is JVM principles. JVM is the basis for running Java programs. You will definitely encounter JVM-related questions during interviews. This lesson will first summarize and introduce the JVM inspection points in the interview. Then the three knowledge points of the JVM memory model, Java's class loading mechanism, and commonly used GC algorithms are explained in detail. Finally, we summarize the JVM inspection points and bonus points, as well as the actual interview questions for this part of the knowledge.

Summary of JVM knowledge points

First, let’s take a look at the summary of JVM knowledge points.

As shown in the figure above, there are 6 general directions in JVM knowledge points, among which memory model, class loading mechanism, and GC garbage collection are the more important ones. The performance tuning part focuses on practical applications, focusing on practical capabilities. The compiler optimization and execution mode part focuses on theoretical foundations and mainly masters knowledge points.

The knowledge points you need to know about each part are as follows.

  • Memory model: the role of the program counter, method area, heap, stack, local method stack, and what data is saved.

  • Class loading: Parental delegation loading mechanism, and which types of classes are loaded by common class loaders.

  • GC: The idea and basis of generational recycling, as well as the ideas and suitable scenarios for implementing different garbage collection algorithms.

  • Performance tuning: The role of commonly used JVM optimization parameters, the basis for parameter tuning, what types of problems can commonly used JVM analysis tools be analyzed, and how to use them.

  • Execution mode: The advantages and disadvantages of interpretation, compilation, and mixed modes, and the layered compilation technology provided by Java7. You need to know JIT just-in-time compilation technology and OSR (on-stack replacement), and know the scenarios that the C1 and C2 compilers are aimed at. Among them, C2 is aimed at the Server mode and has more radical optimization. In terms of new technologies, you can learn about the Graal compiler implemented in Java provided by Java10.

  • Compilation optimization: the compilation process of the front-end compiler javac, AST abstract syntax tree, compile-time optimization and run-time optimization. Commonly used techniques for compilation optimization include elimination of common subexpressions, method inlining, escape analysis, allocation on the stack, synchronization elimination, etc. Only by understanding these can we write compiler-friendly code.

     JVM 的内容相对来说比较集中,但是对知识深度的掌握要求较高。
    
Detailed explanation of JVM memory model

The JVM memory model mainly refers to the runtime data area, including 5 parts, as shown in the figure below.

  • The stack is also called the method stack, which is private to the thread. When the thread executes each method, it will create a stack frame at the same time, which is used to store information such as local variable tables, operation stacks, dynamic links, method exits, etc. Pushing is performed when the method is called, and popping is performed when the method returns.

  • The local method stack is similar to the stack and is also used to save information when the thread executes methods. The difference is that the stack is used to execute Java methods, while the local method stack is used to execute native methods.

  • The program counter saves the bytecode location executed by the current thread. Each thread has an independent counter when working. The program counter serves the execution of Java methods. When a native method is executed, the program counter is empty.

     栈、本地方法栈、程序计数器这三个部分都是线程独占的。
    
  • The heap is the largest piece of memory managed by the JVM. The heap is shared by all threads and is used to store object instances. Almost all object instances are allocated here. When there is no free space in the heap memory, an OOM exception is thrown. According to the different life cycles of objects, the JVM manages the heap memory in generations, and the garbage collector manages the recycling of objects.

  • The method area is also a memory area shared by each thread, also called a non-heap area. Used to store data such as class information, constants, static variables, and code compiled by the just-in-time compiler that have been loaded by the virtual machine. The permanent generation in JDK 1.7 and the Metaspace in JDK 1.8 are both implementations of the method area.

     面试回答这个问题时,要答出两个要点:一个是各部分的功能,另一个是哪些线程共享,哪些独占。
    
Detailed explanation of JMM memory visibility

JMM is a Java memory model, which is different from the JVM memory model just mentioned. The main goal of JMM is to define the access rules for variables in the program. As shown in the figure below, all shared variables are stored and shared in the main memory. Each thread has its own working memory. What is stored in the working memory is a copy of the variables in the main memory. Operations such as reading and writing variables by the thread must be performed in its own working memory, and cannot directly read and write variables in the main memory.

When multi-threads interact with data, for example, after thread A assigns a value to a shared variable, thread B reads the value. After A modifies the variable, it is modified in its own workspace memory. B is invisible and can only be read from A. B's workspace is written back to the main memory, and B then reads its own workspace from the main memory before further operations can be performed. Due to the existence of instruction reordering, this write-read sequence may be disrupted. Therefore, JMM needs to provide guarantees of atomicity, visibility, and orderliness.

Detailed explanation of JMM guarantee

As shown in the figure below, let's see how JMM ensures atomicity, visibility, and orderliness.

atomicity

JMM guarantees that read and write operations on basic data types except long and double are atomic. In addition, the keyword synchronized can also provide atomicity guarantee. The atomicity of synchronized is guaranteed by Java's two advanced bytecode instructions monitorenter and monitorexit.

visibility

One of the JMM visibility guarantees is synchronized, and the other is volatile. The assignment of a volatile forced variable will be flushed back to the main memory synchronously, and the read of the forced variable will be reloaded from the main memory, ensuring that different threads can always see the latest value of the variable.

Orderliness

Orderliness is guaranteed mainly through volatile and a series of happens-before principles. Another function of volatile is to prevent instruction reordering, thus ensuring the orderliness of variable reading and writing.

The happens-before principle includes a series of rules such as:

  • Program sequence principle, that is, semantic seriality must be guaranteed within a thread;

  • Lock rules, that is, unlocking the same lock must occur before locking it again;

  • Transitivity of happens-before principle, thread startup, interruption, termination rules, etc.

Detailed explanation of class loading mechanism

Class loading refers to reading the bytecode in the compiled Class class file into memory, placing it in the method area and creating the corresponding Class object. Class loading is divided into loading, linking, and initialization, and linking includes three steps: verification, preparation, and parsing. As shown below.

  1. Loading is the process of loading files into memory. Find such bytecode files by their fully qualified names and create a Class object from the bytecode files.

  2. Verification is to verify the content of class files. The purpose is to ensure that the Class file meets the current virtual machine requirements and will not endanger the security of the virtual machine itself. It mainly includes four types: file format verification, metadata verification, bytecode verification, and symbol reference verification.

  3. The preparation phase is memory allocation. Allocate memory for class variables, that is, variables modified by static in the class, and set initial values. It should be noted here that the initial value is 0 or null, not the specific value set in the code. The value set in the code is completed during the initialization phase. In addition, static variables modified with final are not included here, because final will be allocated during compilation.

  4. Parsing mainly involves parsing fields, interfaces, and methods. Mainly the process of replacing symbol references in the constant pool with direct references. A direct reference is a pointer, relative offset, etc. directly pointing to the target.

  5. Initialization mainly completes the execution of static blocks and the assignment of static variables. This is the final stage of class loading. If the parent class of the loaded class has not been initialized, the parent class will be initialized first.

    只有对类主动使用时,才会进行初始化,初始化的触发条件包括在创建类的实例时、访问类的静态方法或者静态变量时、Class.forName() 反射类时、或者某个子类被初始化时。
    

As shown in the figure above, the two light green parts represent the life cycle of the class, which is from the loading of the class to the creation and use of class instances, to the time when the class object is no longer used and can be unloaded and recycled by GC. One thing to note here is that classes loaded by the three class loaders that come with the Java virtual machine will not be unloaded during the entire life cycle of the virtual machine. Only classes loaded by user-defined class loaders can be unloaded. .

Detailed explanation of class loader

As shown in the figure above, the three class loaders that come with Java are: BootStrap startup class loader, extension class loader and application loader (also called system loader). The orange text on the right side of the picture indicates the loading directories corresponding to various loaders. The startup class loader loads classes in the lib directory in Java home, the extension loader is responsible for loading classes in the ext directory, and the application loader loads classes in the directory specified by classpath. In addition to this, the class loader can be customized.

Java's class loading uses the parent delegation mode, that is, when a class loader loads a class, it first delegates the request to its own parent class loader for execution. If the parent class loader still has a parent class loader, it will continue to delegate upwards. , until the top-level startup class loader, as shown by the blue upward arrow in the picture above. If the parent class loader can complete the class loading, it will return successfully. If the parent class loader cannot complete the loading, then the child loader will try to load it by itself. Like the orange downward arrow in the picture.

The benefit of this parental delegation model is that it can avoid repeated loading of classes, and it also prevents Java's core API from being tampered with.

Detailed explanation of generational recycling

Java's heap memory is managed by generations. Why is it managed by generations? Generational management is mainly to facilitate garbage collection. This is based on two facts. First, most objects will no longer be used soon; second, there are still some objects that will not be immediately useless, but will not last for a long time.

The virtual machine is divided into the young generation, the old generation, and the permanent generation, as shown in the figure below.

  • The young generation is mainly used to store newly created objects. The young generation is divided into Eden area and two Survivor areas. Most objects are generated in the Eden area. When the Eden area is full, surviving objects will be saved alternately in the two Survivor areas. Objects that reach a certain number of times will be promoted to the old generation.

  • The old generation is used to store objects that have been promoted from the young generation and have a longer survival time.

  • The permanent generation mainly stores class information and other contents. The permanent generation here refers to the object division method, not specifically to the PermGen of 1.7 or the Metaspace after 1.8.

     根据年轻代与老年代的特点,JVM 提供了不同的垃圾回收算法。垃圾回收算法按类型可以分为引用计数法、复制法和标记清除法。
    
  • The reference counting method determines whether an object is used by the number of times it is referenced. The disadvantage is that it cannot solve the problem of circular references.

  • The copy algorithm requires two memory spaces of the same size, from and to. Object allocation is only performed in the from block. During recycling, the surviving objects are copied to the to block and the from block is emptied. Then the division of labor between the two blocks is exchanged, that is, the from block is block as the to block, and the to block as the from block. The disadvantage is lower memory usage.

  • The mark and clear algorithm is divided into two stages: marking objects and clearing objects that are no longer in use. The disadvantage of the mark and clear algorithm is that it will produce memory fragmentation.

     JVM 中提供的年轻代回收算法 Serial、ParNew、Parallel Scavenge 都是复制算法,而 CMS、G1、ZGC 都属于标记清除算法。
    
Detailed explanation of CMS algorithm

Based on the generational recycling theory, several typical garbage collection algorithms are introduced in detail. Let's look at the CMS recycling algorithm first. CMS can be said to be the most mainstream garbage collection algorithm before JDK1.7. CMS uses a mark and clear algorithm, which has the advantage of concurrent collection and small pauses.

The CMS algorithm is shown in the figure below.

  1. The first stage is initial marking. This stage will stop the world. The marked objects are only the most directly reachable objects from the root set;

  2. The second stage is concurrent marking, when the GC thread and the application thread execute concurrently. Mainly to mark reachable objects;

  3. The third stage is the re-marking stage. This stage is the second stop the world stage. The pause time is much smaller than concurrent marking, but slightly longer than initial marking. It mainly rescans and marks objects;

  4. The fourth stage is the concurrent cleaning stage, which performs concurrent garbage cleaning;

  5. The last phase is the concurrent reset phase, which resets related data structures for the next GC.

Detailed explanation of G1 algorithm

G1 became the default garbage collection algorithm of the JVM after version 1.9. The characteristic of G1 is to reduce pauses while maintaining a high recovery rate.

The G1 algorithm cancels the physical division of the young generation and the old generation in the heap, but it still belongs to the generational collector. The G1 algorithm divides the heap into several regions, called Regions, as shown in the small squares in the figure below. Part of the area is used for the young generation, part is used for the old generation, and there is also a special partition used to store giant objects.

Like CMS, G1 will traverse all objects, and then mark the object references. After clearing the objects, it will copy and move the area to consolidate the fragmented space.

The G1 recycling process is as follows.

  1. G1's young generation recycling uses a copy algorithm to collect in parallel, and the collection process will be STW.

  2. When the old generation of G1 is recycled, the young generation will also be recycled. It is mainly divided into four stages:

  3. It is still the initial marking phase to complete the marking of the root object. This process is STW;

  4. Concurrent marking phase, this phase is executed in parallel with the user thread;

  5. The final marking stage completes the three-color marking cycle;

  6. In the copy/purge phase, this phase will give priority to recycling Regions with larger recyclable space, that is, garbage first, which is also the origin of the name G1.

G1 uses incremental cleaning that only cleans part of the Region at a time instead of all of it, thereby ensuring that each GC pause time will not be too long.

To summarize, G1 is a logical generation rather than a physical division. You need to know the recycling process and the pause stage. In addition, you need to know that the G1 algorithm allows you to set the Region size through JVM parameters, ranging from 1 to 32MB, and you can set the expected maximum GC pause time, etc. Interested readers can also have a brief understanding of the three-color marking algorithm used by CMS and G1.

Detailed explanation of ZGC
ZGC Features

ZGC is an efficient garbage collection algorithm provided in the latest JDK1.11 version. ZGC is designed for large heap memory and can support TB-level heaps. ZGC is very efficient and can achieve a recycling pause time of less than 10ms.

How does ZGC achieve such a fast response? This is due to the following characteristics of ZGC.

  1. ZGC uses colored pointer technology. We know that on a 64-bit platform, the available bits of a pointer are 64 bits. ZGC limits the maximum support for a 4TB heap. In this way, only 42 bits are needed for addressing, and the remaining 22 bits can be used for storage. Additional information, colored pointer technology is to use the extra information bits of the pointer to color the object on the pointer.

  2. The second feature is the use of read barriers. ZGC uses read barriers to solve the problem that GC threads and application threads may concurrently modify the object state, instead of simply and crudely performing global locking through STW. Using read barriers will only potentially slow down the processing of a single object.

  3. Due to the function of the read barrier, most of the time STW is not required for garbage collection. Therefore, most of the time ZGC is processed concurrently, which is the third feature of ZGC.

  4. The fourth feature is based on Region, which is the same as the G1 algorithm. However, although Regions are also divided, generations are not divided. The Region of ZGC is not fixed in size like G1, but the size of the Region is dynamically determined, and the Region can be created and destroyed dynamically. This allows for better allocation management of large objects.

  5. The fifth feature is compression and finishing. When the CMS algorithm cleans up objects and recycles them in situ, there will be memory fragmentation issues. Like G1, ZGC will also move and merge objects in the Region after recycling, which solves the fragmentation problem.

    虽然 ZGC 的大部分时间是并发进行的,但是还会有短暂的停顿。来看一下 ZGC 的回收过程。
    
ZGC recycling process

As shown in the figure below, the ZGC algorithm is used for recycling, looking from top to bottom. In the initial state, the entire heap space is divided into many Regions of different sizes, namely the green squares in the picture.

When starting to recycle, ZGC will first perform a short STW to mark the roots. This step is very short because the total number of roots is usually small.

Then the concurrent marking begins. As shown in the figure above, the object pointer is colored to mark, and the read barrier is combined to solve the concurrency problem of a single object. In fact, there will still be a very short STW pause at the end of this stage to deal with some edge cases. This stage is conducted concurrently most of the time, so this pause is not clearly marked.

The next step is the cleanup phase, which will recycle objects marked as no longer in use. As shown in the figure above, the orange objects that are no longer in use are recycled.

The last stage is relocation. Relocation is to move the surviving objects after GC to release large blocks of memory space and solve the fragmentation problem.

Relocation will initially have a short STW used to relocate the root object in the collection. The pause time depends on the number of roots, the ratio of the relocation set to the object's total active set.

Finally, there is concurrent relocation. This process is also performed concurrently with the application thread through the read barrier.

Inspection points and bonus points
Inspection point

A summary of the JVM-related interview inspection points is as follows:

  1. Have an in-depth understanding of the JVM memory model and Java memory model;

  2. To understand the class loading process and the parent delegation mechanism;

  3. To understand memory visibility and the Java memory model's guarantee mechanism for atomicity, visibility, and orderliness;

  4. It is necessary to understand the characteristics, execution process, and applicable scenarios of commonly used GC algorithms. For example, G1 is suitable for occasions that require maximum latency, and ZGC is suitable for large memory services in 64-bit systems;

  5. It is necessary to understand the commonly used JVM parameters, understand the impact of adjusting different parameters, and what scenarios are applicable, such as the number of concurrent garbage collections, biased lock settings, etc.

bonus

If you want to make a better impression on your interviewer, pay attention to these bonus points.

  1. If you have an in-depth understanding of compiler optimization, it will make the interviewer feel that you are pursuing technical depth. For example, you know how to make reasonable use of stack allocation to reduce GC pressure when programming, how to write code suitable for inline optimization, etc.

  2. It would be better if you have experience or ideas for troubleshooting practical online problems. Interviewers like students with strong hands-on skills. For example, I have solved frequent FullGC problems online and troubleshooted memory leaks.

  3. If there are JVM optimization practices or optimization ideas for specific scenarios, there will be unexpected effects. For example, for high-concurrency and low-latency scenarios, how to adjust GC parameters to minimize GC pause time, how to maximize the throughput rate of queue processors, etc.;

  4. If you have some understanding of the latest JVM technology trends, it will also leave a deep impression on the interviewer. For example, understand the efficient implementation principle of ZGC, understand the characteristics of Graalvm, etc.

Summary of real questions

To summarize the real interview questions related to JVM, the first part of the real questions is as follows. You can focus on practicing after class.

The problem-solving ideas are as follows.

  • Question 1: Java memory model has been mentioned before. When answering this question during the interview, remember to confirm with the interviewer whether you want to answer the JVM memory model or Java's memory access model. Don't answer wrongly.

  • Question 2 requires a review of the scenarios under which FullGC will be triggered, such as insufficient space in the old generation when the young generation is promoted, insufficient space in the permanent generation, etc.

  • Questions 3 to 6 have been explained previously, so they will not be repeated.

The real questions for the second part are as follows.

The problem-solving ideas are as follows.

  • Question 7 volatile should focus on two points: forcing main memory read and write synchronization and preventing instruction reordering.

  • Questions 8 and 9 have been discussed before.

  • Question 10 focuses on introducing the four types of references: strong, weak, soft, and virtual, and how they are handled in the GC.

  • In question 11, you can learn about the functions of several tools that come with Java, such as the flight recorder in JMC, the heap analysis tool MAT, the thread analysis tool jstack, and jmap for obtaining heap information.

This lesson is over. The next lesson will explain another very important content of Java: multi-threading.

Guess you like

Origin blog.csdn.net/g_z_q_/article/details/129758562
Recommended