Simple analysis of Safepoint in JVM

ChatGPT has been soaring for 160 days, and the world is no longer what it used to be.

A new artificial intelligence Chinese website https://ai.weoknow.com
updates the available chatGPT resources available in China every day


1

Get to know Safepoint in Safepoint-GC

The earliest contact with the concept of safe points in the JVM was when I read the contents of the garbage collector chapter in the book "In-depth Understanding of Java Virtual Machines". I believe that most people are the same, and they have a preliminary understanding of security points for the first time through this method. Might as well, first review the content of the chapter on security points in the book "In-depth Understanding of Java Virtual Machine".

The book introduces the introduction of security points in the chapter explaining the garbage collector-garbage collection algorithm. In order to quickly and accurately complete the GC Roots enumeration and avoid the waste of a large amount of storage space caused by generating a corresponding OopMap for each instruction, only in " Specific location" to generate the corresponding OopMap, these locations are called security points. Then, the book mentions that the selection criteria for the position of the safe point is: whether the program can be executed for a long time; therefore, safe points will be generated in method calls, loop jumps, exception jumps, etc.

The book also mentions how the JVM stops user threads at the nearest safe point during GC: preemptive interrupts and active interrupts. Preemptive interruption does not require the thread's execution code to actively cooperate. When GC occurs, the system first interrupts all user threads. It will be interrupted again until it reaches a safe point. The idea of ​​active interruption is that when the GC needs to interrupt the thread, it does not directly operate on the thread, but simply sets a flag bit, and each thread actively polls this flag during execution. Once the interrupt flag is found to be true, it will Actively breaks the suspension at the nearest safe point by itself. Basically all virtual machine implementations now use active interrupts to suspend threads in response to GC events.

Summarize the knowledge points learned from the first security point:

  • The JVM GC needs to let the user thread stop at the safe point (Stop The World)

  • The JVM will place safe points in method calls, loop jumps, exception jumps, etc.

  • The JVM reaches the global STW through active interruption: set a flag, and each thread actively polls the flag during execution. Once the interrupt flag is found to be true, it actively interrupts and suspends at the nearest safe point.

The above is basically all the introductions to JVM security points in the book "In-depth Understanding of Java Virtual Machine".

Later, I found that after some online problems and interesting examples about security points on the Internet, I found that security points are not simple, and not only GC can use security points; if simple code is not written properly, security points will also bring There are some inexplicable problems; its implementation inside the JVM and the optimization of it by JIT are often confusing. This article tries to gain a more comprehensive understanding of security points through a simple sample code and ask a few more questions on the basis of the known knowledge points after the initial understanding of security points.

2

In-depth analysis of Safepoint through a sample code

2.1 Sample code

This sample code can be directly copied to run locally, and the operating environment for all sample codes in this article is jdk 1.8.

public class SafePointTest {
   
   
    public static AtomicInteger counter = new AtomicInteger(0);
    public static void main(String[] args) throws Exception{
   
           long startTime = System.currentTimeMillis();        Runnable runnable = () -> {
   
               System.out.println(interval(startTime) + "ms后," + Thread.currentThread().getName() + "子线程开始运行");            for(int i = 0; i < 100000000; i++) {
   
                   counter.getAndAdd(1);            }            System.out.println(interval(startTime) + "ms后," + Thread.currentThread().getName() + "子线程结束运行, counter=" + counter);        };
        Thread t1 = new Thread(runnable, "zz-t1");        Thread t2 = new Thread(runnable, "zz-t2");
        t1.start();        t2.start();
        System.out.println(interval(startTime) + "ms后,主线程开始sleep.");
        Thread.sleep(1000L);
        System.out.println(interval(startTime) + "ms后,主线程结束sleep.");        System.out.println(interval(startTime) + "ms后,主线程结束,counter:" + counter);    }
    private static long interval(Long startTime) {
   
           return System.currentTimeMillis() - startTime;    }}

In the sample code, the main thread starts two sub-threads, and then the main thread sleeps for 1s, and observes the execution of the main thread and the sub-threads by printing the time.

It stands to reason that the main thread and the two sub-threads are concurrent independently, without any explicit dependencies, and the execution of the main thread will not be affected by the sub-threads: the main thread will end directly after sleeping. But the execution result is not the same as expected.

The execution result is shown in the animation below:

, duration 00:15

Judging from the execution results, the main thread enters the sleep state after starting two threads. The sleep time specified in the code is 1s, but the main thread does not end the sleep until more than 3s later. What caused the main thread to oversleep? From the results, the end time of the main thread sleep is the same as the end time of the child thread. Therefore, we have reason to suspect that the main thread did not end early on time and should be blocked by two child threads.

2.2 Give the conclusion first

Because some operations of VMThread require STW, the main thread enters the JVM global safe point before sleep ends, and then the main thread waits for all other threads to enter the safe point, so the main thread is blocked by other threads that have not entered the safe point for a long time .

2.3 Verification conclusion

Add the JVM to print the safe point log parameter -XX:+PrintSafepointStatisticsand then execute the above example code, the result is as shown in the screenshot below:

It can be seen from the safe point log that the JVM wants to execute no vm operationthis operation, and this operation requires threads to enter the safe point. There are 12 threads during the entire period, and there are two running threads. It is necessary to wait for these two threads to enter the safe point, and the waiting time is time-consuming. 2251ms.

After adding  the parameters -XX:+SafepointTimeout and -XX:SafepointTimeoutDelay=2000 executing the code, you can further see which two threads are waiting to enter the safe point.

Sure enough, as guessed, the two threads that did not reach the safe point are the zz-t1 and zz-t2 threads defined in the sample code.

2.4 why

So far, the reason for the execution result of this example has been concluded and verified, and it is basically known. But if you think about it deeply, the knowledge points you learned when you first learned about security points cannot be explained, so in order to understand why, here are a few reasons why.

(1) Why do you enter the safe point

In other words, what triggers access to a safe point?

The basic knowledge obtained from the initial knowledge of the security point knows that two conditions are required to enter the security point:

  • The active interrupt flag is set by the JVM operation

  • There is a safe point in the running code

The first thing that comes to mind is that the GC triggers the JVM to set the active interrupt flag, and the  -XX:-PrintGCexecution of the sample code does not print the GC log, so the GC can be ruled out.

Since it’s not GC, let’s go back to the security point log to find clues, and found a vmop (virtual machine operation type): , about, some great no vm operationmasters on no vm operationthe Internet got a conclusion by analyzing the JVM source code, here is not a detailed interpretation of the JVM source code, directly to the conclusion:

When the JVM is running normally, if the interval for entering the safe point is set, it will judge whether there is a code cache to be cleared after a period of time, and if so, it will enter the safe point. This trigger condition is not a VM operation, so _vmop_type will be set to -1, and the corresponding  "no vm operation" will be printed when outputting the log , which is the safe point log we see.

When the VM operation is empty, as long as the following three conditions are met, it will also enter the safe point:

1. VMThread is running normally

2. Set the interval time for entering the safe point

3. Whether SafepointALot is true or whether it needs to be cleaned up

Use  Java -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal 2>&1 | grep Safepoint the command to view the default parameters of the JVM about the security point:

It is found that GuaranteedSafepointInterval is set to 1 second by default, and it will try to enter the safe point every 1s.

So, modify GuaranteedSafepointIntervalthe parameter value and see if that prevents access to the safepoint.

GuaranteedSafepointIntervalThe parameter is a JVM diagnostic parameter. To modify the value of this parameter, it needs to be XX:+UnlockDiagnosticVMOptionsused together with -.

In addition, it is not recommended to modify the value of this parameter online.

  • Turn off timed access to safe points

Enter the safe point by  -XX:GuaranteedSafepointInterval = 0 turning off the timing to see how the code runs

It can be seen from the running results that after closing the timer and entering the safe point, the main thread sleeps for 1 second and ends normally without being blocked by other threads. From the safe point log, the two threads waiting to enter the safe point are gone.

  • Increase the interval between timing and entering the safe point

From the printed execution results, it can be seen that the running time of the sub-thread is more than 3s. If the interval between entering the safe point is adjusted to 5s, that is, after the sub-thread ends, will it be possible to avoid waiting for the sub-thread to enter the safe point before trying to enter the safe point? ?

Modify the parameters -XX:GuaranteedSafepointInterval = 5000 to adjust the safe point interval and execute the result again:

From the execution results, it can be seen that increasing the safe point interval has the same effect as turning off the timing to enter the safe point, and it can also avoid waiting for the child thread to enter the safe point.

(2) Where is the main thread a safe point to enter

Judging from the execution results of the sample code in the default JVM parameters, the sleep time of the main thread exceeds 3s. In fact, the main thread Thread.sleep()enters the safe point inside the method. Here is a brief analysis of the source code of the JVM security point implementation:

Safepoint implementation source code: Safepoint.cpp

Reading the source code is too strenuous, let’s read the comments. Fortunately, the answer can also be found in the comments. The comment on the screenshot above says that when the program enters Safepoint, the Java thread may be in five different states, and different processing mechanisms for different states. Suppose there is an operation that triggers a certain VM thread. All threads need to enter SafePoint. If other threads now:

  • Run the bytecode : When running the bytecode, the interpreter will check whether the thread is marked as poll armed, and if so, the VM thread calls  SafepointSynchronize::block(JavaThread *thread)to block.

  • Run native code : When running native code, the VM thread skips this thread, but sets poll armed for this thread, so that after executing native code, it will check whether poll armed, if it still needs to stop at SafePoint, then directly block.

  • Run the JIT-compiled code : Since the compiled machine code is running, directly check whether the local local polling page is dirty, and if it is dirty, a block is required. This feature is after JEP 312: Thread-Local Handshakes introduced in Java 10. It only needs to check whether the local local polling page is dirty.

  • In the BLOCK state : Do not leave the BLOCK state until the operations that require all threads to enter the SafePoint are complete

  • In the thread switching state or in the VM running state : the thread state will be polled until the thread is in the blocked state (the thread will definitely become the four states mentioned above, and any of them will be blocked).

Look at the declaration of the Thread.sleep method again, and it matches the red box in the screenshot of the Safepoint.cpp source code comment above, Thread.sleepwhich is a native method.

The magic of Thread.sleep(0) in RocketMQ

The above code is a piece of code from RocketMQ. The earliest version implemented in 2016 will call once every 1000 times in the for loop. This seems Thread.sleep(0)to be a piece of useless code. The real purpose of the author is to place a safe point here to avoid for Loop running time is too long causing long SWT of the system. From the change record of the code, someone changed the way of writing this code in September 2022: defined the variable type of the for loop as long type, and commented out the internal code of the loop at the same time, why it can be written like this and why it should be written like this Thread.sleep(0)here Press the button first.

(3) Why the child thread cannot enter the safe point

Now we know why the main thread enters the safe point, and where the main thread enters the safe point. According to the known knowledge point, the JVM will place the safe point at the loop jump and method call. Why the child thread does not enter the safe point ?

Countable loops and uncountable loops

In order to avoid the heavy burden brought by too many security points, the JVM has an optimization measure for the loop. It thinks that if the number of loops is small, the execution time should not be too long, so the int type and the data type with a smaller range are used as index values. By default, the loop will not be placed in a safepoint. This kind of cycle is called a countable cycle. Correspondingly, a cycle that uses long or a data type with a larger range as an index value is called an uncountable cycle and will be placed in a safe place.

In the sample code, the data type of the loop index value of the child thread is int, that is, a countable loop, so the JVM does not place a safe point at the loop jump.

Change the data type of the loop index value to long type, and the loop becomes an uncountable loop, and you can successfully place a safety point at the loop jump, preventing the child thread from being unable to enter the safety point for a long time and blocking the main thread.

As can be seen from the above execution results, if the data type of the loop index value is changed to long, the main thread immediately ends the sleep after sleeping for 1s, and does not wait for the execution of the child thread.

At this point, you also know why the RocketMQ code posted above is large. Changing the data type of the loop index value to long can replace the inside of the loop Thread.Sleep(0)to achieve the purpose of placing a safe point.

In fact, you can also -XX:+UseCountedLoopSafepointsturn off JVM's optimization of placing safe points in countable loops through parameters. As can be seen from the following execution results, after adding -XX:+UseCountedLoopSafepointsparameters, the running results can also meet expectations.

there is another doubt

AtomicIntegerLook carefully at the example code, and find that the method of the class is called in the loop body of the sub-thread getAndAdd, and then look deeply at the implementation of the jdk  getAndAddmethod, and find that the bottom layer calls sun.misc.Unsafe#getIntVolatile this method, which Thread.sleepis the same as the method, and is also a native method. Why does it not enter Thread.sleepthe security point like the method ? ?

Yes, it's terrible, it is indeed optimized, and it is optimized by JIT. In order to verify that it is optimized by the JIT, you can use

-Djava.compiler=NONETurn off JIT and look at the running results.

From the running results, after turning off JIT optimization, the main thread does end immediately after sleeping for 1s, but the running time of the sub-thread is much longer than when JIT optimization is turned on. Therefore, JIT can still bring some performance optimization, and sometimes it will bring some strange phenomena.

3

More comprehensive security point definition

Different from the concept of security point in GC when first learning about security point, here is a more comprehensive definition of security point:

Safepoint can be understood as some special positions in the code execution process. When the thread executes to these positions, the thread can be suspended. Some running information of the current thread that is not available in other locations is saved in SafePoint for other threads to read. This information includes: any information about the thread context, such as internal pointers of objects or non-objects, etc. We generally understand SafePoint in this way, that is, only when the thread runs to the position of SafePoint, all its status information is determined, and only at this time can we know which memory is used by the thread and which is not used; and only the thread is in SafePoint Location, at this time, modify the stack information of the JVM, such as reclaiming a certain part of unused memory, the thread will perceive it, and then continue to run, each thread has a snapshot of its own memory usage, at this time other threads use memory If the modification is made, the thread will not know it, and will only perceive it when it reaches SafePoint.

4

When will I enter Safepoint

When VM Thread needs to perform vm operations, the thread will enter the safe point. There are many types of vm operations. You can refer to VM_OP_ENUMthe source code vmOperations.hpp. The following are several situations that often occur when entering Safepoint:

(1) GC: Since the object usage information of each thread is required, some objects are recycled, and some heap memory or direct memory is released, it is necessary to enter Safepoint to stop the world;

(2) Timely entry to Safepoint: Every configured -XX:GuaranteedSafepointInterval time, all threads will enter Safepoint. Once all threads enter, they will immediately resume from Safepoint. This timing is mainly for the execution of some tasks that do not need to stop the world immediately. You can set -XX:GuaranteedSafepointInterval=0this timing to be turned off.

(3) Because of commands such as jstack, jmap and jstat, it will cause Stop the world: all these commands need to collect stack information, so all threads need to enter Safepoint and suspend.

(4) Biased lock cancellation: In most cases, there is no competition for locks (in most cases, a synchronization block will not have multiple threads competing for locks at the same time), so performance can be improved through bias. That is, when there is no competition, when the thread that previously acquired the lock acquires the lock again, it will judge whether the lock is biased towards me, then the thread will not need to acquire the lock again, and can directly enter the synchronization block. However, in the case of high concurrency, the biased lock will often fail, which leads to the need to cancel the biased lock. When canceling the biased lock, it is necessary to stop the world, because it is necessary to obtain the state of the lock used by each thread and the running state.

(5) Agent loading and class redefinition caused by Java Instrument: Since class redefinition is involved, information related to this class on the stack needs to be modified, so Stop the world is required

(6) Java Code Cache related: When JIT compilation optimization or de-optimization occurs, OSR or Bailout or code cache cleaning is required, because it is necessary to read the method of thread execution and change the method of thread execution, it is necessary to stop the world

5

Avoid Safepoint Side Effects

To a certain extent, Safepoint can be understood as designed to stop all user threads (Stop The World). STW is a terrible thing for application systems. JVM is trying to avoid STW and reduce STW time, whether it is in GC or other VM operations.

The main side effect of the safety point is that it may cause the STW time to be too long, and this side effect should be avoided as much as possible.

For the first thread that enters the safe point, STW starts when it enters the safe point. If a certain thread has been unable to enter the safe point, it will cause the time to enter the safe point to be in a waiting state, which will cause the STW time to expire. too long. Therefore, the situation that the thread execution is too long and cannot enter the safe point should be avoided.

The execution time in the countable loop body is too long and JIT optimization makes it impossible to enter the safe point. These are the most common situations where the safe point cannot be entered. When writing a large loop, you can define the data type of the loop index value as long.

In high-concurrency applications, biased locks do not bring about performance improvements. Instead, the cancellation of biased locks brings many unnecessary threads into safe points. So it is recommended to close: -XX:-UseBiasedLocking.

Commands such as jstack, jmap, and jstat also lead to safepoint entry. Therefore, the production environment should turn off the switch of thead dump to avoid the excessively long dump time and the excessively long application STW time.


ChatGPT has been soaring for 160 days, and the world is no longer what it used to be.

A new artificial intelligence Chinese website https://ai.weoknow.com
updates the available chatGPT resources available in China every day

Guess you like

Origin blog.csdn.net/zyqytsoft/article/details/131075899