Tencent Kona JDK11 no-pause memory management ZGC production practice

insert image description here

Source|
author of infoQ|
Planning by Wang Chao| Tian Xiaoxu
Photos| Pexels

Tencent Big Data JVM team is based on Tencent Kona JDK11, which is self-developed by OpenJDK11. At present, the ZGC feature has been incubated and matured, and its performance is better than the version provided by OpenJDK, enabling Java to easily build strong real-time online services with response time at the ms level. Improve the efficiency of R&D and operation and maintenance. Currently, it has been implemented in multiple business scenarios within Tencent, and the business delay SLA has been improved by 2-3 orders of magnitude.

With the official release of Tencent Kona JDK 11.0.10-GA on April 30, 2021, the production-ready ZGC is also officially open-sourced.

background

After more than 20 years of development, the ecosystem of the Java language has grown enormously, with applications ranging from embedded devices to large data centers, forming various business forms. Different businesses have different concerns. For example, some offline applications focus on the throughput rate of the entire system, but do not pay much attention to the pause time of a single process; other applications have strict requirements on the GC pause time, such as the following business forms:

  1. Online service interaction. The UI thread that interacts with the user needs to refresh the screen at a specific frequency. For example, a screen with an ordinary 60HZ refresh rate needs to be refreshed 60 times within 1s when playing an animation to maintain the continuity of the screen. A refresh is completed within 15ms. If the UI thread is suspended due to the GC pause at this time, it will cause the screen to appear tearing, which will eventually lead to a decline in the user experience.
  2. Auction ads. In the bidding advertisement application scenario, such as the advertisement bidding in Real Time Bidding, when an advertisement column requests to play an advertisement, different advertisers have to conduct transaction bidding according to the current user value, usually within a specified time ( Usually 100ms to 200ms) to reach a transaction, otherwise, an opportunity for ad exposure will be missed. At this time, the control of GC pause is very important.
  3. Quantitative trading. When trading opportunities arise, trading institutions need to complete transactions as quickly as possible, and the requirements for real-time performance are even more stringent. If there is a delay due to GC pauses, trading opportunities may be missed, or losses may be incurred.

In order to meet different business needs, Java's GC algorithm is constantly iterating. For a specific application, selecting the most suitable GC algorithm can help businesses achieve their business goals more efficiently. For these latency-sensitive applications, GC pauses have become a stubborn problem that hinders the wide application of Java, and more suitable GC algorithms are needed to meet the needs of these businesses.

In recent years, the performance of servers has become more and more powerful, and the heap memory that can be used by various applications has become larger and larger. The common heap size ranges from 10G to 100G, and some models can even reach the TB level. For heap applications, the pause time of traditional GC, such as CMS and G1, also increases synchronously with the growth of the heap size, that is, when the heap size increases exponentially, the pause time also increases exponentially. Especially when the Full GC is triggered, the pause can reach the minute level. When a business application needs to provide a high service level agreement (SLA), for example, the response time of 99.99% cannot exceed 100ms, then CMS, G1, etc. cannot meet the business needs.

In order to meet the current application requirements for ultra-low pause and high SLA, and to meet the challenges brought by large heaps and ultra-large heaps, with the release of JDK 11 in 2018, A Scalable Low-Latency Garbage Collector - ZGC came into being. As a downstream branch of OpenJDK Hotspot VM, Tencent Kona JDK of Tencent Big Data JVM team is also committed to providing the Production Ready ZGC function on the JDK11 version of LTS to meet the needs of the company's internal customers.

GC stalls

1.1 The origin of the pause

On the Hotspot virtual machine, the GC algorithm is implemented based on Mark Sweep or Mark Compact, which can also be called Tracing GC. For this type of mark scanning GC algorithm, it is necessary to find all live objects through Mark, and then clear dead objects, or copy all live objects to another area to achieve the purpose of clearing memory. Therefore, all Tracing GC requires the following three steps, and the overall process is shown in the following animation:

insert image description here

  • Find all GC Roots sets: This is the starting point of the Tracing GC algorithm. GC Roots are mainly pointers to heap objects stored in key data structures at runtime, such as heap object pointers on thread stacks.
  • Marking process: traverse the entire object graph starting from GC Roots to find all objects in stock. The remaining unmarked objects are dead objects.
  • Cleanup process: Clean up dead objects and release their occupied memory. Of course, the object memory can be released directly when cleaning, or all live objects can be moved to a continuous area, and the original memory space can be released.

It can be seen that the above three steps all require consistent information. For example, GC Roots need to be complete and cannot be arbitrarily modified during scanning; during the marking process, all living objects need to be scanned, and other threads cannot modify the object graph at will; cleanup If the process needs to move the object, it needs to update all the places that point to the object, such as object A points to B, after B is moved, the pointer in A needs to be updated synchronously. Therefore, it is required to take synchronization measures in these three steps, and the simplest synchronization measure is to suspend all Java threads, namely Stop-The-World (STW). During STW, GC threads can safely access various runtimes Data, object graphs, updating object pointers, etc. If you need to reduce the time of STW, you need to move tasks in different stages of GC out of STW and execute concurrently with Java threads. At this time, changes in algorithms and data structures are required to meet the consistency between GC threads and Java threads on the current GC. sex.

Different GC algorithm implementations have different tasks to be completed in the STW stage, resulting in different STW durations of different GC algorithms. When all tasks are completed in the STW stage, this type of GC does not need to preempt CPU with application threads. From the perspective of the application as a whole, the final throughput rate is relatively high, such as Parallel GC; and when the tasks in the STW stage are reduced, It is necessary to add corresponding tasks in the concurrent phase - that is, some GC tasks need to run together with business threads to preempt each other's CPU. Such GCs are divided according to different tasks, and finally achieve a balance between throughput and pause, such as CMS and G1 is committed to trading smaller pauses and higher response with smaller throughput loss; ZGC and ShenandoahGC focus on extreme pauses, doing everything possible to reduce the workload of STW, so as to achieve ms-level pause times.

1.2 Industry Solutions

In order to compete with Apple's iOS system, a major problem that the Google-led Android system needs to solve is the display freeze problem. Through the continuous evolution of the GC algorithm, the Baker Barrier-based Concurrent Copy GC algorithm is implemented, and the pause time is controlled within a few The ms level and the refresh constraint of less than 15ms complement the shortcomings of Java in embedded devices, enabling the loose Java ecosystem to achieve the same smooth system as the strictly controlled Apple ecosystem.

Before the emergence of ZGC and ShenandoahGC, the GC pause of the Hotspot JVM was difficult to smoothly control below 100 milliseconds, which greatly hindered the application of OpenJDK in latency-sensitive scenarios such as the financial industry. However, as a downstream branch of OpenJDK, Azul's Zing virtual machine, with its closed-source C4 GC, has achieved almost "no pause" low latency. Various systems and common desktop applications on the cloud face the dilemma of no low-latency GC available. In order to meet business needs, some important modules are generally rewritten in native languages ​​such as C++. For example, Tencent's advertising system is implemented in C++, or a VM such as Zing is purchased to realize low-latency GC, or the eight immortals cross the sea on CMS and G1, and each shows its magical powers. Use experience to tune parameters to meet basic business needs.

The pause on Hotspot has caused alternative solutions to face various problems such as the high development threshold of C++, cost of funds, and experience in tuning parameters, as well as a lot of management problems, making the pause a nightmare for Java developers, hindering Java’s low-latency requirements. The application of business, this is the pain of stagnation and the pain of developers on the current Hotspot JVM. In order to solve this problem, ZGC adopts a GC algorithm similar to Azul's Zing VM. It started open source incubation from JDK11, until JDK15 completes various functions and becomes an official version that can be truly commercialized, ensuring that the Java pause time will not change with the grows as heap size and business size increase.

JDK11 was released in the second half of 2018 and is the latest Long-Term Support version, while the subsequent LTS version is JDK17, which will be released in the second half of 2021. JDK12 to JDK16 are intermediate development versions and will not provide continuous support like JDK11 and JDK17. updates and fixes. ZGC is an experimental feature on OpenJDK11 and cannot meet the commercial needs of the business. In order to meet the needs of the business in advance, the Tencent JVM team has continuously updated and repaired Tencent Kona JDK11, and has completed the various functions of ZGC. The long-term verification has enabled the ZGC on Tencent Kona JDK11 to reach the commercial level, allowing pause-sensitive business applications to achieve ultra-low GC latency on the LTS version of JDK11.

Introduction to ZGC

ZGC was introduced into Hotspot Runtime by the proposal JEP 333 (https://openjdk.java.net/jeps/333). Its goal is to completely solve the delay problem caused by GC pauses. The overall design goals are:

  • The total pause time of each GC is controlled below 10ms
  • The throughput of the application is reduced by no more than 15% relative to the G1
  • Support large heap and extra large heap (8MB~16TB), and the pause time does not increase with the growth of the heap size

insert image description here

It can be seen from the design goals that ZGC mainly serves the management problems of the current and future heaps, and is committed to obtaining the largest pause advantage with the smallest performance loss. Judging from the test data released by Oracle (see [1]), the throughput rate (max-JOPS) of ZGC on SPECjbb2015 in the above figure is almost the same as that of Parallel GC and G1GC, while the critical-JOPS indicator that reflects the impact of pauses has increased by 20%. %+; in terms of pause time, ZGC will not exceed 10ms, while Parallel GC and G1GC are as high as 100ms+, as shown in the figure below. Therefore, ZGC is especially suitable for large heap tasks that are sensitive to latency.

insert image description here

2.1 ZGC Algorithm Implementation

In order to reduce the pause, it is necessary to reduce the tasks performed in the STW. ZGC mainly promotes the following three aspects:

  • Scan for GC Roots. Move the Roots scan that can be removed outside STW to the concurrent phase. The concurrent move of the Roots scan requires the modification of the Roots data structure to support the simultaneous operation of the GC thread and the Java thread.
  • Handling of Runtime data structures. Many tables are maintained in Runtime to record Meta (class, method, jit code
    , etc.), and Java has a special type of weak reference, namely java.lang.ref.Reference and its subclasses, which require additional processing.
  • Concurrency transformation of object movement. In order to allow moving objects and Java threads to run at the same time, a Read barrier needs to be added to ensure the correctness of each object field read.

ZGC has greatly changed the processing logic of the GC algorithm, but the overall logic is the same as its predecessor GC algorithm, in the form of Mark&Compact. In terms of specific implementation, ZGC implements a low-latency GC algorithm through the following six stages, as shown in the following figure:
insert image description here

  • The first stage is Pause Mark Start: mainly to do some lightweight tasks such as global state setting and global data structure initialization, indicating that the subsequent concurrent stages need to be GC Concurrent Mark.
  • The second stage is Concurrent Mark & ​​Remap: Concurrently transform the GC Roots that take up the most time to support concurrent Roots marking. Concurrent marking of object graphs from GC Roots. The pointer update (Remap) of the previous round of GC is executed in the current stage through Piggyback, thereby reducing the traversal of the object graph.
  • The third stage is Pause Mark End: This stage synchronizes the Concurrent Mark, ends the concurrent marking stage, and sets some global variables at the same time.
  • The fourth stage is Concurrent Prepare: This stage mainly deals with weak references such as java.lang.ref.Reference, and selects the ZGC Region that needs to be Compact.
  • The fifth stage is Pause Relocate Start: This stage is similar to the third stage, mainly global synchronization, setting global variables, and indicating the start of the Relocate stage.
  • The sixth stage is Concurrent Relocate: concurrently moving objects.

Compared with other GCs, ZGC requires three STW stages for global synchronization, but the tasks in each STW are very clear, and the time of the tasks to be completed is positively related to the processing speed of the CPU, so it can achieve ms-level pauses . Compared with G1GC, the difficulty of ZGC lies in how to carry out the concurrent transformation of GC Roots and the concurrent transformation of object movement.

insert image description here

For the concurrent transformation of object relocation, ZGC uses Colored Pointer to implement a lightweight Read Barrier, as shown in the figure above. For a 64-bit system, 4 bits are taken out of the high-order bits to indicate different processing states, and two Mark bits indicate whether the object pointer has been marked. Using two Mark bits can use different Mark bits in different GCs before and after. ;Remapped bit indicates whether the current object pointer has been adjusted to the object pointer after moving; Finalizable bit is mainly used for Finalizable object service, used to indicate whether the object pointer is only marked by the Finalize object, mainly for Mark stage and weak reference processing stage. Through the Colored pointer, in different GC stages, the correct pointer color of the current Runtime is only one color (Marked or Remapped), you can test whether the object pointer is bad color as shown in the figure below, and finally realize on x86 is a test instruction and a jne jump instruction.
insert image description here

Colored Pointer leads to different periods, and the high bits of the object pointer are different. In the following figure, the object pointer is 0x0000000012345678. During the running of the program, it may be perceived by the Java thread in the following three states: Remapped state, Mark1 state, and Mark0 state. In order to make these different states (pointers of different values) point to the same object, ZGC fully utilizes the virtual address and physical address translation of the operating system, so that the virtual address pointers of these three states point to the same physical address, so ZGC's Java heap needs to occupy three addresses in virtual addresses. ZGC uses the memory file to occupy the actual physical memory, and then maps this memory file to the virtual addresses pointed to by Remapped, Mark0 and Mark1. It can be seen that although the Java Heap of ZGC occupies three virtual addresses on the surface, there is only one actual physical address. This is also the reason why the top or ps command of linux sees the RSS memory of the Java process with ZGC enabled triples, but the RSS consumption observed after ZGC is enabled is not the actual physical memory consumption.

insert image description here

2.2 Overhead of ZGC Algorithm

The impact of ZGC on business threads is mainly concentrated in the following five aspects:

  • Read barrier overhead. In a Java program, the number of reads of the object pointer is far more than the number of writes of the object pointer, and the insertion point of the Read Barrier is much more than the insertion point of the Write Barrier, so the Read Barrier of ZGC will have a greater impact on the performance of the program. negative impact.
  • Entry barrier overhead for JIT methods. If the code after the JIT contains a dead java object, then this method should be discarded, so the JIT code needs to use an entry barrier when entering to ensure the validity of itself and the meta information it contains. ZGC generates nmethod entry barriers for each JIT code, which will cause a slight performance penalty for JIT methods.
  • Frame barrier overhead. In order to scan Java stack frames concurrently and reduce the impact of Stack Roots scan on STW time, Hotspot currently uses StackWaterMark for concurrent stack scan. At the same time, in order to reduce the workload of business threads scanning stack frames, Hotspot adopts a single stack frame scanning method, that is, if it exceeds the current stack water mark when returning to the stack, it will fall into the stack mark barrier and repair the caller's java object pointer. See https://openjdk.java.net/jeps/376
  • The overhead caused by the lock structure generated by other Runtime transformations.
  • Most of the GC work in ZGC is placed in the concurrent phase, so the GC thread and the Java business thread in the concurrent phase preempt the CPU, resulting in preemption overhead for the business thread.

It can be seen that in order to reduce the impact of the pause caused by STW, the measures taken by ZGC are extreme concurrent transformation, that is, a slight performance loss is exchanged for the lowest impact of the pause. The latest ZGC implementation pause has reached the ms level, which is lower than the background noise of the Linux kernel, that is, the scheduling overhead and system call overhead, which may also cause an impact of 10ms level. It can be said that ZGC makes the old-fashioned impression that Java cannot serve real-time business. subversion.

ZGC usage and parameter tuning

3.1 Typical application scenarios of ZGC

This is honey, the other is arsenic, and different GC algorithms have their own strengths and weaknesses. The biggest advantage of ZGC is that the guaranteed pause time can be controlled below 10ms, but in order to achieve this high SLA pause time, the price is performance. loss and memory consumption. As can be seen from the previous introduction, in order to reduce the work in STW, many GC tasks have undergone concurrent transformation, and the cost of concurrent transformation is scattered in various operating details. Through the continuous investment of the entire OpenJDK community, the current ZGC performance The performance degradation in the loss scenario has been controlled to a small extent. For performance, different configurations have different effects on performance. For example, in the case of large heaps with sufficient memory, ZGC can exceed G1 by about 5% to 20% in various benchmarks, while in the case of small heaps, then It is about 10% lower than G1; different configurations have different effects on applications, and developers need to make reasonable judgments based on usage scenarios. The current ZGC does not support compressed pointers and generational GC, and its memory footprint is slightly larger than that of G1, which is more obvious in the case of small heaps, and less prominent in the case of large heaps. Therefore, the following two types of applications are strongly recommended to use ZGC to improve business experience:

  • Very large heap application. Under the ultra-large heap (above 100 GB), if a Full GC occurs on the CMS or G1, the pause will be at the minute level, which may cause service terminals. It is strongly recommended to use ZGC.
  • Applications with high SLA requirements. For real-time and soft real-time applications that have P999 time limit requirements for response time, ZGC with low pause is recommended for such applications regardless of heap size.

3.2 ZGC parameter settings

The beauty of ZGC lies not only in its ultra-low STW pause, but also in the simplicity of its parameters, which can be adapted to most production scenarios. Of course, in extreme cases, it is still possible to adjust individual parameters of ZGC, which can be roughly divided into three categories:

Heap size: Xmx. ZGC can meet the service access conditions of high-standard SLA through extremely low latency, but similar to concurrent GC in all programming languages, the latency is trade-off with memory space. When the allocation rate is too high and exceeds the recovery rate, resulting in insufficient heap memory, an Allocation Stall will be triggered. This type of Stall will slow down the current user thread. Therefore, when we see the Allocation Stall in the GC log, we can usually think that the heap space is too small or the number of concurrent gc threads is too small.

GC trigger timing: ZAllocationSpikeTolerance, ZCollectionInterval. ZAllocationSpikeTolerance is used to estimate the current heap memory allocation rate. In the current remaining heap memory, the larger the ZAllocationSpikeTolerance, the faster the estimated time to reach OOM, and the earlier ZGC will trigger GC. ZCollectionInterval is used to specify the interval at which GC occurs, in seconds to trigger GC.

GC threads: ParallelGCThreads, ConcGCThreads. ParallelGCThreads is the number of GC threads for setting STW tasks, the default is 60% of the number of CPUs; ConcGCThreads is the number of GC threads in the concurrent phase, and the default is 12.5% ​​of the number of CPUs. Increasing the number of GC threads can speed up GC completion tasks and reduce the time of each stage, but it will also increase the CPU preemption overhead, which can be adjusted according to production conditions.

It can be seen from the above that the parameters that ZGC needs to adjust are very simple. Usually, setting Xmx can meet the needs of the business, which greatly reduces the burden of Java developers. The current parameters for enabling ZGC on Tencent Kona JDK11 are: "-XX:+UnlockExperimentalVMOptions -XX:+UseZGC".

ZGC Production Notes

4.1 Abnormal phenomenon of RSS memory

As can be seen from the previous ZGC principle, ZGC uses the multi-mapping method to realize that three copies of virtual memory point to the same physical memory. The algorithm of Linux statistics process RSS memory occupancy is relatively fragile. This multi-mapping method is not considered completely. Therefore, according to the current Linux using large pages and small pages, the statistical memory performance of the Java process with ZGC enabled is different. of. On the Linux version where the kernel uses small pages, the same three-mapped physical memory will be counted three times by the Linux RSS occupancy algorithm, so it can usually be seen that the RSS memory of the Java process using ZGC is expanded by about three times, but The actual occupation is only one-third of the statistical data, which will cause certain troubles to operation and maintenance or other businesses. On the Linux version where the kernel uses huge pages, this part of the three-mapped physical memory will be counted on the hugetlbfs inode instead of the current Java process.

4.2 Shared Memory Tuning

ZGC needs to create a memory file in share memory to occupy the actual physical memory, so when the Java heap size to be used is larger than the size of /dev/shm, the size of /dev/shm needs to be adjusted. Generally speaking, the command is as follows (the following is to adjust /dev/shm to 64G):

vi /etc/fstabtmpfs /dev/shm tmpfs defaults,size=65536M 0 0

First, modify the size of the shm configuration in fstab, and modify the value of size according to the requirements, and then perform the mount and umount of the shm.

umount /dev/shmmount /dev/shm

4.3 mmap node cap adjustment

The heap application of ZGC is different from that of traditional GC, and it needs more memory mapping, that is, each ZPage needs mmap mapping three times, so the number of mmap occupied by only Java Heap in the system is (Xmx / zpage_size) * 3 , the size of zpage_size is 2M by default.

To make room for the number of mmap maps in native modules such as JNI, the number of memory maps should be adjusted to (Xmx / zpage_size)3*1.2.

The default number of system memory mappings is specified by the file /proc/sys/vm/max_map_count, usually 65536. When configuring a large heap for the JVM, you need to adjust the configuration of the file to make it larger than (Xmx / zpage_size)3 *1.2.

ZGC's large-scale production practice in Tencent

At present, the ZGC of Tencent Kona JDK11 has been running stably for a long time in Tencent advertising big data scenarios, Tencent Cloud VPC, WAF and other business scenarios, and has assisted the business to achieve excellent performance.

5.1 Support massive data query of advertisements

Hermes is a big data real-time analysis system developed by Tencent. It has the characteristics of real-time access and storage of massive data, low-latency query analysis, multi-dimensional analysis of thousands of dimensions, and massive log access and query analysis with a daily increment of trillions. . In the real-time OLAP analysis business of the advertising business, 99% of the SQL queries on the Hermes system are required to have an end-to-end delay of no more than 3s, while only 98.1% of the SQL queries with the default configuration of the G1 GC have an end-to-end delay of no more than 3s. By switching Kona 11 ZGC, the SQL end-to-end latency satisfaction rate increased to 99.5%, and the latency caused by GC in a single SQL query did not exceed 20ms.

5.2 Very large heap support

The Tencent VPC team provides network control services for Tencent Cloud. This service mainly stores information such as network configuration used for resource communication on the cloud, and provides services such as query, modification, and delivery of configuration information. The business requires storing as much configuration information as possible on a machine with 512G memory (maximum support for 800M monitoring number), and ensuring that the read and write delay does not exceed 1s in stress scenarios. Using G1GC will cause a large number of delays of more than 10s at high frequency. Through the cooperation of the Tencent Big Data JVM team to switch ZGC, and solve the problems encountered by ZGC in the business, such as Mark Stack Overflow, slow entry into Safepoint, and ZGC Mark suspended animation, the business can finally be In stress scenarios, the expected business storage capacity is increased by 12.5%, while the read and write latency does not exceed 50ms.

5.3 Help improve SLA

The Tencent WAF team uses Java to quickly implement product function iteration and launch. Among them, the bypass security service is an Http service based on the Netty framework. This service has strict latency requirements and needs to reach 99.99% end-to-end request latency less than 80ms SLA target. Therefore, the STW of the GC has a certain negative impact on this service, and the "world pause" time needs to be further reduced. Before using ZGC, the WAF team used G1GC, and spent a lot of time in the early stage to debug options for G1 GC, and made code-level modifications. However, due to the insufficiency of G1GC itself, there is still a request jitter delay, and the established SLA target cannot be achieved. With the cooperation of the Tencent Big Data JVM team, after switching to ZGC, the P9999 request latency of this business is less than 80ms, providing users with faster and more stable services.

community feedback

While supporting the business switching ZGC, the Tencent Big Data JVM team will actively report and give feedback to the community on related problems and fixes, striving to be a good citizen of the OpenJDK community.

6.1 Problems with the combined use of ZGC and VectorAPI

In an advertising business, VectorAPI was launched to improve machine learning efficiency, and ZGC was enabled to meet service SLA. Unexpected results occurred during business operation, and there were similar error reports in the community. By analyzing the assembly code produced by JIT, it is found that there is a lack of load barrier. After analysis, the Vector node needs to be Unboxed in the Vector optimization stage of C2, a new load node will be generated in this optimization stage, and the impact of the GC Barrier on the load operation is not considered, that is, the ZGC needs to generate a load barrier for the load operation, thus As a result, the newly generated load node lacks load barrier information, and finally fails to generate relevant barrier instructions. By adding a GC barrier processing flow to the newly produced load operation in the Vector optimization phase, the load node with gc barrier information can be generated in this phase, so that the corresponding correct barrier code can be generated under different GC options. The fix was merged into JDK16 with P2 priority after it was contributed to the community.
insert image description here

6.2 ZGC Mark Stack Overflow 问题

When a Tencent Cloud business performed grayscale on ZGC, the JVM process crashed. The relevant logs showed that the Mark Stack used in the ZGC marking phase exceeded the preset 8G memory, and the Mark Stack usage under normal circumstances would not exceed 32M. . After the full cooperation of the business, a reproducible scenario was obtained for in-depth analysis, and it was found that there are two defects in the use of Mark Stack by ZGC in this scenario: First, a large number of Mark Stacks are stuffed into the global queue before they are fully used. Causes a single Stack memory fragmentation problem; second, a large number of objects are pushed into the Mark Stack for many times, resulting in a high repetition rate of Entry in the Stack, wasting Stack space. The Tencent Big Data JVM team made a quick fix. After verifying that the Mark Stack Overflow problem can be solved, the problem and the fix were reported to the OpenJDK community. The community gave a more elegant fix based on the patch submitted, and the Tencent Big Data JVM team was used as the The co-author jointly submits the code into the library. At present, the fixes for the two problems have been entered into the library JDK17.

insert image description here

6.3 ZGC Mark suspended animation

When analyzing the performance of ZGC in business, you need to turn on the gc debug log option. Soon after startup, the process gets stuck. The analysis found that most of the gc worker threads are in the "Concurrent Mark Try Terminate" stage, and are waiting for the log file read-write lock in the "Concurrent Mark Idle" stage, and another gc worker thread is in the process of writing the log, which can be analyzed. Since the gc worker threads are all grabbing the log file lock, the gc worker threads eventually form a dynamic deadlock state, that is, all the gc worker threads are in an infinite loop of "waiting for locks -> taking locks -> releasing locks". This suspended animation is caused by the Concurrent Mark exit mechanism of ZGC. In the exit mechanism, all gc worker threads will wait for 1ms for status synchronization, and after the waiting is over, relevant log printing will be performed. This printing requires the aforementioned log file lock. This leads to the phenomenon of dynamic suspended animation. The Tencent Big Data JVM team quickly fixed the problem and submitted it to the community. At present, the contribution has been incorporated into JDK17.
insert image description here

Tencent Kona JDK open source

The latest version of Tencent Kona JDK from Tencent Big Data JVM team has been officially released. You can use Tencent Kona JDK 11.0.10-GA to enjoy the benefits of ZGC.

Tencent Kona JDK 8.0.5-GA updates OpenJDK 8u282ga synchronously

https://github.com/Tencent/TencentKona-8

Tencent Kona JDK 11.0.10-GA updates OpenJDK 11.0.10-ga synchronously

https://github.com/Tencent/TencentKona-11
insert image description here

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324135866&siteId=291194637
ZGC