JDK 17 marketing first experience - sub-millisecond pause ZGC landing practice | JD Cloud technical team

foreword

Since its release in 2014, JDK 8 has been a fairly popular JDK version. The reason is that a major upgrade has been made to the underlying data structure, JVM performance, and development experience, which has been recognized by developers. But 9 years have passed since the release of JDK 8, so what upgrades has JDK made in these 9 years? Are there new big features worth trying? Can it solve some of the problems we are struggling with now? With this question in mind, we conducted research and attempts on the JDK version.

New features at a glance

Today's JDK release rhythm is getting faster. Every time a new version is released, we will sigh: I am still using JDK 8, and now I have JDK 9, 10, 11... 21? Then I will take a look at what new features are added. There are some new features that are very fragrant, but after thinking about it, I decided to give up the upgrade. The main reason is that apart from the new features that have not changed much for us, the most important thing is the modularity (JEP 200) brought by JDK 9, which makes it very difficult for us to upgrade.

The idea of ​​modularization is to divide the JDK into a set of modules that can be combined into various configurations at compile time, build time, and runtime. The main goal is to make the implementation easier to expand to small devices, improve security and maintainability , and improve application performance. But the price paid is very high. The most intuitive impact is that some JDK internal classes cannot be accessed.

But other than that, there are not too many problems blocking upgrades, and subsequent versions have some very fragrant features:

  • G1 (JEP 248, JEP 307, JEP 344, JEP 345, JEP 346), providing a high-performance garbage collector that supports specified pause times, NUMA-aware memory allocation
  • ZGC (JEP 333, JEP 376, JEP 377), a NUMA-enabled garbage collector with pause times that should not exceed 1ms
  • Concurrent API update (JEP 266), providing a publish-subscribe framework, supporting the interface of the responsive stream publishing-subscribing framework, and further improving CompletableFuture
  • Collection factory method (JEP 269), similar to Guava, supports quick creation of collections with initial elements
  • A new version of the HTTP client (JEP 321), a modern JDK built-in API that supports asynchronous, WebSocket, reactive streams
  • The null pointer NPE directly gives the location of the abnormal method (JEP 358). In the past, only the number of lines of code was given, and no method was told. When writing multiple methods in one line, once a null pointer appears, it all depends on the programmer's contextual analysis and reasoning.
  • The pattern matching of instanceof (JEP 394), no need to force transfer after judging the type
  • Data record class (JEP 395), a standard value aggregation class, helps programmers focus on modeling immutable data and realize data-driven
  • Switch expression syntax improvement (JEP 361), change the status quo that Switch is stinky, long and error-prone
  • Text blocks (JEP 378), support two-dimensional text blocks, instead of splicing by + sign like now
  • Sealed classes (JEP 409), which provide a syntax that restricts extension, the superclass should be widely accessible (because it represents an important abstraction to users), but not widely extensible (because its subclasses should be limited to those known to the author subclass of
  • And some unmentioned underlying data structure optimizations, JVM performance improvements...

So many advantages can just solve some of the problems we are currently encountering, so we decided to upgrade the JDK.

upgrade

Upgrade Application Assessment

First of all, it is natural to consider which applications to upgrade. We filter applications based on the following criteria:

  1. First and most importantly, this system can be upgraded to solve existing problems and bottlenecks
  2. Second, there is a complete mechanism for rapid regression and verification, such as complete unit testing, automated test coverage, convenient production pressure testing capabilities, etc. The upgrade of the underlying layer must be fully verified
  3. Third, the technical debt must be small, so as not to encounter some technical debt that must be resolved during the upgrade process, making the upgrade more difficult
  4. Fourth, the person responsible for the upgrade is very familiar with the system. In addition to the core business logic, he can also understand which middleware and dependencies are introduced, which functions of the middleware are used, and whether a large number of incompatible changes have been made after the middleware is upgraded. impact on existing systems

In the end, we selected an application that displayed couponless payment marketing on the settlement page and cash register for upgrade. The application features are as follows:

  • As one of the core link applications, the interface response time is very demanding, and GC is one of the bottlenecks of time-consuming jitter
  • The business is undergoing rapid iterative development. With the implementation of the strategy of cost reduction and efficiency increase, the marketing strategy has been further refined, and the types, quantities, and scope of marketing have been further increased, which has brought greater challenges to system performance.
  • The daily traffic is not low, and there is a burst of traffic at the whole point, and it is necessary to undertake a large promotional traffic
  • The core link covers unit tests, and the test environment has automatic regression capabilities. Pre-release and production support normalized stress testing and playback of production traffic
  • For non-web applications, only the basic functions of each middleware are used, and there are few incompatibility problems when upgrading
  • Maintained for 3 years, has undergone multiple refactorings, has fewer historical issues, and has almost no technical debt

Based on the above characteristics, this application is very suitable for JDK 17 upgrade. This application is based on JDK 8, SpringBoot 2.0.8. In addition to common external basic components, it also uses the following internal company middleware: UMP, SGM, DUCC, CDS, JMQ, JSF, R2M.

upgrade effect

You can first look at the effect of our stress test after the upgrade:

Pure calculation code is no longer affected by GC

System monitoring

before upgrade

G1 performance

after upgrade

ZGC performance

Version throughput Average time spent Maximum time-consuming
JDK 8 G1 99.966% 35.7ms 120ms
JDK 17 ZGC 99.999% 0.0254ms 0.106ms

After the upgrade, the throughput is almost unaffected (even increased by 0.01% ), the average GC time consumption is reduced by 1405 times , and the maximum GC time consumption is reduced by 1132 times

upgrade steps

Upgrade the compiled version of JDK

First of all, it is natural to modify the JDK version specified in maven, you can upgrade to JDK 11 first, and modify the maven compilation plugin at the same time

<java.version>11</java.version>
<maven-compiler-plugin.version>3.8.1</maven-compiler-plugin.version>  
<maven-source-plugin.version>3.2.1</maven-source-plugin.version>  
<maven-javadoc-plugin.version>3.3.2</maven-javadoc-plugin.version>  
<maven-surefire-plugin.version>2.22.2</maven-surefire-plugin.version>

<plugin>  
    <groupId>org.apache.maven.plugins</groupId>  
    <artifactId>maven-compiler-plugin</artifactId>  
    <version>${maven-compiler-plugin.version}</version>  
    <configuration>        
        <release>${java.version}</release>  
        <encoding>${project.build.sourceEncoding}</encoding>  
    </configuration>
</plugin>



Introduce missing dependencies

Then you can compile locally. At this time, some very simple problems will be exposed, such as not being able to find packages, classes, and so on. The reason is that JDK 11 removed the Java EE and CORBA modules, which need to be imported manually.

<!-- JAVAX -->  
<dependency>  
    <groupId>javax.annotation</groupId>  
    <artifactId>javax.annotation-api</artifactId>  
    <version>1.3.1</version>  
</dependency>  
<dependency>  
    <groupId>javax.xml.bind</groupId>  
    <artifactId>jaxb-api</artifactId>  
    <version>2.3.0</version>  
</dependency>  
<dependency>  
    <groupId>com.sun.xml.bind</groupId>  
    <artifactId>jaxb-impl</artifactId>  
    <version>2.3.0</version>  
</dependency>  
<dependency>  
    <groupId>com.sun.xml.bind</groupId>  
    <artifactId>jaxb-core</artifactId>  
    <version>2.3.0</version>  
</dependency>  
<dependency>  
    <groupId>javax.activation</groupId>  
    <artifactId>activation</artifactId>  
    <version>1.0.2</version>  
</dependency>



Upgrade external middleware

After solving the problem that the class cannot be found during compilation, it is time to upgrade the dependent external middleware. For our application, that is to upgrade the version of SpringBoot. The version that supports JDK 17 is Spring 5.3, corresponding to SpringBoot 2.5.

Here I suggest upgrading to SpringBoot 2.7. There is almost no need to change when upgrading from 2.5 to 2.7. At the same time, the dependencies agreed by the higher version of SpringBoot have better support for JDK 17.

It is recommended to upgrade major versions one by one, for example, we upgrade from 2.0 to 2.1. Every time a version is upgraded, it is necessary to carefully observe the changes in the dependent version and grasp the status of each dependent upgrade. The upgrade of SpringBoot actually means a major version upgrade of all the agreed versions of open source components, the interface is deprecated, and there are many destructive compatibility updates, which need to be identified one by one.

Let's take upgrading Spring Boot 2.1 as an example to illustrate our upgrade steps:

  1. First read what configuration changes Spring Boot 2.1 has made that are relevant to us

  2. The same bean override is disabled, and it needs to be specified spring.main.allow-bean-definition-overridingastrue

  3. Then read Spring Boot 2.1 upgrades which dependencies we use

    1. Spring upgrade to 5.1

      1. First read what configuration changes Spring 5.1 has made that are relevant to us

        1. no effect
      2. Then read Spring 5.1 upgrades which dependencies we use

        1. ASM 7.0

          1. In the same way, read the upgrade impact (the underlying dependencies of this underlying dependency, if only ASM is in use, you don't need to care)
        2. CGLIB 3.2

          1. In the same way, read the upgrade impact (the underlying dependencies of this underlying dependency, if only ASM is in use, you don't need to care)
      3. Finally, read what configuration and dependencies related to us are deprecated in Spring 5.1

        1. no effect
    2. Lombok upgrade to 1.18

      1. Read the impact of the changes, 1.18 Lombok will no longer generate private no-argument constructors by default.  It can be enabled by  lombok.config setting in the configuration file lombok.noArgsConstructor.extraPrivate=true
    3. Hibernate upgrade to 5.3

      1. Read the impact of changes, no impact on our project
    4. JUnit upgrade to 5.2

      1. To read the impact of changes, you need to upgrade the Surefire plugin to  2.21.0and above
  4. Finally, read Spring Boot 2.1, which configurations and dependencies related to us are deprecated

So far, Spring Boot 2.1 has been upgraded. Next, analyze the change of the dependency tree and compare it with the dependency tree before the upgrade to check whether the range of dependency changes is known and controllable. After completion, upgrade to Spring Boot 2.2.

The following upgrade items we need to pay attention to are for reference only:

  • You can upgrade to JDK 11 first, and verify while starting. But don't use ZGC in JDK 11, the ratio of ZGC's heap reservation to available heap is too large, which sometimes leads to OOM

  • There is the same Bean in the code, and Springboot 2.0 will automatically overwrite it when it starts. If the higher version enables overwriting, it needs to be specified spring.main.allow-bean-definition-overridingastrue

  • Spring Boot 2.2's default unit test Junit is upgraded to 5, and the unit test of Junit 4 is recommended to be upgraded, with little change

  • Spring Boot 2.4 no longer supports unit testing of Junit 4, and the Vintage engine can be manually introduced if necessary

  • Spring Boot 2.4 configuration file processing logic changes, pay attention to read the update log

  • Spring Boot 2.6 disables Bean circular dependencies by default, which can be  enabled by spring.main.allow-circular-references setting true

  • Spring Boot 2.7 automatic configuration registration file changes, spring.factoriesthe content needs to be moved to META-INF/spring/org.springframework.boot.autoconfigure.AutoConfiguration.importsthe file

  • spring-boot-properties-migratorDeprecated properties are identified and can be considered using

  • Spring Framework 5.2 requires Jackson 2.9.7+, please read the update log

  • Spring Framework 5.2 Annotation retrieval algorithm refactoring, all custom annotations must be  @Retention(RetentionPolicy.RUNTIME) annotated with so that Spring can find them

  • Spring Framework 5.3 has modified many things, but none of them have anything to do with our application, please pay attention to the update log

  • ASM only uses the unit test Mock, no special attention is required, just do a good job of JUnit upgrade compatibility

  • CGLIB major version upgrade is mainly compatible with the bytecode version, just pay attention to the change log

  • Even if Lombok is a small version upgrade, there will be destructive updates. You need to read the update log of each version carefully. It is recommended to use Lombok less

  • Hibernate doesn't have much breaking updates, just keep an eye on the changelog

  • The JUnit upgrade mainly focuses on major version changes, such as 4 to 5. The minor version does not have a particularly large destructive update, and it is a dependency used by unit tests. You can upgrade with confidence or not

  • Jackson 2.11,  the java.util.Date default  java.util.Calendar format has been changed, please check the update log for compatibility

  • Pay attention to the upgrade of bytecode enhancement related dependencies

  • Watch out for local cache upgrades

  • Pay attention to Netty upgrade, pay attention to the update log

Upgrade internal middleware

Internal middleware upgrades are relatively simple, mainly focusing on JMQ and JSF versions. Among them, Netty and Javassist, which JSF depends on, need to be upgraded, and the lower version of Netty will cause memory leaks.

Dependency versions we use

For your reference, our upgraded dependency version

<properties>  
    <!-- 基础组件版本 Start -->    
    <java.version>17</java.version>  
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>  
    <maven-compiler-plugin.version>3.11.0</maven-compiler-plugin.version>  
    <maven-surefire-plugin.version>2.22.2</maven-surefire-plugin.version>  
    <jacoco-maven-plugin-version>0.8.10</jacoco-maven-plugin-version>  
    <maven-assembly-plugin-version>2.4.1</maven-assembly-plugin-version>  
    <maven-dependency-plugin-version>3.1.0</maven-dependency-plugin-version>  
    <profiles.dir>src/main/profiles</profiles.dir>  
    <springboot-version>2.7.13</springboot-version>  
    <log4j2.version>2.18.0-jdsec.rc2</log4j2.version>  
    <hibernate-validator.version>5.2.4.Final</hibernate-validator.version>  
    <collections-version>3.2.2</collections-version>  
    <collections4.version>4.4</collections4.version>  
    <netty.old.version>3.9.0.Final</netty.old.version>  
    <netty.version>4.1.36.Final</netty.version>  
    <javassist-version>3.29.2-GA</javassist-version>  
    <guava.version>23.0</guava.version>  
    <mysql-connector-java.version>5.1.29</mysql-connector-java.version>  
    <jmh-version>1.36</jmh-version>  
    <caffeine-version>3.1.6</caffeine-version>  
    <fastjson-version>1.2.83-jdsec.rc1</fastjson-version>  
    <fastjson2-version>2.0.35</fastjson2-version>  
    <roaringBitmap.version>0.9.44</roaringBitmap.version>  
    <disruptor.version>3.4.4</disruptor.version>  
    <jaxb-impl.version>2.3.8</jaxb-impl.version>  
    <jaxb-core.version>2.3.0.1</jaxb-core.version>  
    <activation.version>1.1.1</activation.version>  
    <!-- 基础组件版本 End -->  

    <!-- 京东中间件版本 Start -->    
    <ump-version>20221231.1</ump-version>  
    <ducc.version>1.0.20</ducc.version>  
    <jdcds-driver-alg-version>2.21.1</jdcds-driver-alg-version>  
    <jdcds-driver-version>3.8.3</jdcds-driver-version>  
    <jmq.version>2.3.3-RC2</jmq.version>  
    <jsf.version>1.7.6-HOTFIX-T2</jsf.version>  
    <r2m.version>3.3.4</r2m.version>  
    <!-- 京东中间件版本 End -->  
    </properties>



JVM startup parameter upgrade

The remote DEBUG parameters have changed:

JAVA_DEBUG_OPTS=" -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:8000 "



To print the changes of GC log parameters, we opened the log in the pre-release environment for observation:

JAVA_GC_LOG_OPTS=" -Xlog:gc*:file=/export/logs/gc.log:time,tid,tags:filecount=10:filesize=10m "



Some JVM parameters of ZGC are used:

JAVA_MEM_OPTS=" -server -Xmx12g -Xms12g -XX:MaxMetaspaceSize=256m -XX:MetaspaceSize=256m -XX:MaxDirectMemorySize=2048m -XX:+UseZGC -XX:ZAllocationSpikeTolerance=3 -XX:ParallelGCThreads=8 -XX:CICompilerCount=3 -XX:-RestrictContended -XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/export/logs "



Internal dependencies need access to JDK modules, such as UMP, JSF, Wormhole, MyBatis, DUCC, R2M, SGM:

if [[ "$JAVA_VERSION" -ge 11 ]]; then  
  SGM_OPTS="${SGM_OPTS} --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens java.management/java.lang.management=ALL-UNNAMED "  UMP_OPT=" --add-opens java.base/sun.net.util=ALL-UNNAMED " 
  JSF_OPTS=" --add-opens java.base/sun.util.calendar=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.math=ALL-UNNAMED"  
  WORMHOLE_OPT=" --add-opens java.base/sun.security.action=ALL-UNNAMED "  
  MB_OPTS=" --add-opens java.base/java.lang=ALL-UNNAMED "  
  DUC_OPT=" --add-opens java.base/java.net=ALL-UNNAMED "  
  R2M_OPT=" --add-opens java.base/java.time=ALL-UNNAMED "  
fi



After startup, the complete startup parameters are as follows:

-javaagent:/export/package/sgm-probe-java/sgm-probe-5.9.5-product/sgm-agent-5.9.5.jar -Dsgm.server.address=http://sgm.jdfin.local -Dsgm.app.name=market-reduction-center -Dsgm.agent.sink.http.connection.requestTimeout=2000 -Dsgm.agent.sink.http.connection.connectTimeout=2000 -Dsgm.agent.sink.http.minAlive=1 -Dsgm.agent.virgo.address=10.24.216.198:8999,10.223.182.52:8999,10.25.217.95:8999 -Dsgm.agent.zone=m6 -Dsgm.agent.group=m6-discount -Dsgm.agent.tenant=jdjr -Dsgm.deployment.platform=jdt-jdos --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED --add-opens=java.management/sun.management=ALL-UNNAMED --add-opens=java.management/java.lang.management=ALL-UNNAMED -DJDOS_DATACENTER=JXQ -Ddeploy.app.name=jdos_kj_market-reduction-center -Ddeploy.app.id=30005051 -Ddeploy.instance.id=0 -Ddeploy.instance.name=server -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -Djava.util.Arrays.useLegacyMergeSort=true -Dog4j2.contextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector -Dlog4j2.AsyncQueueFullPolicy=Discard -Xmx12g -Xms12g -XX:MaxMetaspaceSize=256m -XX:MetaspaceSize=256m -XX:MaxDirectMemorySize=2048m -XX:+UseZGC -XX:ZAllocationSpikeTolerance=3 -XX:ParallelGCThreads=8 -XX:CICompilerCount=3 -XX:-RestrictContended -XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/export/logs --add-opens=java.base/sun.net.util=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.math=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.time=ALL-UNNAMED -Dloader.path=/export/package/jdos_kj_market-reduction-center/conf



system verification

After the system can be successfully started, functional verification can be performed. There are several verification points and methods:

  • First of all, the system can be quickly returned through unit testing to avoid business exceptions caused by JDK API and middleware API changes

  • Deploy to the test environment to verify whether each middleware is normal, such as DUCC switch delivery, MQ transceiver, JSF interface call, etc. All middleware used in the system need to be verified one by one

  • Then you can start to verify the core business. At this time, you can use the test automation capabilities of the test students and add manual supplementary scenarios to quickly return to the core business. Among them, R&D needs to observe all abnormal logs when the system is called, including warnings, and clarify the cause of each log

  • After the verification is completed, it can be deployed to the joint debugging environment, and the request of external colleagues for joint debugging can be used for further verification

  • After fully observing the test environment, deploy to the pre-release environment, use the requests from external colleagues for further verification, and conduct normalized pressure testing to verify the optimization effect and bottlenecks

  • After a long period of pre-release verification, if there is no problem, deploy a production, and further verify by playing back the production traffic

  • After the replay traffic is normal, start to accept the production traffic, and observe it for several weeks according to the interface volume.

  • Gradually cut the amount until the full amount is online

GC tuning

Introduction to ZGC

GC positioning

ZGC target

ZGC GC process

As shown in the figure, ZGC is positioned as a garbage collector with a maximum pause time of less than 1ms, capable of handling heaps ranging in size from 8MB to 16TB, and easy to tune. ZGC has only three STW stages. There are a lot of similar articles on the Internet about the specific process, so I won’t introduce them in detail here.

Optimization direction

At present, our application uses G1 with a GC pause time of about 30ms, which is triggered once in less than 1 minute. The frequency is higher during big promotions, and the pause time is longer, resulting in large fluctuations in interface performance. With the development of the business, in order to optimize the system, we have applied a large number of local caches, resulting in more surviving objects. ZGC pause time does not increase with heap, active set or root set size, and extremely low GC time is exactly the characteristic we need, so we decided to use ZGC.

As a modern GC, ZGC does not need to be optimized too much. The default configuration can solve 99.9% of the scenarios. However, our application will accept heavy traffic. According to observations, when the instantaneous traffic surges, the timing of GC is late. Therefore, dealing with burst traffic is one of our ZGC tuning goals, and no adjustments are made to other attributes.

Optimization measures

One of ZGC's optimization measures is a large enough heap. Generally speaking, the more memory for ZGC, the better, but we don't need to waste it. We can observe the GC logs through pressure testing and get an appropriate value. We just guarantee that:

  1. The heap can hold real-time garbage generated by the application

  2. There is enough space in the heap to make room for new garbage allocations when the GC runs

Therefore, we upgraded the machine to 8C 16G configuration, observed the GC log and adjusted the memory usage configuration according to the application situation, and finally set it to -Xmx12g -Xms12g -XX:MaxMetaspaceSize=256m -XX:MetaspaceSize=256m -XX:MaxDirectMemorySize=2048mimprove the effect of ZGC.

The remaining other optimization measures depend on the situation. You can adjust the timing of triggering GC, or you can change to trigger GC based on a fixed time interval.

We slightly increased the trigger timing, -XX:ZAllocationSpikeTolerance=3(default is 2) to deal with burst traffic.

CICompilerCount ParallelGCThreadsOne is to increase the JIT compilation speed, and the other is to increase the number of threads used in the parallel stage of the garbage collector slightly according to the actual situation, sacrificing a little CPU usage to improve efficiency.

It can also be turned on Large Pagesto further improve performance. We did not do this step, because the current deployment method is a physical machine Docker mixed deployment. Opening needs to modify the kernel and affect other images of the host.

Summarize

At this point, the tuning is complete. We have been running online for more than a month now, and we have three normal stress tests every week, and everything is normal.

The above upgrade experience is shared with you, and I hope it will be helpful to you.

Author: JD Technology Zhang Tianci

Source: JD Cloud Developer Community

Guess you like

Origin blog.csdn.net/JDDTechTalk/article/details/132273246