Strong combination, performance improved several times! Alibaba Dragonwell11+VectorAPI Helps Java High Performance New Era

01  background

Alibaba Dragonwell, as the downstream version of OpenJDK, is Alibaba's OpenJDK implementation optimized for online e-commerce, finance, and logistics applications running on 100,000+ servers. Alibaba and the OpenJDK community work closely together to bring as many Alibaba Dragonwell custom features upstream as possible. Dragonwell is the default JDK of OpenAnolis, and many Java applications of Alibaba are gradually migrating to Dragonwell.

The Intel Java team has been committed to the development of OpenJDK for a long time, contributing a large number of optimization features, which enable OpenJDK to better utilize the powerful features of modern CPUs. VectorAPI is a powerful tool for Java to perform high-performance computing in specific fields . Through VectorAPI, Java developers can precisely control and utilize the SIMD (Single Instruction Multiple Data) hardware units commonly found in modern CPUs, and specific applications can obtain several times more performance. promote.

02  Current status of Java high-performance computing

The SIMD computing unit has evolved from the MMX (Multi Media eXtension) era to the current AVX-512 (Advanced Vector Extensions)/AMX (Advanced Matrix Extensions) era on the X86 platform (SIMD is common on non-X86 platforms). Processing, game entertainment, big data processing and most recently AI have all played a vital role . Developers can use C/C++ (intrinsic instructions), assembly language and other language tools to explicitly write SIMD-related code (called vectorization coding) to utilize SIMD units; they can also use the automatic vectorization function of the compiler/interpreter to proceed. JVM (Java Virtual Machine) also has the ability of automatic vectorization.

When using the automatic vectorization function of the compiler/interpreter, although the burden on the developer is relatively small, because these tasks depend on the implementation of the compiler/interpreter, the expected purpose cannot be achieved in many cases. However, what developers can do is very limited, and the performance of SIMD hardware is often not fully utilized; while programming directly for SIMD hardware, older versions of Java can only use C/C++ or assembly language through JNI (Java Native Interface). Implemented library to achieve. The introduction of JNI will bring additional performance overhead that cannot be ignored; at the same time, the mixed programming mode will also increase the complexity of system management and maintenance. The emergence of Java VectorAPI (Vector API) has given Java developers the ability to program directly for SIMD hardware. Ways to use SIMD hardware:

automatic vectorization call native language library Program directly to SIMD cells
C/C++, assembly and other local languages yes yes Yes (Intrinsic/Assembly)
Java yes Yes (JNI) Yes (VectorAPI)

03  Introduction to VectorAPI

VectorAPI (Incubator, JEP 338, JEP: JDK Enhancement Proposals) was originally created in April 2018, and was introduced as an incubator project in OpenJDK16 (October 2020). With the subsequent upgrade of OpenJDK version, VectorAPI has also been upgraded synchronously:

  • OpenJDK 17 -> JEP 414,Second Incubator
  • OpenJDK 18 -> JEP 417,Third Incubator
  • OpenJDK 19 -> JEP 426,Fourth Incubator
  • OpenJDK 20 -> JEP 438,Fifth Incubator

Every VectorAPI upgrade will bring performance improvements, more features and Bugfixes. The programming of VectorAPI uses pure Java code, let's look at a simple example:

// 传统写法实现2个数组相加
void add (float[] A, float[] B, float[] C) { 
    for (int i = 0; i < C.length; i++) { 
        C[i] = A[i] + B[i]; 
    } 
} 
 
// 使用VectorAPI的2个数组相加
public class AddClass<S extends Vector.Shape<Vector<?, ?>>> { 
    private final FloatVector.FloatSpecies<S> spec; 
    AddClass (FloatVector.FloatSpecies<S> v) {spec = v; } 
 
    // vector routine for add  
    void add (float[] A, float[] B, float[] C) { 
        int i=0; 
        for (; i + spec.length() < C.length; i += spec.length()) { 
            FloatVector<S> av = spec.fromArray(A, i); 
            FloatVector<S> bv = spec.fromArray(B, i); 
            av.add (bv).intoArray(C, i); 
        } 
        // clean up loop 
        for (; i < a.length; i++) C[i] = A[i] + B[i]; 
    } 
}

It can be seen that after using VectorAPI, on a hardware platform that supports AVX-512, one addition can process (512/32=16) 16 floating-point numbers; while traditional addition can only process one floating-point number at a time. Here are some examples of VectorAPI in action:

  • BLAS (Basic Linear Algebra Subprograms, Basic Linear Algebra Subprograms), which are widely used in high-performance computing, AI, and multimedia fields, can be improved by 2.2X~4.5X.

  • Image processing Sepia filtering, up to 6 times better:

  • The database application has been improved by more than 2 times:

Since VectorAPI is a relatively new module, more new projects utilizing VectorAPI are being developed.

04  Current status of Java versions in the industry

In the Apache Parquet-mr project, the use of VectorAPI requires the support of JDK17. On the one hand, JDK17 is the LTS (Long-Term Support) version of Java, and more importantly, the previous Java LTS version, JDK11, does not support VectorAPI. However, JDK11 is more widely used in the industry than JDK17, and upgrading the Java version in a mature production environment is a costly thing, which creates an obvious and huge obstacle for the promotion and use of VectorAPI in the industry. The industry has also made a lot of efforts and attempts. For example, Alibaba has added VectorAPI support to the AJDK it uses internally, but its implementation is still different from that of the OpenJDK community. Subsequent upgrades and maintenance are not easy. . So Alibaba and Intel started a project to port VectorAPI (JEP 338) to Dragonwell11. This can not only take advantage of the powerful functions of VectorAPI, but also protect the current investment and avoid the risk of upgrading JDK .

05Difficulties  in transplantation

  • Huge amount of code involved

Community JEP 338 involves modification of 336 files, involving 290,000 lines of code. And these are based on JDK16 and compared with Dragonwell based on JDK11, the difference will be even greater.

  • Keep the link with upstream OpenJDK

Moreover, on the one hand, Dragonwell11 needs to continue to easily track the revision and enhancement of upstream OpenJDK11, and it also needs to make it easier for VectorAPI to track the evolution of the upstream OpenJDK follow-up VectorAPI, which brings a lot of challenges to the transplantation work.

  • The performance should be close to the upstream OpenJDK

Some performance-related changes in versions after OpenJDK11 and how to introduce Dragonwell11?

  • Stable and reliable strong demand

Dragonwell needs to support a large number of existing businesses, and stability and reliability are the first requirements.

06  Solution

For the massive amount of code, Alibaba and Intel have invested a lot of resources, including key contributors from the upstream OpenJDK community VectorAPI. The two sides have worked closely together to discuss solutions and check the code. During the transplantation process, try to keep the original structure of Dragonwell11, and only use the upstream implementation for the part involving VectorAPI. After some upstream JDK11 changes, analyze its implementation, if the existing Dragonwell11 components can be used, no additional changes will be introduced. This minimizes the impact on Dragonwell11. And use the built-in test set of OpenJDK to completely cover the transplanted Dragonwell11 to ensure the quality of the transplant.

07 Dragonwell11 + VectorAPI

Currently, VectorAPI has been merged into the main branch (master) of Dragonwell11, and is fully compatible with VectorAPI 1st Incubator (JEP 338). The functions of JEP 414, JEP 417, JEP 426, and JEP 438 will be ported to Dragonwell11 in the future.

Related Links:

Alibaba Dragonwell:https://github.com/alibaba/dragonwell11

OpenJDK :https://github.com/openjdk/jdk

Text/Zhu Wenjie, Jin Zhonghui

Original link

This article is the original content of Alibaba Cloud and may not be reproduced without permission.

Guess you like

Origin blog.csdn.net/yunqiinsight/article/details/130366886