Up to 170 times faster than JDK, Ant Group's open source high-performance multilingual serialization framework Fury

Source|Public account InfoQ 

Author|Yang Chaokun (Mubai)

Editor|Deng Yanqin  

Fury is a multi-language serialization framework based on JIT dynamic compilation and zero-copy. It supports languages ​​such as Java/Python/Golang/JavaScript/C++, and provides fully automatic object multi-language/cross-language serialization capabilities. Compared with JDK, it is up to 170 double the performance.  

The GitHub address of the main code warehouse is: https://github.com/alipay/fury

background

Serialization is a fundamental component of system communication and is widely used in distributed systems such as big data, AI frameworks, and cloud native. When an object needs to be transferred across processes, languages, nodes, persistence, state reading and writing, and replication, it needs to be serialized, and its performance and ease of use affect operating efficiency and development efficiency .

Because the static serialization framework protobuf/flatbuffer/thrift does not support object references and polymorphism, and needs to generate code in advance, it cannot be used as a domain object for cross-language development directly for applications . The dynamic serialization framework JDK serialization/Kryo/Fst/Hessian/Pickle, etc., although it provides ease of use and dynamics, does not support cross-language, and there are significant shortcomings in performance , which cannot meet high throughput, low latency and Scenario requirements for large-scale data transmission.

Therefore, we developed a new multilingual serialization framework, Fury , and officially open sourced it on Github. Through a set of highly optimized serialization basic primitives , combined with JIT dynamic compilation and Zero-Copy and other technologies, it meets the requirements of performance, function and ease of use at the same time , realizes automatic cross-language serialization of any object, and provides the ultimate performance .

3213a0ea602660d3d971876155fa6d0f.jpeg

Introduction to Fury

Fury is a multi-language serialization framework based on JIT dynamic compilation and zero-copy, providing ultimate performance and ease of use:

  • Support mainstream programming languages ​​Java/Python/C++/Golang/JavaScript , other languages ​​can be easily extended;

  • Unified multilingual serialization core capabilities:

    • Highly optimized serialization primitives ;

    • Zero-Copy serialization support, support Out of band serialization protocol, support read and write of off-heap memory;

    • Based on JIT dynamic compilation technology , asynchronous multi-threading automatically generates serialized code  at runtime to optimize performance , increase method inlining, code caching and eliminate dead code, and reduce virtual method calls/conditional branches/Hash lookup/metadata writing/memory reading and writing etc., providing performance up to 170 times higher than other serialization frameworks ;

  • Multi-protocol support: taking into account the flexibility and ease of use of dynamic serialization, and the cross-language capabilities of static serialization.

    • Java serialization:

      • It seamlessly replaces JDK/Kryo/Hessian without modifying any code, but provides a performance of up to 170x, which can greatly improve the efficiency of RPC calls, data transmission and object persistence in high-performance scenarios ;

      • 100% compatible with JDK serialization, natively supports JDK custom serialization methods writeObject/ readObject/ writeReplace/ readResolve/ readObjectNoData

    • Cross-language object graph serialization:

      • Multi-language/cross-language automatic serialization of arbitrary objects without creating IDL files, manually compiling schemas to generate code, and converting objects to intermediate formats;

      • Multi-language/cross-language automatic serialization of shared references and circular references , no need to care about data duplication or recursion errors;

      • Support object type polymorphism, multiple subtype objects can be serialized at the same time;

    • Line storage serialization:

      • Provides a cache-friendly binary random access line storage format, supports skip serialization and partial serialization, and is suitable for high-performance computing and large-scale data transmission scenarios;

      • Supports automatic mutual conversion with Arrow columns;

Serialization Core Capabilities

Although different scenarios have requirements for serialization, the underlying operations of serialization are similar. Therefore, Fury defines and implements a set of basic capabilities for serialization. Based on this set of capabilities, different multi-language serialization protocols can be quickly built, and high performance can be achieved through compilation acceleration and other optimizations. At the same time, the performance optimization of a protocol's basic capabilities can also benefit all serialization protocols .

Serialization primitives

Common operations involved in serialization mainly include:

  • bitmap bit manipulation

  • integer codec

  • integer compression

  • string creation* copy optimization

  • String encoding: ASCII/UTF8/UTF16

  • memory copy optimization

  • Array copy compression optimization

  • Metadata Encoding & Compression & Caching

Fury has made a lot of optimizations in each language for these operations, combining SIMD instructions and advanced language features to push the performance to the extreme, thus facilitating the use of different protocols.

zero-copy serialization

In large-scale data transmission scenarios, there are often multiple binary buffers inside an object graph, and the serialization framework will write these data into an intermediate buffer during the serialization process, introducing multiple time-consuming memory copies. Fury draws on the zero-copy design of pickle5, ray, and arrow, and implements a set of Out-Of-Band serialization protocols, which can directly capture all binary buffers in an object graph, avoid intermediate copies of these buffers , and convert The memory copy overhead during serialization is reduced to 0.

The figure below is the approximate serialization process of Zero-Copy when Fury turns off reference support.

34f84e1fe1855ae5a1c18ad297ee1611.png

Currently Fury has built-in support for the following types of Zero-Copy:

  • Java: All primitive arrays, ByteBuffer, ArrowRecordBatch, VectorSchemaRoot

  • Python: all arrays of the array module, numpy arrays, pyarrow.Table, pyarrow.RecordBatch

  • Golang:byte slice

Users can also extend new zero-copy types based on Fury's interface.

JIT dynamic compilation acceleration

For custom type objects to be serialized, which usually contain a lot of type information, Fury uses these type information to directly generate efficient serialization code at runtime , and completes a large number of runtime operations in the dynamic compilation phase, thereby increasing method inlining. And code caching, reducing virtual method calls/conditional branches/Hash lookup/metadata writing/memory reading and writing, etc., ultimately greatly accelerating serialization performance.

For the Java language, Fury implements a set of runtime code generation framework, defines a set of serialization logic operator expression IR, performs type inference based on the generic information of the object type at runtime, and then builds a description serialization The expression tree of code logic generates efficient Java code according to the expression tree , and then compiles it into bytecode through Janino at runtime, then loads it into the user's ClassLoader or the ClassLoader created by Fury, and finally compiles it into an efficient Java code through Java JIT assembly code.

Since the JVM JIT will skip compiling and inlining large methods, Fury also implements a set of optimizers to recursively split large methods into small methods, thus ensuring that all code generated by Fury can be compiled and inlined , squeezing The performance of the JVM is extreme.

bbf5b6e931218b59fcebc95b79e996c4.png

At the same time, Fury also supports asynchronous multi-threaded dynamic compilation , submitting the code generation tasks of different serializers to the thread pool for execution, and using the interpretation mode to execute before the compilation is completed, so as to ensure that there will be no serialization glitches, and there is no need to warm up all types in advance serialization .

Python and JavaScript scenarios also adopt a similar code generation method, which has a low development threshold and makes it easier to troubleshoot problems.

Since serialization needs to closely manipulate the objects of each programming language, and the programming language does not expose the low-level API of the memory model, there is a large overhead in calling through the Native method, so we cannot build a unified serializer JIT framework through LLVM , but needs to combine language features within each language to implement a specific code generation framework and serializer construction logic.

static code generation

Although JIT compilation can greatly improve serialization efficiency and regenerate better serialization code based on the statistical distribution of data at runtime, languages ​​such as C++/Rust do not support reflection, do not have a virtual machine, and do not provide a low-level memory model. API, so we can't generate serialized code through JIT dynamic compilation for this kind of language.

For such scenarios, Fury is implementing a set of AOT static code generation framework, which generates serialization code in advance according to the schema of the object at compile time, and then uses the generated code for automatic serialization. For Rust, in the future, Rust's macro will also be used to generate code at compile time to provide better ease of use.

cache optimization

When serializing a custom type , the fields are reordered to ensure that the fields of the same interface type are serialized sequentially, increasing the probability of a cache hit , and at the same time promoting the CPU instruction cache to achieve more efficient serialization. For basic type fields, the writing order is arranged in descending order according to the size of the byte field , so that if the start address is aligned, subsequent reads and writes will occur at the memory address alignment position, and the CPU execution is more efficient.

Multi-protocol design and implementation

Based on the multilingual serialization core capabilities provided by Fury, we built three serialization protocols on top of it, which are suitable for different scenarios:

  • Java serialization: suitable for pure Java serialization scenarios, providing a performance improvement of up to 100 times;

  • Cross-language object graph serialization: suitable for application-oriented multi-language programming, and high-performance cross-language serialization;

  • Row storage serialization: suitable for distributed computing engines such as Spark/Flink/Dories/Velox/sample stream processing framework/feature storage, etc.;

In the future, we will also add new protocols for some core scenarios, and users can also build their own protocols based on Fury's serialization capabilities.

Java serialization

Due to the widespread use of Java in big data, cloud native, microservices, and enterprise-level applications, performance optimization for Java serialization can greatly reduce system latency, increase throughput, and reduce server costs .

Therefore, Fury has made a lot of extreme performance optimizations for Java serialization, and our implementation has the following capabilities:

  • Extreme performance : By utilizing the type and generic information of Java objects, combined with JIT compilation and Unsafe low-level operations , Fury has a performance improvement of up to 170 times compared to JDK, and a performance improvement of up to 50-100 times compared to Kryo/Hessian.

  • 100% JDK serialization API compatibility : It supports the semantics of all JDK custom serialization methods writeObject/readObject/writeReplace/readResolve/readObjectNoData, ensuring the correctness of replacing JDK serialization in any scenario. However, existing Java serialization frameworks such as Kryo/Hessian have certain correctness problems in these scenarios

  • Type compatibility : When the Class Schema of the deserialization end and the serialization end are inconsistent, it can still be deserialized correctly. It supports independent upgrade and deployment of applications, and independent addition and deletion of fields . And we have carried out the ultimate compression and sharing of metadata, and the type compatibility mode has almost no performance loss compared with the type strong consistency mode.

  • Metadata sharing : Metadata (class name, field name, Final field type information, etc. ) is shared between multiple serializations under a certain context (TCP connection) , and these information will be sent when the context is serialized for the first time At the peer end , the peer end can rebuild the same deserializer based on the type information. Subsequent serialization can avoid the transmission of metadata, reduce network traffic pressure, and automatically support forward and backward compatibility of types.

  • Zero-copy support : Support Out of band zero-copy and off-heap memory read and write.

Cross-language object graph serialization

Cross-language object graph serialization is mainly used in scenarios that have higher requirements for dynamism and ease of use. Although frameworks such as Protobuf/Flatbuffer provide multi-language serialization capabilities, there are still some shortcomings:

  • It is necessary to write IDL in advance and statically compile and generate code, which is not dynamic and flexible enough;

  • The generated class does not conform to the object-oriented design and cannot add behavior to the class, and cannot be directly used for multilingual application development as a domain object .

  • Subclass serialization is not supported . The main feature of object-oriented programming is to call subclass methods through interfaces . Such patterns are also not well supported. Although Flatbuffer provides Union and Protobuf provides OneOf/Any features, such features need to determine the type of object during serialization and deserialization, which does not conform to the design of object-oriented programming .

  • Circular and shared references are not supported . It is necessary to redefine a set of IDL for domain objects and implement reference resolution by yourself , and then write code in each language to realize the mutual conversion between domain objects and protocol objects . Deeper, more code needs to be written.

Combining the above points, Fury implements a set of cross-language object graph serialization protocols:

  • Multi-language/cross-language automatic serialization of arbitrary objects : define two Classes at the serialization and deserialization ends, and automatically serialize objects in one language into objects in another language without creating IDL files, compiling schema generated code and handwritten conversion code;

  • Multi-language/cross-language automatic serialization of shared references and circular references ;

  • Support object type polymorphism , in line with the object-oriented programming paradigm, multiple subtype objects can be automatically deserialized at the same time, without manual processing by the user;

  • At the same time, we also support Out of band zero copy on this protocol;

Example of automatic cross-language serialization:

aa909b5cd6e9ba81717ecaa05b5376d3.png

row storage serialization

For high-performance computing and large-scale data transmission scenarios, data serialization and transmission are often the performance bottleneck of the entire system . If the user only needs to read part of the data, or filter according to a certain field of the object, deserializing the entire data will bring additional overhead. Therefore, Fury also provides a set of binary data structures to read and write directly on binary data, avoiding serialization .

Apache arrow is a mature column storage format that supports binary reading and writing. However, column storage cannot meet the needs of all scenarios. The data in online links and streaming computing scenarios is naturally a row storage structure. At the same time, the columnar computing engine will also use it when it involves data changes and Hash/Join/Aggregation operations. Row storage structure.

However, there is no unified standard implementation for line storage. Computing engines such as Spark/Flink/Doris/Velox define a set of line storage formats. These formats do not support cross-language, and can only be used internally by their own engines, and cannot be used for other frame. Although Flatbuffer can support on-demand deserialization, it needs to statically compile Schema IDL and manage offsets, which cannot meet the dynamic and ease-of-use requirements of complex scenarios.

Therefore, Fury borrowed from the spark tungsten and apache arrow formats in the early stage, and implemented a set of binary line storage structures that can be accessed randomly. Currently, it implements the Java/Python/C++ version, which realizes direct reading and direct writing on binary data, avoiding the All serialization overhead .

The following figure is the binary format of Fury Row Format:

37775813a201994e5b08a939b4aa1f60.png

This format is densely stored, data aligned, cache-friendly, and faster to read and write. Since deserialization is avoided, Java GC pressure can be reduced. At the same time, the overhead of Python is reduced. At the same time, due to the dynamic nature of Python, Fury's data structure implements _getattr__/getitem/slice/ and other special methods, ensuring the consistency of behavior with python dataclass/list/object, and users have no perception.

performance comparison

Here are some Java serialization performance data. The chart whose title contains compatible is the performance data under the support type backward and backward compatibility, and the chart whose title does not contain compatible is the performance data under the unsupported type backward and forward compatibility. To be fair, all tests Fury turned off the zero-copy feature.

For more benchmark data, please refer to Fury Github official documentation: https://github.com/alipay/fury/tree/main/docs/benchmarks

4f8972baa689327b1d3437ddcba92066.jpegc57cf60924ca9b33929061aef4a2a02f.jpeg7aae6d8493fc68e8b8bae342d4add4a2.jpeg

future plan

  • Metadata compression and automatic sharing

  • Cross-language serialization supports forward and backward compatibility

  • Static code generation framework for generating c++/golang/rust code ahead of time

  • C++/Rust supports cross-language object graph serialization

  • Golang/Rust/JavaScript supports line storage

  • Compatible with the ProtoBuffer ecosystem and supports automatic generation of Fury serialization codes based on Proto IDL

  • New protocol implementation: AI feature storage, knowledge map serialization

  • Continuously improve our basic serialization primitives to provide higher performance implementations

  • Standardized protocol, providing binary compatibility

  • Documentation and usability improvements

join us

We are committed to making Fury an open and neutral community project that pursues perfection and innovation . Subsequent research and development and discussions will be carried out in the community in an open-source and transparent manner. Any form of participation is welcome, including but not limited to questions, code contributions, technical discussions, etc. I am very much looking forward to receiving your ideas and feedback, to participate in the construction of the project together, to promote the development of the project, and to create the most advanced serialization framework.

The GitHub address of the main code warehouse is : https://github.com/alipay/fury

Official website: https://furyio.org

All kinds of Issues, PRs, and Discussions are welcome.

You are also welcome to directly join the official communication group below and communicate with us.

WeChat public account: Fury, a high-performance serialization framework

Dingding communication group: Dingding group number (36170003000)

About the Author:

Yang Chaokun , technical expert of Ant Group, author of Fury framework. Joined Ant Group in 2018, and has been engaged in the development of distributed computing frameworks such as stream computing framework, online learning framework, scientific computing framework and Ray. in-depth understanding.

Guess you like

Origin blog.csdn.net/bjweimengshu/article/details/131798959