foreword

Protocol Buffers, also known as Protobuf, is a language-independent, platform-independent scalable structured data serialization component open sourced by Google in 2008, which can be used in communication protocols, data storage and other scenarios. Since it was open sourced, it has experienced the verification of time and has been widely used in the industry because of its ease of use, high compression rate, high operating efficiency, and high development efficiency.

So can we use it directly in the project? I think this is inappropriate. Protobuf is indeed very good, and its test scores are among the best in all aspects, but it is not the first in all aspects. If your project has particularly strict requirements for a certain performance index, you can It is necessary to select the most suitable serialization component according to local conditions and actual needs.

What I want to introduce today is another data serialization component, also developed and open sourced by Google, it is FlatBuffers.

Introduction to FlatBuffers

According to the official website , FlatBuffers is an efficient, cross-platform serialization component that supports multiple programming languages and is specially developed for game development and other performance-critical applications. What is the difference between him and Protobuf? Why is it more suitable for game development? Google answered this question very directly: Protobuf is indeed similar to FlatBuffers. The main difference is that FlatBuffers does not require a conversion/unpacking step to obtain the original data.

Because we know that in network communication in game scenarios, players are often very sensitive to delay (especially in FPS and Moba games). Serialization) delay can reduce the delay of the player's operation and improve the game experience.

According to the description on the official website, we have the following general comparison between Protobuf and FlatBuffers:

	Protobuf	Flatbuffers
language support	C/C++, C#, Go, Java, Python, Ruby, Objective-C, Dart	C/C++, C#, Go, Java, JavaScript, TypeScript, Lua, PHP, Python, Rust, Lobster
Version	2.x/3.x, not compatible with each other	1.x
agreement document	.proto, you need to specify the version of the protocol file	.fbs
code generation tool	Yes (a lot of generated code)	Yes (less generated code)
protocol field type	bool, bytes, int32, int64, uint32, uint64, sint32, sint64, fixed32, fixed64, sfixed32, sfixed64, float, double, string	bool, int8, uint8, int16, uint16, int32, uint32, int64, uint64, float, double, string, vector

In order to have a more specific comparison of the performance of Protobuf and FlatBuffers, I did the following performance test, the specific code can be found in the Git repository .

Test comparison

First, a set of test data needs to be selected. In order to simulate business data as realistically as possible, I selected three types of data, namely a small amount of data, a medium amount of data, and a large amount of data.

Test Data

A small amount of data: There is only one int32. This situation mainly simulates a simple client request/server return. Some communication messages in actual business are such a simple small amount of data. The original data size is 4 bytes.
Medium data: There are commonly used data types: int32, int64, float and string, each type is 10, and each string is 10 bytes in size, so the original data size is 260 bytes (10 * (4+8 +4+10)). Because these are the most commonly used business data types, a large number of communication messages in actual business are medium-sized data with a general amount of data.
A large amount of data: There are also commonly used data types: int32, int64, float and string, each type is 10,000. Because there may be a small amount of large message data in the business, you can use this scenario to test a more extreme situation. The original data size is 253KB (10000 * (4+8+4+10) / 1024).

test environment

Test language: Java 1.8

Operating system: CentOS Linux release 6.5 (Final)

JVM：Java HotSpot(TM) 64-Bit Server VM 1.8.0_112

Protobuf：3.11.4

FlatBuffers：1.11.0

In order to compare the cost of using the two components, I listed their development and use steps respectively.

ProtoBuf usage steps

Create a protocol file and name it Pb.proto
```
// 协议版本为3.x
syntax = "proto3";
// 指定生成消息类的Java包
option java_package = "com.digisky.protocol.msg.pb";
    
    
// 消息
message Msg {
    // int32数据
    int32 intData = 1;
    // 数据消息
    repeated DataMsg datas = 2;
}
    
message DataMsg {
    // int32数据
    int32 intData = 1;
    // int64数据
    int64 longData = 2;
    // float数据
    float floatData = 3;
    // string数据
    string stringData = 4;
}
```
For the convenience of testing, a general message Msg is defined, which contains an int32 type for "small" data testing, and a DataMsg type of data for "medium" and "large" data testing. It should be noted that if you use the 3.x version of Protobuf, you must add syntax = "proto3"the version mark, and in the 3.x version, the required and optional field descriptions in the 2.x version are removed. By default, all fields are optional, so Message compatibility can be improved.
Use the compiler "protoc" to compile the protocol file and generate the protocol code. (The compiler can be downloaded from the Github repository )

Compile command:protoc --java_out=../../src/main/java/ *.proto

It *.protomeans to compile all proto protocol files in the current directory.
Looking at the generated Java code, we can see that the class com.digisky.protocol.msg.pb.PbTest has been generated.

The Protobuf protocol file with only 2 messages generates about 2000 lines of code. According to my past experience, if a protocol file is used to record the protocol messages of a module, only a dozen or so messages are needed to generate tens of thousands of lines of Java code, so the code file generated by Protobuf is quite large.

Add message serialization code

protected byte[] serialize(TestData data) {
    Builder builder = Msg.newBuilder();
    // 填充int32
    builder.setIntData(data.getData());
    // 填充data
    for (DataObject dataObject : data.getDataArray()) {
        DataMsg.Builder dataBuilder = DataMsg.newBuilder();
        dataBuilder.setIntData(dataObject.getIntData());
        dataBuilder.setLongData(dataObject.getLongData());
        dataBuilder.setFloatData(dataObject.getFloatData());
        dataBuilder.setStringData(dataObject.getStringData());
        builder.addDatas(dataBuilder);
    }
    Msg msg = builder.build();
    return msg.toByteArray();
}

First, use the static factory to new a Msg builder, and then fill intData, because datas is a composite type and an array (repeated), so use a loop to create a new DataMsg builder, fill in the DataMsg builder, and then add it Added to the Msg builder. Then build the Msg object through the build method, and finally convert it into a byte array through the toByteArray method, and then send the byte[] through the network or other forms.

Add deserialization code
```
protected void deserialize(byte[] serializedData) {
    try {
        Msg.newBuilder().mergeFrom(serializedData);
    } catch (InvalidProtocolBufferException e) {
        e.printStackTrace();
    }
}
```
Deserialization is very simple. After getting the received byte[] (from the network), generally first determine the message class corresponding to this message through the message number in the message header of the network packet. Suppose we correspond to Msg here. class, you can get the Msg builder directly through the newBuilder method and the mergeFrom method, and then you can get the data in the builder object.

At this point, Protobuf is used up. It can be seen that if Protobuf has been integrated in the framework, adding a new message, each step is simple, and there are no redundant steps.

FlatBuffers usage steps

Make a protocol file and name it Fb.fbs

// 指定生成消息类的Java包
namespace com.digisky.protocol.msg.fb;
    
// 消息
table Msg {
    // int32数据
    intData:int;
    // 数据消息
    datas:[DataMsg];
}
    
table DataMsg {
    // int32数据
    intData:int;
    // int64数据
    longData:int64;
    // float数据
    floatData:float;
    // string数据
    stringData:string;
}

The messages of FlatBuffers are all defined in the table type. Unlike Protobuf, the field is preceded by the variable name, followed by the variable type, and the order of the fields is omitted by default (it can also be added).

The compilation method is very similar to Protobuf, using the compiler "flatc" to compile the protocol file and generate the protocol code. (The compiler can be downloaded from the Github repository )

Compile command:flatc --java -o ../../src/main/java/ Fb.fbs

It should be noted that flatc does not support wildcards to specify protocol files, and cannot use *.fbs like Protobuf.
Also check the generated Java code, you can see that two classes are generated under the com.digisky.protocol.msg.fb package: Msg and DataMsg.

Therefore, the generation rule of FlatBuffers is to generate a separate class for each table. It is recommended to put the message of a module into a package separately. After formatting, each class is only about 100 lines, which is much smaller than Protobuf.

Add serialization message code

protected byte[] serialize(TestData data) {
    FlatBufferBuilder builder = new FlatBufferBuilder();
    DataObject[] dataObjects = data.getDataArray();
    int[] dataOffsets = new int[dataObjects.length];
    // 构建data偏移
    for (int i = 0; i < dataObjects.length; i++) {
        DataObject dataObject = dataObjects[i];
        // 填充string，获得stringDataOffset
        int stringDataOffset = builder.createString(dataObject.getStringData());
        // 填充data的其他字段，获得oneDataOffset
        int oneDataOffset = DataMsg.createDataMsg(builder, dataObject.getIntData(), dataObject.getLongData(),
                dataObject.getFloatData(), stringDataOffset);
        dataOffsets[i] = oneDataOffset;
    }
    int dataOffset = Msg.createDatasVector(builder, dataOffsets);
    Msg.startMsg(builder);
    // 填充int32
    Msg.addIntData(builder, data.getData());
    // 填充data
    Msg.addDatas(builder, dataOffset);
    int end = Msg.endMsg(builder);
    builder.finish(end);
    return builder.sizedByteArray();
}

First of all, a new FlatBufferBuilder needs to be created, and the filling data in the future needs to be assigned to this builder. Since the DataMsg type is referenced in the Msg, it is necessary to create the DataMsg first, obtain the offset of the DataMsg, and then fill the offset into the Msg. It can be seen that when FlatBuffers processes non-basic types of data such as custom types or arrays, it is realized by first calculating the offset and then filling the offset. This process is slightly more complicated than Protobuf.

Add deserialization code
```
protected void deserialize(byte[] serializedData) {
    ByteBuffer buffer = ByteBuffer.wrap(serializedData);
    Msg msg = Msg.getRootAsMsg(buffer);
}
```
Deserialization is relatively simple. First, use NIO to wrap byte[] into ByteBuffer, and then use Msg's getRootAsMsg to get the Msg object, and then you can perform value operations.

Performance Testing

It can be seen from the above usage steps that the usage steps of Protobuf and FlatBuffers are basically the same. The main difference is that Protobuf generates a large amount of code and simple message encapsulation; on the contrary, FlatBuffers generates less code and slightly more complicated message encapsulation. So what is their performance, and how do they perform in the case of small, medium, and large amounts of data? I tested and compared them from the dimensions of data size, serialization time, deserialization time, and memory usage.

data size

	small amount of data (4 bytes)	Medium data (260 bytes)	Large amount of data (253 kilobytes)
Protobuf	2 bytes	201 bytes	229,735 bytes
FlatBuffers	28 bytes	496 bytes	440,056 bytes

It can be seen that when the amount of data is small, Protobuf is many times smaller than FlatBuffers. When the amount of data gradually increases, Protobuf will eventually be about half smaller than FlatBuffers. So the conclusion is that Protobuf is better than FlatBuffers in terms of data size (about twice the lead), but because of different data volumes and different field types will affect the performance index, here is only a rough conclusion, I will later The source code analysis section makes a more detailed analysis.

serialization time

	small amount of data (4 bytes)	Medium data (260 bytes)	Large amount of data (253 kilobytes)
Protobuf	1466 nanoseconds	10757 nanoseconds	2,000,497 nanoseconds
FlatBuffers	1922 nanoseconds	9754 nanoseconds	2,760,061 nanoseconds

It can be seen that in terms of serialization time, Protobuf and Flatbuffers are generally comparable. Therefore, it can be concluded that Protobuf and FlatBuffers have basically the same performance in terms of serialization time.

deserialization time

	small amount of data (4 bytes)	Medium data (260 bytes)	Large amount of data (253 kilobytes)
Protobuf	2040 nanoseconds	5393 ns	1,101,464 nanoseconds
FlatBuffers	847 ns	312 nanoseconds	286 ns

It can be seen that in the case of a small amount of data, the deserialization time of FlatBuffers is about half that of Protobuf, and the advantage is already obvious in the case of medium data. What about in the case of a large amount of data?

Protobuf was directly killed in seconds! This also verifies the characteristics of Flatbuffers introduced at the beginning of this article: the original data can be obtained without conversion/unpacking operations, so the deserialization time is almost zero. So the conclusion is: FlatBuffers can beat any framework/component in deserialization performance, because it has achieved the ultimate performance.

memory usage

Because the Java language is used for testing, it is impossible to accurately count how much memory is used, but garbage collection can be used to reflect the memory usage of the entire test process. I set the maximum heap memory (Xmx) to 16MB. Let’s first look at The gc situation of Protobuf.

It can be seen that for the test data, Protobuf serialization and deserialization took a total of 1 minute and 06 seconds, the CPU took up an average of 42%, and garbage collection took up 12.2%. The GC situation is shown in the figure.

Flatbuffers only ran for 32 seconds, which saved a lot of time due to the deserialization feature, which was more in line with expectations. The CPU utilization rate is only 27%, garbage collection only accounts for 2.9%, and the number of GC is obviously at least half that of Protobuf. Therefore, the running process of Flatbuffers is relatively light and consumes less resources.

Source code analysis

So why can Protobuf make the data size so small? Generally, the serialized data will add some data of the protocol itself, but the final data size of Protobuf is actually smaller than the data itself. How does it compress it? Also, how does Flatbuffers limit the deserialization time to 0? Let us find the answers one by one from the source code. ! ! ! Hypnosis warning, if you are not interested in the source code, you can also jump directly to the end of the article to read the conclusion

Protobuf data compression

Protobuf's data processing algorithm is mainly in the com.google.protobuf.CodedOutputStream class. According to different field types, there are many different computeXXXSize. Let's take int32 as an example:

/**
* Compute the number of bytes that would be needed to encode an {@code int32} field, including
* tag.
*/
public static int computeInt32Size(final int fieldNumber, final int value) {
    return computeTagSize(fieldNumber) + computeInt32SizeNoTag(value);
}

It can be seen that the space occupied by the int32 field is determined by the sum of the tag size (that is, the field order) and the field value size. First, look at the tag size.

/** Compute the number of bytes that would be needed to encode a tag. */
public static int computeTagSize(final int fieldNumber) {
    return computeUInt32SizeNoTag(WireFormat.makeTag(fieldNumber, 0));
}

Regardless of the field type, the tag size is calculated in the same way, first:

/** Makes a tag value given a field number and wire type. */
static int makeTag(final int fieldNumber, final int wireType) {
    return (fieldNumber << TAG_TYPE_BITS) | wireType;
}

Here TAG_TYPE_BITS is equal to 3, wireType is equal to 0, so the return value is 8. There may be doubts here, why do you want to shift 3 bits to the left? Because it is necessary to allow 3bit space for wireType, which is the encoding type of Protobuf, the default is 0, and the detailed encoding list is defined in com.google.protobuf.WireFormat:

  public static final int WIRETYPE_VARINT = 0;
  public static final int WIRETYPE_FIXED64 = 1;
  public static final int WIRETYPE_LENGTH_DELIMITED = 2;
  public static final int WIRETYPE_START_GROUP = 3;
  public static final int WIRETYPE_END_GROUP = 4;
  public static final int WIRETYPE_FIXED32 = 5;

Therefore, the encoding type used by int32 is Varints, which will be further introduced later. Let's continue to look at computeUInt32SizeNoTag first:

/** Compute the number of bytes that would be needed to encode a {@code uint32} field. */
public static int computeUInt32SizeNoTag(final int value) {
    if ((value & (~0 << 7)) == 0) {// 表示value除后7bit外全为0，例如00000000 00000000 00000000 01111111
      return 1;
    }
    if ((value & (~0 << 14)) == 0) {// 表示value除后14bit外全为0，例如00000000 00000000 00111111 11111111
      return 2;
    }
    if ((value & (~0 << 21)) == 0) {// 表示value除后21bit外全为0，例如00000000 00011111 11111111 11111111
      return 3;
    }
    if ((value & (~0 << 28)) == 0) {// 表示value除后28bit外全为0，例如00001111 11111111 11111111 11111111
      return 4;
    }
    return 5;
}

As noted in the comments above, this method determines the space occupied by the tag based on the value of value. When the value is 0-127, it occupies 1 byte, when it is 128-16383, it occupies 2 bytes, and so on, up to 5 bytes. The value is obtained by shifting the fieldNumber (field order) to the left by 3 bits. To make the value less than 128, the fieldNumber needs to be less than 16. Therefore, in order to make the space occupied by the tag not exceed 1 byte, it is best not to exceed 15 fields when we define a message.

After the tag size is figured out, let's continue to look at the size of the field value computeInt32SizeNoTag:

/**
* Compute the number of bytes that would be needed to encode an {@code int32} field, including
* tag.
*/
public static int computeInt32SizeNoTag(final int value) {
    if (value >= 0) {
      return computeUInt32SizeNoTag(value);
    } else {
      // Must sign-extend.
      return MAX_VARINT_SIZE;
    }
}

First judge whether the int value is a non-negative number, if it is a non-negative number, use computeUInt32SizeNoTag to calculate the occupied space, otherwise it will occupy MAX_VARINT_SIZE bytes, that is, 10 bytes. We have just analyzed computeUInt32SizeNoTag, and there may be doubts at this time, why is uint32 divided into intervals according to 127 and 16383? Let's look at writeInt32NoTag again:

@Override
public final void writeInt32NoTag(int value) throws IOException {
      if (value >= 0) {
        writeUInt32NoTag(value);
      } else {
        // Must sign-extend.
        writeUInt64NoTag(value);
      }
}

When writing data, the first thing to do is to judge whether the value is positive or negative. If it is a negative number, it will be directly expanded into uint64 for processing. Therefore, it is directly determined that the occupied space is 10 bytes. And if it is a non-negative number, call writeUInt32NoTag to handle it:

@Override
public final void writeUInt32NoTag(int value) throws IOException {
      if (HAS_UNSAFE_ARRAY_OPERATIONS
          && !Android.isOnAndroidDevice()
          && spaceLeft() >= MAX_VARINT32_SIZE) {
        if ((value & ~0x7F) == 0) {
          UnsafeUtil.putByte(buffer, position++, (byte) value);
          return;
        }
        UnsafeUtil.putByte(buffer, position++, (byte) (value | 0x80));
        value >>>= 7;
        if ((value & ~0x7F) == 0) {
          UnsafeUtil.putByte(buffer, position++, (byte) value);
          return;
        }
        UnsafeUtil.putByte(buffer, position++, (byte) (value | 0x80));
        value >>>= 7;
        if ((value & ~0x7F) == 0) {
          UnsafeUtil.putByte(buffer, position++, (byte) value);
          return;
        }
        UnsafeUtil.putByte(buffer, position++, (byte) (value | 0x80));
        value >>>= 7;
        if ((value & ~0x7F) == 0) {
          UnsafeUtil.putByte(buffer, position++, (byte) value);
          return;
        }
        UnsafeUtil.putByte(buffer, position++, (byte) (value | 0x80));
        value >>>= 7;
        UnsafeUtil.putByte(buffer, position++, (byte) value);
      } else {
        try {
          while (true) {
            if ((value & ~0x7F) == 0) {
              buffer[position++] = (byte) value;
              return;
            } else {
              buffer[position++] = (byte) ((value & 0x7F) | 0x80);
              value >>>= 7;
            }
          }
        } catch (IndexOutOfBoundsException e) {
          throw new OutOfSpaceException(
              String.format("Pos: %d, limit: %d, len: %d", position, limit, 1), e);
        }
      }
}

It is more complicated here. First, it is a huge if-else. If the virtual machine supports Unsafe and there are at least 5 bytes unallocated, the if will be executed, otherwise, the else will be executed. In fact, the logic of the two algorithms is the same, both are assigning byte arrays, but the efficiency of Unsafe will be higher. If you are interested, you can first learn about Unsafe by yourself.

So how to assign values specifically, here we need to introduce the concept of Protobuf's Base 128 Varints . To put it simply, the 8 bits of each byte are divided according to the purpose, and the highest bit is called "most significant bit" (MSB) to store flags, indicating whether there are additional words outside this byte section, and the remaining 7 lower bits are used to store data. Therefore, each byte can only store 7 bits of data (the range of non-negative values is 0-127), which is also consistent with the previous data that is divided into one segment per 7 bits when calculating the occupied space.

Then the meaning of the above code is to first judge whether the data does not exceed 127 (whether the highest bit is 0), if it is 0, it means that the data has only one byte, and it can be directly converted to byte and then written; if not 0 , it means that in addition to this byte, additional bytes are needed to write data, so first write the first 7 bits of this data, and set the MSB to 1, write it as 1 byte data, and then follow In the same way (every 7bit is a segment) to process the following data.

Seeing this, the data compression principle of Protobuf is basically clear. For int32 type data, the space of 4 bytes will be compressed to 1 byte at most according to the data value, plus the space occupied by the protocol itself (tag) The space can be compressed up to 2 bytes. Therefore, for int32 type data, the highest compression rate that can be achieved is 50%.

There is another detail that needs to be noted. As mentioned above, for the int32 type of negative value, no matter how negative it is, Protobuf will convert it into uint64 for processing, which will be a waste of space (for example, the number "-1" of int32 If you convert uint64, its value will become a number as large as 2^32-1=4294967295). So what if you need to deal with negative numbers in your business? Protobuf provides a sint type specially optimized for negative numbers. Check the source code of writeSInt32 to see:

/** Write a {@code sint32} field, including tag, to the stream. */
public final void writeSInt32(final int fieldNumber, final int value) throws IOException {
    writeUInt32(fieldNumber, encodeZigZag32(value));
}

/**
* Encode a ZigZag-encoded 32-bit value. ZigZag encodes signed integers into values that can be
* efficiently encoded with varint. (Otherwise, negative values must be sign-extended to 64 bits
* to be varint encoded, thus always taking 10 bytes on the wire.)
*
* @param n A signed 32-bit integer.
* @return An unsigned 32-bit integer, stored in a signed int because Java has no explicit
*     unsigned support.
*/
public static int encodeZigZag32(final int n) {
    // Note:  the right-shift must be arithmetic
    return (n << 1) ^ (n >> 31);
}

Protobuf uses zigzag encoding to deal with negative numbers. For example, 1 becomes 2 after encoding, -1 becomes 1 after encoding, and all positive numbers after encoding, and there will be no number encoding with a small absolute value. After that, it becomes a very large number, so you can make full use of Protobuf's Base 128 Varints for data compression. Other types such as enum, float, double, sfixed, etc. are converted into fixed for processing without data compression.

To sum up, for data with small integer values, Protobuf can compress it very well, and the compression rate can reach up to 50%.

Deserialization of FlatBuffers

To understand the process of deserialization, you must first understand the process of serialization, because deserialization is the reverse operation of serialization. From the usage steps of Flatbuffers, we can see that each message is represented by a table, and tables can be nested. To assign a value to a table variable, you only need to give the offset of the table. Therefore, calculating the offset becomes the key to analyzing the source code of FlatBuffers. Let’s first look at the method of calculating the DataMsg offset: com.digisky.protocol.msg.fb.DataMsg class (the code is generated by the flatc compiler)

public static int createDataMsg(FlatBufferBuilder builder,
                                int intData,
                                long longData,
                                float floatData,
                                int stringDataOffset) {
    builder.startObject(4);
    DataMsg.addLongData(builder, longData);
    DataMsg.addStringData(builder, stringDataOffset);
    DataMsg.addFloatData(builder, floatData);
    DataMsg.addIntData(builder, intData);
    return DataMsg.endDataMsg(builder);
}

In this method, the startObject method of the builder is first called to create the object, and the parameter "4" represents the number of variables contained in the DataMsg. Follow up to see the implementation of this method:

public void startObject(int numfields) {
    notNested();
    // new出一个包含变量便宜值的数组
    if (vtable == null || vtable.length < numfields) vtable = new int[numfields];
    vtable_in_use = numfields;
    // 初始化数组
    Arrays.fill(vtable, 0, vtable_in_use, 0);
    nested = true;
    // 获得本对象的起始便宜值
    object_start = offset();
}

This method doesn't do anything real, it just does some initialization work. Next, let's look at the addXXX method for assigning values to each variable. Let's take addIntData as an example:

public static void addIntData(FlatBufferBuilder builder, int intData) {
    builder.addInt(0, intData, 0);
}

You can see that the addInt method of the builder is called directly. The first parameter 0 represents the order of the fields, 0 represents the first field, and the third parameter 0 represents the default value of the field, which means that the default is 0. Continue to follow up:

/**
 * Add an `int` to a table at `o` into its vtable, with value `x` and default `d`.
 *
 * @param o The index into the vtable.
 * @param x An `int` to put into the buffer, depending on how defaults are handled. If
 *          `force_defaults` is `false`, compare `x` against the default value `d`. If `x` contains the
 *          default value, it can be skipped.
 * @param d An `int` default value to compare against when `force_defaults` is `false`.
 */
public void addInt(int o, int x, int d) {
    if (force_defaults || x != d) {
        addInt(x);
        slot(o);
    }
}

The first is to make a judgment whether to force the default value to be written. If it is not mandatory, and the field is the default value, then it will be returned directly. This is the only data compression scheme for Flatbuffers. If the field is the default value, it will not Serialize the field. If it is not the default value, the addInt and slot methods are called. First is the addInt method:

/**
 * Add an `int` to the buffer, properly aligned, and grows the buffer (if necessary).
 *
 * @param x An `int` to put into the buffer.
 */
public void addInt(int x) {
    prep(Constants.SIZEOF_INT, 0);
    putInt(x);
}

The AddInt method includes prep and PutInt:

/**
 * Prepare to write an element of `size` after `additional_bytes`
 * have been written, e.g. if you write a string, you need to align such
 * the int length field is aligned to {@link com.google.flatbuffers.Constants#SIZEOF_INT}, and
 * the string data follows it directly.  If all you need to do is alignment, `additional_bytes`
 * will be 0.
 *
 * @param size This is the of the new element to write.
 * @param additional_bytes The padding size.
 */
public void prep(int size, int additional_bytes) {
    // Track the biggest thing we've ever aligned to.
    if (size > minalign) minalign = size;
    // Find the amount of alignment needed such that `size` is properly
    // aligned after `additional_bytes`
    int align_size = ((~(bb.capacity() - space + additional_bytes)) + 1) & (size - 1);
    // Reallocate the buffer if needed.
    while (space < align_size + size + additional_bytes) {
        int old_buf_size = bb.capacity();
        ByteBuffer old = bb;
        bb = growByteBuffer(old, bb_factory);
        if (old != bb) {
            bb_factory.releaseByteBuffer(old);
        }
        space += bb.capacity() - old_buf_size;
    }
    pad(align_size);
}
    
/**
 * Add an `int` to the buffer, backwards from the current location. Doesn't align nor
 * check for space.
 *
 * @param x An `int` to put into the buffer.
 */
public void putInt(int x) {
    bb.putInt(space -= Constants.SIZEOF_INT, x);
}

prep is mainly to make some preparations before writing, confirm whether the buffer is enough to write, and check data alignment, etc., and putInt is to actually write data into the buffer. It can be seen from here that when Flatbuffers writes data, it directly writes the original data without compression. Let’s look at the slot method:


/**
 * Set the current vtable at `voffset` to the current location in the buffer.
 *
 * @param voffset The index into the vtable to store the offset relative to the end of the
 * buffer.
 */
public void slot(int voffset) {
    vtable[voffset] = offset();
}

After the data is written, the offset of the field is calculated according to the order of the fields and saved in the vtable. So far, the data of each field has been written. Then by calculating the final offset, the offset of the entire message can be obtained, and the serialization work is completed.

The next step is deserialization. Do you remember how to deserialize the FlatBuffers data:

ByteBuffer buffer = ByteBuffer.wrap(serializedData);
Msg msg = Msg.getRootAsMsg(buffer);

First, wrap the byte array into NIO ByteBuffer, and then call getRootAsMsg to get the Msg object. How is the Msg object constructed? Is field parsing performed during the construction process?

public static Msg getRootAsMsg(ByteBuffer _bb) {
    return getRootAsMsg(_bb, new Msg());
}

public static Msg getRootAsMsg(ByteBuffer _bb, Msg obj) {
    _bb.order(ByteOrder.LITTLE_ENDIAN);
    return (obj.__assign(_bb.getInt(_bb.position()) + _bb.position(), _bb));
}

public Msg __assign(int _i, ByteBuffer _bb) {
    __init(_i, _bb);
    return this;
}

public void __init(int _i, ByteBuffer _bb) {
    bb_pos = _i;
    bb = _bb;
    vtable_start = bb_pos - bb.getInt(bb_pos);
    vtable_size = bb.getShort(vtable_start);
}

Through the above methods, we can see that firstly, the ByteBuffer is set to the little-endian mode, and then the __assign method is called to associate the ByteBuffer with the Msg object and initialize it. The whole process is just a simple assignment operation, without time-consuming operations such as memory copy and data decoding, so how to parse the message field, let's take intData as an example:

public int intData() {
    int o = __offset(4);
    return o != 0 ? bb.getInt(o + bb_pos) : 0;
}

/**
* Look up a field in the vtable.
*
* @param vtable_offset An `int` offset to the vtable in the Table's ByteBuffer.
* @return Returns an offset into the object, or `0` if the field is not present.
*/
protected int __offset(int vtable_offset) {
    return vtable_offset < vtable_size ? bb.getShort(vtable_start + vtable_offset) : 0;
}

To obtain a field, first obtain the offset of the field through the field order "4", and then use the offset to directly obtain the field value in the ByteBuffer, which is very concise and efficient.

To sum up, FlatBuffers calculates the offset of each field in the data body during serialization, and stores the offset in the data body. When deserializing, first read the offset of the field, and then read the data according to the offset. Because the deserialization process does not have time-consuming operations such as memory copy and data decoding, the speed is very fast.

Summarize

After FlatBuffers is serialized, the original data can be obtained without conversion/unpacking operations. The time consumed by deserialization is extremely short, so short that it is much less than 1ms and can be ignored; the data is basically not compressed, because there is an offset relationship. , the amount of data has increased compared with the original data; the amount of generated code is less, the operation is lighter, the CPU usage is lower, and the memory usage is less.
Protobuf uses the "Base 128 Varints" algorithm to compress integer data, and the compression ratio can reach up to 50%. Serialization and deserialization are relatively heavy, resulting in a large amount of generated code, high CPU usage, and high memory usage.
In Protobuf 3.x, the required and optional field descriptions are removed, which means that all fields except repeated are optional, which improves the message compatibility of Protobuf.
Protobuf usage skills:
- In order to make the data compression rate higher, the number of fields in each message should preferably not exceed 15.
- Try not to use int32 and int64, use uint if it is a positive number, and use sint if it is a negative number.
- If the business can control the value range of int32 or int64 data, try to control it within 0-127.
- Through the above techniques, the data compression capability of Protobuf can be improved.
Tips for using Flatbuffers:
- The uint type is only to expand the value range of int (compatible with languages such as c/c++ and c# that have unsigned int type), and if it is a language without unsigned int type such as java, it will be extended to long when assigning and obtaining values. processing, so unless there is an actual need, try not to use uint.
- If the business can control the probability that the values of bool, int8, int16, int32, and int64 are 0 and non-zero, try to make the value 0 more often, so that Flatbuffers can have certain compression capabilities.
If the project requirements have strict requirements on data processing delay (such as FPS, Moba, action RPG, etc.), you can consider using Flatbuffers, and cooperate with UDP/KCP and other transport layer protocols, which can be better than the traditional TCP+Protobuf solution The effect of reducing the delay.

[Serialization comparison] Protobuf VS FlatBuffers