Analysis of protobuf features

This article mainly analyzes the underlying implementation principles of several major features of protolbuf, and does not explain the basic syntax too much.

Varint variable length compression: The compression effect can be achieved when transmitting smaller numbers, as shown in the figure below:

Digital storage model less than or equal to 127: Digital storage model
Insert picture description here
greater than or equal to 128:
Insert picture description here

Key field name compression : In the general rpc protocol design process, we all need to send the field name (key) and field value (value) to the other party, but pb uses a few clever methods: pass the first byte Describe the field type and the sequence of the field in the structure to achieve the effect of data encapsulation and decapsulation. The specific algorithm is as follows:
Insert picture description here
As shown in the figure, pb divides the field type into 0~5. During the serialization process, use the first The three bits in the byte store the type information, and the rest of the information stores the order of the labels. In this way, the field type and field label are used to achieve the effect of the traditional scheme, and the space occupied by the field name is greatly saved:
Note: [1,15 The identification number within] will occupy one byte when encoding. The identification number within [16,2047] occupies 2 bytes.

The identification numbers within [1,15] are arranged in the encoding memory:
Insert picture description here
Note: The above screenshot is the compression process of an int32 type field name with id field label=10, "tag (field label)<<3 (left shift by three digits) )|ware_type (field type)", so that the compression of fields and types is achieved through one byte .

The identification numbers within [16,2047] are arranged in the encoding memory:
Insert picture description here
Note: The above screenshot is the compression process of an int32 type field name with id field label=16, "tag (field label) <<3 (left shift three digits) )|ware_type (field type)", so that the compression of the field and type is realized by two bytes .

Packed modified field compression:
Under normal circumstances, the memory arrangement of an array must be the model of tag-value tag-value tag-value...:
Insert picture description here
Insert picture description here
Question: If you store with traditional multiple T-V pairs (without packed=true) , It will lead to Tag redundancy, that is, the same Tag is stored multiple times;

Insert picture description here

Negative number compression:
The insufficiency of Varint encoding.
Background: In computers, negative numbers are generally expressed as large integers.
Because the computer defines the sign bit of a negative number as the highest bit of the number.
Problem: If you use Varint encoding to represent a negative number, you must 5 bytes (because the highest bit of a negative number is 1, it will be treated as a large integer)
Solution: Protocol Buffer defines the sint32 / sint64 type to represent negative numbers, by first adopting Zigzag encoding (converting signed numbers to unsigned Number), and then use Varint encoding, which is used to reduce the number of bytes after encoding
Insert picture description here
Insert picture description here

Insert picture description here
Insert picture description here
Through the above several compression methods, the space occupation is greatly reduced, and the network bandwidth pressure is reduced.

Extensibility and compatibility:
The new and old version compatibility issues of the new version protocol are solved very well through the method of tag labeling, so I won't introduce too much here.

Detailed reference: https://cloud.tencent.com/developer/article/1394349

Instructions:

Installation: https://github.com/protocolbuffers/protobuf Download the source code
tar zxvf protobuf-cpp-3.8.0.tar.gz
cd protobuf-cpp-3.8.0/
./configure
sudo make install
Display version information: protoc - version

Generate .cc and .h files based on the proto file:
Insert picture description here

Refer to the generated .h and .cc:
g++ -o mode_name code_test.cpp xx.cc -lprotobuf -lpthread

Basic grammar reference translation: https://blog.csdn.net/u011518120/article/details/54604615?utm_medium=distribute.pc_aggpage_search_result.none-task-blog-2 all first_rank_v2~rank_v25-1-54604615.nonecase&utm_term=proto%E7 %9A%84oneof%E4%BB%80%E4%B9%88%E4%BD%9C%E7%94%A8&spm=1000.2123.3001.4430

Guess you like

Origin blog.csdn.net/wangrenhaioylj/article/details/109178376