protobuf format analysis

 

Protobuf is a high-performance serialization framework produced by Google. It has the advantages of small packet data after serialization, and supports a variety of programming languages ​​(c/c++, java, php, python and other mainstream languages). The disadvantage is that the binary is unreadable. important.

 

1. Installation

Download the source code to compile

 

2. Development process

2.1 Prepare the helloworld.proto file

package com;

message helloworld{
    required int32 id = 1;
    required string str = 3;
    optional int32 age = 2;
}

 

 

Required field

Optional optional field, used more, easy to upgrade the system smoothly later

Repeated repeated fields, equivalent to passing an array

 

Num 1,2,3 Field serial number, cannot be repeated

package package name, corresponding to namespace in c++, and corresponding to package name in java

 

Common data types, corresponding to c/c++ data structures

bool

int32/uint32 int64/uint32

float double

string can only handle ASCII characters

bytes is used to handle multi-byte language characters, such as Chinese

enum enumeration

 

2.2 Generate bundles for each language

protoc -I=. --cpp_out=. helloworld.proto
protoc -I=. --java_out=./java helloworld.proto

 

 

2.3 Network test

The biggest advantage of protobuf is that it is cross-language, and the Java client message is processed through the C++ udp server.

 

C++ UDP Server:  

/**
 * C++ Udp server
 */
void udpServer()
{
        int s;
        struct sockaddr_in addr_serv;
        struct sockaddr_in client;

        s = socket (AF_INET, SOCK_DGRAM, 0);

        memset(&addr_serv, 0, sizeof(addr_serv));
        addr_serv.sin_family = AF_INET;
        addr_serv.sin_addr.s_addr = htonl(INADDR_ANY);
        addr_serv.sin_port = htons(PORT_SERV);

        bind(s, (struct sockaddr*)&addr_serv, sizeof(addr_serv));

        int n;
        char buff [BUFF_LEN];
        socklen_t len;

        while(1)
        {
                len = sizeof(client);
                n = recvfrom(s, buff, BUFF_LEN, 0, (struct sockaddr*)&client, &len);

                // unserialize
                helloworld rmsg;
                rmsg.ParseFromArray( buff, BUFF_LEN );
                printf( "Recv: %s\n", rmsg.DebugString().c_str() );
        }
}

 

Java UDP Client:

/**
 * UDP send pb packets
 */
public static void sendPbPacket() {
	// Builder
	Helloworld.helloworld.Builder builder = Helloworld.helloworld.newBuilder();
	builder.setId(505100).setStr("hello world");
	builder.setAge(18);
	
	// Make object
	Helloworld.helloworld hw = builder.build();
	System.out.println( hw.toString() );

	// Serialization
	byte[] buf = hw.toByteArray();
	try {
		// deserialize
		Helloworld.helloworld hw1 = Helloworld.helloworld.parseFrom(buf);
		System.out.println( hw1.toString() );
	} catch (InvalidProtocolBufferException e) {
		e.printStackTrace ();
	}

	// UDP send
	DatagramSocket client = null;
		
	try {
		client = new DatagramSocket();

		InetAddress addr = InetAddress.getByName(host);
		DatagramPacket sendPacket = new DatagramPacket(buf, buf.length, addr, port);
		client.send(sendPacket);
	} catch (Exception e) {
		e.printStackTrace ();
	}finally{
		client.close();
	}
}

 

Server prints:

Recv: id: 505100
str: "hello world"
age: 18

   It can be seen that protobuf can perfectly serialize across languages.

 

 

3. protobuf format analysis

Protobuf encoding is actually similar to tlv (tag length value) encoding, which is a combination of (tag, length, value) internally, where tag is calculated by (field_number<<3)|wire_type, and field_number is defined by us in the proto file.

tlv

  

Wireshark captures the above communication process. data pack:

 pb data

 

 

Data segment, a total of 19 bytes:

08 8c ea 1e 12 0b 68 65 6c 6c 6f 20 77 6f 72 6c 64 18 12 

 

1. int id = 505100 

08 08 = (1<<3)|0, id serial number, the Type corresponding to int32 is 0 from the above table  

8c ea 1e three bytes represent the number 505100

 

Why is 505100 0x8cea1e? The following is the conversion process:

Decimal: 505100 

Binary: 1111011010100001100

Split every 7 digits: 001 1110 110 1010 000 1100

Swap high and low bits, fill high bits (1 or 0): 1000 1100 1110 1010 0001 1110

Hex: 0x08 0x0c 0xe 0xa 0x1 0xe

 

2. string str = "hello world";

12 0x12 = (2<<3)|2 

0b length is 11

68 65 6c 6c 6f 20 77 6f 72 6c 64 hello world

 

3. int age = 18

18 0x18 = (3<<3)|0

12 decimal 18 

  

 

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326440242&siteId=291194637