Detailed explanation and application of protobuf reflection (pb/json mutual conversion)

For the use of protobuf, the coding principle, and the application of the coding principle, please refer to the following articles.

Common usage of Python to operate protobuf

Installation and use of protobuf in linux environment

Detailed explanation of Protobuf encoding rules

Protobuf coding principle and its application in schema format conversion

In addition, Chen Shuo's article is very worthy of reference.

This article mainly introduces the reflection mechanism of protobuf, several classes involved in the pb reflection mechanism, the implementation steps of pb reflection, and the application of reflection in pb↔json conversion.

First solve a problem, what is reflection? What can it be used for?

1. What is reflection. Given a pb object, how do I automatically iterate over all fields of that object? In other words is there a general way to iterate over all fields of any pb object without caring about the specific object type. In order to deepen the understanding, here is a passage from Chen Shuo's article. "The problem to be solved here is: after receiving protobuf data, how to automatically create a specific Protobuf Message object and then deserialize it. 'Auto' means: when a new protobuf message type is added to the program, this part The code does not need to be modified, and there is no need to register the message type yourself."

Here I want to make it clear: In fact, it is how to automatically create a Message object with the type name; and then deserialize the serialized binary stream to restore the original data.

For example: for the following pb message type Person instance, can we automatically convert the object into a json string {"name":"waitingzhuo","age":26}. For this problem, many students will say that it is not easy. It is enough to parse out the Person, take out each field and create json. Note: The word "automatic" here. For example, if we add a new field person.set_email("[email protected]") in pb, can you automatically add this field in the json? Or you need to manually modify the code of pb2json to achieve it.

#有如下proto文件
syntax = "proto2";
package tencent;

message Person
{
    required string name = 1;
    required uint32 age  = 2;
    optional uint64 email = 3;
}
Person person;
person.set_name("yingshin");
person.set_age(21);

The answer is the reflection function of protobuf. In fact, the deserialization process of protobuf itself is realized by using reflection.

2. Use scenarios. I have already mentioned the mutual conversion between pb and json formats. In fact, many underlying libraries of pb2json are realized by using the reflection ability of protobuf. In addition, the conversion from pb to xml, pb writing directly to the database (such as writing different fields to different columns of hbase), pb-based automated testing tools, etc. are all application scenarios of pb reflection mechanism. In a word: Reflection is just a mechanism, and what kind of application scenarios you have depends on your imagination.

3. The key points of reflection implementation.

The key point of reflection implementation is to obtain the meta information of the system.

Original information: the system word description information, which is used to describe the system itself. For example, what classes does the system have? What fields and what methods are there in each class? What type does the field belong to, and what parameters and return value does the method have? ………

For java, the key to its ability to provide reflection is to compile the meta information of the program into the .class file during the compilation phase, and the JVM will load the .class file into the method area of ​​the JVM memory model when the program is running. Afterwards, the program runtime will have the ability to obtain meta information about itself. In addition to java, languages ​​such as JS, python, GO, and PHP also implement program reflection at the language level.

So where is the meta information needed for protobuf reflection? - Actually in the .proto file. The user defines the data structure we need in the .proto file. This process is also the process of providing data metadata for protobuf. That is, the meta-information includes which fields the data consists of, what type the fields belong to, and the combination relationship between the first-level fields, etc.

There are two main uses of reflection:

(1) Create an object by the name of the proto object (json→pb)

#include <iostream>
#include <string>
#include "person.pb.h"

using namespace std;

/*
    Descriptor/FieldDescriptor位于descriptor.h文件;
    Message/Reflection 位于message.h文件
    以上四个类都位于 namespace google::protobuf下.
*/
using namespace google::protobuf;

typedef tencent::Person T;

Message* createMessage(const std::string& typeName);

int main()
{
	//通过Descriptor类的full_name函数获取相应结构的type_name
	std::string type_name = T::descriptor()->full_name();
    cout << "type_name:" << type_name << endl;


    //根据type name创建相应的message对象 new_person
    Message* new_person = createMessage(type_name);
    assert(new_person != NULL);//指针为null向stderr打印一条信息
    assert(typeid(*new_person) == typeid(tencent::Person::default_instance()));
    cout << "new_person:" << new_person->DebugString() << endl;


    //接下来使用DescriptorPool类的FindMessageTypeByName方法通过type_name查到元信息Descriptor*
    const Descriptor* descriptor = google::protobuf::DescriptorPool::generated_pool()->FindMessageTypeByName(type_name);
	cout << "FindMessageTypeByName() = " << descriptor << endl;
	cout << "T::descriptor()         = " << T::descriptor() << endl;
	cout << endl;

	// 再用MessageFactory::generated_factory() 找到 MessageFactory 对象
	const Message* prototype = MessageFactory::generated_factory()->GetPrototype(descriptor);
	cout << "GetPrototype()        = " << prototype << endl;
    cout << "T::default_instance() = " << &T::default_instance() << endl;
    cout << endl;

    //再然后我们实例化出一个实例
    //dynamic_cast:将基类的指针或引用安全第一转换成派生类的指针或引用,并用派生类的指针或引用调用非虚函数。
    T* new_obj = dynamic_cast<T*>(prototype->New());
    cout << "prototype->New() = " << new_obj << endl;
    cout << endl;

    /*--------接下来看看反射接口怎么用--------*/
    //获取这个message的反射接口指针
    const Reflection* reflecter = new_obj->GetReflection();

    //通过name查找filed
    const FieldDescriptor* field = descriptor->FindFieldByName("name");

    //设置这个field的字段值
    std::string str1 = "shuozhuo";
    reflecter->SetString(new_obj, field, str1);

    //取出这个field的值
    std::cout << "\"name\" field is:" << reflecter->GetString(*new_obj,field)<< std::endl;
}


/*
    本函数的作用就是根据type name 自动创建具体的protobuf message对象;
*/
Message* createMessage(const std::string& typeName)
{
  Message* message = NULL;
  const Descriptor* descriptor = DescriptorPool::generated_pool()->FindMessageTypeByName(typeName);
  if (descriptor)
  {
    const Message* prototype = MessageFactory::generated_factory()->GetPrototype(descriptor);
    if (prototype)
    {
      message = prototype->New();
    }
  }
  return message;
}

1) Obtain meta information DescriptorPool. Obtain meta information Descriptor (that is, Descriptor class) through FindMessageTypeByName of DescriptorPool.

Among them, DescriptorPool is a meta-information pool, which provides various interfaces such as FindMessageTypeByName and FindMessageTypeByName to externally query the required original information.

DescriptorDatabase can query the content of the .proto file with the corresponding name from the hard code or disk, and return the meta information required for the query after parsing. DescriptorPool is equivalent to the Descriptor that caches the file (the bottom layer uses Map). When querying, it will first query in the cache. If it cannot be found, it will go further to the DB (DescriptorDatabase) query. At this time, it may be necessary to read the file content from the disk. Then parse it into a Descriptor and return it, which takes a certain amount of time. It is not difficult to see from the above description that DescriptorPool and DescriptorDatabase improve the efficiency of reflection operation through the caching mechanism, but this is only an optimization of reflection engineering implementation. We should be more interested in the source of Descriptor.

It is easy to understand that DescriptorDatabase reads .proto content from disk and parses it into Descriptor, but most of the time we don't use this method, and we don't read .proto files during reflection. So where is our .proto content? In fact, when we use protoc to generate xxx.pb.cc and xxx.pb.h files, it not only contains the interface for reading and writing data, but also contains the contents of the .proto file. Read the content of any xxx.pb.cc, you can see something like the following.

This array stores the .proto content. Of course, this is not simply storing the original text string, but the result of SerializeToString serialization. This result is hard-coded in xxx.pb.cc.

The hardcoded .proto meta information content will be loaded, parsed by DescriptorDatabase in a lazy loading manner (triggered only when called), and cached in DescriptorPool.

2) Other sentences will not be explained in detail.

(2) Initialize and obtain the value of member variables through Message (pb→json)

Just look at the last example.

Reflection-related interfaces (introduction to related classes):

First look at the UML diagram

Note: The type name mentioned below refers to the complete structure name, such as "tencent.Person" above. 

1. MessageLite class

All message interface classes are lite messages from the name, and ordinary messages are also subclasses of it.

MessageLite is a "lightweight" message (only encoding+serialization, no reflection and descriptors). In the scenario where it is determined that "lightweight" message can be used, you can add configuration (option optimize_for = LITE_RUNTIME;) to the .proto file to let the protocol compiler produce a MessageLite type class, which can save runtime resources.

Note: lite Low-calorie, light, lightweight.

2. Message class

Interface class, adding descriptor and reflection on the basis of MessageLite class.

3. MessageFactory class

Interface class to find the MessageFactory object, which can create all protobuf Message types that are linked when the program is compiled. The GetPrototype method provided by it can find the default instance of the specific Message Type. The bottom layer encapsulates the GeneratedMessageFactory class.

4. DescriptorPool class

Use DescriptorPool::generated_pool() to find a DescriptorPool object that contains all the protobuf Message types that were linked when the program was compiled. Then through the FindMessageTypeByName method provided by it, the Descriptor can be found according to the type name.

5、GeneratedMessageFactory类

Inherited from MessageFactory, singleton mode.

6. Descriptor class

Describes the meta information of a message (note: not a separate message object). The constructor is a private type and must be constructed through DescriptorPool (friend class).

The const members are as follows:

const FileDescriptor* file_: 描述message所在的.proto文件信息
const Descriptor* containing_type_:如果在proto定义中,这个message是被其它message所包含,那么这个字段是上一级message的descriptor*;如果没有被包含,那么是NULL
const MessageOptions* options_: 定义在descriptor.proto,从注释看是用来和老版本proto1中MessageSet做拓展,可以先不去关注涉及extension的部分。

The non-const members are as follows:

int field_count_:当前field包含的field的个数
FieldDescriptor* fields_: 以连续数组方式保存的所有的fieds
int nested_type_count_: 嵌套类型数量
Descriptor* nested_types_: message中嵌套message
int enum_type_count_: 内部enum的个数
EnumDescriptor* enum_types_: enum类型的连续内存起始地址

7. FileDescriptor class

Describe the entire .proto file information, which includes:

1、依赖.proto文件信息:
int dependency_count_;
const FileDescriptor** dependencies_;

2、当前.proto文件包含的message信息:
int message_type_count_;
Descriptor* message_types_;

3、当前.proto文件包含的所有symbol(各种discriprot)的tables:
const FileDescriptorTables* tables_;

8. FileDescriptor class

Describe a single field, the constructor is private, and must also be constructed by DescriptorPool (friend class). Obtained by the descriptor function (Descriptor::FindFieldByName()) of the message containing this field.

9. Enum Descriptor class

Describes the enum type defined in the .proto file.

Application example: pb↔json mutual conversion

 The file command is pb2json.cpp, and the code is in the personal cloud server /mystudy/protobuf2json

Example program can be obtained from here ! ! ! !

The execution effect is as follows:

[root@VM_50_94_centos /mystudy/protobuf2/pb2json]# ./pb2json 
-------------------------------pbbuf二进制流反序列化---------------------------
body1 is:riXXXXXt {
  exxxs {
    text {
      str: "\346\226\260\345\271\264\345\260\206\350\207\263"
    }
  }
  exxxs {
    face {
      index: 69
      old: "\024h"
    }
  }
  exxxs {
    text {
      str: "\357\274\214\347\272\242\345\214\205\345\257\271\350\201\224\345\244\247\347\244\274\345\214\205\347\201\253\347\203\255\345\256\232\345\210\266\344\270\255"
    }
  }
  exxxs {
    face {
      index: 144
      old: "\024\321"
    }
  }
  exxxs {
    text {
      str: "\357\274\201\357\274\201\357\274\201"
    }
  }
  elems {
    crm_elem {
      crm_buf: "\010\010\202\001.\010"
    }
  }
}

-------------------------------将pb对象 → json-------------------------------
my_json{
   "rixxxxxt" : {
      "elxxxs" : [
         {
            "texxt" : {
               "str" : "新年将至"
            }
         },
         {
            "face" : {
               "index" : 69,
               "old" : "\u0014h"
            }
         },
         {
            "text" : {
               "str" : ",红包对联大礼包火热定制中"
            }
         },
         {
            "face" : {
               "index" : 144,
               "old" : "\u0014�"
            }
         },
         {
            "text" : {
               "str" : "!!!"
            }
         },
         {
            "crm_elem" : {
               "crm_buf" : "\b\b�\u0001.\b"
            }
         }
      ]
   }
}

----------------通过Descriptor类的full_name函数获取相应结构的type_name-----------
type_name:tencent.im.msg.MsgBody
---------------------根据type_name创建一个default instance---------------------
------------------------根据json填充这个default instance------------------------
my_msgbody is:rxxxxt {
  exxxxs {
    text {
      str: "\346\226\260\345\271\264\345\260\206\350\207\263"
    }
  }
  exxxxs {
    face {
      index: 69
      old: "\024h"
    }
  }
  exxxxs {
    txxxt {
      stxxxr: "\357\274\214\347\272\242\345\214\205\345\257\271\350\201\224\345\244\247\347\244\274\345\214\205\347\201\253\347\203\255\345\256\232\345\210\266\344\270\255"
    }
  }
  elems {
    face {
      index: 144
      old: "\024\321"
    }
  }
  exxxs {
    text {
      str: "\357\274\201\357\274\201\357\274\201"
    }
  }
  exxxxs {
    crxxxxm {
      cxxxxxuf: "\010\010\202\001.\010"
    }
  }
}

See:  GitHub - HaustWang/pb2json: interconversion between protobuf message and json, using C++11 features

In-depth ProtoBuf - Reflection Principle Analysis - Short Book

Guess you like

Origin blog.csdn.net/mijichui2153/article/details/111665192#comments_26561672