C++ Protobuf learning and use

Introduction

Protocol buffers are a language-independent, platform-independent, and extensible way to serialize structured data , which can be used for (data) communication protocols , data storage , and more. The information transmitted during communication is packaged through the message data structure defined by Protobuf , and then compiled into a binary code stream for transmission or storage.

Protocol buffers is a flexible, efficient, and automated structured data serialization method - comparable to XML, but smaller (3 to 10 times), faster (20 to 100 times), and simpler than XML.

When using Protobuf, you must write an IDL (Interface description language) file, and define the data structure in it . Only the pre-defined data structure can be serialized and deserialized . Among them, serialization is to convert objects into binary data, and deserialization is to convert binary data into objects.

tutorial


This tutorial provides a basic C++ programmer's introduction to using Protocol Buffers. By creating a simple example application, it shows you how to

  • Define the message format in the .proto file.
  • Use the protocol buffer compiler.
  • Write and read messages using the C++ Protocol Buffers API.

This is not a comprehensive guide to using Protocol Buffers in C++. For more detailed reference information, see Protocol Buffer Language Guide (proto2), Protocol Buffer Language Guide (proto3), C++ API Reference, C++ Generated Code Guide, and Encoding Reference.


The example we'll be using is a very simple "address book" application that reads and writes people's contact details to a file. Each person has a name, ID, email address and contact phone number in the address book.

How to serialize and retrieve structured data like this ? There are several ways to solve this problem:

  • Data structures in raw memory can be sent/saved in binary form . This is a brittle approach, since the receiving/reading code must be compiled with the exact same memory layout, endianness, etc. Furthermore, as files accumulate data in their original format, and copies of software travel over the wire in this format, it is difficult to extend the format.
  • You could invent an ad-hoc way to encode the data item as a single string - for example 4 ints as "12:3:-23:67". This is a simple and flexible approach, although it does require writing one-time encoding and parsing code, and parsing imposes a small runtime cost. This is best used to encode very simple data.
  • Serialize data to XML. This approach is very attractive because XML is (sort of) human readable and has many languages. If you want to share data with other people, this is a good application/project. However, XML is notoriously space intensive, and encoding/decoding it can impose a huge performance penalty on the application. Also, navigating an XMLDOM tree is more often the case than navigating simple fields within a class.

You can use Protobuf instead for these options. Protocol Buffers is the flexible, efficient, and automated solution to precisely address this problem. With Protobuf, you can write .protoa description of the data structure you wish to store. The Protobuf compiler thus creates a class that implements the automatic encoding of Protobuf data and parsing the efficient binary format. The generated classes are the fields that make up the protocol buffer and are responsible for reading the protocol buffer as a unit. Importantly, the Protocol Buffers format supports the idea of ​​extending the format over time in such a way that code can still read data encoded in the old format.

Example code is included in the "examples" directory of the source code package  .

Defining Your Protocol Format

To create an address book application , you need .protoa start file. .protoThe definition in the file is simple: you add the message for each data structure you want to serialize , and then give it the name and type of each field in the message. .protoBelow is the file  defining the message , addressbook.proto.

syntax = "proto2";

package tutorial;

message Person {
  optional string name = 1;
  optional int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    optional string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phones = 4;
}

message AddressBook {
  repeated Person people = 1;
}


As you can see, its syntax is similar to C++ or Java. Let's go through each part to see what it does.


.protoFiles start with a package declaration , which helps prevent naming conflicts between different projects . In C++, the generated classes will be placed in a namespace matching the package name.

Next, you have your message definitions. Messages contain a set of typed fields . There are many standard simple data types available as field types, including bool, int32, float, doubleandstring . You can also add further structure field types to messages by using other message types - in the example above, Personmessage contains message contains message . You can even define message types that are nested inside other messages - as you can see,  type definitions are inside . You can also define the type if you want one of your fields to have one of the predefined list of values, here you want to specify the phone number can be one of the following phone types: , or .PhoneNumber AddressBookPersonPhoneNumberPersonenumMOBILEHOMEWORK

The "=1", "=2" flags on each element identify the field to use in binary encoding . Field numbers 1-15 require one less byte to encode than higher numbers, so as an optimization you may decide to use these commonly used or repeated element numbers, leaving field numbers 16 and 18. Higher for less commonly used optional elements. Each element field in needs to recode the field number, so repeated fields especially this is a good way to optimize.

Each field must be annotated with one of the following modifiers:

  • optional: The field can be set or not set . If it is an optional field value, use the default value. typeFor simple types, you can specify your own default value, as we did for the phone number in our example . Otherwise, the system defaults are used: zero for numeric types, string for empty strings, and false for booleans. For embedded messages, the default value is always the "default instance" or "prototype" of the message, which has no fields set. Calling an accessor for an optional (or required) field that is not always explicitly set returns the default value for that field.
  • repeated: Field can be repeated any number of times (including zero). The order of repeated values ​​will be preserved in protocol buffers. Treat repeated fields as dynamically sized arrays .
  • required: A value for the field must be provided , otherwise the message will be considered "uninitialized". If libprotobufcompiled in debug mode, serializing uninitialized messages will cause an assertion to fail. In an optimized build, the check will be skipped and the message will be written anyway. However, parsing an uninitialized message will always fail (by returning from the parse method false). Other than that, required fields behave exactly like optional fields.

Important

 You should be very careful about marking fields as  Required Is Foreverrequired . If at some point if you wish to stop writing or sending a required field, change the field to optional - old readers will consider the message and the field is incomplete , and may reject or discard them inadvertently. You should consider buffers for you. fields are strongly disfavored within Google  required ; most messages defined in proto2 syntax use  optional and and  repeated only. (Proto3 does not support only (Proto3 does not support  required fields at all.) all field of).

You'll find a complete guide to writing .protofiles - including all possible field types - in  the Protocol Buffers Language Guide . Don't go looking for something like class inheritance - protocol buffers won't do it.

Compiling Your Protocol Buffers Compiling Protocol Buffers

Now that you have that .proto, the next thing you need to do is generate the classes you need to read and write AddressBook(hence Personand  PhoneNumber) messages. protocTo do this, you need to run on the protocol buffer compiler .proto:

  1. If you don't have the compiler installed,  download the package and follow the instructions in the README.

  2. Now run the compiler, specifying the source directory (where your application's source code still exists - if you don't provide a value), the target directory (where you want the generated code to go; usually the same as), and the path to your .in $SRC_DIRthis  .protocase ,you……:

protoc -I=$SRC_DIR --cpp_out=$DST_DIR $SRC_DIR/addressbook.proto

Because the C++ classes are required, --cpp_outthe option -similar is provided for other supported languages.

This will generate the following files in the specified target directory:

    • addressbook.pb.h, the header which declares your generated classes.
    • addressbook.pb.cc, which contains the implementation of your classes.

The Protocol Buffer API Protocol Buffer API

Let's take a look at the generated code to see what classes and functions the compiler created for you. If you look at addressbook.pb.h, you can see that there is a class for each message specified in addressbook.proto. Looking closely at the Person class, you can see that the compiler generated accessors for each field. For example, for the name, id, email, and phones fields, the following methods can be used:

// name
  inline bool has_name() const;
  inline void clear_name();
  inline const ::std::string& name() const;
  inline void set_name(const ::std::string& value);
  inline void set_name(const char* value);
  inline ::std::string* mutable_name();

  // id
  inline bool has_id() const;
  inline void clear_id();
  inline int32_t id() const;
  inline void set_id(int32_t value);

  // email
  inline bool has_email() const;
  inline void clear_email();
  inline const ::std::string& email() const;
  inline void set_email(const ::std::string& value);
  inline void set_email(const char* value);
  inline ::std::string* mutable_email();

  // phones
  inline int phones_size() const;
  inline void clear_phones();
  inline const ::google::protobuf::RepeatedPtrField< ::tutorial::Person_PhoneNumber >& phones() const;
  inline ::google::protobuf::RepeatedPtrField< ::tutorial::Person_PhoneNumber >* mutable_phones();
  inline const ::tutorial::Person_PhoneNumber& phones(int index) const;
  inline ::tutorial::Person_PhoneNumber* mutable_phones(int index);
  inline ::tutorial::Person_PhoneNumber* add_phones();

As you can see, the getter has the exact same name as the field in lowercase, and the setter method starts with set_. There is also has_method singular (required or optional) field, if the field is prepared. Finally, each field has a clear_method that restores the field to an empty state.

While number idfields only have the basic set of accessors described above,  nameand emailfields have two additional methods as they do strings - a getter that lets you mutable_ get a pointer to a string directly, and a setter mutable_email()setting; it will be automatically initialized to the empty string. If you had a repeated message field in this example, it would also have a email method instead of mutable_a method.

There are also some special methods for repeated fields - if you look at repeated phonesfields you'll see

  • Check for duplicate fields _size(in other words, how many phone numbers  Person2#2 has).
  • Gets the specified phone number using an index.
  • Updates an existing phone number at the specified index.
  • Add another phone number to the message, which you can then edit (there is one for the repeating scalar type add_, it just lets you pass in the new value).

See the C++ Generated Code Reference for details on which members the protocol compiler generates for any specific field definitions  .

Enums and nested classes

The generated code includes an enum enum PhoneTypethat corresponds to yours . .protoYou can call this type Person::PhoneType, and its values ​​are  Person::MOBILE, , Person::HOMEand Person::WORK(the implementation details are a little more complicated, but you don't need to know them to use enums).

The compiler also generates for you a class called  Person::PhoneNumber. If you look at the code, you can see that the "real" class is actually called Person_PhoneNumber, but internally defines a typedef  Personthat allows you to treat it as a nested class. The only case where this would make a difference is if you want it in another file - you can't forward-declare nested types in C++, but you can forward-declare  Person_PhoneNumber.

standard message method

Each message class also contains a number of other methods that allow you to inspect or manipulate the entire message, including:

  • bool IsInitialized() const;: Check that all required fields are entered and ready.
  • string DebugString() const;: Returns a human-readable message, especially useful for debugging.
  • void CopyFrom(const Person& from);: Use the given message value.
  • void Clear();: Clears all elements to an empty state.

These methods and the I/O methods described in the next section implement  Messagean interface shared by all C++ protocol buffer classes. See the full API documentation for Message for more information  .

parsing and serialization

Finally, each Protocol Buffers class has methods for writing and reading messages in the  binary format of the selected type . These include:

  • bool SerializeToString(string* output) const;: Serializes the message and stores the bytes in the given string. Note that bytes are binary, not text; we just use stringclasses as convenient containers.
  • bool ParseFromString(const string& data);: from the given string
  • bool SerializeToOstream(ostream* output) const;: write message to C++ #2
  • bool ParseFromIstream(istream* input);: from the given C++  istream.

These are just a few of the options provided for parsing and serialization. Again, see  the Message API Reference for  a complete list.

write a message

Now let's try using the protocol buffer classes. The first thing the address book application can do is write personal details to your address book file. To do this, you need to create and populate protocol buffer classes, then write them to the output stream.

Below is a program that reads from a file AddressBook, adds a new  Personbased on user input writes the new AddressBookback to it, and writes the new undefined back to the file again. Direct calls or references are highlighted by the protocol compiler.

#include <iostream>
#include <fstream>
#include <string>
#include "addressbook.pb.h"
using namespace std;

// This function fills in a Person message based on user input.
void PromptForAddress(tutorial::Person* person) {
  cout << "Enter person ID number: ";
  int id;
  cin >> id;
  person->set_id(id);
  cin.ignore(256, '\n');

  cout << "Enter name: ";
  getline(cin, *person->mutable_name());

  cout << "Enter email address (blank for none): ";
  string email;
  getline(cin, email);
  if (!email.empty()) {
    person->set_email(email);
  }

  while (true) {
    cout << "Enter a phone number (or leave blank to finish): ";
    string number;
    getline(cin, number);
    if (number.empty()) {
      break;
    }

    tutorial::Person::PhoneNumber* phone_number = person->add_phones();
    phone_number->set_number(number);

    cout << "Is this a mobile, home, or work phone? ";
    string type;
    getline(cin, type);
    if (type == "mobile") {
      phone_number->set_type(tutorial::Person::MOBILE);
    } else if (type == "home") {
      phone_number->set_type(tutorial::Person::HOME);
    } else if (type == "work") {
      phone_number->set_type(tutorial::Person::WORK);
    } else {
      cout << "Unknown phone type.  Using default." << endl;
    }
  }
}

// Main function:  Reads the entire address book from a file,
//   adds one person based on user input, then writes it back out to the same
//   file.
int main(int argc, char* argv[]) {
  // Verify that the version of the library that we linked against is
  // compatible with the version of the headers we compiled against.
  GOOGLE_PROTOBUF_VERIFY_VERSION;

  if (argc != 2) {
    cerr << "Usage:  " << argv[0] << " ADDRESS_BOOK_FILE" << endl;
    return -1;
  }

  tutorial::AddressBook address_book;

  {
    // Read the existing address book.
    fstream input(argv[1], ios::in | ios::binary);
    if (!input) {
      cout << argv[1] << ": File not found.  Creating a new file." << endl;
    } else if (!address_book.ParseFromIstream(&input)) {
      cerr << "Failed to parse address book." << endl;
      return -1;
    }
  }

  // Add an address.
  PromptForAddress(address_book.add_people());

  {
    // Write the new address book back to disk.
    fstream output(argv[1], ios::out | ios::trunc | ios::binary);
    if (!address_book.SerializeToOstream(&output)) {
      cerr << "Failed to write address book." << endl;
      return -1;
    }
  }

  // Optional:  Delete all global objects allocated by libprotobuf.
  google::protobuf::ShutdownProtobufLibrary();

  return 0;
}

Watch out for GOOGLE_PROTOBUF_VERIFY_VERSIONmacros. It's good practice - though not strictly necessary - to execute this macrobuffer library before using the C++ protocol. It verifies that you haven't accidentally linked against a version of the library that is incompatible with the version of the headers you compiled. If a version mismatch is detected, the program will abort. Notes .pb.ccThis macro is called automatically by each file at startup.

Note also ShutdownProtobufLibrary()the call to at the end of the program. All this does is delete any global object allocated by the protocol buffer library. This is unnecessary for most programs, since the process is meant to exit anyway and the OS will take care of reclaiming all its memory. However, if required by the memory leak checker used or if you are writing a library that can be loaded and unloaded multiple times, you may want to force Protocol Buffers to clean up everything

Define the message type

First let's look at a very simple example. Suppose you want to define a request message format where each search request has a query string, the specific pages of results you're interested in, and the number of results for each page. Below is the file you use to define the message types .proto.

syntax = "proto3";

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 results_per_page = 3;
}
  • The first line of the file specifies which syntax you're usingproto3 : if you don't, the protocol buffer compiler will assume you're using  prototype 2. This must be the first non-empty, non-comment line of the file.
  • SearchRequestA message definition specifies three fields (name/value pairs) , one for messages of this type to be included. Each field has a name and a type.

Specify field type

In the preceding example, all fields are scalar types : two integers ( page_numberand results_per_page) and a string ( query). It is also possible to specify fields for enumerated and composite types, such as other message types.

Guess you like

Origin blog.csdn.net/qq_44632658/article/details/130978314