ProtoBuf official documentation (a) - Developer Guide


Check relatively good translation of articles and information outside the network data encountered in the process, first as a technical reference for future reference, and second, the training of English.

This article translated from Protocol Buffers part of the official documentation Developer Guide

Translated to paraphrase, not scripted control of every word translation
following is the original content translation

Developer's Guide

Welcome to the developer documentation protocol buffers, independent of protocol buffers is a language, platform-independent, scalable data structure sequence method which can be used for the communication protocol, data storage.

This document is intended for those Java, C ++ or Python developers who want to use protocol buffers in their applications. This Introduction (Guide) describes the protocol buffers and tell you to take the first step needed to work - then you can learn specific development tutorial or more in-depth study protocol buffer encoding rules . At the same time we all three languages provide the appropriate API reference documentation , as well as grammar and writing style guide .proto file.

What protocol buffers that?

protocol buffers are a flexible, efficient, automated method of structural data serialization mechanism - can be compared to XML, but smaller than XML, faster, easier. You can define the data structure, then the generated source code using a special easily be written in various languages ​​and reading various data structures in the data stream. You can even update the data structure without breaking compiled based on old data and structure from the deployed program.

How does it work?

You can define the protocol buffer message types .proto file to specify how you want the serialization of structured information. Each protocol buffer message is a small logical record information, comprising a series of name-value pairs. There is a very basic .proto sample file that defines the message contains a "person" with related information:

message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;

  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];

  repeated PhoneNumber phone = 4;

As you can see, message format is simple - each message has one or more type having the unique number of fields, each field has a name and a value type, which may be a digital value type (integer or floating point) , boolean, string, the original byte, and even (as illustrated) other protocol buffer message type, which means that allows you to build hierarchical data. You can specify optional fields, required fields, and repeated fields. You can Protocol Buffer Language Guide found on writing in .protomore information about the file.

Translator's Note:
proto3 discarded required fields, optional fields can not be displayed using (by default because the default is set to optional)

Once you have defined messages, you can run the protocol buffer compiler on .proto file to generate data access classes specified language. These classes provide a simple access to each field (such as name () and set_name ()), and the entire structure into a sequence of raw bytes and bytes of the original analytical methods - for example, if you select the language is C ++ example, the above operation compiler will generate a class called Person. Then, you can use this class to fill in the application, serialization and retrieval Person of messages. So you can write some code like this:

Person person;
person.set_name("John Doe");
person.set_email("[email protected]");
fstream output("myfile", ios::out | ios::binary);

After that, you can re-read your message parsing

fstream input("myfile", ios::in | ios::binary);
Person person;
cout << "Name: " << << endl;
cout << "E-mail: " << << endl;

You can add a new field in the message format, without breaking backward compatibility; the old binary file when parsing just ignore the new field. So, if you use a communication protocol buffers as its protocol data format, the protocol can be extended without fear of breaking existing code.
You can API Reference section to find complete reference using the generated protocol buffer code, you can protocol buffers coded find more information on how to code in protocol buffer messages.

Why not use XML?

For serializing structured data, protocol buffers more advantages than XML. Protocol buffers:

  • simpler
  • 3 to 10 times smaller
  • 20 to 100 times faster
  • More clarity
  • Automatically generating data easier access classes used to programmatically

For example, suppose you want to model a person with a name and e-mail. In XML, you need:

    <name>John Doe</name>
    <email>[email protected]</email>

And the corresponding protocol buffer Message (see protocol buffer text format ) is:

# Textual representation of a protocol buffer.
# This is *not* the binary format used on the wire.
person {
  name: "John Doe"
  email: "[email protected]"

When this message is encoded as a protocol buffer in binary format when (the above text merely represents a human-readable form in order to facilitate debugging and editing), it may be 28 bytes long and takes about 100-200 nanoseconds resolution. If you remove the spaces, XML version is at least 69 bytes and requires approximately 5,000 to 10,000 nanoseconds to resolve.
In addition, compared to XML, easier operation protocol buffer:

cout << "Name: " << << endl;
cout << "E-mail: " << << endl;

The use of XML, you must do the following:

cout << "Name: "
     << person.getElementsByTagName("name")->item(0)->innerText()
     << endl;
cout << "E-mail: "
     << person.getElementsByTagName("email")->item(0)->innerText()
     << endl;

However, protocol buffers is not always a better solution than XML - eg, protocol buffers instead of using markers (eg HTML) for a good way to document text-based modeling, because you can not easily interleave structure with text. In addition, XML is a human-readable and human-editable; protocol buffers, at least in their native format, do not have such features. XML is to some extent self-described. When only those with message definitions (.proto file), protocol buffer makes sense.

Introduction proto3

Our latest Version 3 Release , it introduces new language versions - Protocol Buffers language version 3 (also known as proto3), and added an existing language version (also known as proto2) some of the new features. Proto3 simplifies Protocol Buffers language, both easy to use and can be used in a wider range of programming languages: This version allows you to use Java, C ++, Python, Java Lite, Ruby, JavaScript, Objective-C and C # generation protocol buffer code. In addition, you can use the latest Go protoc plugin generates proto3 code Go, the plug-in is available from github repository golang / protobuf get. More languages are in the pipeline.

Please note that the two language versions of the API is not fully compatible. In order to avoid inconvenience to existing customers, we will continue to support the previous versions of the language in the new version of the protocol buffers.

You can release notes to see major differences with the current version of the default and Proto3 grammar guidelines understanding proto3 syntax). Complete documentation proto3 coming soon!

(If the name proto2 and proto3 looks a bit confusing, it is because when we first open source protocol buffers, it is actually Google's second language version - also known as proto2 This is why our open source version from v2 .0.0 start).

A little bit of history

Protocol buffers originally developed at Google for processing the index server request / response protocol. Prior protocol buffer, there is a format for request and response, it is manually marshalling / unmarshalling, and supports a number of versions of the protocol. This has led to some very ugly code, for example:

 if (version == 3) {
 } else if (version > 4) {
   if (version == 5) {

The agreement also specifically formatted version of the introduction of the new agreement is complicated, because the developer must ensure that all servers between the initiator and the actual server processes the request requests can understand the new agreement before the switch to start using the new protocol.

Protocol Buffers to address these problems:

  • Can easily introduce new field, intermediate server does not need to check the data, you can simply parse it and pass data without having to know all of the fields.
  • More self-descriptive format, you can use a variety of language processing (C ++, Java, etc.)

However, users still need to resolve their own hand-written code.

With the development of the system, it won many other features and uses:

  • Sequences generated automatically and deserialization code to avoid the need for manual resolution.
  • In addition to short-term RPC (Remote Procedure Call) addition request, it is also started using protocol buffers as a convenient, self-describing format for storing persistent data (e.g., in the in Bigtable).
  • Server RPC interface starts to be declared as part of the agreement document, protocol compiler to generate the stub class, the user can use the actual implementation of the server interfaces to cover these classes.

Protocol buffers the data Google is now common language - at the time of this writing, Google has 12,183 tree .proto file, which has defined 48,162 kinds of different message types. They can for RPC system, persistent data storage can also be used for a variety of storage system.

Guess you like