protobuf idl

The IDLs of protobuf are all saved as *.proto files. The data types in proto files can be divided into two categories: composite data types and standard data types. Composite data types include: enumeration and message types, standard data types include: integer, floating point, string, etc.

Define a message type (message)

The most commonly used data format is message. Suppose you want to define a "search request" message format. Each request contains a query string, the number of pages where the query results you are interested in are located, and the number of query results per page. The .proto file of the message type can be defined in the following way:

message SearchRequest {

  required string query = 1;

  optional int32 page_number = 2;

  optional int32 result_per_page = 3;

}

The SearchRequest message format has three fields, and the data carried in the message corresponds to each field. Each of these fields has a name and a type.

Ø Specify the field type

In the above example, all fields are scalar types: two integer types (page_number and result_per_page), and one string type (query). Of course, you can also specify other composite types for fields, including enumerations or other message types.

Ø Assign identification number

As with the file format above, in the message definition, each field has a unique identifier. These identifiers are used to identify individual fields in the binary format of the message and cannot be changed once used. Note: The identification number within [1,15] will occupy one byte during encoding. The identification number within [16,2047] occupies 2 bytes. Therefore, identification numbers within [1, 15] should be reserved for those message elements that appear frequently. IMPORTANT: Reserve some identifiers for frequently occurring identifiers that may be added in the future.

The smallest identification number can start at 1 and go as high as 229 - 1, or 536,870,911. The identification numbers of [19000-19999] cannot be used, and these are reserved in the implementation of the Protobuf protocol. If you must use these reserved identification numbers in the .proto file, an alarm will be raised during compilation.

Ø Specify field rules

The specified message field modifier must be one of the following:

² required: A well-formed message must contain one of these fields. Indicates that the value must be set;

² optional: This field can have 0 or 1 value (not more than 1) in the message format.

² repeated: In a well-formed message, this field can be repeated any number of times (including 0). The order of duplicate values is preserved. Indicates that the value can be repeated, equivalent to List in java.

For some historical reasons, repeated fields of primitive numeric types are not encoded as efficiently as possible. In new code, users should use the special option [packed=true] to guarantee more efficient encoding. Such as:

repeated int32 samples = 4 [packed=true];

required is permanent: special care should be taken when marking a field as required. If you don't want to write or send a required field in some cases, changing the original field modifier to optional may run into problems - consumers of older versions will consider messages without this field to be incomplete, As a result, parsing may be rejected without purpose. In this case, you should consider writing application-specific, custom message validation functions. Some engineers at Google have come to the conclusion that using required does more harm than good; they prefer to use optional and repeated rather than required. Of course, this view is not universal.

Ø Add more message types

Multiple message types can be defined in a .proto file. This is especially useful when defining multiple related messages - for example, if you want to define a reply message format corresponding to the SearchResponse message type, you can add it to the same .proto file, like:

message SearchRequest {

  required string query = 1;

  optional int32 page_number = 2;

  optional int32 result_per_page = 3;

}

 

message SearchResponse {

 …

}

Ø Add notes

To add comments to .proto files, you can use the C/C++/java-style double slash (//) syntax, such as:

message SearchRequest {

  required string query = 1;

  optional int32 page_number = 2;// 最终返回的页数

  optional int32 result_per_page = 3;// 每页返回的结果数

}

Ø What is generated from the .proto file?

When running a .proto file with the protocolbuffer compiler, the compiler will generate code in the language of choice that can manipulate the message types defined in the .proto file, including getting, setting field values, and serializing the message to an output stream, and parsing messages from an input stream.

For C++, the compiler generates a .h file and a .cc file for each .proto file, and each message in the .proto file has a corresponding class.
For Java, the compiler generates a .java file for each message type, as well as a special Builder class (which is used to create the message class interface).
For Python, it's a bit different - the Python compiler generates a module with a static descriptor for each message type in the .proto file, which is combined with a metaclass at runtime. Used to create the required Python data access classes.

l Scalar numeric type

A scalar message field can contain one of the following types - this table shows the types defined in the .proto file and the corresponding types defined in the auto-generated access class:

.proto type	Java type	C++ types	Remark
double	double	double
float	float	float
int32	int	int32	Use variable-length encoding. Not efficient enough to encode negative numbers - use sint32 if your field may contain negative numbers.
int64	long	int64	Use variable-length encoding. Not efficient enough to encode negative numbers - use sint64 if your field may contain negative numbers.
uint32	int[1]	uint32	Uses variable-length encoding.
uint64	long[1]	uint64	Uses variable-length encoding.
sint32	int	int32	Use variable-length encoding. Signed integer value. More efficient than usual int32 when encoding.
sint64	long	int64	Use variable-length encoding. Signed integer value. More efficient than usual int64 when encoding.
fixed32	int[1]	uint32	Always 4 bytes. This type is more efficient than uint32 if the value is always greater than 228.
fixed64	long[1]	uint64	Always 8 bytes. This type is more efficient than uint64 if the value is always greater than 256.
sfixed32	int	int32	Always 4 bytes.
sfixed64	long	int64	Always 8 bytes.
bool	boolean	bool
string	String	string	A string must be UTF-8 encoded or 7-bit ASCII encoded text.
bytes	ByteString	string	May contain byte data in any order.

l Optional fields and default values

As mentioned above, an element in the message description can be marked as "optional". A well-formed message can contain zero or one optional element. When parsing a message, if it does not contain an optional element value, the corresponding field in the parsed object is set to the default value. Default values can be specified in the message description file. For example, to specify a default value of 10 for the result_per_page field of a SearchRequest message, define the message format as follows:

optional int32 result_per_page = 3 [default = 10];

If no default value is specified for an optional element, the type-specific default value is used: for string, the default value is the empty string. For bool, the default value is false. For numeric types, the default value is 0. For enumerations, the default value is the first value in the enumeration type definition.

l enumeration

When you need to define a message type, you may want to specify a value in a "predefined sequence of values" for a field. For example, suppose you want to add a corpus field to each SearchRequest message, and the value of corpus may be one of UNIVERSAL, WEB, IMAGES, LOCAL, NEWS, PRODUCTS or VIDEO. In fact, this can be easily achieved: by adding an enumeration (enum) to the message definition. A field of type enum can only have one value from the specified set of constants as its value (if you try to specify a different value, the parser will treat it as an unknown field). In the following example, an enum type called Corpus - which contains all possible values - and a field of type Corpus are added to the message format:

message SearchRequest {

  required string query = 1;

  optional int32 page_number = 2;

  optional int32 result_per_page = 3 [default = 10];

  enum Corpus {

    UNIVERSAL = 0;

    WEB = 1;

    IMAGES = 2;

    LOCAL = 3;

    NEWS = 4;

    PRODUCTS = 5;

    VIDEO = 6;

  }

  optional Corpus corpus = 4 [default = UNIVERSAL];

}

Enum constants must be in the range of 32-bit integer values. Because the enum value is encoded in a variable way, it is not efficient for negative numbers, so it is not recommended to use negative numbers in the enum. As shown in the example above, enums can be defined inside or outside a message definition - these enums can be reused in any message definition in the .proto file. Of course it is also possible to declare an enumeration type in one message and use it in a different message - using the syntax of MessageType.EnumType.

When running the protocol buffer compiler against a .proto file that uses an enumeration, the generated code will have a corresponding enum (for Java or C++), or a special EnumDescriptor class (for Python) , which is used to create a series of integer-valued symbolic constants in runtime-generated classes.

l Use other message types

You can use other message types as field types. For example, assuming that each SearchResponse message contains a Result message, you can define a Result message type in the same .proto file, and then specify a Result type field in the SearchResponse message, such as:

message SearchResponse {

  repeated Result result = 1;

}

message Result {

  required string url = 1;

  optional string title = 2;

  repeated string snippets = 3;

}

Ø Import definitions

In the above example, the Result message type and SearchResponse are defined in the same file. What if the message type you want to use is already defined in another .proto file?

You can use them by importing definitions in other .proto files. To import definitions from other .proto files, you need to add an import declaration to your file, like:

import "myproject/other_protos.proto";

The protocol compiler will look for files to be imported in a series of directories specified by the command line parameter -I/--import_path of the protocol compiler. If no arguments are provided, the compiler looks in its calling directory.

l Nested type

You can define and use message types in other message types. In the following example, the Result message is defined in the SearchResponse message, such as:

message SearchResponse {

  message Result {

    required string url = 1;

    optional string title = 2;

    repeated string snippets = 3;

  }

  repeated Result result = 1;

}

If you want to reuse this message type outside of its parent message type, you need to use it as Parent.Type like:

message SomeOtherMessage {

  optional SearchResponse.Result result = 1;

}

Of course, you can also nest messages to any level, such as:

message Outer {                  // Level 0

  message MiddleAA {  // Level 1

    message Inner {   // Level 2

      required int64 ival = 1;

      optional bool  booly = 2;

    }

  }

  message MiddleBB {  // Level 1

    message Inner {   // Level 2

      required int32 ival = 1;

      optional bool  booly = 2;

    }

  }

}

l Update a message type

If an existing message format can no longer meet the new requirements - for example, to add an extra field to the message - but at the same time the code written in the old version is still available. do not worry! It is very simple to update the message without breaking existing code. Just keep the following rules in mind when updating.

Do not change the numeric identifiers of any existing fields.
Any fields added must be optional or repeated. This means that any message serialized by code using the "old" message format can be parsed by the new code, since they don't drop any required elements. Reasonable defaults should be set for these elements so that new code can properly interact with messages generated by older code. Similarly, messages created by new code can also be parsed by old code: old binaries simply ignore the new fields when parsing them. However, unknown fields are not discarded. After that, if the message is serialized, the unknown fields are serialized along with it - so if the message gets to the new code, the new fields are still available. Note: The retention policy for unknown fields is invalid for Python.
Non-required fields can be removed - as long as their identification number is no longer used in the new message type (a better practice might be to rename that field, e.g. prefix the field with "OBSOLETE_", in that case, use The user of the .proto file will not inadvertently reuse identification numbers that should not be used in the future).
A non-required field can be converted to an extension and vice versa - as long as its type and identification number remain the same.
int32, uint32, int64, uint64, and bool are all compatible, which means that one of these types can be converted to the other without breaking forward and backward compatibility. If the parsed number does not match the corresponding type, the result is as if it had been cast in C++ (for example, if a 64-bit number was read as an int32, it would be truncated to 32-bit numbers).
sint32 and sint64 are compatible with each other, but they are not compatible with other integer types.
string and bytes are compatible - as long as bytes is a valid UTF-8 encoding.
Nested messages are compatible with bytes - as long as bytes contain an encoded version of the message.
fixed32 is compatible with sfixed32, fixed64 is compatible with sfixed64.

l Expansion

By extension, a range of field identification numbers can be declared available for use by third-party extensions. Others can then declare new fields for that message type in their own .proto file without having to edit the original file. Look at a specific example:

message Foo {

  // …

  extensions 100 to 199;

}

This example shows that in message Foo, field identification numbers in the range [100,199] are reserved for extension. Now, others can add new fields to Foo in their own .proto files, but add field IDs within the specified range - for example:

extend Foo {

  optional int32 bar = 126;

}

This example shows: message Foo now has an optional int32 field named bar.

When the user's Foo message is encoded, the data is transferred in exactly the same format as the user defines a new field in Foo.

However, accessing an extension field in program code is a little different than accessing a normal field—the generated data access code prepares a special accessor function for the extension to access it. For example, here's how to set the value of bar in C++:


Foo foo;
foo.SetExtension(bar, 15);

Similarly, the Foo class also defines the template functions HasExtension(), ClearExtension(), GetExtension(), MutableExtension(), and AddExtension(). The semantics of these functions are consistent with the corresponding ordinary field access functions. For more information on using extensions, please refer to the code generation guide for that language. Note: Extensions can be of any field type, including message types.

l Nested extensions

An extension can be declared in the scope of another type, like:

message Baz {

  extend Foo {

    optional int32 bar = 126;

  }

  …

}

In this example, the C++ code to access this extension is as follows:

Foo foo;

foo.SetExtension(Baz::bar, 15);

A common design pattern is to define the extension within the scope of the extension's field type - for example, here is an extension of Foo (which is of type Baz), where the extension is defined as part of Baz:

message Baz {

  extend Foo {

    optional Baz foo_ext = 127;

  }

  …

}

However, there is no mandatory requirement for extensions to a message type to be defined in that message. It is also possible to do this:

message Baz {

  …

}



extend Foo {

  optional Baz foo_baz_ext = 127;

}

In fact, this syntax format is more resistant to confusion. As mentioned above, nested syntax is often mistaken for subclassing relationships - especially for users who are not yet familiar with extensions.

Ø Select expandable symbols

In the same message type, it must be ensured that two users will not expand and add the same identification number, otherwise it may lead to data inconsistency. This can be prevented by defining an extensible identification number rule for new items.

If a large number of identification numbers are required, the range of the extensible symbol can be extended to max, where max is 229 - 1, or 536,870,911. As follows:

message Foo {

  extensions 1000 to max;

}

In general, when choosing symbols, the rules for generating identification numbers should avoid numbers between [19000-19999], because these have been reserved by the Protocol Buffers implementation.

l Package

Of course, an optional package declarator can be added to the .proto file to prevent naming conflicts for different message types. Such as:

package foo.bar;

message Open { ... }

In other message format definitions, the package name + message name can be used to define the type of the field, such as:

message Foo {

  ...

  required foo.bar.Open open = 1;

  ...

}

The declarator of a package affects the generated code depending on the language in which it is used. For C++, the generated class will be packaged in the C++ namespace, such as Open in the above example will be packaged in the foo::bar space; for Java, the package declarator will become a package of java, unless in .proto An explicit java_package is provided in the file; for Python, this package declarator is ignored because Python modules are organized according to their location in the file system.

Ø Package and name resolution

The parsing of type names in the protocol buffer language is the same as that of C++: first, the search starts from the innermost part, and then proceeds outwards in turn. Each package will be regarded as an inner class of its parent class package. Of course for (foo.bar.Baz) separated by "." means start from the outermost. The ProtocolBuffer compiler resolves all type names defined in the .proto file. Code generators for different languages will know how to point to each concrete type, even if they use different rules.

l Define service (Service)

If you want to use the message type in the RPC (remote method call) system, you can define an RPC service interface in the .proto file, and the protocol buffer compiler will generate the service interface code and stubs according to the different languages you choose. For example, if you want to define an RPC service and have a method that can receive a SearchRequest and return a SearchResponse, you can define it in the .proto file as follows:

service SearchService {

  rpc Search (SearchRequest) returns (SearchResponse);

}

The protocol compiler will generate an abstract interface SearchService and a corresponding stub implementation. The stub directs all calls to RpcChannel, which is an abstract interface that must be implemented in the RPC system. For example, RpcChannel can be implemented to serialize messages and send them over HTTP to a server. In other words, the generated stub provides a type-safe interface for making protocolbuffer-based RPC calls, rather than restricting you to a specific RPC implementation. The code in C++ looks like this:

using google::protobuf;

protobuf::RpcChannel* channel;
protobuf::RpcController* controller;
SearchService* service;
SearchRequest request;
SearchResponse response;

void DoSearch() {
  // You provide classes MyRpcChannel and MyRpcController, which implement
  // the abstract interfaces protobuf::RpcChannel and protobuf::RpcController.
  channel = new MyRpcChannel("somehost.example.com:1234");
  controller = new MyRpcController;
  

// The protocol compiler generates the SearchService class based on the
  // definition given above.


service = new SearchService::Stub(channel);
  // Set up the request.
  request.set_query("protocol buffers");

  // Execute the RPC.
  service->Search(controller, request, response, protobuf::NewCallback(&Done));
}

void Done() {
  delete service;
  delete channel;
  delete controller;
}

All service classes must implement the Service interface, which provides a way to call a specific method without knowing the method name and its input and output types at compile time. On the server side, it can be used to implement an RPC Server through service registration.

using google::protobuf;

class ExampleSearchService : public SearchService {
 public:
  void Search(protobuf::RpcController* controller,
              const SearchRequest* request,
              SearchResponse* response,
              protobuf::Closure* done) {
    if (request->query() == "google") {
      response->add_result()->set_url("http://www.google.com");
    } else if (request->query() == "protocol buffers") {
      response->add_result()->set_url("http://protobuf.googlecode.com");
    }
    done->Run();
  }
};

int main() {
  // You provide class MyRpcServer.  It does not have to implement any
  // particular interface; this is just an example.
  MyRpcServer server;

  protobuf::Service* service = new ExampleSearchService;
  server.ExportOnPort(1234, service);
  server.Run();

  delete service;
  return 0;
}

l Generate access class

You can generate Java, Python, and C++ code through the defined .proto file. You need to run the protocol buffer compiler protoc based on the .proto file. The command to run looks like this:

protoc --proto_path=IMPORT_PATH --cpp_out=DST_DIR --java_out=DST_DIR --python_out=DST_DIR path/to/file.proto

· IMPORT_PATH declares a specific directory where the .proto file is located. If this value is omitted, the current directory is used. If there are multiple directories you can write to --proto_path multiple times and they will be accessed and imported sequentially. -I=IMPORT_PATH is its abbreviated form.

· Of course you can also provide one or more output paths:

o --cpp_out Generate C++ code in the target directory DST_DIR, see more at http://code.google.com/intl/en-US/apis/protocolbuffers/docs/reference/cpp-generated.html.

o --java_out Generate Java code in the target directory DST_DIR, see more at http://code.google.com/intl/en-US/apis/protocolbuffers/docs/reference/java-generated.html.

o --python_out Generate Python code in the target directory DST_DIR, see more at http://code.google.com/intl/en-US/apis/protocolbuffers/docs/reference/python-generated.html.

As an added bonus, if DST_DIR ends in .zip or .jar, the compiler packs the output into a zip-format archive. .jar will output a manifest file required for Java JAR declarations. Note: If the output archive already exists, it will be overwritten, the compiler is not smart enough to add new files to the existing archive.

· You must provide one or more .proto files as input. Multiple .proto files can be declared all at once. Although the files are named relative to the current directory, each file must be in an IMPORT_PATH so that the compiler can determine its standard name.

Guess you like