google protocol buffer介绍

Developer Guide

Welcome to the developer documentation for protocol buffers –
a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.

欢迎阅读protocol buffers的开发文档，protocol buffers是一个语言中立，平台中立，可扩展的序列化结构数据的方式
可用于通讯协议，数据存储等方面。

This documentation is aimed at Java, C++, or Python developers who want to use protocol buffers in their applications.
This overview introduces protocol buffers and tells you what you need to do to get started
– you can then go on to follow the tutorials or delve deeper into protocol buffer encoding.
API reference documentation is also provided for all three languages, as well as language and style guides for writing .proto files.

这篇文档的目标定位于使用protocol buffers开发的Java,C++或者Python开发人员。这个概述介绍了protocol Buffers并告诉你如何开始使用。
然后你就可以跟着教程继续深入研究prorocol buffer的编码。API参考文档也提供三种语言的。还给出了如何写.proto文件的文档。
--.proto文件估计是与语言无关的文件，类似CORBA的idl文件

What are protocol buffers?

Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data –
think XML, but smaller, faster, and simpler. You define how you want your data to be structured once,
then you can use special generated source code to easily write and read your structured data to and from a variety of data streams
and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the "old" format.

什么是protocol buffers?
Protocol buffers是一个灵活的，高效的，有自动机制(可能指编解码)工具用于序列化结构数据。
类似XML，但是更小、更快、更简单。你定义你的结构化数据，然后就可以使用工具生成的特殊代码方便的使用各种语言(前面支持的三种)
从各种数据流中读写你的结构化数据。你甚至可在不打断已经部署的程序的情况下重新更新你的数据结构（热部署）。

How do they work?

You specify how you want the information you're serializing to be structured by defining protocol buffer message types in .proto files.
Each protocol buffer message is a small logical record of information, containing a series of name-value pairs.
Here's a very basic example of a .proto file that defines a message containing information about a person:

如何工作？
你可以通过.proto文件定义你需要序列化的信息。每个buffer消息是一个逻辑记录，包括一系列名值对。下面是一个关于一个人的信息的.proto文件的例子
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;

enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
}

message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
}

repeated PhoneNumber phone = 4;
}

As you can see, the message format is simple – each message type has one or more uniquely numbered fields,
and each field has a name and a value type, where value types can be numbers (integer or floating-point), booleans, strings, raw bytes,
or even (as in the example above) other protocol buffer message types, allowing you to structure your data hierarchically.
You can specify optional fields, required fields, and repeated fields.
You can find more information about writing .proto files in the Protocol Buffer Language Guide.

象你看到的一样，消息格式很简单：每个消息类型有一个或者多个数据项，每个数据项有一个名字和一个数据类型。数据类型可以是数值(整形或者浮点型)，
布尔型，字符串，字节流或者自定义的buffer类型，允许你子架构造数据体系。
你可以指定可选的数据项，必选数据项和重复数据项。关于如何写.proto文件，可以从protocol buffer language指南中得到更多信息。

Once you've defined your messages, you run the protocol buffer compiler for your application's language on your .proto file to generate data access classes.
These provide simple accessors for each field (like query() and set_query()) as well as methods to serialize/parse the whole structure to/from raw bytes –
so, for instance, if your chosen language is C++, running the compiler on the above example will generate a class called Person.
You can then use this class in your application to populate, serialize, and retrieve Person protocol buffer messages.
You might then write some code like this:

定义了消息后，就可以protocol buffer编译器编译，从.proto文件生成数据访问类。（与corba idl类似）
这些类提供了简单的访问数据项的方法，类似query(),set_query()
你可以在你的应用中使用这些类来构造、序列化和取回Person这个protocol buffer消息。你可以写如下代码：
Person person;
person.set_name("John Doe");
person.set_id(1234);
person.set_email("[email protected]");
fstream output("myfile", ios::out | ios::binary);
person.SerializeToOstream(&output);

then, later on, you could read your message back in:
然后，从文件中读回信息

fstream input("myfile", ios::in | ios::binary);
Person person;
person.ParseFromIstream(&input);
cout << "Name: " << person.name() << endl;
cout << "E-mail: " << person.email() << endl;

You can add new fields to your message formats without breaking backwards-compatibility;
old binaries simply ignore the new field when parsing. So if you have a communications protocol that uses protocol buffers as its data format,
you can extend your protocol without having to worry about breaking existing code.

你可以增加数据项，不用考虑前向兼容；旧的代码仅仅是简单的忽略新增的项。
如果你使用protocol buffer作为你的通讯协议，你能够扩展你的协议，不用担心影响已经存在的代码。

You'll find a complete reference for using generated protocol buffer code in the API Reference section,
and you can find out more about how protocol buffer messages are encoded in Protocol Buffer Encoding.

你可在API文档中找到完整的参考资料，并能够了解协议是如何编解码的。

Why not just use XML?

Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:

    * are simpler
    * are 3 to 10 times smaller
    * are 20 to 100 times faster
    * are less ambiguous
    * generate data access classes that are easier to use programmatically

For example, let's say you want to model a person with a name and an email. In XML, you need to do:
为什么不使用XML?
protocol buffer有很多XML不具备的优点：
1.简单；
2.小巧：3-10倍
3.效率高：20-100倍
4.无二义性
5.有自动工具生成访问类；(其实ASN.1, CORBA都有类似工具)

例如，Person模型使用xml表示
<person>
    <name>John Doe</name>
    <email>[email protected]</email>
</person>

while the corresponding protocol buffer message (in protocol buffer text format) is:
对应的protocol文本格式

# Textual representation of a protocol buffer.
# This is *not* the binary format used on the wire.
person {
name: "John Doe"
email: "[email protected]"
}

When this message is encoded to the protocol buffer binary format (the text format above is just a convenient
human-readable representation for debugging and editing), it would probably be 28 bytes long and take around 100-200 nanoseconds to parse.
The XML version is at least 69 bytes if you remove whitespace, and would take around 5,000-10,000 nanoseconds to parse.

当消息编码成二进制格式（上面的说明只是为了编译阅读的表示方式），protocol buffer将差不多28个子节长，用100-200ns时间解析。
而XML文件有69字节长，还要去掉空白符，使用5000-10000ns来解析

Also, manipulating a protocol buffer is much easier:
维护以很容易：

cout << "Name: " << person.name() << endl;
cout << "E-mail: " << person.email() << endl;

Whereas with XML you would have to do something like:
而XML要做如下的事情：

cout << "Name: "
       << person.getElementsByTagName("name")->item(0)->innerText()
       << endl;
cout << "E-mail: "
       << person.getElementsByTagName("email")->item(0)->innerText()
       << endl;

However, protocol buffers are not always a better solution than XML –
for instance, protocol buffers would not be a good way to model a text-based document with markup (e.g. HTML),
since you cannot easily interleave structure with text. In addition, XML is human-readable and human-editable; protocol buffers,
at least in their native format, are not. XML is also – to some extent –
self-describing. A protocol buffer is only meaningful if you have the message definition (the .proto file).

可是，protocol buffers并不是一直都比XML好-例如，protocol buffers不适合描述符号文本，如HTML，因为你不能很好的组织文本。
另外，XML更易于阅读和编辑。protocols buffers也不是自描述的（不知什么意思？）

并且，protocol buffers在google内部已经广泛使用。

google protocol buffer介绍

猜你喜欢