Protocol Buffer基础

目录

概述

How do protocol buffer work?

定义Protocol Format

编译protocol buffers

Propocol Buffers API

Standard Message Methods

Parsing and Serialization

Writing A Message

Reading A Message

Extending A Protocol Buffer


概述

官网的说明:

Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

protocol buffers是Google开发的一套机制,跟开发语言和平台无关,且支持扩展,用于序列化结构化数据。类似于XML,但比XML更轻,更快,更简单。开发者可以定义数据如何被结构化,然后用特定产生的源码生成和读取结构化数据。

为什么不用XML

  • 更简单
  • 用XML进行序列化的数据,比用protocol buffers大3-10倍。
  • protocol buffers的解析速度是XML的20-100倍
  • 更清晰
  • 提供数据访问类,方便开发

How do protocol buffer work?

通过在.proto文件中定义protocol buffer message types,开发者可以指定如何序列化数据。每个protocol buffers message都是一个小的logical record of information,其中包含一系列name-value键值对。

定义Protocol Format

Python tutorial使用了一个例子,是地址簿(address book)。在地址簿的每个人都有一个名字,一个ID,一个email地址和一个联系电话号码。本例中,我们定义addressbook.proto。

syntax = "proto2";

package tutorial;                #package是为了防止命名冲突(naming conflicts)

message Person {                 #定义message   
  required string name = 1;      #数字1是唯一的标签(unique tag)。     
  required int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];    #默认值是HOME
  }

  repeated PhoneNumber phones = 4;
}

message AddressBook {
  repeated Person people = 1;
}

一个message是有多个field组成的集合。message的field可以采用基础的数据类型,例如bool, int32, float, double和string。也可以使用其他message做为field的数据类型。甚至可以在某个message内部定义其他message。

唯一标签(unique tag)是该field在binary encoding过程会使用的。由于标签1-15在编码时,会比16及以上的标签少使用一个byte。故可以将标签1-15赋给常用的field或者repeated field。

所有的filed必须给出三种修饰语(required,optional和repeated)中的一个。

  • required:必须给出该filed的value。否则该message会被认为uninitialized。序列化uninitialized message会触发异常。
  • optional:the filed may or may not be set。如果optional filed没有被赋值,则该filed将使用default value。用户可以指定默认值,否则采用以下策略:zero for numeric types, the empty string for strings and fasle for bools。
  • repeated:该filed将重复零次或者多次。the repeated value的顺序将保留在protocol buffers里。

编译protocol buffers

To be done。

可以参考tutorial。

Propocol Buffers API

C++和Java会产生protocol buffers code,但Python不会。Python protocol buffer compiler将为所有的message,enums 和fileds生成特殊的描述符(descriptor),也为每一个message type生成empty class。

class Person(message.Message):
  __metaclass__ = reflection.GeneratedProtocolMessageType

  class PhoneNumber(message.Message):
    __metaclass__ = reflection.GeneratedProtocolMessageType
    DESCRIPTOR = _PERSON_PHONENUMBER
  DESCRIPTOR = _PERSON

class AddressBook(message.Message):
  __metaclass__ = reflection.GeneratedProtocolMessageType
  DESCRIPTOR = _ADDRESSBOOK

其中最重要的代码就是__metaclass__ = reflection.GeneratedProtocolMessageType。可以将这行代码理解为a template for creating classes。在加载的时候,GeneratedProtocolMessageType Metaclass将使用特殊的desciptor生成所有的python method,这样用户可以通过这些method work with each message type。

以下是使用message Person的代码:

import addressbook_pb2
person = addressbook_pb2.Person()
person.id = 1234
person.name = "John Doe"
person.email = "[email protected]"
phone = person.phones.add()
phone.number = "555-4321"
phone.type = addressbook_pb2.Person.HOME

需要注意的是,这些assignment不是随意添加一个新的filed。如果随意添加一个没有定义在文件.proto的field,AttributerError将被触发。如果给a field赋值一个错误类型的值,TypeError将被触发。

enums被metaclass展开为取值为整数的符号常量。例如,常量addressbook_pb2.Person.Work的取值为2。

Standard Message Methods

每个message class有一些standard message methods,方便开发者检查和操控整个message。主要有:

  • IsInitialized():检查是否所有required field都已经被赋值。
  • __str__():返回human-readable representation of message。这个在debug时有用。
  • CopyFrom(other_msg):overwrite the message with the given message's values
  • Clear():clears all the elements back to the empty state

Parsing and Serialization

所有的protocol buffers class都有读取和保存二进制化的message的methods。

  • SerializeToString():序列化message,并返回a string?????。Note that the bytes are binary,not text!!!! we only use the str type as a convenient container。
  • ParseFromString(data):parses a message from the given string。

Writing A Message

下面的代码实现功能:从文件中读取an AddressBook,基于用户输入增加一个新的Person,然后将新的AddressBook写回文件。

#! /usr/bin/python

import addressbook_pb2
import sys

# This function fills in a Person message based on user input.
def PromptForAddress(person):
  person.id = int(raw_input("Enter person ID number: "))
  person.name = raw_input("Enter name: ")

  email = raw_input("Enter email address (blank for none): ")
  if email != "":
    person.email = email

  while True:
    number = raw_input("Enter a phone number (or leave blank to finish): ")
    if number == "":
      break

    phone_number = person.phones.add()
    phone_number.number = number

    type = raw_input("Is this a mobile, home, or work phone? ")
    if type == "mobile":
      phone_number.type = addressbook_pb2.Person.MOBILE
    elif type == "home":
      phone_number.type = addressbook_pb2.Person.HOME
    elif type == "work":
      phone_number.type = addressbook_pb2.Person.WORK
    else:
      print "Unknown phone type; leaving as default value."

# Main procedure:  Reads the entire address book from a file,
#   adds one person based on user input, then writes it back out to the same
#   file.
if len(sys.argv) != 2:
  print "Usage:", sys.argv[0], "ADDRESS_BOOK_FILE"
  sys.exit(-1)

address_book = addressbook_pb2.AddressBook()

# Read the existing address book.
try:
  f = open(sys.argv[1], "rb")
  address_book.ParseFromString(f.read())
  f.close()
except IOError:
  print sys.argv[1] + ": Could not open file.  Creating a new one."

# Add an address.
PromptForAddress(address_book.people.add())

# Write the new address book back to disk.
f = open(sys.argv[1], "wb")
f.write(address_book.SerializeToString())
f.close()

Reading A Message

下面代码实现功能:从文件读取message,并将其信息打印出来。

#! /usr/bin/python

import addressbook_pb2
import sys

# Iterates though all people in the AddressBook and prints info about them.
def ListPeople(address_book):
  for person in address_book.people:
    print "Person ID:", person.id
    print "  Name:", person.name
    if person.HasField('email'):
      print "  E-mail address:", person.email

    for phone_number in person.phones:
      if phone_number.type == addressbook_pb2.Person.MOBILE:
        print "  Mobile phone #: ",
      elif phone_number.type == addressbook_pb2.Person.HOME:
        print "  Home phone #: ",
      elif phone_number.type == addressbook_pb2.Person.WORK:
        print "  Work phone #: ",
      print phone_number.number

# Main procedure:  Reads the entire address book from a file and prints all
#   the information inside.
if len(sys.argv) != 2:
  print "Usage:", sys.argv[0], "ADDRESS_BOOK_FILE"
  sys.exit(-1)

address_book = addressbook_pb2.AddressBook()

# Read the existing address book.
f = open(sys.argv[1], "rb")
address_book.ParseFromString(f.read())
f.close()

ListPeople(address_book)

Extending A Protocol Buffer

To be done

参考文献:

https://developers.google.com/protocol-buffers/

https://tensorflow-notes.readthedocs.io/zh_CN/latest/protocol-buffer.html

https://blog.csdn.net/u011518120/article/details/54604615

https://colobu.com/2015/01/07/Protobuf-language-guide/

猜你喜欢

转载自blog.csdn.net/ghalcyon/article/details/82053673