Thrift RPC actual (c) a sequence of Thrift Secret

This article explains the serialization mechanism Thrift, thrift as a data exchange format to see how it works?

1. Construction scenarios:

1). First, let's define a simple structure at the thrift.

      
      
1
2
3
4
5
      
      
namespace java com.yangyang.thrift.api
struct Pair {
    1: required string key
    2: required string value
}

required modifiers sure you can guess its meaning, but if you have no such doubts, "1", "2" What is the meaning of these numerical identifier exactly? What kind of role it actually plays in the serialization mechanism?
compile and
thrift -gen java
2) Write the test code

      
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
      
      
private static  String datafile = "1.dat";
// *) the object into the document
public static  void writeData() throws IOException, TException {
    Pair pair = new Pair();
    pair.setKey("key1").setValue("value1");
    FileOutputStream fos = new FileOutputStream(new File(datafile));
    pair.write(new TBinaryProtocol(new TIOStreamTransport(fos)));
    fos.close();
}
// *) 从文件恢复对象
public static  void readData() throws TException, IOException {
    FileInputStream fis = new FileInputStream(new File(datafile));
    Pair pair = new Pair();
    pair.read(new TBinaryProtocol(new TIOStreamTransport(fis)));
    System.out.println("key => " + pair.getKey());
    System.out.println("value => " + pair.getValue());
    fis.close();
}
public static void main(String[] args) throws Exception{
    //writeData();
    readData();
}

调用writeData(), 把pair{key=> key1, value=> value1} 写入文件1.dat中
然后调用readData(),观察控制台结果为:
key =>key1
value =>value1
3). 如果我重新定义pair结构, 调整数字编号数序

      
      
1
2
3
4
5
      
      
namespace java com.yangyang.thrift.api
struct Pair {
    2: required string key
    1: required string value
}

评注: 这边2对应key, 1对应value.
重新编译thrift -gen java
4). 然后读取该数据
调用readData(), 注意此时不要在调用writeData(),从文件1.dat中恢复Pair对象来
结果:
key => value1
value => key1
是不是和你预期的相反, 看来属性名称并没有发挥作用, 而id标识在thrift的序列化/反序列化扮演非常重要的角色
带着这些疑惑, 我们进一步的详细解读序列化机制

2.thrift 数据格式描述

官网文档描述: http://thrift.apache.org/static/files/thrift-20070401.pdf

      
      
1
      
      
Versioning in Thrift is implemented via field identifiers. The field header for every member of a struct in Thrift is encoded with a unique field identifier. The combination of this field identifier and its type specifier is used to uniquely identify the field. The Thrift definition language supports automatic assignment of field identifiers, but it is good programming practice to always explicitly specify field identifiers.

翻译: thrift的向后兼容性(Version)借助属性标识(数字编号id + 属性类型type)来实现, 可以理解为在序列化后(属性数据存储由 field 大专栏  Thrift RPC实战(三) thrift序列化揭秘_name:field_value => id+type:field_value), 这也解释了上述提到的场景的原因了.
对之前定义的Pair结构体, 进行代码解读:

      
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
      
      
public void read(org.apache.thrift.protocol.TProtocol iprot, Pair struct) throws org.apache.thrift.TException {
  org.apache.thrift.protocol.TField schemeField;
  //读取结构开始标记
  iprot.readStructBegin();
  while (true)
  {
    // 读取Field属性开始标记
    schemeField = iprot.readFieldBegin();
    if (schemeField.type == org.apache.thrift.protocol.TType.STOP) {
      break;
    }
    // field标记包含 id + type, switch根据(id+type)来分配相关的值
    switch (schemeField.id) {
      case 2: // KEY
        if (schemeField.type == org.apache.thrift.protocol.TType.STRING) {
          struct.key = iprot.readString();
          struct.setKeyIsSet(true);
        } else {
          org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type);
        }
        break;
      case 1: // VALUE
        if (schemeField.type == org.apache.thrift.protocol.TType.STRING) {
          struct.value = iprot.readString();
          struct.setValueIsSet(true);
        } else {
          org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type);
        }
        break;
      default:
        org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type);
    }
    // 读取Field属性结束标记
    iprot.readFieldEnd();
  }
   // 读取结构体结束标记
  iprot.readStructEnd();
  // check for required fields of primitive type, which can't be checked in the validate method
  struct.validate();
}

  从恢复对象的函数中, 我们也可以对thrift定义的序列化对象有个初步的认识, 庖丁解牛,最终会被细化为readStructBegin, readFieldBegin, read (readString, readI32, readI64), readFieldEnd, readStructEnd的有组织有序调用.

3.数据交换格式分类

当前的数据交换格式可以分为如下几类:
1). 自解析型
  序列化的数据包含完整的结构, 包含了field名称和value值. 比如xml/json/java serizable, 大百度的mcpack/compack, 都属于此类. 即调整不同属性的顺序对序列化/反序列化不影响.
2). 半解析型
  序列化的数据,丢弃了部分信息, 比如field名称, 但引入了index(常常是id+type的方式)来对应具体属性和值. 这方面的代表有google protobuf, thrift也属于此类.
3). 无解析型
  传说中大百度的infpack实现, 就是借助该种方式来实现, 丢弃了很多有效信息, 性能/压缩比最好, 不过向后兼容需要开发做一定的工作, 详情不知.

thrift and common data exchange format comparison
| Interchange Format | Type | advantages | disadvantages |
| - | - | - | - |
| the Xml | text | read | bloated, does not support the binary data types |
| Json | text | Easy read | discard the types of information, such as "score": 100, to score type is int / double parsing ambiguous, does not support the binary data types |
| the java serizable | binary | simple | bloated, limited only in java field |
| Thrift | | binary | efficient | should not be read backwards compatible with certain conventions limit |
| Google Protobuf | binary | efficient | should not be read backwards compatible with certain contractual constraints |

4. Practice backward compatibility

  Thrift official document, also referred to the new field properties, using id incrementally to identify and optional modifications are made to add.

Guess you like

Origin www.cnblogs.com/lijianming180/p/12037891.html