Serialization of the TLV

Serialization of the TLV

The communication protocol may be understood to work together to achieve the exchange of information between two nodes, certain negotiated rules and conventions, for example, a predetermined byte order, each field type, using what compression algorithm or encryption algorithm. Common are common protocols tcp, udo, http, sip and so on. Agreement process specification and encoding specifications. Processes and other processes such as call signaling process, the coding specification how all signaling and data packing / unpacking.

Coding standards is what we usually refer to encoding and decoding, serialization. Not only is the work of communication, we often work in storage use. As we often want to store objects in memory to disk, you need to object data serialization work.

In this paper, the first step by step, to give an example, and then continue to ask questions - perfect to solve, such an iterative fashion evolution, the agreement introduces a gradual evolution and improvement, concluded. After reading it, you can easily develop in the future and choose their own encoding protocol at work.

1. Compact mode

Examples A and B herein communications Gets or sets the basic information, the developer is generally the first step is to define a protocol structure:

struct userbase {
    unsigned short cmd;     // 1-get, 2-set, 定义一个short,为了扩展更多命令(理想那么丰满)
    unsigned char gender;   // 1 – man , 2-woman, 3 - ??
    char name[8];           // 当然这里可以定义为 string name;或len + value 组合,为了方便叙述使用简单定长
}

In this manner, A basic need encoding Copy directly from memory, and then do something cmd network byte order conversion, sent to B. B can be resolved, everything is harmonious and happy.

This time can be represented by FIG coding result is (1 byte cell)

Compact mode structure

This encoding, called compact mode , meaning that in addition to the data itself, without any additional redundant information can be seen as Raw Data. In the dos era, this use is very common, but time is pressing K memory and network computing, cpu not to 1G. If you add extra information, not just consume stretched cpu, memory and bandwidth are even afford to hurt.

2. Scalability

One day, A birthday plus a field in which basic information, then tell B

struct userbase {
    unsigned short cmd;
    unsigned char gender;
    unsigned int birthday;
    char name[8];
}

It is anxious to B, A packet is received, do not know the first three fields in the end is the name field of the old agreement, or a new agreement birthday. This is the A, B, and finally to the lessons learned from a protocol important feature - compatibility and scalability.

Ever since, A and B of the decision to abandon the old agreement and start again, each version is compatible with the development of a future agreement. The method is very simple, is to add a version field.

struct userbase {
    unsigned short version;
    unsigned short cmd;
    unsigned char gender;
    unsigned int birthday;
    char name[8];
}

Thus, A and B on the relief, since it can be easily extended. Increase the field is also very convenient. This method Even now, there should be a lot of people use.

3. Better scalability

After a long period of time, A and B have discovered a new problem, that is not to add a field to change the version number, this is not the focus, the focus is to maintain the code is pretty troublesome, each version of a case branch, to preferably, the code which case several branches, looks ugly and costly to maintain.

A and B to think carefully about it, and feel alone a version maintenance agreement as a whole, is not detailed enough, so that adding an additional information for each field - tag, although the increase in memory and bandwidth, but now, unlike then, as can be tolerated these redundancy in exchange for ease of use.

struct userbase {
    unsigned short version;
    unsigned short cmd;
    unsigned char gender;
    unsigned int birthday;
    char name[8];
}

The scalable architecture -1

After the development of these protocols, A and B are very proud, that this agreement is good, free to increase and decrease the field. Casually extended.
Reality is always cruel, and soon there is a new demand, name using 8 bytes is not enough, may reach a maximum length of 100 bytes, A and B on Chouhuai, they can not even be called "steven" person, per times are in accordance with 100 byte-packed, though not bad money, can not be such a waste.

So look for parties A and B data, we found ANS.1 coding standard, good things ah .. ASN.1 is an ISO / ITU-T standard. Wherein encoding BER (Basic Encoding Rules) easy to use, it uses the triplet coding, abbreviated TLV encoding .

Each field is coded as the memory organization

Tag | Length | Value

Field may be a structure that can be nested

TLV structure

A and B use the TLV packetized protocol, the data memory organization probably as follows:

The scalable architecture -2

TLV 具备了很好可扩展性,很简单易学。同时也具备了缺点,因为其增加了 2 个额外的冗余信息,tag 和 len,特别是如果协议大部分是基本数据类型 int ,short, byte. 会浪费几倍存储空间。另外 Value 具体是什么含义,需要通信双方事先得到描述文档,即 TLV 不具备结构化和自解释特性。

4. 自解释性

当 A 和 B 采用 TLV 协议后,似乎问题都解决了。但是还是觉得不是很完美,决定增加自解释特性,这样抓包就能知道各个字段类型,不用看协议描述文档。这种改进的类型就是 TT[L]V(tag,type,length,value),其中 L 在 type 是定长的基本数据类型如 int,short, long, byte 时候,因为其长度是已知的,所以 L 不需要。

于是定义了一些 type 值如下

类型 Type值 类型描述
bool 1 布尔值
int8 2 带符号的一个字符
uint8 3 带符号的一个字符
int16 4 16位有符号整型
uint16 5 16位无符号整型
int32 6 32位有符号整型
uint32 7 32位无符号整型
...
string 2 字符串或二进制序列
struct 13 自定义的结构,嵌套使用
list 14 有序列表
map 15 无序列表

按照 ttlv 序列化后,内存组织如下

ttlv sequence structure

改完后,A 和 B 发现,的确带来很多好处,不光可以随心所以的增删字段,还可以修改数据类型,例如把 cmd 改成 int cmd;可以无缝兼容。真是太给力了。

5. 跨语言特性

有一天来了一个新的同事 C,他写一个新的服务,需要和 A 通信,但是 C 是用 java 或 PHP 的语言,没有无符号类型,导致负数解析失败。为了解决这个问题,A 重新规划一下协议类型,做了有些剥离语言特性,定义一些共性,对使用类型做了强制性约束。虽然带来了约束,但是带来通用型和简洁性,和跨语言性,大家表示都很赞同,于是有了一个类型(type)规范。

类型 Type值 类型描述
bool 1 布尔值
int8 2 带符号的一个字符
int16 3 16位有符号整型
int32 4 32位有符号整型
...
string 2 字符串或二进制序列
struct 13 自定义的结构,嵌套使用
list 14 有序列表
map 15 无序列表

6. 代码自动化:IDL语言

但是 A 和 B 发现了新的烦恼,就是每搞一套新的协议,都要从头编解码,调试,虽然 TLV 很简单,但是写编解码是一个毫无技术含量的枯燥体力活,一个非常明显的问题是,由于大量 copy/past,不管是对新手还是老手,非常容易犯错,一犯错,定位排错非常耗时。于是 A 想到使用工具自动生成代码。

IDL(Interface Description Language),它是一种描述语言,也是一个中间语言,IDL 一个使命就是规范和约束,就像前面提到,规范使用类型,提供跨语言特性。通过工具分析 idl 文件,生成各种语言代码。

Gencpp.exe sample.idl 输出 sample.cpp sample.h
Genphp.exe sample.idl 输出 sample.php
Genjava.exe sample.idl 输出 sample.java

7. 总结

大家看到这里,是不是觉得很面熟。是的,协议讲到最后,其实就是和 facebook 的 thrift 和 google protocol buffer 协议大同小异了。包括公司无线使用的 jce 协议。咋一看这些协议的 idl 文件,发现几乎是一样的。只是有些细小差异化。

这些协议在一些细节上增加了一些特性:

  1. 压缩,这里压缩不是指 gzip 之类通用压缩,是指针对整数压缩,如 int 类型,很多情况下值是小于 127(值为 0 的情况特别多),就不需要占用 4 个字节,所以这些协议做了一些细化处理,把 int 类型按照情况,只使用 1/2/3/4 字节,实际上还是一种 ttlv 协议。

  2. reuire/option 特性: 这个特性有两个作用,一是压缩,有时候一个协议很多字段,有些字段可以带上也可以不带上,不赋值的时候不是也要带一个缺省值打包,这样很浪费,如果字段是 option 特性,没有赋值的话,就不用打包。二是约束功能,规定哪些字段必须有,加强校验。

序列化是通信协议的基础,不管是信令通道还是数据通道,还是 rpc,都需要使用到。在设计协议早期就考虑到扩展性和跨语言特性。会为以后省去不少麻烦。

Ps

This part describes the sequence of binary communications protocol, not spoken text protocol. In a sense, born with a text protocol compatibility and scalability. Unlike binary need to consider so many problems. Text easier to debug protocol (such as packet capture is visible character, telnet to debug, data packets can be manually generated without the aid of special tools), easy to learn is the most powerful advantage. Binary protocol advantage is performance and security. But the trouble debugging.

Both strengths and weaknesses, demand options

reference:

This article forwarded to https://www.jianshu.com/p/fb183509f14d

https://www.jianshu.com/p/73c9ed3a4877
https://www.jianshu.com/p/72108f0aefca


The intentions of recording a little bit every day. Perhaps the content is not important, but the habit is very important!

Guess you like

Origin www.cnblogs.com/binarylei/p/10991550.html