Analysis of new features of HTTP/2

http://io.upyun.com/2015/05/13/http2/

HTTP/2 is derived from SPDY/2

The SPDY family of protocols was developed by Google and made public in 2009. It's designed to reduce page load times by 50%. At present, many well-known Internet companies have adopted the SPDY series of protocols in their websites or APPs (the latest version is SPDY/3.1), because its performance improvement is obvious. Mainstream browsers (Google, Firefox, Opera) have already supported SPDY, and it has become an industry standard. The HTTP Working-Group finally decided to develop HTTP/2 based on SPDY/2.

However, there are still differences between HTTP/2 and SPDY, mainly in the following two points:

HTTP/2 supports clear-text HTTP transport, while SPDY enforces the use of HTTPS
The compression algorithm of HTTP/2 message header adopts HPACK instead of DELEFT used by SPDY

Advantages of HTTP/2

Compared with HTTP/1.x, HTTP/2 has made great changes and optimizations in the underlying transmission:

HTTP/2 transmits data in binary format rather than the text format of HTTP/1.x. The binary format brings more advantages and possibilities in the analysis and optimization of the protocol.
HTTP/2 uses HPACK to compress and transmit message headers, which can save the network traffic occupied by message headers. However, each request of HTTP/1.x will carry a lot of redundant header information, which wastes a lot of bandwidth resources. Header compression can solve this problem very well.
Multiplexing, to put it bluntly, is that all requests are completed concurrently through a TCP connection. Although HTTP/1.x can use one connection to complete multiple requests, there is a sequence between multiple requests, and subsequent requests must wait for the previous request to return before sending a response. This can easily cause subsequent requests to be blocked, while HTTP/2 achieves true concurrent requests. At the same time, streams also support priority and flow control.
Server Push: The server can push resources to the client faster. For example, the server can actively push JS and CSS files to the client without requiring the client to parse HTML and then send these requests. When the client needs it, it's already on the client side.

HTTP/2 is mainly a complete reconstruction of the underlying transport mechanism of HTTP/1.x. HTTP/2 is basically compatible with the semantics of HTTP/1.x (please click here for detailed compatibility instructions ). Content-Type Still Content-Type , it's just not a text transfer anymore. So how are these new features of HTTP/2 implemented?

The cornerstone of HTTP/2 - Frame

Frame is the basis of the HTTP/2 binary format, which can basically be understood as the data packet in its TCP. The reason why HTTP/2 can have so many new features is precisely because of changes in the underlying data format. The basic format of Frame is as follows (the numbers in the figure represent the number of bits occupied, and the content is taken from http2-draft-17 ):

+-----------------------------------------------+
|                 Length (24)                   |
+---------------+---------------+---------------+
|   Type (8)    |   Flags (8)   |
+-+-------------+---------------+-------------------+
|R|                 Stream Identifier (31)          |
+=+=================================================+
|                   Frame Payload (0...)        ...
+---------------------------------------------------+

Length: Indicates the length of the Frame Payload part, and the length of the Frame Header is a fixed 9 bytes (Length + Type + Flags + R + Stream Identifier = 72 bits).

Type: distinguish whether the data stored in this Frame Payload belongs to HTTP Header or HTTP Body; in addition, HTTP/2 defines some other Frame Types. For example, when this field is 0, it indicates the DATA type (that is, in HTTP/1.x Body part data)

Flags: 共 8 位，每位都起标记作用。每种不同的 Frame Type 都有不同的 Frame Flags。例如发送最后一个 DATA 类型的 Frame 时，就会将 Flags 最后一位设置 1（ flags &= 0x01 ），表示 END_STREAM，说明这个 Frame 是流的最后一个数据包。

R: 保留位。

Stream Identifier: 流 ID，当客户端和服务端建立 TCP 链接时，就会先发送一个 Stream ID = 0 的流，用来做些初始化工作。之后客户端和服务端从 1 开始发送请求/响应。

Frame 由 Frame Header 和 Frame Payload 两部分组成。不论是原来的 HTTP Header 还是 HTTP Body，在 HTTP/2 中，都将这些数据存储到 Frame Payload，组成一个个 Frame，再发送响应/请求。通过 Frame Header 中的 Type 区分这个 Frame 的类型。由此可见语义并没有太大变化，而是数据的格式变成二进制的 Frame。二者的转换和关系如下图:

图片引用自这里

为 HTTP/2 头压缩专门设计的 HPACK

如果我们约定将常用的请求比如 GET /index.html 用一个 1 来表示， POST /index.html 用 2 来表示。那么是不是可以节省很多字节？

为 HTTP/2 的专门量身打造的 HAPCK 便是类似这样的思路延伸。它使用一份索引表来定义常用的 HTTP Header。把常用的 HTTP Header 存放在表里。请求的时候便只需要发送在表里的索引位置即可。例如 :method=GET 使用索引值 2 表示， :path=/index.html 使用索引值 5 表示（完整的列表参考： HPACK Static Table ）。只要给服务端发送一个 Frame，该 Frame 的 Payload 部分存储 0x8285 ，Frame 的 Type 设置为 Header 类型，便可表示这个 Frame 属于 HTTP Header，请求的内容是：

GET /index.html

为什么是 0x8285 ，而不是 0x0205 ？这是因为高位设置为 1 表示这个字节是一个完全索引值（key 和 value 都在索引中）。类似的，通过高位的标志位可以区分出这个字节是属于一个完全索引值，还是仅索引了 key，还是 key 和 value 都没有索引。因为索引表的大小的是有限的，它仅保存了一些常用的 HTTP Header，同时每次请求还可以在表的末尾动态追加新的 HTTP Header 缓存。动态部分称之为 Dynamic Table。Static Table 和 Dynamic Table 在一起组合成了索引表：

<----------  Index Address Space ---------->
<-- Static  Table -->  <-- Dynamic Table -->
+---+-----------+---+  +---+-----------+---+
| 1 |    ...    | s |  |s+1|    ...    |s+k|
+---+-----------+---+  +---+-----------+---+
                       ^                   |
                       |                   V
                 Insertion Point      Dropping Point

HPACK 不仅仅通过索引键值对来降低数据量，同时还会将字符串进行霍夫曼编码来压缩字符串大小。

以常用的 User-Agent 为例，它在静态表中的索引值是 58，它的值是不存在表中的，因为它的值是多变的。第一次请求的时候它的 key 用 58 表示，表示这是一个 User-Agent ，它的值部分会进行霍夫曼编码（如果编码后的字符串变更长了，则不采用霍夫曼编码）。服务端收到请求后，会将这个 User-Agent 添加到 Dynamic Table 缓存起来，分配一个新的索引值。客户端下一次请求时，假设上次请求 User-Agent 的在表中的索引位置是 62，此时只需要发送 0xBE （同样的，高位置 1），便可以代表： User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36 。其过程如下图所示:

图片引用自这里

最终，相同的 Header 只需要发送索引值，新的 Header 会重新加入 Dynamic Table。

Multipexing 多路复用

每个 Frame Header 都有一个 Stream ID 就是被用于实现该特性。每次请求/响应使用不同的 Stream ID。就像同一个 TCP 链接上的数据包通过 IP:PORT 来区分出数据包去往哪里一样。通过 Stream ID 标识，所有的请求和响应都可以欢快的同时跑在一条 TCP 链接上了。下图是 http 和 spdy(http2 的模型和 spdy 是类似的) 的并发模型对比：

当流并发时，就会涉及到流的优先级和依赖。优先级高的流会被优先发送。图片请求的优先级要低于 CSS 和 SCRIPT，这个设计可以确保重要的东西可以被优先加载完。

Server Push

当服务端需要主动推送某个资源时，便会发送一个 Frame Type 为 PUSH_PROMISE 的 Frame，里面带了 PUSH 需要新建的 Stream ID。意思是告诉客户端：接下来我要用这个 ID 向你发送东西，客户端准备好接着。客户端解析 Frame 时，发现它是一个 PUSH_PROMISE 类型，便会准备接收服务端要推送的流。

结束语

本文简化了很多 HTTP/2 协议中的具体细节，只描述了 HTTP/2 中主要特性实现的基本过程。

如果你想实现一个支持 HTTP/2 的服务器，那么你可以移步 HTTP/2 官网做更多了解，它还提供了一份已经实现 HTTP/2 的项目列表： https://github.com/http2/http2-spec/wiki/Implementations 。

另外，关于 HTTP/2 性能如何，可以参考官方小组给出的例子： https://http2.akamai.com/demo