HTTP/3 is here: the application of QUIC protocol in OPPO

PART

0 0
 guide
In recent years, the QUIC protocol has set off an upsurge in the field of network communication, and protocol names such as QUIC and HTTP/3 can be seen everywhere on various technical websites. So what exactly is QUIC? What is the relationship and difference between HTTP/3 and QUIC? This article starts with the principle of the excellent features of the QUIC protocol, and introduces the implementation and application of OPPO's self-developed QUIC protocol (OQUIC protocol).
PART

0 1
 QUIC Background


With the development of the Internet, especially the addition of the mobile Internet, on the one hand, the network becomes more and more crowded, and on the other hand, people have higher and higher requirements for network delay. However, there are more and more large resources such as pictures and videos on the Internet, and audio and short videos are becoming more and more popular. People have higher requirements for real-time network transmission. They hope that the page can be opened instantly and the short video is smooth without lagging. In recent decades, practitioners in the field of network optimization have devoted themselves to the research of accelerating data network transmission. Finally, in 2013, Google threw a thunderbolt to the world: using the "QUIC" protocol to replace the traditional TCP protocol, which is still very popular today. high.

1.1 Why does TCP "not work" anymore?

1.1.1 Connection establishment is very slow
The connection establishment process of TCP must go through 1 RTT (Round Trip Time), the establishment process of MPTCP is slower (a total of 3 RTTs are required), and the protocol of HTTP/2 and previous versions requires TLS establishment after TCP establishment. Even, the TLS handshake needs 1~2 RTT. From the perspective of users who are far away from the client and server, this connection establishment process is already maddening.


1.1.2 Head of queue blocking problem 
Due to the reliable transmission characteristics of TCP, data is required to arrive in order. TCP uses packet sequence numbers (Sequence Number) to ensure order. However, in the complex network transmission process, the data packet sent out first may not necessarily arrive at the destination first. If the packet is lost, TCP requires that the receiving window must be blocked and waited for the retransmission to arrive before sliding, so as to continue to receive packets with larger sequence numbers. This behavior that requires blocking and waiting is called head-of-line blocking. The HTTP/2 protocol uses the concept of flow to solve the head-of-line blocking problem of the HTTP protocol, but the head-of-line blocking problem of the TCP protocol cannot be solved.
1.1.3 Rigidity of the agreement
The TCP protocol is now fifty years old and implemented in the kernel. Every time the TCP protocol is updated, the operating system needs to be upgraded, so even if TCP has better feature updates, it is difficult to promote it quickly.
1.1.4 Rigidity of middleware
Since the TCP protocol has been used for too long, the intermediate devices, such as firewalls and NAT gateways, have been solidified. If you want to make major changes to TCP, the intermediate device will "not agree" first, and the consequence is that the user updates his local TCP. Unable to access the Internet after the agreement.

1.2  How does the QUIC protocol solve the above problems?

1.2.1 Jianlian Express
The QUIC protocol is implemented based on UDP. UDP is unreliable transmission and does not need to establish a connection. The QUIC protocol ensures reliable transmission at the application layer. The QUIC protocol has a particularly prominent feature: 0-RTT, that is, the entire connection establishment process of QUIC does not require a network delayed. The implementation process of 0-RTT is described in detail in Chapter 2.1.
1.2.2 User mode protocol
QUIC is a transport layer protocol implemented in user mode, so it only needs two endpoints to perform protocol matching, does not involve the update of the operating system, and is more transparent to intermediate devices.
1.2.3 No queue head blocking problem
The QUIC protocol is implemented based on UDP, so it naturally solves the TCP head-of-line blocking problem.

1.3 What is HTTP/3?

I believe that many people will have doubts about these concepts: why some people call it QUIC, others call it HTTP/3, and iQUIC, gQUIC, what are these?
As early as 2012, when Google proposed the concept of the QUIC protocol, the content of the QUIC protocol included two layers (transport layer and application layer), not only the functions of the transport layer, but also the functions of the application layer. The QUIC protocol at this time is also called HTTP/2 over QUIC. Later, the IETF organization found out: Hey, this protocol is good, I need to standardize it. Therefore, with the efforts of the IETF, the QUIC protocol was transformed, and Google's original QUIC was stripped into a two-layer protocol. The transport layer is called the QUIC protocol, which is only responsible for the functions of the transport layer; the application layer is still called the HTTP protocol, but it has been upgraded. A version number, called the HTTP/3 protocol. When IETF standardized the transport layer function of QUIC, it also optimized it. The optimized QUIC protocol is called iQUIC (i refers to IETF). Naturally, in order to distinguish, the Google set before optimization is called gQUIC (g refers to is Google). Since gQUIC is becoming less and less popular (even Google is doing iQUIC), the QUIC protocol implemented by OPPO is the iQUIC protocol.

PART

02
 What are the features of QUIC


2.1 0-RTT

The 0-RTT handshake process of the gQUIC protocol is different from that of the iQUIC protocol. The key point of gQUIC is SCFG, and the key point of iQUIC is PSK. This article only explains the iQUIC process.
Our QUIC handshake process is the TLS1.3 standard process, using the BoringSSL library (OpenSSL's TLS1.3 only supports TCP, does not support QUIC, currently only BoringSSL supports QUIC).
First of all, the establishment of a connection between the client and the server does not always achieve 0-RTT. In the following two cases:
1) The client and server have never established a connection, and the QUIC handshake requires a complete RTT.
2) After the PSK file saved by the client expires, the QUIC handshake requires a complete RTT.
Let's first look at a complete RTT process of the QUIC handshake:


Client sends Client Hello:
1) Choose an elliptic curve, which is the base point G of the elliptic curve.
2) Generate a random number as the client's elliptic curve private key ( Ra ) and keep it locally.
3) Calculate the client's elliptic curve public key based on the base point G and the private key: Pa(x, y) = Ra * G(x, y)
Client Random
4) The algorithm sets supported by the client, the elliptic curve used by the client, the psk mode and other information.

The server returns Server Hello:
1) Generate a random number as the elliptic curve private key ( Rb ) of the server and keep it locally.
2) Calculate the elliptic curve public key of the server based on the base point G and the private key: Pb(x, y) = Rb * G(x, y)
3) Generate a random number again for the calculation of the final session key.

At this point, the client has: the private key of the client, the public key of the server; the server has: the private key of the server, the public key of the client;
The client calculates Sa(x, y) = Ra * Pb(x, y) ;  
The server calculates  Sb(x, y) = Rb *Pa(x, y ) ;
According to the elliptic curve algorithm: Sa = Sb = S , extract the x vector of S as the pre-master key (  pre-master) 
The final session key is generated by three calculations: client Random, Server Random, and pre-master.
At this point, the connection between the client and the server has been established, and the client can send HTTP requests.
After the 1-RTT connection is established, the server will also send a New Session Ticket message. This message is very important and is the basis of 0-RTT.


After the 1-RTT handshake, the client and the server have changed from "strangers" to "friends", and when the client accesses the business again, QUIC's 0-RTT will be "started".


Client and Server share the same PSK (obtained by NewSessionTicket in the first handshake), client carries data ("early data") in the first message sent out, and uses PSK recovery session key to encrypt early-data .
We can also see from the figure that 0-RTT still has ClientHello and ServerHello messages, so 0-RTT does not require a handshake, but just carries the HTTP Request data during the handshake.


Only three conditions are met at the same time, the 0RTT session recovery mode is enabled, otherwise it is 1RTT session recovery:
1) After the first complete handshake, the server sends a New Session Ticket, and the max_early_data_size extension in the Session Ticket indicates that it is willing to accept early_data.
2) In PSK session recovery, the early data extension is configured in the extension of ClientHello, indicating that the Client should enable 0RTT mode.
3) The server carries the early data extension in the EnCrypted Extensions message, indicating that it agrees to read the early data.


2.2 Connection Migration

2.2.1 Function Introduction of Connection Migration
Connection migration is one of the most complex functions in the QUIC protocol.
A TCP connection is uniquely identified by a quadruple. As long as any one of the quadruples (source IP, source port, destination IP, destination port) changes, the current connection will be broken and a connection needs to be re-established. What is connection migration? That is, when any element of the quadruple changes, the current connection will not be interrupted, and data can continue to be transmitted. In life, our mobile phones often switch between wifi and cellular data. When we leave home, we will automatically switch to cellular data. When we return home, we will automatically switch to wifi. When we enter public places such as companies, restaurants, and coffee shops, we will automatically switch to wifi. Switched to cellular data after leaving. Each switch will bring changes in the source IP and port number, each switch will cause the current connection to be disconnected, and each switch will cause a short period of no network, the video will freeze, and the web page will fail to load... Connection migration of the QUIC protocol The function solves this problem very well. QUIC uniquely identifies the connection based on the connection ID. When the source address changes, QUIC can still guarantee the survival of the connection and the normal sending and receiving of data.


From the above figure, we can see that when the connection migration occurs, the connection ID ( conn_id ) does not change. Although the source IP changes after switching the network channel (WIFI -> Cellular), the QUIC server will come based on the connection ID. Determine whether it is the same QUIC connection.
It is worth noting that there is a certain risk of attack in connection migration. In order to prevent third-party attacks, the protocol stipulates that address verification needs to be performed before sending subsequent data to confirm the reliability of the peer. This process is completed through PC frame (Path Challenge) and PR frame (Path Response).





2.2.2 Difficulties in implementing connection migration
Let's first look at the most common QUIC architecture


Client data passes through the four-layer load balancer, then through the seven-layer gateway cluster, and finally reaches the business back-end server. We deploy the QUIC client on the client side, and deploy the QUIC server in the layer-7 gateway cluster container.
Question one
The client is prone to "dead wait" situations


Solution: The status of the network card can be obtained (the user needs to authorize the APP), and IP2 can also be used to send a detection message to notify the server to perform connection migration.
question two
四层负载均衡器( DPVS )需要将同一条连接上的数据包负载均衡到同一个七层网关容器中。
当连接迁移发生时,需要保证连接不断的同时,也要保证连接的正确性。也就是连接迁移之前:客户端 A 是与 服务端 B 进行通信的,发生连接迁移后,我们需要保证客户端 A 的数据包仍然必须发送到服务端 B 上,不能发送到其他服务端上(这样会导致连接出现错误)。
解决方案:传统意义上,四层负载均衡器一般按照四元组(源 IP、源 PORT、目的 IP、目的 PORT )进行一致性哈希选择后端七层网关,但是连接迁移后,四元组发生变化,所以会哈希到其他的七层网关。要解决这个问题,四层负载均衡器就不能再以四元组进行哈希,而是根据 QUIC 协议的连接 ID 进行选择上游七层网关。
我们使用一个讨巧的方案来解决这个问题:服务端在生成SCID时,将本端的IP和端口号进行编码,作为 SCID 的一部分。
四层负载均衡器需要解析客户端发送的 QUIC 包中的 DCID(服务端的 SCID 在客户端发送的包中是  DCID ) ,这样即可保证四层负载均衡器将连接迁移后的数据包发送到同一个服务端。
问题三
在七层网关是多核的情况下,内核如何将同一个连接的数据交给同一个进程
解决方案:内核传统的做法是,通过四元组进行哈希,找到 socket fd,同一个 fd 对应着唯一的应用层进程。通过 eBPF 修改这一过程,通过连接 ID 进行哈希。


2.3 更优秀的拥塞控制算法

QUIC 协议的拥塞控制算法的优势,主要表现在两个方面:
第一:灵活性
QUIC 协议可以针对连接的每个流进行配置不同的拥塞控制算法,我们知道每个拥塞控制算法都有各自的适应场景,换句话说,不同的业务场景,不同的网络环境用不同的拥塞控制算法更为合适。在传统 TCP 连接里,拥塞控制算法的选择是在内核中进行配置。
linux 上查询当前系统支持的拥塞控制算法:
   
   
   
   
   
sysctl net.ipv4.tcp_available_congestion_control
linux 上查询系统当前正在使用的拥塞控制算法:
   
   
   
   
   
sysctl net.ipv4.tcp_congestion_control
linux 上修改当前系统使用的拥塞控制算法(以bbr算法为例):
   
   
   
   
   
echo "net.ipv4.tcp_congestion_control=bbr" >> /etc/sysctl.conf
可见,一旦配置为某个拥塞控制算法,那么这台服务器上所有的业务所有的连接都只能使用该拥塞控制算法;
而 QUIC 不同,由于实现在应用层,我们可以随时更改拥塞控制算法,也可以对每个连接中不同的流使用不同的拥塞控制算法。
第二:更精确性
TCP 为了保证可靠性,使用了基于字节序号的 Sequence Number 及 Ack 来确认消息的有序到达。
QUIC 同样是一个可靠的协议,它使用 Packet Number 代替了 TCP 的 sequence number,并且每个 Packet Number 都严格递增,也就是说就算 Packet N 丢失了,重传的 Packet N 的 Packet Number 已经不是 N,而是一个比 N 大的值。而 TCP 呢,重传 packet 的 sequence number 和原始的 packet 的 Sequence Number 保持不变,也正是由于这个特性,引入了 TCP 重传的歧义问题。


如上图的左流程,TCP 重传的歧义问题,就会导致 RTT 或者过大或者过小,而 RTT 是拥塞控制算法的重要输入参数,在 RTT 不准确的情况下,拥塞控制算法就无法做到精确。
QUIC 由于 Packet Number 严格递增,不会出现重传的歧义问题,拥塞控制算法更为精确。
另外,在普通的 TCP 里面,如果发送方收到三个重复的 ACK 就会触发快速重传,如果太久没收到 ACK 就会触发超时重传,而 QUIC 使用 NACK  ( Negative Acknowledgement ) 可以直接告知发送方哪些包丢了,不用等到超时重传。TCP 有一个 SACK 的选项,也具备 NACK 的功能,QUIC 的 NACK 有一个区别它每次重传的报文序号都是新的。
但是单纯依靠严格递增的 Packet Number 肯定是无法保证数据的顺序性和可靠性。QUIC 又引入了一个 Stream Offset 的概念,即一个 Stream 可以经过多个 Packet 传输,Packet Number 严格递增,没有依赖。但是 Packet 里的 Payload 如果是 Stream 的话,就需要依靠 Stream 的 Offset 来保证应用数据的顺序。

2.4 两级流量控制

所谓流控,就是接收端需要控制发送端的发送速度,以免发送端发送速度过快,导致自己“无能力”接收。TCP 的流量控制是经典的“滑动窗口”算法。但由于 TCP 的队头阻塞问题,一旦有某个 ACK 包丢了,就会导致整条连接上窗口无法向右滑动,很快就会出现“零窗口”的情况,此时数据无法再进行发送。


QUIC 采用两级流量控制,连接和流都进行流量控制。两级流量控制并不是 QUIC 协议的专属,HTTP/2 也同时提供流级和连接级别的流量控制。
流级流量控制就是 QUIC 某一条流接收端告诉另外一端可以接受多少这种流多少数据。针对的是特定流号的流,而不是整个链接。本质来说,就是接收端告诉对端最多能发到偏移到多少的流数据。例如,某一条流 N 告诉接收到可以到偏移200字节的位置。但是发送端已经发送150字节,那么发送端最多就只能发送50字节。等发送端把150字节处理完毕,又重新发送 WINDOW_UPDATE 到400字节的偏移。发送收到后,已经发送150,那么就再能发送250字节。
流级别的流量控制虽然能起到控制流量的效果,但是不够充分,数据发送端可以在同一个连接创建多条流来发送数据,每条流都达到最大值的攻击方法。因此还需要连接级别的流量控制。
连接级别流量控制和流级别的一样,但是消耗字节,最大接收偏移都是穿插所有流,是所有流的最大值或者总和。


2.5 流的多路复用

TCP 的有序性带来了队头阻塞问题,一条连接上,其中一个 packet 丢失了之后,该后续 packet 必须等到丢失的 packet 重传之后,把完整有序的数据交给应用层。这在多并发请求时候带来的影响很大,假设我们在一条连接上需要发送多个请求,Request 1 其中一个 packet 丢了之后,在其被重传成功之前,这条连接后面所有的所有的数据都不能被正常交付给应用层,即使 request 2 的所有数据都能按序到达,也即是请求之间会互相影响。
QUIC 协议由于基于 UDP,不存在这样的问题,假设 Request 1 的某个 packet丢了,它只会影响到 Request 1,不会影响并发的其他请求。



2.6 QUIC/TCP 多路竞速

据业内统计,全球有7%地区的运营商对 UDP 有限速或者禁闭,除了运营商还有很多企业、公共场合也会限制UDP流量甚至禁用 UDP。这对使用 UDP 来承载 QUIC 协议的场景会带来致命的伤害。
对此,OPPO 的 QUIC 协议采用多路竞速的方式使用 TCP 和 QUIC 同时建连。除了在建连进行竞速以外,还可以对网络 QUIC 和 TCP 的传输延时进行实时监控和对比,如果有链路对 UDP 进行了限速,可以动态从 QUIC 切换到 TCP。



2.7 QUIC 的 PING 帧

为了实时探测 QUIC 连接的“活性”,防止使用“坏死”连接导致请求失败,OPPO 的 QUIC 实现了自己的 PING 帧机制。不同于 HTTP/2 的 PING Request 和 PING Response 机制,QUIC 的 PING 帧的接收方只需要应答( ACK )包含该帧的包。
当连接建立后,就开始发送 PING 帧,PING帧间隔时间为5s、10s、15s。如果连续三次 PING 帧都无 ACK,就主动断开连接,并且发送 CC 帧( Connection Close )给服务端,服务端收到 CC 帧后释放连接资源。
PART

03
 QUIC 在 OPPO 的应用


通过弱网实验测试,QUIC 在开启 0-RTT 时,其延迟要比 HTTP 降低20%,比 HTTPS 要降低50%以上。现在主要在海外商店、小布助手等多个业务上线使用 QUIC。


在海外软件商店大规模灰度上线后,接口成功率提升3%~13%,秒开率提升2%~19%。


PART

04
 后记


OPPO 的 QUIC 协议从2020年开始进行研究,然后经过持续两年的迭代优化,现在与Google、华为、腾讯等大厂的 QUIC 协议的性能水平基本一致。经过在海外软件商店等业务长达2年的灰度验证,稳定性得到了严格的验证,目前也有多个业务决定全量接入 QUIC 协议。
我们 QUIC 团队也在继续致力于网络传输优化领域的研究,希望能为 OPPO 的更多业务继续贡献自己的力量。

作者介绍

Longyan LI      OPPO 高级工程师

2020年加入 OPPO 后, 从0-1建设了 OPPO 的 QUIC 协议,并在多个业务落地,取得良好效果。曾供职于华为,拥有多年网络协议栈的开发经验。

About AndesBrain

安第斯智能云
OPPO 安第斯智能云(AndesBrain)是服务个人、家庭与开发者的泛终端智能云,致力于“让终端更智能”。作为 OPPO 三大核心技术之一,安第斯智能云提供端云协同的数据存储与智能计算服务,是万物互融的“数智大脑”。

本文分享自微信公众号 - 安第斯智能云(OPPO_tech)。
如有侵权,请联系 [email protected] 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

工信部:不得为未备案 App 提供网络接入服务 Go 1.21 正式发布 阮一峰发布《TypeScript 教程》 Vim 之父 Bram Moolenaar 因病逝世 某国产电商被提名 Pwnie Awards“最差厂商奖” HarmonyOS NEXT:使用全自研内核 Linus 亲自 review 代码,希望平息关于 Bcachefs 文件系统驱动的“内斗” 字节跳动推出公共 DNS 服务 香橙派新产品 Orange Pi 3B 发布,售价 199 元起 谷歌称 TCP 拥塞控制算法 BBRv3 表现出色,本月提交到 Linux 内核主线
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4273516/blog/8597013