STGW Next Generation Internet Standard Transmission Protocol QUIC Large-scale Operation Road

Author: wentaomao, Tencent TEG background development engineers

Preface

As the next-generation Internet standard transmission protocol, QUIC can significantly increase the speed of business access, increase the success rate of weak network requests, and improve the smooth experience in network change scenarios.

As a company-level 7-layer access gateway and the underlying support framework of Tencent Cloud CLB (load balancer), STGW provides trillions of request services for the company’s internal business and Tencent Cloud’s external customers every day. Transmission efficiency and operational reliability have very strict requirements.

This article mainly introduces some experience and development work in STGW large-scale operation of QUIC.

Introduction to QUIC

The birth and development of QUIC

Before the birth of QUIC, the HTTP protocol went through several important upgrades:

HTTP1.0 -> HTTP1.1: Long connection support is added, which greatly improves the performance in long connection scenarios.

HTTP -> HTTPS: Adding security features will have a certain impact on the performance of the request.

HTTP1.1 -> HTTP2: The main feature is multiplexing and header compression, which improves the concurrency of a single connection.

These important changes are all around security and performance, and have played a very important role in the application and development of the HTTP protocol. However, it did not bypass the limitations of the kernel TCP, resulting in a bottleneck in the development of its protocol.

After leading the industry from HTTP1.1 to HTTP2 (the standard version of the GOOGLE SPDY protocol), GOOGLE once again took the lead. In 2012, it proposed the experimental QUIC protocol and used UDP to reconstruct the TLS and TCP protocols for the first time. The QUIC protocol is not only applied to HTTP. In addition to HTTP, QUIC is designed as a general transport layer protocol. In terms of security, GOOGLE designed the QUIC encryption protocol as a handshake protocol to solve the security problems of the QUIC protocol. Generally speaking, QUIC handshake protocol + QUIC transport layer + HTTP2 is what we often call GQUIC (here refers to the web part). The version of the GQUIC protocol continues to evolve. Starting from Q46, the GQUIC protocol has also continued to move closer to IETF QUIC and HTTP3.

In 2015, the QUIC network draft was officially submitted to the Internet Engineering Task Force, which means that a new QUIC protocol standard will be born. During the drafting of the standard QUIC protocol, the standard HTTP protocol on the QUIC protocol was also drafted as HTTP3. As the standard handshake protocol of QUIC, IETF applies TLS1.3 to it. TLS1.3+QUIC+HTTP3, this is what we often call IETF QUIC (here refers to the web part). Up to now, the draft of the QUIC standard has been updated to version 34, and no formal RFC has been formed. However, QUIC has entered the IETF for the final comment, and it is expected that the standard QUIC/HTTP3 protocol will come out soon.

Key features of QUIC

There are many related articles about the principle of QUIC. Here are the important features of QUIC. These characteristics are the key to the wide application of QUIC. Different businesses can also use QUIC features to make some optimizations based on business characteristics. At the same time, these features are also the entry point for us to provide QUIC services.

Low connection delay: QUIC is based on UDP and does not require TCP connection. In the best case, QUIC can enable 0RTT to enable data transmission under short connections. However, TCP-based HTTPS still requires 1RTT to enable data transmission even under the best TLS1.3 early data. For the current common online TLS1.2 complete handshake situation, 3RTT is required to enable data transmission. For RTT-sensitive services, QUIC can effectively reduce the connection establishment delay.
Customizable congestion control: QUIC transmission control no longer relies on the kernel's congestion control algorithm, but is implemented on the application layer, which means that we implement and configure different congestion control algorithms and parameters according to different business scenarios. The BBR congestion control algorithm proposed by GOOGLE is an algorithm with completely different ideas from CUBIC. In a weak network and certain packet loss scenarios, BBR is less sensitive than CUBIC and has better performance. Under QUIC, we can freely specify the congestion control algorithm and parameters according to the business, and even different connections of the same business can use different congestion control algorithms.
Head-of-line blocking: Although HTTP2 implements multiplexing, because it is based on byte stream-oriented TCP, once a packet is lost, all request streams under multiplexing will be affected. QUIC is based on UDP, so the head-of-line blocking problem is solved by design. At the same time, IETF designed the QPACK encoding to replace the HPACK encoding, which to a certain extent also alleviates the problem of header blocking in HTTP request headers. Headless blocking makes QUIC more powerful than TCP in a weak network and certain packet loss environments.
Connection migration: When the user's address changes, such as WIFI switch to 4G scenario, the TCP-based HTTP protocol cannot keep the connection alive. QUIC uniquely identifies the connection based on the connection ID. When the source address changes, QUIC can still ensure that the connection is alive and data is sent and received normally.

Choice of QUIC protocol stack

For the implementation of the protocol, the STGW and CDN business teams have implemented a complete HTTP2 protocol on LEGO (a high-performance forwarding framework developed by STGW and CDN). At the same time, STGW is the first in the industry to implement the TLS asynchronous proxy calculation solution. There is a lot of engineering and optimization experience for HTTP1.1/2 and TLS protocols. The self-study of the QUIC protocol stack is currently underway as planned, but it is not yet mature.

Based on the open source solution, this article will give you a brief introduction to the in-depth customization and optimization of the QUIC protocol stack.

Regarding the implementation of the QUIC protocol stack, the current functions are extensive, and there are not many implementations with complete protocol support. NGINX officially implements an experimental version at present, but many problems in this implementation have not been solved. At the same time, it only supports the latest IETF DRAFT, and even a complete congestion control algorithm has not been implemented. CLOUDFLARE's QUIC is implemented based on RUST, and its performance is not strong from public data.

Many other implementations such as MSQUIC, NGHTTP3, etc. only support IETF QUIC, not GQUIC.

GOOGLE is the pioneer of the QUIC protocol. Its CHROME-based QUIC protocol stack is the earliest, the most complete, and the most standard in implementation.

Regardless of the QUIC protocol stack, its access requires us to have a deep understanding of the basic features and concepts of QUIC. For example, common connections, streams, QUIC connection IDs, QUIC timers, unified schedulers, and so on. These concepts are closely related to the content of the QUIC protocol.

The following takes CHROMIUM QUIC as an example to integrate the QUIC protocol stack with the high-performance forwarding framework NGINX and LEGO:

CHROMIUM QUIC accesses high-performance framework NGINX/LEGO

STGW's work

As a company-level 7-layer access gateway and the underlying support framework of Tencent Cloud CLB (load balancer), STGW provides services for the company's internal business and Tencent Cloud's external customers with trillions of requests, and the performance and transmission efficiency of request processing , The high reliability of operation quality has very strict requirements.

To this end, we have done a lot of optimization and deep customization of the QUIC protocol stack to meet large-scale operations and business needs. The main tasks are:

Single machine and transmission performance optimization
1. QUIC protocol stand-alone performance/cost optimization: QUIC moves the protocol stack to the application layer. From the current public implementations, the performance is much worse than TCP. Optimizing the performance of the QUIC protocol is an important part of the large-scale promotion of the QUIC protocol.
2. Integration with high-performance forwarding framework: The current open source QUIC protocol stack implementation only provides single-core support, and only provides simple demos. For large-scale applications, we need to connect the QUIC protocol stack to the high-performance network forwarding framework we use, such as NGINX, LEGO, etc.
3. Transmission performance and customization of congestion control: Different services can be allowed to choose different congestion control algorithms according to service characteristics.
4. Achieve a safe high ratio of 0RTT to reduce service connection delay.
Customization and enhancement of features
1. How to make QUIC connection migration from theory to application: The connection ID of QUIC is a feature of the QUIC protocol, but in actual applications, it is not easy to do connection migration, and it is necessary to fully consider each path of the QUIC package. Even on the same machine, it needs to be correctly forwarded to the corresponding core.
2. QUIC private protocol support: QUIC is not only used for HTTP, as a general transport layer protocol, in addition to supporting GQUIC, IETF HTTP3, QUIC private protocol also needs to be provided to users.
3. QUIC customized SDK: In addition to the high-performance QUIC server, to use QUIC, you need the support of the client SDK. For this we have also developed the QUIC SDK and customized it for different scenarios.
4. Satisfy various customized needs of the business: For example, some businesses need QUIC plaintext transmission, and some businesses need QUIC back to the source.
High availability operation
1. Daily changes and smooth upgrades: When frequent configuration changes and module upgrades occur, we need to ensure that the QUIC connection is not damaged.
2. Packet capture analysis tool: analysis positioning is more convenient.
3. Statistical monitoring: QUIC's key statistical indicators need to be visualized.

We have carried out QUIC related work around these issues, and strive to achieve the best of QUIC features, QUIC operations, QUIC performance, and QUIC customization requirements.

QUIC processing performance optimization

The QUIC protocol is based on UDP and moves the characteristics of TCP from the kernel to the application layer. From the current various QUIC implementations, the performance is much worse than that of TCP. TCP has been widely used for a long time, which also makes it has undergone a lot of optimization from the protocol stack to the network card. Compared with it, the optimization of UDP is much less. In addition to the kernel and hardware, the performance of the QUIC protocol is also related to the implementation, and different implementation versions may vary greatly.

We conducted a detailed analysis of the performance of QUIC using various tools such as flame graphs, and found some key points that affect the performance of QUIC:

The overhead of cryptographic algorithms: For small packets, RSA accounts for a high proportion of calculations, and for large packets, symmetric encryption and decryption will also account for about 15%.
The overhead of UDP sending and receiving packets: especially for large file downloads, sendmsg accounts for a high proportion, which can reach more than 35%-40%.
Protocol stack overhead: It is mainly implemented by the protocol stack, such as ACK processing, MTU detection and packet size, memory management and copying, etc.
We made optimizations based on the key points that affect QUIC performance.

QUIC's RSA hardware OFFLOAD

In the small file request scenario, the calculation of RSA in the QUIC request is the same as HTTPS, which still consumes the most CPU overhead. RSA can use hardware offload for HTTPS requests, and during QUIC handshake, RSA can also use hardware offload.

A very important reason for using hardware to do RSA offloading is that the CPU has poor performance in calculating RSA, while the performance of accelerator cards dedicated to encryption and decryption is very strong. Generally speaking, the cost of a single RSA encryption and decryption card is about 5%-7% of a server, but its operating performance for RSA signatures is about 2-3 times that of a server. Just insert two cards into one machine. Brings 5 times RSA performance improvement.

The method of hardware offloading QUIC's RSA calculation on different QUIC protocol stacks is not the same. Here is a general RSA offloading scheme of RSA HOOK + ASYNC JOB. Its characteristic is that the code is less intrusive, and there is no need to modify too much quic protocol stack or openssl code.

After Openssl1.1.0, Async Job is supported. The Async job mechanism uses a coroutine-like approach to switch context to implement asynchronous tasks, and provides a notification mechanism to notify the returned results of asynchronous tasks.

The two important functions in Async Job are:

async_fibre_makecontext

async_fibre_swapcontext

They use setjmp, longjmp, setcontext, and makecontext calls to save and switch the current context, so as to achieve state retention and code jump capabilities.

Use RSA callback to intercept the RSA in the handshake process, and request RSA operations from the accelerator card locally or remotely in the RSA HOOK function. At the same time, the Async job method is used to make the synchronous mode asynchronous, so as to realize the asynchronous offload of the RSA operation.

RSA HOOK + Async Job on QUIC TLS1.3 to perform RSA offload.

After QUIC has performed the hardware OFFLOAD of the RSA, the performance has been greatly improved for small packet short connections.

Take CHROMIUM QUIC as an example. In the 1RTT scenario, QUIC uses RSA OFFLOAD and the performance is 256% of the original; in the 0RTT scenario, QUIC uses RSA OFFLOAD, and the performance is 205% of the original. This performance improvement will be more obvious in the implementation of the QUIC protocol stack with less overhead.

GSO optimization of QUIC package

In large file downloads, the logic of QUIC package sending accounts for a large proportion, usually over 35%-40%. Therefore, optimizing the outsourcing logic can improve the performance of large file transfers.

QUIC large file request flame diagram

GSO (Generic Segmentation Offload) started to support UDP after kernel 4.8. The principle is to allow data to cross the IP layer and link layer, and segment the data before it leaves the protocol stack and enters the network card driver, whether it is TCP or UDP. Segmentation (each packet has a TCP/UDP header attached), so that when a segment is lost, there is no need to send the entire TCP/UDP message. At the same time, the CPU consumption on the path will also be reduced.

If the network card hardware supports UDP segmentation, it is called GSO OFFLOAD, which puts the segmentation work on the network card hardware, which can further save the CPU.

Schematic diagram of GSO principle

In the implementation of the QUIC protocol, in order not to perform MTU fragmentation, the transmitted data is usually fragmented before the packet is sent, so that there is no need to perform MTU fragmentation. For large packets, QUIC will control each packet to about 1400 bytes, and then send it out via sendmsg. When sending large files, this performance is very low. If you send a large packet without segmentation during sendmsg, and then use the kernel GSO to delay segmentation, it will reduce the CPU consumption of the path.

Throughput when using GSO to send packets of different sizes

It can be seen from the table that using GSO to send 20 consecutive packets for sendmsg, compared to 1400 packets sent separately, the performance is improved by 2-3 times.

In the actual QUIC scenario, sending a package is not the entire logic of QUIC. At the same time, not every time you send a package, you can get 20 consecutive packages. We used GSO to perform QUIC stress tests on large files. Under the same CPU usage, the throughput increased by about 15%-20%.

Optimization of QUIC protocol stack

The performance of the QUIC protocol stack is related to the implementation of the QUIC protocol stack. For some common protocol stack implementations, the optimization space mainly includes:

Some implementations such as CHROMIUM add an additional RSA calculation in the 0RTT and 1RTT requests, and this redundant RSA calculation can be removed. After optimization, the RSA calculations of 0RTT and 1RTT are 0 and 1, respectively.
The server will receive and process a large number of ACKs during large file downloads. In ACK processing, there is no need to receive one by one and process one by one. All ACKs in a round can be parsed and then processed at the same time.
The size of a sent packet is as close as possible to the MTU, and the QUIC protocol itself also provides the feature of MTU detection.
Minimize the memory copy of the protocol stack.

The following figure shows the performance comparison before and after optimization of the protocol stack in a small file 0RTT request scenario:

Before optimization

Optimized

4.4 Summary of QUIC performance optimization

At present, we seamlessly connect the QUIC protocol stack to the high-performance forwarding modules NGINX and LEGO. For small packet requests, the performance overhead of QUIC can basically reach more than 90% of HTTPS. If QUIC uses an accelerator card for RSA OFFLOAD, the performance is even better than native HTTPS. For large packet requests, the optimized QUIC CPU performance can reach 70% of HTTPS, but in most models, large file requests usually reach the bottleneck by the network card first. In general, the current performance of QUIC is no longer a big problem for large-scale deployment. Of course, there is still room for optimization, and we are continuing to optimize for this.

0RTT optimization of QUIC

The figure below shows a comparison between an HTTPS request and a QUIC request. It can be seen that an HTTPS request with a complete handshake has experienced 3 RTTs when the HTTP request is officially sent. The QUIC request can achieve 0RTT consumption before sending the HTTP request.

Why can QUIC achieve 0RTT? This is divided into QUIC handshake protocol and IETF QUIC's TLS1.3 protocol. Let's take the GQUIC handshake as an example, as shown below:

The user sends a client hello without server config in the first QUIC package. At this time, the server returns a REJ package containing server config and other information. After the user brings the server config and continues to send client hello, the server will return server hello. At this time, the handshake is established successfully.

The QUIC encryption handshake is based on DH's asymmetric key exchange algorithm for secret key exchange. Server config contains information such as the server's algorithm and public key. The client uses the server's public key and its own public and private keys to calculate and negotiate the symmetric key for the connection.

Therefore, for the first request, the client needs a 1RTT request to complete the first QUIC request without saving the server config information. In subsequent requests, the client can directly bring the previous server config to complete the 0RTT request.

Therefore, the key here is: how to increase the ratio of 0RTT. A typical scenario is that the server config information obtained by the same user in the first 1RTT request, in subsequent multiple requests, no matter which layer 7 STGW server is routed to, the corresponding server config can be processed as much as possible information. In this regard, we have tried many solutions, mainly:

1) Layer 4 forwards the same IP to the same layer 7 STGW server as much as possible through session maintenance. The disadvantages of this are: 1 the user's ip may change, and 2 the fourth-layer IP-based session retention conflicts with the connection ID-based session retention, which may lead to 0RTT improvement, and the connection migration feature may not be available.

2) Similar to the distributed session cache of HTTPS, the same cluster shares server config information through remote modules. This requires the introduction of new modules, and will bring a certain delay.

3) Similar to session ticket, it supports distributed stateless server config generation. In the actual process, multiple sets of SCFG can be generated according to the date and parameters, which further improves the usability and safety.

We have done a lot of work on 0RTT optimization. For some insensitive data transmission, we can achieve 100% 0RTT.

QUIC connection migration implementation

QUIC connection migration is a very important feature of the QUIC protocol. QUIC uses the connection ID to uniquely distinguish the connection in response to sudden changes in the user's network. A typical scenario is the switch between 4G and wifi. After that, the user's address changes and the original client fd is no longer available. At this time, you only need to use the QUIC SDK to recreate a new fd on the client, and continue the packet sending of the previous connection, and then the packet with the same connection ID can be sent out.

The user's QUIC packet may pass through many paths in the middle to reach the actual business server. We analyze the scenario of a typical Tencent business using the gateway:

After a QUIC request passes through the external network, it will first go to the four-tier TGW cluster, and then forward to a server in the seven-tier STGW cluster. After reaching the STGW server, the packet will reach a specific worker process for processing. After unloading the QUIC protocol, the Worker uses TCP or UDP to forward it to an RS for a specific service. If the user's QUIC connection suddenly changes its source address during processing, how can we continue to respond correctly and maintain this QUIC connection? Another scenario is: the user's source address has not changed, but the layer 7 STGW server needs to be configured and upgraded. Can the QUIC connection be maintained?

Layer 4 session persistence based on QUIC connection ID

When the user network address changes, although the source address changes, the QUIC connection ID can still remain the same. After passing through the intermediate network, the packet will first reach the TGW cluster. In order to ensure that the QUIC connection is maintained when the user's address changes, the TGW cluster needs to forward correctly.

The TGW cluster's session maintenance for QUIC needs to consider the difference between GQUIC and IETF QUIC. For GQUIC (below Q043), the implementation is relatively simple, because the connection ID in the GQUIC protocol is generated by the client and remains unchanged throughout the connection. For IETF QUIC, the connection ID is negotiated between the client and the server, and different scenarios such as the long header packet and the short header packet need to be considered. The following figure shows the negotiation process of IETF connection ID and the analysis of different types of GQUIC and IETF QUIC packets.

Negotiation process of connection ID under IETF

GQUIC package

IETF different types of packages

At present, the TGW cluster already supports QUIC session retention. The basic principle is to synchronize QUIC connection information between different TGW servers in the same cluster, and to distinguish between different QUIC protocols, and forward packets with the same QUIC connection ID to the same 7-layer STGW server. .

Seven-layer single-machine multi-core connection migration

When the packet reaches the STGW server, due to the multi-core forwarding of the STGW server, the QUIC packet needs to be forwarded to the same process (or thread) for processing at this time. Currently, the 7-layer network framework generally uses the multi-core + REUSEPORT model to provide high-performance forwarding capabilities. For the QUIC service, different upper processes use REUSEPORT to listen on the same UDP port. The LINUX kernel is based on a 4-tuple hash by default, so in the native case, packets with different source addresses cannot be guaranteed to reach the same process.

EBPF introduced the HOOK BPF_PROG_TYPE_SK_REUSEPORT in kernel 4.19, which can strategically control which accept queue a packet is sent to after it arrives. This makes it possible to use EBPF to realize the migration of QUIC connections caused by the user's source address change. The specific method is to parse the QUIC packet and connection ID at the REUSEPORT hook of EBPF, and forward the packet to the corresponding worker according to the QUIC connection ID.

Configuration loading and hot upgrade connection retention

Another typical scenario for QUIC connection migration is configuration loading and hot upgrade: When the STGW server process configuration changes or module upgrades are performed, the native NGINX can keep the TCP connection uninterrupted. But for UDP-based QUIC, without optimization, we cannot maintain the normal forwarding of packages during configuration changes and module upgrades.

Take NGINX configuration and hot upgrade changes as an example. When NGINX configuration changes and hot upgrades, a new set of workers will be generated. At the same time, the old workers enter the shutting down state, and the old connection states are in the old workers. At this time, the old and new workers share a set of fd. If the old worker closes fd monitoring, the connection to the old request will time out. If the old worker continues to monitor the fd, there is a problem that the new and old workers will read the same fd. This makes any new or old connection package may go to any worker, which will affect the old and new connections.

As a platform, STGW requires a lot of configuration changes every day, and some clusters even achieve configuration changes every few seconds. Although we have implemented dynamic configuration loading, most scenarios do not require reload programs, but there are still a few programs that need to be reloaded. At the same time, the hot upgrade scenario is also relatively common. If the configuration of reload or module upgrade will cause the existing QUIC connection to time out or be interrupted, it will inevitably have a great impact on the business.

NGINX configuration changes and worker packages received during hot upgrades

So, how to solve this problem?

The kernel-based EBPF scheme can better handle the scenarios of 4G and WIFI switching, but it is difficult to implement the scenarios of STGW server configuration changes and module upgrades.

To this end, STGW uses a shared memory-based QUIC connection migration solution, and uses shared memory to manage all connection information of different processes. At the same time, a packet queue is set up for each worker to receive the forwarding of packets from other processes connection migration.

It can be said that STGW currently fully supports the QUIC connection migration scenario where 4G and WIFI are mutually cut. At the same time, for large-scale online operations, continuous configuration changes and module upgrades will not affect the maintenance of QUIC connections.

STGW connection migration and connection retention scheme based on shared memory

Application scenarios for connection migration

All scenarios with high reconnection overhead can be said to be usage scenarios for QUIC connection migration.

For example, game, video and service channel transmission are more typical scenarios. When the user switches the WIFI network to 4G, using the original TCP scheme, there will be a connection re-establishment process after the network switch. Generally, after the reconnection, the business will have some initialization operations, which will consume a few or even a dozen RTTs. The phenomenon is that the application freezes or spins. After using the QUIC connection migration function, you can ensure the correct migration and survival of the connection during the switch between WIFI and 4G networks, without the need to establish a new connection, so that the fluency of the business will be greatly improved during network switching. .

Flexible congestion algorithm and TCP redefinition

The purpose of the TCP congestion control algorithm can be simply summarized as: make full use of network bandwidth, reduce network delay, and optimize user experience. However, for the time being, to achieve these goals, there are inevitably trade-offs and trade-offs. The congestion control algorithm of LINUX has gone through many iterations, and the mainstream is the CUBIC algorithm. After the Linux 4.19 kernel, the congestion control algorithm was changed from CUBIC to BBR.

Compared with the previous congestion control algorithm, the BBR algorithm has undergone a very significant change. BBR calculates the bandwidth and minimum RTT in real time to determine the sending rate pacing rate and window size cwnd. BBR mainly focuses on:

1) Fully utilize bandwidth on network links with a certain packet loss rate.

2) Reduce the buffer occupancy rate on the network link, thereby reducing delay.

BBR completely abandons packet loss as a direct feedback factor for congestion control, which also makes it not very sensitive to packet loss. Through testing, we have found that under the condition of simulating a network with a certain probability of packet loss, the download performance of BBR will be better than CUBIC for requests for QUIC large files.

QUIC puts congestion control at the application layer, which also allows us to flexibly choose different congestion control algorithms. At present, we support common CUBIC, BBR and other algorithms on QUIC, and realize independent configuration of services. According to different services, different congestion control algorithms are used for different VIPs requested. At the same time, we also support the dynamic selection of congestion control algorithms for RTT for different users of the same service. In addition, we also work closely with the congestion control algorithm team of CDN to optimize the business experience of the congestion control algorithm in different scenarios.

Service configurable congestion control algorithm

In addition to congestion control, QUIC can also be customized for specific application scenarios at the transport layer. QUIC brings the characteristics of TCP to the application layer, which also makes us have more operational possibilities on the transport layer. For example, referring to some common usages in the audio and video field, redundant data is sent when sending data. Under certain packet loss conditions, the QUIC transport layer can automatically restore the data without waiting for the retransmission of the data packet to reduce the audio and video jam rate. These are difficult to do based on TCP. If the business needs to redefine some of the functions or features of TCP to improve the business experience, QUIC will have a lot of room for development.

Support QUIC private protocol

STGW serves as a 7-layer gateway and provides general WEB protocol offloading and forwarding. Therefore, WEB protocols that support QUIC such as GQUIC and HTTP/3 are our basic capabilities.

However, as mentioned earlier, QUIC, as a general transport layer protocol, is not only applied to WEB, any proprietary protocol can be transformed to QUIC. After using the QUIC handshake protocol, the client can send GQUIC, HTTP/3 and other WEB requests according to its own business needs, or can send any private protocol of its own.

STGW is based on the STREAM module of NGINX and has undergone a deep transformation, so that any private protocol can run under the QUIC protocol. This also greatly increases the application scenarios of QUIC.

So, don't think that QUIC can't be used without HTTP protocol. As long as you understand the features of QUIC and feel that the features of QUIC can optimize your business experience, then try QUIC.

QUIC customized SDK

Since QUIC has not yet been standardized, the current threshold for using QUIC is relatively high.

On the client side, as Google open sourced its browser as the Chromium project, its network protocol stack Cronet has become the main reference object for QUIC clients in the industry. However, Cronet is not suitable for direct use on our mobile client due to limited API support, complex code, and difficulty in meeting individual needs. At the same time, QUIC is a protocol that is still developing at a high speed. Both the server and the client are iterating rapidly, and need to keep close follow-up.

Based on the above pain points and QUIC's growing popularity, we provide a more customized TQUIC SDK than Google QUIC. Compared with Cronet, TQUIC SDK has many advantages such as lighter volume, simpler to use, support for private protocols, connection migration and so on. At present, the TQUIC SDK has been used in multiple businesses within the company.

to sum up

This article comprehensively introduces some optimizations and results made by STGW in the large-scale application of QUIC protocol. current:

We deeply integrate the QUIC protocol stack and high-performance network framework, and support most QUIC functions and features such as the QUIC WEB protocol, QUIC private protocol, and out-of-band congestion control configuration. Meet the large-scale deployment and operation of QUIC.
We have done a lot of performance optimization for the QUIC protocol stack 0RTT, 1RTT, small packet, high bandwidth and other scenarios, and solved several bottlenecks that QUIC severely consumes CPU resources. The performance on small packet requests can basically reach 90% of HTTPS.
For RTT-sensitive short-connection services, we have greatly increased the proportion of 0RTT. In some scenarios, 100% 0RTT can be achieved.
More comprehensive connection migration solves connection migration problems in various scenarios such as layer 4, layer 7, multi-cluster, multi-machine, multi-process, process restart, reload, and module upgrade.
We provide a customized QUIC SDK for clients to meet various features of customized QUIC.

There are still many features of QUIC that need to be fully explored, such as QUIC itself based on UDP without header blocking and QPACK encoding on the HTTP/3 HTTP header to optimize the header blocking. These features will improve the performance and reduce the stall rate of the business in a weak network environment, especially in multiplexing scenarios. At present, we are also accumulating more actual data in combination with the business, and hope to have more optimization in this area.

QUIC and related HTTP/3 protocols are about to form the final standard, and we are also constantly following the evolution of the QUIC protocol.

STGW will continue to provide unified access and optimization of QUIC for self-developed businesses and Tencent Cloud CLB customers, helping businesses better improve user experience.

The long-awaited live broadcast of the "Programmer with Materials" series is finally here. For the first guest, we invited Robin Ru Bingsheng, an expert and best-selling author. He who has both work and entertainment and played "R&D efficiency" will be here. In this live broadcast, he shared with us his 15 years of working experience.

February 3 (Wednesday) 19:30
[Tencent Programmer] The first broadcast of the video account