HTTP HTTP1.0/HTTP1.1 (pipline)/HTTP2.0 (multiplexing)/HTTP3.0 (QUIC)

foreword

Compared with HTTP/1, HTTP/2 can be said to have greatly improved the performance of web pages. It only needs to be upgraded to this protocol to reduce a lot of performance optimization work that needs to be done before. Of course, compatibility issues and how to gracefully degrade should be domestic. One of the reasons it is not commonly used.

Although HTTP/2 improves the performance of web pages, it does not mean that it is perfect. HTTP/3 was launched to solve some problems existing in HTTP/2.

1. HTTP protocol

HTTP protocol is the abbreviation of HyperText Transfer Protocol (Hypertext Transfer Protocol), which is the most widely used network protocol on the Internet. All WWW documents must comply with this standard. With the birth of computer networks and browsers, HTTP1.0 also followed. It is located in the application layer of computer networks. HTTP is based on the TCP protocol, so the bottleneck of the HTTP protocol and its optimization techniques are based on the TCP protocol. Its own characteristics , such as the 3-way handshake of tcp connection establishment and the 4-way handshake of disconnection, and the RTT delay time caused by each connection establishment.

2. Defects of HTTP/1.x

Connections cannot be reused : Connections that cannot be reused will cause each request to go through a three-way handshake and slow start. The three-way handshake has a more obvious impact in high-latency scenarios, and slow start has a greater impact on a large number of small file requests (requests are terminated before reaching the maximum window).

  • When HTTP/1.0 transmits data, the connection needs to be re-established every time, which increases the delay.
    Although some connections can be reused by adding keep-alive to HTTP/1.1 , multiple connections still need to be established in cases such as domain name fragmentation, which consumes resources and brings performance pressure to the server. Head-Of-Line

  • Blocking (HOLB): The bandwidth cannot be fully utilized, and subsequent health requests are blocked. HOLB refers to a series of packages (package) because the first package is blocked; when a lot of resources need to be requested in the page, HOLB (head of line blocking) will cause the remaining resources to wait for other resource requests when the maximum number of requests is reached Complete the request before initiating it.

  • HTTP 1.0: The next request must be sent after the previous request returns, and the request-response pairs occur in order. Obviously, if a request does not return for a long time, all subsequent requests will be blocked.

  • HTTP 1.1: Try to use pipeling to solve it, that is, the browser can send multiple requests at one time (same domain name, same TCP connection). But pipeline requires that the return is in order, so if the previous request is time-consuming (such as processing a large image), then even if the server has finished processing the subsequent requests, it will still wait for the previous requests to be processed before returning in order. Therefore, piping
    only partially solves HOLB.

insert image description here

As shown in the figure above, the requests circled in red have been suspended for a period of time because the number of domain name links has exceeded the limit.

  • High protocol overhead: When HTTP1.x is used, the content carried in the header is too large, which increases the transmission cost to a certain extent, and the header does not change much for each request, especially when the mobile terminal increases user traffic.
  • Security factor: When HTTP1.x transmits data, all transmitted content is plain text, and neither the client nor the server can verify the identity of the other party, which cannot guarantee data security to a certain extent

3. SPDY protocol

Because of HTTP/1.x problems, we will introduce sprite images, inline small images, use multiple domain names, etc. to improve performance. However, these optimizations bypassed the protocol. Until 2009, Google released its self-developed SPDY protocol, which mainly solved the problem of low efficiency of HTTP/1.1. Google's launch of SPDY is the official transformation of the HTTP protocol itself. Reducing latency, compressing headers, etc., the practice of SPDY has proved the effect of these optimizations, and finally brought about the birth of HTTP/2.

After the SPDY protocol proved feasible on the Chrome browser, it was regarded as the basis of HTTP/2, and the main features were inherited in HTTP/2.

4. Introduction to HTTP/2

In 2015, HTTP/2 was released. HTTP/2 is a replacement for the current HTTP protocol (HTTP/1.x), but it is not a rewrite. HTTP methods/status codes/semantics are the same as HTTP/1.x. HTTP/2 is based on SPDY3 and focuses on performance. One of the biggest goals is to use only one connection between the user and the website.

HTTP/2 consists of two specifications (Specification):

Hypertext Transfer Protocol version 2 - RFC7540
HPACK - Header Compression for HTTP/2 - RFC7541

5. New features of HTTP/2

1. Binary transfer

HTTP/2 transmits data in a binary format instead of the text format of HTTP 1.x, and the binary protocol is more efficient to parse. HTTP/1 request and response messages are composed of a start line, a header and an entity body (optional), and each part is separated by a text line break. HTTP/2 splits request and response data into smaller frames, and they are encoded in binary.

Next we introduce several important concepts:

  • Stream : A stream is a virtual channel in a connection that can carry bidirectional messages; each stream has a unique integer identifier (1, 2...N);
  • Message : Refers to logical HTTP* messages: such as requests, responses, etc., consisting of one or more frames.
  • Frame : The smallest unit of HTTP 2.0 communication. Each frame contains a frame header, and at least identifies the stream to which the current frame belongs, carrying specific types of data, such as HTTP headers, payload, etc.

insert image description here

In HTTP/2, all communication under the same domain name is completed on a single connection, which can carry any number of bidirectional data streams. Each stream of data is sent as a message, which in turn consists of one or more frames. Multiple frames can be sent out of order, and can be reassembled according to the stream identifier in the frame header.

2. Multiplexing

Multiplexing technology was introduced in HTTP/2. Multiplexing solves the problem that browsers limit the number of requests under the same domain name, and it also makes it easier to achieve full-speed transmission. After all, opening a new TCP connection requires slowly increasing the transmission speed.

In HTTP/2, with binary framing, HTTP/2 no longer relies on TCP connections to achieve multi-stream parallelism. In HTTP/2:

All communication under the same domain name is done on a single connection. A single connection can carry any number of bidirectional data streams.
The data stream is sent in the form of a message, and the message is composed of one or more frames, and multiple frames can be sent out of order, because it can be reassembled according to the stream identifier in the frame header.

This feature greatly improves performance:

  • The same domain name only needs to occupy one TCP connection, and use one connection to send multiple requests and responses in parallel, eliminating the delay and memory consumption caused by multiple TCP connections.
  • Multiple requests are sent in parallel and interleaved without affecting each other.
  • Send multiple responses interleaved in parallel without interfering with each other.
  • In HTTP/2, each request can carry a 31bit priority value, 0 means the highest priority, and the larger the value, the lower the priority. With this priority value, the client and server can adopt different strategies when processing different streams, and send streams, messages, and frames in an optimal way.

insert image description here

As shown in the figure above, multiplexing technology can transmit all request data through only one TCP connection.

3. Header compression

In HTTP/1, we transmit the header in the form of text. If the header carries a cookie, hundreds to thousands of bytes may need to be repeatedly transmitted each time.

In order to reduce resource consumption and improve performance, HTTP/2 adopts a compression strategy for these headers:

  • HTTP/2 uses "header tables" on the client and server to track and store previously sent key-value pairs, and for the same data, it is no longer sent through each request and response;
  • The header table always exists during the connection lifetime of HTTP/2, and is updated progressively by both the client and the server;
  • Each new header key-value pair is either appended to the end of the current table or replaces the previous value in the table

For example, in the two requests in the figure below, the first request sends all the header fields, and the second request only needs to send the difference data, which can reduce redundant data and reduce overhead
insert image description here

4. Server Push

Server Push means that the server can push the content required by the client in advance, also called "cache push".

It is conceivable that the client will definitely request certain resources. At this time, the server-side push technology can be used to push the necessary resources to the client in advance, so that the delay time can be relatively reduced. Of course, you can also use prefetch if the browser is compatible.
For example, the server can proactively push JS and CSS files to the client, instead of sending these requests when the client parses HTML.

The server can actively push, and the client also has the right to choose whether to receive. If the resource pushed by the server has been cached by the browser, the browser can reject it by sending a RST_STREAM frame. Active push also abides by the same-origin policy. In other words, the server cannot push third-party resources to the client casually, but must be confirmed by both parties.

6. New features of HTTP/3

1. Introduction to HTTP/3

Although HTTP/2 solves many problems of previous versions, it still has a huge problem, which is mainly caused by the TCP protocol supported by the underlying layer.

As mentioned above, HTTP/2 uses multiplexing. Generally speaking, only one TCP connection needs to be used under the same domain name. But when packet loss occurs in this connection, it will cause the performance of HTTP/2 to be worse than HTTP/1.

Because in the case of packet loss, the entire TCP will start to wait for retransmission, which will cause all subsequent data to be blocked. But for HTTP/1.1, multiple TCP connections can be opened, and this situation will only affect one of the connections, and the remaining TCP connections can still transmit data normally.

Then some people may consider modifying the TCP protocol. In fact, this is already an impossible task. Because TCP has existed for too long, it has been flooded in various devices, and this protocol is implemented by the operating system, so it is not realistic to update it.

For this reason, Google started to create a UDP-based QUIC protocol, and used it on HTTP/3. HTTP/3 was previously named HTTP-over-QUIC. From this name, we can also find that HTTP The biggest transformation of /3 is the use of QUIC.

Although QUIC is based on UDP, it has added many new functions on the original basis. Next, we will focus on introducing several new functions of QUIC.

2. New features of QUIC

0-RTT (in the TLS encryption scenario)
caches the context of the current session by using a technology similar to TCP Quick Open. When the session is resumed next time, it only needs to pass the previous cache to the server for verification and then the transmission can be carried out. 0RTT connection establishment can be said to be the biggest performance advantage of QUIC over HTTP2. So what is 0RTT connection establishment?


0-RTT is the application of HTTPS encryption scenarios, not the simple RTT times of TCP three-way handshake, but refers to the version that the number of times of TCP+TLS handshake is less than that of HTTP2.0.


There are two meanings here:
the connection can be established at the transport layer 0RTT.
Encryption layer 0RTT can establish an encrypted connection.
insert image description here

The left side of the figure above is the establishment process of a complete handshake of HTTPS, which requires 3 RTTs. Even session multiplexing requires at least 2 RTTs.

And what about QUIC? Since it is based on UDP and realizes 0RTT security handshake, in most cases, only 0 RTT is needed to realize data transmission. On the basis of forward encryption, the success rate of 0RTT is equivalent to Much higher session score than TLS.

For details, refer to
QUIC's congestion control and 0-RTT connection establishment
How does QUIC achieve 0RTT

multiplexing

Although HTTP/2 supports multiplexing, the TCP protocol does not have this function after all. QUIC natively implements this function, and the transmitted single data stream can guarantee orderly delivery without affecting other data streams. This technology solves the problems existing in TCP before.

Like HTTP2.0, multiple streams can be created on the same QUIC connection to send multiple HTTP requests. However, QUIC is based on UDP, and there is no dependency between multiple streams on one connection. For example, if stream2 loses a UDP packet in the figure below, it will not affect the following Stream3 and Stream4, and there is no TCP head-of-line blocking. Although the packet of stream2 needs to be retransmitted, the packets of stream3 and stream4 can be sent to the user without waiting.

insert image description here

In addition, QUIC will perform better than TCP on the mobile side. Because TCP identifies connections based on IP and port, this method is very fragile in the changing mobile network environment. But QUIC uses ID to identify a connection. No matter how your network environment changes, as long as the ID remains the same, you can quickly reconnect.

encrypted authentication message

The TCP protocol header has not undergone any encryption and authentication, so it is easy to be tampered, injected and eavesdropped by intermediate network devices during transmission. Such as modifying the serial number and sliding window. These behaviors may be for performance optimization, or they may be active attacks.

But QUIC's packet can be said to be armed to the teeth. Except for individual packets such as PUBLIC_RESET and CHLO, all packet headers are authenticated and packet bodies are encrypted.

In this way, as long as there is any modification to the QUIC message, the receiving end can find it in time, effectively reducing the security risk.

insert image description here

As shown in the figure above, the red part is the packet header of the Stream Frame, which has authentication. The green part is the content of the message, all of which are encrypted.

Forward Error Correction Mechanism

The QUIC protocol has a very unique feature called forward error correction (Forward Error Correction, FEC). In addition to its own content, each data packet also includes some other data packets, so a small amount of packet loss Can be directly assembled from redundant data of other packets without retransmission. Forward error correction sacrifices the upper limit of data that can be sent in each packet, but reduces the data retransmission caused by packet loss, because data retransmission will consume more time (including confirmation packet loss, request retransmission, Time spent waiting for steps such as new packets)

If I want to send three packets this time, the protocol will calculate the XOR value of these three packets and send a verification packet separately, that is, a total of four packets are sent. When the non-verification packet is lost, the content of the lost data packet can be calculated through the other three packets. Of course, this technique can only be used when one packet is lost. If multiple packets are lost, the error correction mechanism cannot be used, and only retransmission can be used.

7. Summary

HTTP/1.x has multiple defects such as connection incapability of multiplexing, head-of-line blocking, high protocol overhead, and security factors.
HTTP/2 has greatly improved performance through technologies such as multiplexing, binary streams, and Header compression, but it is still The problematic
QUIC is implemented based on UDP, which is the underlying support protocol in HTTP/3. This protocol is based on UDP and takes the essence of TCP to realize a fast and reliable protocol.

Read the original article to understand the features of HTTP/2 and HTTP/3

Guess you like

Origin blog.csdn.net/weixin_44477424/article/details/132029535