[Network Communication and Information Security] In-depth analysis of how many HTTP requests a TCP connection can send related issues

Intrigue

  • There was once an interview question: What happened when the URL was input to the page in the browser?
  • I believe most students who have prepared can answer it, but if you continue to ask: If the received HTML contains dozens of picture tags, in what way, in what order, how many connections are established, and what protocol is used to download them What about?
  • To understand this problem, we need to solve the following five problems:
    • ① Will modern browsers disconnect after an HTTP request is completed after establishing a TCP connection with the server? Under what circumstances will it be disconnected?
    • ② How many HTTP requests can one TCP connection correspond to?
    • ③ Can HTTP requests be sent together in a TCP connection (for example, three requests are sent together, and three responses are received together)?
    • ④ Why sometimes refresh the page without re-establishing the SSL connection?
    • ⑤ Is there a limit to the number of TCP connections established by the browser to the same Host?

Destroy one by one

1. Modern browsers will disconnect after an HTTP request is completed after establishing a TCP connection with the server? Under what circumstances will it be disconnected?
  • In HTTP 1.0, a server will disconnect the TCP connection after sending an HTTP response. However, each request will re-establish and disconnect the TCP connection, which is too costly.
  • Although it is not set in the standard, some servers support Connection: keep-alive header. This means that after completing this HTTP request, do not disconnect the TCP connection used by the HTTP request.
  • The advantage of this is that the connection can be reused, and there is no need to re-establish the TCP connection when sending HTTP requests, and if the connection is maintained, the overhead of SSL can also be avoided. The following two pictures are two visits to https in a short time:/ /www.github.com time statistics;
  • For the first visit, there is an initial connection and SSL overhead, as shown below:

Insert picture description here

  • The second initial connection and SSL overhead disappeared, indicating that the same TCP connection was used:

Insert picture description here

  • Persistent connection: Since there are so many benefits of maintaining TCP connections, HTTP 1.1 writes the Connection header into the standard and enables persistent connections by default. Unless the request states Connection: close, then the browser and server will maintain TCP for a period of time. The connection will not be disconnected after a request ends.
  • So the answer to the first question is: TCP connection will not be disconnected by default. Only when Connection: close is declared in the request header will the connection be closed after the request is completed.
2. How many HTTP requests can one TCP connection correspond to?

After understanding the first question, in fact, this question has an answer. If the connection is maintained, one TCP connection can send multiple HTTP requests.

3. Can HTTP requests be sent together in a TCP connection (for example, three requests are sent together, and three responses are received together)?
  • There is a problem with HTTP 1.1. A single TCP connection can only process one request at the same time, which means: the life cycle of two requests cannot overlap, and the time from start to end of any two HTTP requests cannot overlap in the same TCP connection .
  • Although the HTTP 1.1 specification stipulates Pipelining to try to solve this problem, this feature is turned off by default in the browser.
  • Let's take a look at what Pipelining is. RFC 2616 stipulates:
A client that supports persistent connections MAY "pipeline" its requests (i.e., send multiple requests without waiting for each response). A server MUST send its responses to those requests in the same order that the requests were received. 一个支持持久连接的客户端可以在一个连接中发送多个请求(不需要等待任意请求的响应)。收到请求的服务器必须按照请求收到的顺序发送响应。
  • As for why the standard is set in this way, we can roughly speculate on one reason: because HTTP 1.1 is a text protocol, and the returned content cannot distinguish which request corresponds to, so the order must be consistent. For example, two requests GET/query?q=A and GET/query?q=B are sent to the server, and the server returns two results. There is no way for the browser to determine which request the response corresponds to based on the response result.
  • The idea of ​​Pipelining looks better, but many problems will arise in practice:
    • Some proxy servers cannot handle HTTP Pipelining properly.
    • The correct pipeline implementation is complicated.
    • Head-of-line Blocking: After establishing a TCP connection, it is assumed that the client continuously sends several requests to the server during this connection. According to the standard, the server should return results in the order in which the requests are received. Assuming that the server spends a lot of time processing the first request, all subsequent requests need to wait for the end of the first request to respond.
  • Therefore, modern browsers do not enable HTTP Pipelining by default.
  • However, HTTP2 provides Multiplexing multiple transmission features, which can complete multiple HTTP requests simultaneously in a TCP connection. As for how to implement multiplexing is another question. We can look at the effect of using HTTP2.
  • As shown in the following figure: the green is the waiting time from initiating the request to the return of the request, and the blue is the download time of the response. You can see that they are all completed in the same Connection in parallel:

Insert picture description here

  • So this question also has an answer: Pipelining technology exists in HTTP 1.1 to complete multiple requests sent at the same time, but because the browser is closed by default, this can be considered infeasible. In HTTP2, due to the multiplexing feature, multiple HTTP requests can be performed in parallel in the same TCP connection.
  • So in the HTTP 1.1 era, how do browsers improve page loading efficiency? There are mainly the following two points:
    • Maintain the established TCP connection with the server, and process multiple requests sequentially on the same connection.
    • Establish multiple TCP connections with the server.
4. Why sometimes refresh the page without re-establishing the SSL connection?

In the discussion of the first question, the answer is already available. Sometimes the TCP connection is maintained by the browser and the server for a period of time. TCP does not need to be re-established, and SSL will naturally use the previous one.

5. Does the browser limit the number of TCP connections to the same Host?
  • Suppose we are still in the HTTP 1.1 era, when there is no multiplexing, what should we do when the browser gets a webpage with dozens of pictures? You must not only open a TCP connection to download sequentially, so users will definitely be uncomfortable to wait, but if each picture opens a TCP connection to send an HTTP request, then the computer or server may not be able to bear it. If there are 1000 pictures, they can’t open it. 1000 TCP connections, your computer will not necessarily agree to NAT.
  • So the answer is: yes. Chrome allows up to six TCP connections to the same Host. There are some differences between different browsers. Please refer to: Network Issues Guide
  • So back to the original question, if the received HTML contains dozens of image tags, in what way, in what order, how many connections are established, and what protocol is used to download them?
  • If the pictures are all HTTPS connections and under the same domain name, the browser will discuss with the server whether HTTP2 can be used after the SSL handshake, and if possible, use the Multiplexing function to multiplex the connection. However, it is not necessarily that all resources linked to this domain name will be obtained using a TCP connection, but it is certain that multiplexing is likely to be used.
  • What if you find that you cannot use HTTP2? Or HTTPS cannot be used (in reality, HTTP2 is implemented on HTTPS, so only HTTP 1.1 can be used).
  • The browser will establish multiple TCP connections on one HOST. The maximum number of connections depends on the browser settings. These connections will be used by the browser to send new requests when they are idle. If all connections are sending requests What? Then other requests can only wait.

Guess you like

Origin blog.csdn.net/Forever_wj/article/details/108985763