HTTP long and short connections

 

1. What is a long connection

HTTP1.1 stipulates the default to maintain a long connection (HTTP persistent connection, also translated as persistent connection), the data transmission is completed to keep the TCP connection uninterrupted (no RST packet, no four-way handshake), waiting to continue to use this under the same domain name Channels transmit data; the opposite is short connections.

Connection: Keep-alive in the HTTP header is an experimental extension of HTTP1.0 browsers and servers. The current HTTP1.1 RFC2616 document does not describe it, because the functions it needs are already enabled by default, so there is no need to carry it, but In practice, it can be found that the message request of the browser will bring it. If the HTTP request message of the HTTP 1.1 version does not want to use a long connection, you need to add Connection: close to the header of the HTTP request message. "HTTP Authoritative Guide" mentions that some ancient HTTP1.0 proxies do not understand Keep-alive, which leads to the failure of long connections: client –> proxy –> server, the client has Keep-alive, but the proxy does not know , so the message was transferred to the server as it is, the server responded to Keep-alive, and was forwarded to the client by the proxy, so the "client->proxy" connection and "proxy->server" were maintained. The connection is not closed, but when the client sends the second request, the proxy will think that there will be no more requests for the current connection, so it ignores it, and the long connection fails. The book also introduces the solution: when the HTTP version is found to be 1.0, keep-alive is ignored, and the client knows that it should not use a long connection. In fact, there is no need to consider so much in actual use. In many cases, the proxy is controlled by ourselves, such as Nginx proxy. The proxy server has long connection processing logic, and the server does not need to do patch processing. It is common for the client to use the Nginx proxy server. HTTP1.1 protocol & long connection, while Nginx proxy server and backend server use HTTP1.0 protocol & short connection.

In actual use, the value of Keep-Alive in the HTTP header does not necessarily mean that a long connection will be used. Both the client and the server can ignore this value, that is, it does not follow the standard, such as the HTTP client I wrote myself. To download files with multiple threads, you can not follow this standard. Multiple concurrent or consecutive GET requests are separated into multiple TCP channels. Each TCP channel has only one GET. After the GET is completed, TCP closes immediately. Four handshakes make it easier to write code. At this time, although the HTTP header has Connection: Keep-alive, it cannot be said to be a long connection. Under normal circumstances, both client browsers and web servers implement this standard, because their files are small and large, and it is valuable to maintain long connections to reduce the overhead of re-opening TCP connections.

The uploading/downloading done by libcurl in the past is a short connection, and the packet capture can see: 1. Each TCP channel has only one POST; 2. After the data transmission, you can see the four-way handshake packet. As long as curl_easy_cleanup is not called, curl's handle may always be valid and reusable. It is possible here, because the connection is on both sides. If the server side is turned off, then my client side keeps it and cannot achieve a long connection.

 

If you are using the Windows WinHTTP library, when POST/GET data, although I close the handle, the TCP connection will not be closed immediately, but will wait for a while. At this time, the bottom layer of the WinHTTP library supports Features required with Keep-alive: Even without Keep-alive, the WinHTTP library may add this TCP channel multiplexing function, while other networking libraries like libcurl do not. It has been observed before that the WinHTTP library does not disconnect TCP connections in a timely manner .

2. Expiration time of long connection

The client's long connection cannot be held indefinitely. There will be a timeout period. The server sometimes tells the client the timeout period, for example:

上图中的Keep-Alive: timeout=20,表示这个TCP通道可以保持20秒。另外还可能有max=XXX,表示这个长连接最多接收XXX次请求就断开。对于客户端来说,如果服务器没有告诉客户端超时时间也没关系,服务端可能主动发起四次握手断开TCP连接,客户端能够知道该TCP连接已经无效;另外TCP还有心跳包来检测当前连接是否还活着,方法很多,避免浪费资源。

三、长连接的数据传输完成识别

使用长连接之后,客户端、服务端怎么知道本次传输结束呢?两部分:1是判断传输数据是否达到了Content-Length指示的大小;2动态生成的文件没有Content-Length,它是分块传输(chunked),这时候就要根据chunked编码来判断,chunked编码的数据在最后有一个空chunked块,表明本次传输数据结束。更细节的介绍可以看这篇文章

四、并发连接数的数量限制

在web开发中需要关注浏览器并发连接的数量,RFC文档说,客户端与服务器最多就连上两通道,但服务器、个人客户端要不要这么做就随人意了,有些服务器就限制同时只能有1个TCP连接,导致客户端的多线程下载(客户端跟服务器连上多条TCP通道同时拉取数据)发挥不了威力,有些服务器则没有限制。浏览器客户端就比较规矩,知乎这里有分析,限制了同域名下能启动若干个并发的TCP连接去下载资源。并发数量的限制也跟长连接有关联,打开一个网页,很多个资源的下载可能就只被放到了少数的几条TCP连接里,这就是TCP通道复用(长连接)。如果并发连接数少,意味着网页上所有资源下载完需要更长的时间(用户感觉页面打开卡了);并发数多了,服务器可能会产生更高的资源消耗峰值。浏览器只对同域名下的并发连接做了限制,也就意味着,web开发者可以把资源放到不同域名下,同时也把这些资源放到不同的机器上,这样就完美解决了。

五、容易混淆的概念——TCP的keep alive和HTTP的Keep-alive

TCP的keep alive是检查当前TCP连接是否活着;HTTP的Keep-alive是要让一个TCP连接活久点。它们是不同层次的概念。

TCP keep alive的表现:

当一个连接“一段时间”没有数据通讯时,一方会发出一个心跳包(Keep Alive包),如果对方有回包则表明当前连接有效,继续监控。

这个“一段时间”可以设置。

WinHttp库的设置:

WINHTTP_OPTION_WEB_SOCKET_KEEPALIVE_INTERVAL

Sets the interval, in milliseconds, to send a keep-alive packet over the connection. The default interval is 30000 (30 seconds). The minimum interval is 15000 (15 seconds). Using WinHttpSetOption to set a value lower than 15000 will return with ERROR_INVALID_PARAMETER.

libcurl的设置:

http://curl.haxx.se/libcurl/c/curl_easy_setopt.html

CURLOPT_TCP_KEEPALIVE

Pass a long. If set to 1, TCP keepalive probes will be sent. The delay and frequency of these probes can be controlled by the CURLOPT_TCP_KEEPIDLE and CURLOPT_TCP_KEEPINTVL options, provided the operating system supports them. Set to 0 (default behavior) to disable keepalive probes (Added in 7.25.0).

CURLOPT_TCP_KEEPIDLE

Pass a long. Sets the delay, in seconds, that the operating system will wait while the connection is idle before sending keepalive probes. Not all operating systems support this option. (Added in 7.25.0)

CURLOPT_TCP_KEEPINTVL

Pass a long. Sets the interval, in seconds, that the operating system will wait between sending keepalive probes. Not all operating systems support this option. (Added in 7.25.0)

CURLOPT_TCP_KEEPIDLE是空闲多久发送一个心跳包,CURLOPT_TCP_KEEPINTVL是心跳包间隔多久发一个。

打开网页抓包,发送心跳包和关闭连接如下:

从上图可以看到,大概过了44秒,客户端发出了心跳包,服务器及时回应,本TCP连接继续保持。到了空闲60秒的时候,服务器主动发起FIN包,断开连接。

六、HTTP 流水线技术

使用了HTTP长连接(HTTP persistent connection )之后的好处,包括可以使用HTTP 流水线技术(HTTP pipelining,也有翻译为管道化连接),它是指,在一个TCP连接内,多个HTTP请求可以并行,下一个HTTP请求在上一个HTTP请求的应答完成之前就发起。从wiki上了解到这个技术目前并没有广泛使用,使用这个技术必须要求客户端和服务器端都能支持,目前有部分浏览器完全支持,而服务端的支持仅需要:按HTTP请求顺序正确返回Response(也就是请求&响应采用FIFO模式),wiki里也特地指出,只要服务器能够正确处理使用HTTP pipelinning的客户端请求,那么服务器就算是支持了HTTP pipelining。

由于要求服务端返回响应数据的顺序必须跟客户端请求时的顺序一致,这样也就是要求FIFO,这容易导致Head-of-line blocking:第一个请求的响应发送影响到了后边的请求,因为这个原因导致HTTP流水线技术对性能的提升并不明显(wiki提到,这个问题会在HTTP2.0中解决)。另外,使用这个技术的还必须是幂等的HTTP方法,因为客户端无法得知当前已经处理到什么地步,重试后可能发生不可预测的结果。POST方法不是幂等的:同样的报文,第一次POST跟第二次POST在服务端的表现可能会不一样。

在HTTP长连接的wiki中提到了HTTP1.1的流水线技术对RFC规定一个用户最多两个连接的指导意义:流水线技术实现好了,那么多连接并不能提升性能。我也觉得如此,并发已经在单个连接中实现了,多连接就没啥必要,除非瓶颈在于单个连接上的资源限制迫使不得不多开连接抢资源。

目前浏览器并不太重视这个技术,毕竟性能提升有限。

七、学习资料

 

http://www.techug.com/http-persistent-connection

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326486998&siteId=291194637