Tcp short connection TIME_WAIT problem solution Daquan

TCP connection is the most basic concept in network programming. Based on different usage scenarios, we generally distinguish between "long connection" and "short connection".
The advantages and disadvantages of long and short connections are not detailed here, and interested students go directly to google Query, this article mainly focuses on how to solve the TIME_WAIT problem of tcp short connections.


The biggest advantage of short connections is convenience, especially for scripting languages. Since the process of scripting languages ​​ends after execution, short connections are basically used.
But the biggest disadvantage of short connection is that it will occupy a lot of system resources, such as: local port, socket handle.
The reason for this problem is actually very simple: the tcp protocol layer does not have the concept of long and short connections, so regardless of whether it is a long connection or a short connection, the process and processing of connection establishment -> data transmission -> connection closing are the same.


After the normal TCP client connection is closed, it will enter a TIME_WAIT state. The duration is generally 1 to 4 minutes. For scenarios where the number of connections is not high, 1 to 4 minutes is actually not long, and it will not affect the system. However ,
if a large number of short connections are made in a short period of time (for example, within 1s), there may be such a situation: the socket ports and handles of the operating system where the client is located are exhausted, and the system cannot initiate new connections!


For example: Assuming that 1000 short connections are established per second (it is very common in web scenarios, such as accessing memcached for each request), assuming that the TIME_WAIT time is 1 minute, 6W short connections need to be established in 1 minute ,
since the TIME_WAIT time is 1 minute, these short connections are in the TIME_WAIT state for 1 minute and will not be released, and the default local port range configuration of Linux is: net.ipv4.ip_local_port_range = 32768 61000 is
less than 3W, so this situation A new request cannot be established because there is no local port.


This problem can be solved by:
1) It can be changed to a long connection, but the cost is high. Too many long connections will cause server performance problems, and scripting languages ​​such as PHP need to use software such as proxy to achieve long connections;
2) Modify ipv4.ip_local_port_range, increase Available port range, but can only alleviate the problem, not solve the problem fundamentally;
3) Set the SO_LINGER option of socket in the client program;
4) Turn on the tcp_tw_recycle and tcp_timestamps options on the
client machine; 5) Turn on the tcp_tw_reuse and tcp_timestamps options on the client machine;
6 ) The client machine sets tcp_max_tw_buckets to a small value;


in the process of solving the short connection problem of php connecting to Memcached, we mainly verified 3) 4) 5) 6) several methods, using basic function verification and code verification , there is no performance stress test verification,
so you need to pay attention to observe the business operation in actual application, and if you find packet loss, disconnection, failure to connect, etc., you need to pay attention to whether it is caused by these options .


Although these methods can be used to query relevant information through google, most of these information are generalized, and most of them are similar to what others say, and have no great reference value.
In the process of locating and dealing with these problems, we encountered some doubts and difficulties, and spent some time to locate and solve them. The following is a summary of relevant experience.

 

 

SO_LINGER is a socket option, which is set through the setsockopt API. It is relatively simple to use, but its implementation mechanism is more complicated and the literal meaning is more difficult to understand.
The most clear explanation is the description in "Unix Network Programming Volume 1" (Chapter 7.5), here is a brief excerpt:
The value of SO_LINGER is represented by the following data structure:
struct linger {
     int l_onoff; /* 0 = off, nozero = on * /
     int l_linger; /* linger time */

};

 

Its value and processing are as follows:
1. Set l_onoff to 0, then this option is turned off, the value of l_linger is ignored, which is equal to the kernel default, the close call will be returned to the caller immediately, and any unsent data will be transmitted if possible ;
2. If l_onoff is set to non-0 and l_linger is 0, the TCP connection will be aborted when the socket is closed, and TCP will discard any data retained in the socket send buffer and send an RST to the other party,
   instead of the usual four-packet termination Sequence, which avoids the TIME_WAIT state;
3. Set l_onoff to non-0 and l_linger to non-0. When the socket is closed, the kernel will delay for a period of time (determined by l_linger).
   If there is still data remaining in the socket buffer, the process will sleep until (a) all data has been sent and acknowledged by the other party, and then the normal termination sequence (descriptor access count is 0)
   or (b) the delay time expires . In this case, it is very important for the application to check the return value of close. If the time expires before the data is sent and acknowledged, close will return an EWOULDBLOCK error and any data in the socket send buffer is lost.
   The successful return of close only tells us that the sent data (and FIN) has been confirmed by the other party's TCP, it does not tell us whether the other party's application process has read the data. If the socket is set to non-blocking, it will not wait for close to complete.
   
The first case is actually no different from not setting it. The second case can be used to avoid the TIME_WAIT state. However, when testing on Linux, it was not found that the RST option was sent, but the four-step shutdown process was performed normally.
Preliminary inference It is "send RST only when data is discarded", if no data is discarded, the normal shutdown process is followed.
Looking at the Linux source code, there is indeed such a comment and source code:
=====linux-2.6.37 net/ipv4/tcp.c 1915=====
/* As outlined in RFC 2525, section 2.17, we send a RST here because
* data was lost. To witness the awful effects of the old behavior of
* always doing a FIN, run an older 2.1.x kernel or 2.0.x, start a bulk
* GET in an FTP client, suspend the process, wait for the client to
* advertise a zero window, then kill -9 the FTP client, wheee...
* Note: timeout is always zero in such a case.
*/
if (data_was_unread) {
/* Unread data was tossed, zap the connection. */
NET_INC_STATS_USER(sock_net(sk), LINUX_MIB_TCPABORTONCLOSE);
tcp_set_state(sk, TCP_CLOSE);
tcp_send_active_reset(sk, sk->sk_allocation);

In addition, in principle, this option has certain dangers, which may lead to data loss. Be careful when using it, but we did not find such a phenomenon in the process of measuring libmemcached. It
should be the communication protocol setting with libmemcached. Related, it may also be that we are not stressed enough for this to happen.


The third case is actually a compromise between the first and the second, and it has no effect when the socket is non-blocking.
For dealing with a large number of TIME_WAIT connection problems caused by short connections, I personally think that the second processing is the best choice. This is what libmemcached adopts.
From the actual measurement situation, after this option is turned on, the number of TIME_WAIT connections is 0, and it is not affected by The influence of network networking (for example, whether it is a virtual machine, etc.).

 

 

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325981184&siteId=291194637