Linux record - TCP status and (TIME_WAIT/CLOSE_WAIT) solution (reproduced)

1. TCP handshake theorem

 

 

 2. TCP state

l CLOSED : The initial state, indicating that the TCP connection is "closed" or "not open".

l LISTEN  : Indicates that a SOCKET on the server is in a listening state and can accept connections from clients.

l SYN_RCVD  : Indicates that the server has received a SYN message from the client requesting a connection. Under normal circumstances, this state is an intermediate state during the three-way handshake session of the server-side SOCKET when establishing a TCP connection. The last ACK packet during the TCP handshake process is not sent. When the TCP connection is in this state, it will enter the ESTABLISHED state when it receives an ACK message from the client.

l SYN_SENT  : This state echoes the SYN_RCVD state. When the client SOCKET executes connect() to connect, it first sends a SYN message, then immediately enters the SYN_SENT state, and waits for the server to send the second message in the three-way handshake. Arts. The SYN_SENT state indicates that the client has sent a SYN message.

l ESTABLISHED  : indicates that the TCP connection has been successfully established.

l FIN_WAIT_1  : This state needs to be explained well. In fact, the real meaning of the two states of FIN_WAIT_1 and FIN_WAIT_2 is to wait for the other party's FIN message. The difference between these two states is: the FIN_WAIT_1 state is actually when the SOCKET is in the ESTABLISHED state, it wants to actively close the connection and sends a FIN message to the other party. At this time, the SOCKET enters the FIN_WAIT_1 state. When the other party responds with the ACK message, it enters the FIN_WAIT_2 state. Of course, in the actual normal situation, no matter what kind of situation the other party is in, it should immediately respond to the ACK message, so the FIN_WAIT_1 status is generally difficult to see, and the FIN_WAIT_2 status can still be seen with netstat sometimes.

l FIN_WAIT_2  : The origin of this state has been explained above. In fact, SOCKET in the FIN_WAIT_2 state represents a semi-connection, that is, one party calls close() to actively request to close the connection. Note: FIN_WAIT_2 has no timeout (unlike the TIME_WAIT state). In this state, if the other party does not shut down (does not cooperate with the completion of the 4 wave waving process), then this FIN_WAIT_2 state will remain until the system restarts, and more and more FIN_WAIT_2 states Will cause kernel crash.

l TIME_WAIT  : Indicates that the other party's FIN message has been received and an ACK message has been sent. A TCP connection in the TIME_WAIT state will wait for 2*MSL (Max Segment Lifetime, which refers to the longest lifetime of a TCP packet on the Internet. Each specific TCP protocol implementation must choose a certain MSL The value, RFC 1122 recommends 2 minutes, but the traditional BSD implementation uses 30 seconds, Linux can cat /proc/sys/net/ipv4/tcp_fin_timeout to see this value of this machine), and then you can return to the CLOSED available state. If in the FIN_WAIT_1 state, when receiving a message with both the FIN flag and the ACK flag from the other party, you can directly enter the TIME_WAIT state without going through the FIN_WAIT_2 state. (This situation should be the situation where the four waves become three waves)

l CLOSING  : This state should be rare in practice, and is a relatively rare exception state. Under normal circumstances, when a party sends a FIN message, it is reasonable to say that it should first receive (or simultaneously receive) the other party's ACK message, and then receive the other party's FIN message. However, the CLOSING state indicates that after one party sends a FIN message, it does not receive the other party's ACK message, but instead receives the other party's FIN message. Under what circumstances does this happen? That is, when both parties close () a SOCKET at the same time, there is a situation where both parties send FIN messages at the same time, which means that the CLOSING state will appear, indicating that both parties are closing the SOCKET connection.  

l CLOSE_WAIT  : Indicates that it is waiting to close. How to understand it? When the other party closes () a SOCKET and sends a FIN message to itself, your system will undoubtedly respond with an ACK message to the other party, and the TCP connection will enter the CLOSE_WAIT state. Next, you need to check whether you still have data to send to the other party. If not, then you can close() the SOCKET and send a FIN message to the other party, that is, close the connection between yourself and the other party. If there is data, it depends on the program's policy, continue to send or discard. Simply put, when you are in the CLOSE_WAIT state, what needs to be done is waiting for you to close the connection.

l LAST_ACK  : When the passively closed party waits for the other party's ACK message after sending the FIN message, it is in the LAST_ACK state. After receiving the ACK message from the other party, it can enter the CLOSED available state.

3. A large number of TIME_WAIT and CLOSE_WAIT analysis on the server

#View TCP status: netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'   

TIME_WAIT 814
CLOSE_WAIT 1
FIN_WAIT1 1
ESTABLISHED 634
SYN_RECV 2
LAST_ACK 1

The three commonly used states are: ESTABLISHED means communicating, TIME_WAIT means active closing, CLOSE_WAIT means passive closing, and Listen means listening to accept client connections.

#FAQ analysis

1. The server keeps a lot of TIME_WAIT state

2. The server maintains a large number of CLOSE_WAIT states

Because the file handle allocated to a user by linux is limited, and if the two states of TIME_WAIT and CLOSE_WAIT are kept, it means that the corresponding number of channels are always occupied, and it is "not working hard to occupy the pit". The upper limit of the number of handles, new requests cannot be processed, followed by a large number of Too Many Open Files exceptions, and tomcat crashes.

#############################################################################

Troubleshooting server file handle exhaustion (Too Many Open Files) in tomcat environment

Why does file handle exhaustion occur?

Mainly because linux has a two-level limit on the number of file handles. One is a system-level total limit, and the other is a user-specific limit. By default the number of handles each user can use is 1024. In general, 1024 is enough, but on large-capacity systems, especially those that frequently use network communication and file IO, 1024 is quickly exhausted. So first we need to adjust this value. The modification method is as follows:

1. ulimit -a View the current user's file handle limit  

2. User-level handle limit modification 

Modify /etc/security/limits.conf to add the following code:  

Username (or * for all users) soft nofile 65535    

username hard nofile 65535   

There are two kinds of restrictions, one is the soft limit, the system will give a warning when the number exceeds the soft limit, but when the hard limit is reached, the system will reject or exception.  

After the modification, you may need to restart the shell to take effect.  

3. System-level handle limit modification

sysctl -w fs.file-max 65536  

or echo  "65536" > /proc/sys/fs/file-max  

The functions of the two are the same. The former changes the kernel parameters, and the latter directly acts on the files corresponding to the kernel parameters on the virtual file system (procfs, psuedo file system).  

The new limit can be viewed with the following command  

sysctl -a | grep fs.file-max  

or cat /proc/sys/fs/file-max  

4. Modify the kernel parameters  

/etc/sysctl.conf  

echo "fs.file-max=65536" >> /etc/sysctl.conf  

sysctl -p  

View the total system limit Command: cat /proc/sys/fs/file-max    

View the number of file handles currently used by the entire system Command: cat /proc/sys/fs/file-nr   

Check which handles a process has opened: lsof -p pid    

A process has opened several handles: lsof -p pid |wc -l    

You can also see what process a directory/file is occupied by, and display all process information that has opened the directory or file: lsof path/filename   

How much should this value be set to?

Priority (Open File Descriptors):
soft limit < hard limit < kernel < limit caused by the data structure used to achieve the maximum number of file descriptors

In fact, there is no specific limit on this value, but if the allocated value is too large, it will affect the system performance, so it needs to be adjusted according to the specific application.

Solution to the problem:

The first of course is to modify the linux handle limit to an appropriate value.

Then there's a tweak to the app itself. There are several situations:

1. Optimization of database connection pool. A connection pool must be used, otherwise the database will crash without running out of handles. . .

2. HttpClient may be used when grabbing resources, and the connection pool should be used as much as possible to control the number of connections.

3. The connection pool settings, establishment of connection timeout, read timeout, number of connections, waiting time, etc. need to be configured to an appropriate value, otherwise the performance of the connection pool will not be played.

###########################################################################################################

The solution is very simple, which is to allow the server to quickly recycle and reuse those TIME_WAIT resources.

Modifications to the /etc/sysctl.conf file:

 

#For a new connection, how many SYN connection requests the kernel needs to send before deciding to give up, should not be greater than 255, the default value is 5, which corresponds to about 180 seconds   

net.ipv4.tcp_syn_retries=2  

#net.ipv4.tcp_synack_retries=2  

#Indicates how often TCP sends keepalive messages when keepalive is enabled. The default is 2 hours, change to 300 seconds  

net.ipv4.tcp_keepalive_time=1200  

net.ipv4.tcp_orphan_retries=3  

#Indicates that if the socket is requested to be closed by the local end, this parameter determines how long it will remain in the FIN-WAIT-2 state  

net.ipv4.tcp_fin_timeout=30    

# Indicates the length of the SYN queue, the default is 1024, and the increased queue length is 8192, which can accommodate more network connections waiting to be connected.  

net.ipv4.tcp_max_syn_backlog = 4096  

# Indicates that SYN Cookies are enabled. When the SYN waiting queue overflows, enable cookies to deal with it, which can prevent a small number of SYN attacks. The default value is 0, which means it is closed.  

net.ipv4.tcp_syncookies = 1  

# Indicates that reuse is enabled. Allow TIME-WAIT sockets to be reused for new TCP connections, defaults to 0, which means close  

net.ipv4.tcp_tw_reuse = 1  

#Indicates to enable fast recycling of TIME-WAIT sockets in TCP connections, the default is 0, which means close  

net.ipv4.tcp_tw_recycle = 1  

##Reduce the number of probes before timeout   

net.ipv4.tcp_keepalive_probes=5   

##Optimize network device receive queue   

net.core.netdev_max_backlog=3000   

After the modification, execute /sbin/sysctl -p to make the parameters take effect.
The main thing to notice here is net.ipv4.tcp_tw_reuse
net.ipv4.tcp_tw_recycle
net.ipv4.tcp_fin_timeout
net.ipv4.tcp_keepalive_*
these parameters.
Both net.ipv4.tcp_tw_reuse and net.ipv4.tcp_tw_recycle are enabled to recycle resources in the TIME_WAIT state.
net.ipv4.tcp_fin_timeout This time can reduce the time for the server to go from FIN-WAIT-2 to TIME_WAIT under abnormal conditions.
net.ipv4.tcp_keepalive_* A series of parameters are used to set the relevant configuration of the server to detect connection survival.

If the solution to a large number of CLOSE_WAIT can be summed up in one sentence: check the code. Because the problem lies in the server program.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324932877&siteId=291194637
Recommended