Illustrated TCP three-way handshake and four-way wave high-frequency interview questions (2023 latest version)

Hello everyone, recently reorganized a version of the interview questions of TCP three-way handshake and four-way handshake (2023 latest version).

-----

No matter how many times TCP tortures me, I still treat TCP like my first love.

Huge, huge, huge, long outline, let’s get started! Let's go!

picture

img


Basic understanding of TCP

What are the TCP header formats?

Let's first take a look at the format of the TCP header. The colors marked indicate the fields that are more relevant to this article. The other fields will not be elaborated in detail.

picture

TCP header format

Serial number : A random number generated by the computer when establishing a connection as its initial value, which is transmitted to the receiving host through the SYN packet. Each time data is sent, the size of the "number of data bytes" is "accumulated". It is used to solve the problem of out-of-order network packets.

Confirmation response number : refers to the sequence number of the next "expected" received data. After the sender receives this confirmation response, it can consider that the data before this sequence number have been received normally. Used to solve the problem of packet loss.

Control bit:

  • ACK1 When  , the "acknowledgment response" field becomes valid, and TCP stipulates SYN that this bit must be set except for the packet  when the connection is initially established 1 .

  • RST1 When , it means that an exception occurs in the TCP connection and the connection must be forcibly disconnected.

  • SYN1 When , it indicates that you want to establish a connection and set the initial value of the serial number in its "serial number" field.

  • FIN1 When , it means that no more data will be sent in the future and the connection is expected to be disconnected. When the communication ends and you want to disconnect, the hosts on both sides of the communication can exchange  FIN TCP segments with bit 1 with each other.

#Why do you need the TCP protocol? At which layer does TCP work?

IP The layer is "unreliable". It does not guarantee the delivery of network packets, the in-order delivery of network packets, or the integrity of the data in network packets.

picture

The relationship between the OSI reference model and TCP/IP

If the reliability of network data packets needs to be guaranteed, then the upper layer (transport layer)  TCP protocol needs to be responsible.

Because TCP is a reliable data transmission service that works at the transport layer, it can ensure that the network packets received by the receiving end are damage - free, gap-free, non-redundant and in order.

#What is TCP?

TCP is a connection-oriented, reliable, byte stream-based transport layer communication protocol.

picture

img

  • Connection-oriented : The connection must be "one-to-one". It cannot be like the UDP protocol, which allows one host to send messages to multiple hosts at the same time, that is, one-to-many cannot be achieved;

  • Reliable : No matter what link changes occur in the network link, TCP can guarantee that a message will reach the receiving end;

  • Byte stream : When user messages are transmitted through the TCP protocol, the messages may be "grouped" into multiple TCP messages by the operating system. If the receiving program does not know the "message boundaries", it cannot read a valid message. User messages. And TCP messages are "ordered". When the "previous" TCP message is not received, even if it receives the subsequent TCP message first, it cannot be thrown to the application layer for processing. At the same time, "repeated" ” TCP packets will be automatically discarded.

#What is a TCP connection?

Let's take a look at how RFC 793 defines "connection":

Connections: The reliability and flow control mechanisms described above require that TCPs initialize and maintain certain status information for each data stream. The combination of this information, including sockets, sequence numbers, and window sizes, is called a connection.

Simply put, it is some state information used to ensure reliability and flow control maintenance. The combination of this information, including Socket, sequence number and window size, is called a connection.

picture

img

So we can know that establishing a TCP connection requires the client and server to reach a consensus on the above three information.

  • Socket : consists of IP address and port number

  • Serial number : used to solve out-of-order problems, etc.

  • Window size : used for flow control

#How to uniquely determine a TCP connection?

The TCP quadruple can uniquely determine a connection, and the quadruple includes the following:

  • source address

  • source port

  • Destination address

  • destination port

picture

TCP quad

The source address and destination address fields (32 bits) are in the IP header and are used to send messages to the other host through the IP protocol.

The source port and destination port fields (16 bits) are in the TCP header, and their function is to tell the TCP protocol which process the message should be sent to.

There is an IP server listening on a port. What is the maximum number of TCP connections?

The server usually listens on a certain local port, waiting for the connection request from the client.

Therefore, the client IP and port are variable, and the theoretical value calculation formula is as follows:

picture

img

For IPv4, the number of client IPs is at most the  power 2 of  32 , and the number of client ports is at most the power of  2 .  16 That is, the maximum number of TCP connections on a single server is approximately  the power 2 of  48 .

Of course, the maximum number of concurrent TCP connections on the server is far from reaching the theoretical upper limit and will be affected by the following factors:

  • File descriptor limit

    , each TCP connection is a file, if the file descriptor is full, Too many open files will occur. Linux imposes three restrictions on the number of open file descriptors:

    • System level : The maximum number that can be opened in the current system, by  cat /proc/sys/fs/file-max viewing;

    • User level : Specify the maximum number that a user can open by  cat /etc/security/limits.conf viewing;

    • Process level : The maximum number that a single process can open, by  cat /proc/sys/fs/nr_open viewing;

  • Memory limit : Each TCP connection occupies a certain amount of memory. The memory of the operating system is limited. If the memory resources are full, OOM will occur.

What is the difference between #UDP and TCP? What are the different application scenarios?

UDP does not provide complex control mechanisms and uses IP to provide "connectionless" communication services.

The UDP protocol is really simple. The header only has  8 1 byte (64 bits). The UDP header format is as follows:

picture

UDP header format

  • Destination and source ports: mainly tell the UDP protocol which process the message should be sent to.

  • Packet length: This field stores the sum of the length of the UDP header and the length of the data.

  • Checksum: The checksum is designed to provide reliable UDP headers and data to prevent receipt of UDP packets damaged during network transmission.

The difference between TCP and UDP:

1. Connect

  • TCP is a connection-oriented transport layer protocol. A connection must be established before data can be transmitted.

  • UDP does not require a connection and transmits data immediately.

2. Service objects

  • TCP is a one-to-one two-point service, that is, a connection has only two endpoints.

  • UDP supports one-to-one, one-to-many, many-to-many interactive communication

3. Reliability

  • TCP delivers data reliably, and the data can arrive in order without errors, loss, or duplication.

  • UDP is a best effort delivery and does not guarantee reliable delivery of data. But we can implement a reliable transmission protocol based on the UDP transmission protocol, such as the QUIC protocol.

4. Congestion control, flow control

  • TCP has congestion control and flow control mechanisms to ensure the security of data transmission.

  • UDP does not. Even if the network is very congested, it will not affect the sending rate of UDP.

5. Initial overhead

  • The length of the TCP header is long, and there will be a certain overhead. The header is one byte when the "option" field is not used  20 , and it will become longer if the "option" field is used.

  • The UDP header is only 8 bytes and is fixed, so the overhead is small.

6. Transmission method

  • TCP is streaming, without boundaries, but guaranteed to be sequential and reliable.

  • UDP is sent packet by packet and has boundaries, but packet loss and disorder may occur.

7. Different shards

  • If the TCP data size is larger than the MSS size, it will be fragmented at the transport layer. After the target host receives it, it will also assemble the TCP data packet at the transport layer. If a fragment is lost midway, it only needs to transmit the lost fragment. .

  • If the UDP data size is larger than the MTU size, it will be fragmented at the IP layer. After the target host receives it, it assembles the data at the IP layer and then transmits it to the transport layer.

TCP and UDP application scenarios:

Since TCP is connection-oriented and can ensure reliable delivery of data, it is often used for:

  • FTP file transfer;

  • HTTP / HTTPS;

Because UDP is connectionless-oriented, it can send data at any time, and the processing of UDP itself is simple and efficient, so it is often used for:

  • Communication with a small total package size, such as  DNS , SNMP etc.;

  • Video, audio and other multimedia communications;

  • broadcast communications;

Why doesn't the UDP header have a "Header Length" field, but the TCP header has a "Header Length" field?

The reason is that TCP has a variable-length "option" field, while the UDP header length does not change . There is no need for an additional field to record the UDP header length.

Why does the UDP header have a "packet length" field, but the TCP header does not have a "packet length" field?

Let’s first talk about how TCP calculates the payload data length:

picture

img

Among them, the total length of IP and the length of IP header are known in the IP header format. The TCP header length is known in the TCP header format, so the length of the TCP data can be obtained.

At this time, everyone was surprised and asked: "UDP is also based on the IP layer, so the data length of UDP can also be calculated by this formula? Why is there a "packet length"?" "

When asked this way, I do feel that the "packet length" of UDP is redundant.

I checked a lot of information, and I think there are two more reliable statements:

  • The first statement: Because for the convenience of network device hardware design and processing, the header length needs to be an  4 integer multiple of bytes. If the "Packet Length" field of UDP is removed, then the UDP header length is not an  4 integer multiple of bytes, so I think this may be to supplement the fact that the UDP header length is an integer multiple of  4 bytes, so the "Packet Length" field is added. .

  • The second statement: Today's UDP protocol is developed based on the IP protocol, but this may not be the case back then. It may rely on other network layer protocols that do not provide their own message length or header length, so the UDP message header needs to have a length. fields for calculation.

Can #TCP and UDP use the same port?

Answer: Yes .

In the data link layer, hosts on the LAN are found through MAC addresses. In the Internet layer, IP addresses are used to find interconnected hosts or routers in the network. In the transport layer, addressing by port is required to identify different applications communicating simultaneously on the same computer.

Therefore, the function of the "port number" of the transport layer is to distinguish the data packets of different applications on the same host.

The transport layer has two transport protocols, TCP and UDP, which are two completely independent software modules in the kernel.

When the host receives the data packet, it can know that the data packet is TCP/UDP in the "Protocol Number" field of the IP header, so it can determine which module (TCP/UDP) to send it to for processing based on this information, and send it to the TCP/UDP module. The packet is sent to which application is sent for processing based on the "port number".

picture

img

Therefore, the respective port numbers of TCP/UDP are also independent of each other. For example, TCP has a port number 80, and UDP can also have a port number 80. There is no conflict between the two.

There are still many knowledge points that can be discussed about ports. For example, it can also involve the following issues:

  • Can multiple TCP service processes bind to the same port at the same time?

  • Why does the "Address in use" error message appear when restarting the TCP service process? How to avoid it?

  • Can the client's port be reused?

  • If the client TCP connection has too many TIME_WAIT states, will it cause the port resources to be exhausted and new connections to be unable to be established?

#TCP connection established

What is the #TCP three-way handshake process?

TCP is a connection-oriented protocol, so a connection must be established before using TCP, and the connection is established through a three-way handshake . The three-way handshake process is as follows:

picture

TCP three-way handshake

  • Initially, both client and server are in  CLOSE state. First, the server actively listens to a certain port and is in the  LISTEN state

picture

The first message - SYN message

  • The client will randomly initialize the sequence number ( client_isn), place this sequence number in the "sequence number" field of the TCP header, and set  SYN the flag position to  1indicate  SYN the message. Then the first SYN message is sent to the server, indicating that a connection is initiated to the server. The message does not contain application layer data, and the client is in the state thereafter  SYN-SENT .

picture

The second message - SYN + ACK message

  • After the server receives the client's  SYN message, it first randomly initializes its own sequence number ( server_isn), fills this sequence number into the "Serial Number" field of the TCP header, and then fills in the "Confirmation Response Number" field of the TCP header  client_isn + 1. Then put  SYN the and  ACK mark in position  1. Finally, the message is sent to the client, and the message does not contain application layer data, and then the server is in the  SYN-RCVD state.

picture

The third message - ACK message

  • ACK After receiving the server message, the client must respond to the server with the last response message. First, the TCP header flag position of  the response message is , then the "Confirmation Response  1 Number" field is filled in  server_isn + 1 , and finally the message is sent to the service. end, this time the message can carry the data from the client to the server, and then the client is in the  ESTABLISHED state.

  • After the server receives the response message from the client, it also enters  ESTABLISHED the state.

From the above process, we can find that the third handshake can carry data, but the first two handshakes cannot carry data . This is also a frequently asked question in interviews.

Once the three-way handshake is completed and both parties are in  ESTABLISHED the state, the connection has been established and the client and server can send data to each other.

#How to check TCP status in Linux system?

Check the TCP connection status through  netstat -napt commands in Linux.

picture

View TCP connection status

#Why is it a three-way handshake? Not twice or four times?

I believe everyone’s more common answer is: “Because the three-way handshake can ensure that both parties have the ability to receive and send.”

This answer is yes, but it is one-sided and does not tell the main reason.

Earlier we knew what  a TCP connection is :

  • Certain state information is used to ensure reliability and flow control maintenance. The combination of this information, including  Socket, sequence number and window size, is called a connection.

Therefore, what is important is why the three-way handshake can initialize the Socket, sequence number and window size and establish a TCP connection.

Next, analyze the reasons for the three-way handshake from three aspects:

  • Only the three-way handshake can prevent repeated initialization of historical connections (the main reason)

  • A three-way handshake is required to synchronize the initial sequence numbers of both parties.

  • A three-way handshake can avoid wasting resources

Reason 1: Avoid historical connections

Let’s take a look at the top reasons for using a three-way handshake for TCP connections as stated by RFC 793 :

The principle reason for the three-way handshake is to prevent old duplicate connection initiations from causing confusion.

Simply put, the primary reason for the three-way handshake is to prevent confusion caused by old duplicate connection initializations.

Let's consider a scenario where the client first sends a SYN (seq = 90) message, and then the client crashes. Moreover, the SYN message is blocked by the network and the server does not receive it. Then after the client restarts, The connection is re-established to the server and a SYN (seq = 100) message is sent ( note! It is not a retransmission of SYN, the sequence number of the retransmitted SYN is the same ).

See how the three-way handshake blocks historical connections:

picture

Three-way handshake to avoid historical connections

The client continuously sends multiple SYN messages (all the same four-tuple) to establish a connection. When the network is congested :

  • An "old SYN message" arrives at the server earlier than the "latest SYN" message, then the server will return a  SYN + ACK message to the client. The confirmation number in this message is 91 (90+1) .

  • 客户端收到后,发现自己期望收到的确认号应该是 100 + 1,而不是 90 + 1,于是就会回 RST 报文。

  • 服务端收到 RST 报文后,就会释放连接。

  • 后续最新的 SYN 抵达了服务端后,客户端与服务端就可以正常的完成三次握手了。

上述中的「旧 SYN 报文」称为历史连接,TCP 使用三次握手建立连接的最主要原因就是防止「历史连接」初始化了连接

TIP

有很多人问,如果服务端在收到 RST 报文之前,先收到了「新 SYN 报文」,也就是服务端收到客户端报文的顺序是:「旧 SYN 报文」->「新 SYN 报文」,此时会发生什么?

当服务端第一次收到 SYN 报文,也就是收到 「旧 SYN 报文」时,就会回复 SYN + ACK 报文给客户端,此报文中的确认号是 91(90+1)。

然后这时再收到「新 SYN 报文」时,就会回 Challenge Ack报文给客户端,这个 ack 报文并不是确认收到「新 SYN 报文」的,而是上一次的 ack 确认号,也就是91(90+1)。所以客户端收到此 ACK 报文时,发现自己期望收到的确认号应该是 101,而不是 91,于是就会回 RST 报文。

如果是两次握手连接,就无法阻止历史连接,那为什么 TCP 两次握手为什么无法阻止历史连接呢?

我先直接说结论,主要是因为在两次握手的情况下,服务端没有中间状态给客户端来阻止历史连接,导致服务端可能建立一个历史连接,造成资源浪费

Think about it, in the case of two handshakes, the server enters the ESTABLISHED state after receiving the SYN message, which means that it can send data to the other party at this time, but the client has not yet entered the ESTABLISHED state. Assume that this The second time is a historical connection. If the client determines that this connection is a historical connection, it will return an RST message to disconnect the connection. The server will enter the ESTABLISHED state during the first handshake, so it can send data. But it doesn't know that this is a historical connection, and it will disconnect only after receiving the RST message.

picture

Two handshakes cannot prevent historical connections

It can be seen that if two handshakes are used to establish a TCP connection, the server does not block the historical connection before sending data to the client, causing the server to establish a historical connection and send data in vain. It wastes server resources.

Therefore, to solve this phenomenon, it is best to block historical connections before the server sends data, that is, before establishing a connection, so as not to waste resources. To implement this function, a three-way handshake is required .

Therefore, the main reason why TCP uses a three-way handshake to establish a connection is to prevent "historical connections" from initializing the connection.

TIP

Someone asked: The client can send data after sending the three-way handshake (ack message), but the passive party is still in the syn_received state at this time. If the ack is lost, will the data sent by the client be wasted?

No, even if the server is still in the syn_received state and receives the data sent by the client, it can still establish a connection and receive the data packet normally. This is because there is an ack identification bit in the data message and a confirmation number. This confirmation number confirms receipt of the second handshake. As shown below:

picture

img

Therefore, when the server receives this data message, it can establish a connection normally, and then it can receive the data packet normally.

Reason 2: Synchronize the initial sequence numbers of both parties

Both parties communicating in the TCP protocol must maintain a "sequence number". The sequence number is a key factor for reliable transmission. Its function is:

  • The receiver can remove duplicate data;

  • The receiver can receive the data packets in order according to their sequence numbers;

  • It can identify which of the sent data packets have been received by the other party (known through the sequence number in the ACK message);

SYN It can be seen that the serial number plays a very important role in the TCP connection, so when the client sends a message carrying  the "initial serial number", the server needs to  return a ACK response message, indicating that the client's SYN message has been served If the client successfully receives it, then when the server sends the "initial serial number" to the client, it still needs to get a response from the client. This way, the initial serial numbers of both parties can be reliably synchronized.

picture

Four handshakes and three handshakes

The four-way handshake can actually reliably synchronize the initialization serial numbers of both parties, but since the second and third steps can be optimized into one step , it becomes a "three-way handshake".

The two handshakes only ensure that one party's initial sequence number can be successfully received by the other party, but there is no way to guarantee that both parties' initial sequence numbers can be confirmed and received.

Reason 3: Avoid wasting resources

SYN If there is only "two handshakes", when the message  generated by the client is  blocked in the network and the client does not receive ACK the message, it will resend it  SYN . Since there is no third handshake, the server does not know whether the client has received it. Reply  ACK message, so every time the server receives one,  SYN it can only actively establish a connection first . What will happen?

SYN If the message  sent by the client  is blocked in the network and the message is sent multiple times, the server will establish multiple redundant invalid linksSYN  after receiving the request , causing unnecessary waste of resources.

picture

Two handshakes will cause a waste of resources

That is, the two-way handshake will cause message retention, and the server will repeatedly accept useless connection request  SYN messages, resulting in repeated allocation of resources.

TIP

Many people ask, can't the two-way handshake also discard syn historical messages based on context information?

The two handshakes here are based on the assumption that "since there is no third handshake, the server does not know whether the client has received the connection establishment  ACK confirmation message sent by itself, so every time it receives one,  SYN it can only actively establish a connection first." This scenario .

Of course, you have to implement it like a three-way handshake. It is also possible to discard the syn historical messages according to the context. There is no specific implementation of the two-way handshake, so you can make any assumptions.

summary

When TCP establishes a connection, the three-way handshake can prevent the establishment of historical connections, reduce unnecessary resource overhead on both sides, and help both parties synchronize initialization sequence numbers . Sequence numbers ensure that data packets are not repeated, discarded, and transmitted in order.

Reasons for not using "two-way handshake" and "four-way handshake":

  • "Two handshakes": It cannot prevent the establishment of historical connections, which will cause a waste of resources on both sides, and it is also impossible to reliably synchronize the serial numbers of both parties;

  • "Four-way handshake": The three-way handshake is the theoretical minimum to establish a reliable connection, so there is no need to use more communication times.

#Why is the initial sequence number required to be different every time a TCP connection is established?

There are two main reasons:

  • In order to prevent historical packets from being received by the next connection with the same quadruple (main aspect);

  • For security, prevent hackers from forging TCP messages with the same serial number from being received by the other party;

Next, let’s talk about the first point in detail.

Assume that each time a connection is established, the initialization sequence numbers of the client and the server start from 0:

picture

img

The process is as follows:

  • The client and the server establish a TCP connection. The data packet sent by the client is blocked by the network, and then the data packet times out and is retransmitted. At this time, the server device is powered off and restarted, and the connection previously established with the client disappears. , so the RST message will be sent when receiving the client's data packet.

  • Immediately afterwards, the client established a connection with the server with the same four-tuple as the previous connection;

  • After the new connection is established, the data packet blocked by the network in the previous connection arrives at the server. The sequence number of the data packet happens to be within the receiving window of the server, so the data packet will be received normally by the server. This will cause data confusion.

It can be seen that if the initialization sequence numbers of the client and the server are the same every time a connection is established, it is easy for historical messages to be received by the next connection with the same four-tuple .

If the initialization sequence numbers of the client and server are "different" every time a connection is established, there is a high probability that the sequence number of the historical message is "not in" the other party's receiving window, thus avoiding historical messages to a large extent, such as the following picture:

picture

img

On the contrary, if the initialization sequence numbers of the client and the server are "the same" every time a connection is established, there is a high probability that the sequence number of the historical message just "happens" within the receiving window of the other party, causing the historical message to be new. Connection received successfully.

Therefore, different initialization sequence numbers each time can largely prevent historical messages from being received by the next connection with the same four-tuple. Note that to a large extent, it is not completely avoided (because the sequence number will wrap around. , so the timestamp mechanism needs to be used to judge historical messages)

#How is the initial sequence number ISN randomly generated?

The start  ISN is based on the clock, +1 every 4 microseconds, and one revolution takes 4.55 hours.

RFC793 mentions the random generation algorithm of initialization sequence number ISN: ISN = M + F (localhost, localport, remotehost, remoteport).

  • M It is a timer that increments by 1 every 4 microseconds.

  • F It is a Hash algorithm that generates a random value based on source IP, destination IP, source port, and destination port. To ensure that the Hash algorithm cannot be easily calculated by the outside, it is a better choice to use the MD5 algorithm.

It can be seen that the random number is incremented based on the clock timer, and it is basically impossible to randomize to the same initialization sequence number.

#Since the IP layer will be fragmented, why does the TCP layer still need MSS?

Let’s first get to know MTU and MSS

picture

MTU and MSS

  • MTU: The maximum length of a network packet, generally in  1500 bytes in Ethernet;

  • MSS: The maximum length of TCP data that can be accommodated in a network packet after removing the IP and TCP headers;

If the entire packet (header + data) in TCP is handed over to the IP layer for fragmentation, what will happen?

When the IP layer has  MTU data that exceeds the size (TCP header + TCP data) to be sent, the IP layer will fragment the data into several pieces to ensure that each fragment is smaller than the MTU. After an IP datagram is fragmented, it is reassembled by the IP layer of the target host and then handed over to the upper TCP transport layer.

This seems to be in order, but there are hidden dangers, so when an IP fragment is lost, all fragments of the entire IP message have to be retransmitted .

Because the IP layer itself does not have a timeout retransmission mechanism, it is the TCP of the transport layer that is responsible for timeout and retransmission.

When an IP fragment is lost, the IP layer of the receiver cannot assemble a complete TCP message (header + data), and cannot send the data message to the TCP layer, so the receiver will not respond with ACK To the sender, because the sender cannot receive the ACK confirmation message for a long time, it will trigger a timeout retransmission and resend the "entire TCP message (header + data)".

Therefore, it can be known that fragmented transmission by the IP layer is very inefficient.

Therefore, in order to achieve the best transmission performance, the TCP protocol usually negotiates the MSS value of both parties when establishing a connection . When the TCP layer finds that the data exceeds the MSS, it will first fragment it. Of course, the length of the IP packet formed by it is It will not be larger than the MTU, and naturally there is no need for IP fragmentation.

picture

Negotiate MSS during handshake phase

After TCP layer fragmentation, if a TCP fragment is lost, MSS will be used as the unit when retransmitting , instead of retransmitting all fragments, which greatly increases the efficiency of retransmission.

#What happens if the first handshake is lost?

When the client wants to establish a TCP connection with the server, the first thing it sends is the SYN message, and then it enters the  SYN_SENT state.

After that, if the client fails to receive the SYN-ACK message from the server (second handshake), the "timeout retransmission" mechanism will be triggered to retransmit the SYN message, and the retransmitted SYN message will be The serial numbers are all the same .

Different versions of the operating system may have different timeouts, some are 1 second, and some are 3 seconds. This timeout is hard-coded into the kernel. If you want to change it, you need to recompile the kernel, which is troublesome.

When the client does not receive the SYN-ACK message from the server after 1 second, the client will resend the SYN message, how many times will it be resent?

In Linux, the maximum number of client SYN message retransmissions is  tcp_syn_retriescontrolled by kernel parameters. This parameter can be customized, and the default value is generally 5.

# cat /proc/sys/net/ipv4/tcp_syn_retries
5

Usually, the first timeout retransmission is after 1 second, the second timeout retransmission is after 2 seconds, the third timeout retransmission is after 4 seconds, the fourth timeout retransmission is after 8 seconds, and the second timeout retransmission is after 8 seconds. Five times after retransmitting with a timeout of 16 seconds. That's right, each timeout is twice as long as the last time .

After the fifth timeout retransmission, it will continue to wait for 32 seconds. If the server still does not respond with ACK, the client will no longer send SYN packets and then disconnect the TCP connection.

Therefore, the total time-consuming is 1+2+4+8+16+32=63 seconds, about 1 minute.

For example, assuming that the tcp_syn_retries parameter value is 3, then when the client's SYN packets are constantly lost in the network, the following process will occur:

picture

img

Specific process:

  • When the client retransmits the SYN message 3 times overtime, since tcp_syn_retries is 3, the maximum number of retransmissions has been reached, so wait for a while (the time is twice the time of the previous timeout), if the server still fails to receive After the second handshake (SYN-ACK message), the client will disconnect.

What happens if the second wave is lost?

When the server receives the client's first wave, it will send back an ACK confirmation message. At this time, the server's connection enters the  CLOSE_WAIT state.

We also mentioned before that the ACK message will not be retransmitted, so if the server's second wave is lost, the client will trigger the timeout retransmission mechanism and retransmit the FIN message until it receives the server's second wave. Wave twice, or reach the maximum number of retransmissions.

For example, assuming that the parameter value of tcp_orphan_retries is 2, when the second wave is always lost, the process that occurs is as follows:

picture

img

Specific process:

  • When the client times out and retransmits the FIN message 2 times, since tcp_orphan_retries is 2, the maximum number of retransmissions has been reached, so it waits for a while (the time is twice the last timeout time). If it still fails to receive the response from the server, The second wave (ACK message), then the client will disconnect.

Let me mention here that when the client receives the second wave, that is, after receiving the ACK message sent by the server, the client will be in a state.  FIN_WAIT2 In this state, it needs to wait for the server to send the third wave, that is, the service terminal FIN message.

For the connection closed by the close function, since data can no longer be sent and received, FIN_WAIT2 the state cannot last for too long, and  tcp_fin_timeout the duration of the connection in this state is controlled. The default value is 60 seconds.

This means that for a connection closed by calling close, if the FIN message is not received after 60 seconds, the connection of the client (active closing party) will be closed directly, as shown below:

picture

img

However, note that if the active closing party uses the shutdown function to close the connection and specifies that only the sending direction is closed, but the receiving direction is not closed, it means that the actively closing party can still receive data.

At this time, if the actively closing party has not received the third wave, the connection of the actively closing party will remain in the state  FIN_WAIT2 ( tcp_fin_timeout the connection closed by shutdown cannot be controlled). As shown below:

picture

img

The third wave is lost, what happens?

When the server (passive closing party) receives the FIN message from the client (active closing party), the kernel will automatically reply with ACK, and the connection is in state. As the name suggests,  CLOSE_WAIT it means waiting for the application process to call the close function to close the connection.

At this time, the kernel does not have the right to close the connection on behalf of the process. The process must actively call the close function to trigger the server to send a FIN message.

When the server is in the CLOSE_WAIT state and the close function is called, the kernel will send a FIN message and the connection will enter the LAST_ACK state, waiting for the client to return ACK to confirm that the connection is closed.

If the ACK is not received for a long time, the server will resend the FIN message, and the number of retransmissions is still  tcp_orphan_retriecontrolled by the s parameter, which is the same as the client's resending FIN message.

For example, assuming  tcp_orphan_retries = 3, when the third wave is always lost, the process that occurs is as follows:

picture

img

Specific process:

  • When the server retransmits the third wave message for 3 times, since tcp_orphan_retries is 3, the maximum number of retransmissions is reached, so it waits for another period of time (the time is 2 times the last timeout). If it still If the client fails to receive the fourth wave (ACK message), the server will disconnect.

  • Because the client closes the connection through the close function, there is a time limit in the FIN_WAIT_2 state. If the client still fails to receive the third wave (FIN message) from the server within the tcp_fin_timeout time, the client will disconnect.

#The fourth wave is lost, what will happen?

When the client receives the FIN message of the third wave from the server, it will respond with an ACK message, which is the fourth wave. At this time, the client connection enters the state  TIME_WAIT .

In the Linux system, the TIME_WAIT state will last for 2MSL before entering the closed state.

Then, before the server (passive closing party) does not receive the ACK message, it is still in the LAST_ACK state.

If the ACK message of the fourth wave does not reach the server, the server will resend the FIN message, and the number of resends is still  tcp_orphan_retries controlled by the parameters introduced earlier.

For example, assuming tcp_orphan_retries is 2, when the fourth waving is always lost, the process that occurs is as follows:

picture

img

Specific process:

  • When the server retransmits the wave message for the third time and reaches 2, since tcp_orphan_retries is 2, the maximum number of retransmissions is reached, so it waits for a while (the time is 2 times the last timeout). If it still fails to receive The client waves for the fourth time (ACK message), then the server will disconnect.

  • After receiving the third wave, the client will enter the TIME_WAIT state and start a timer with a duration of 2MSL. If it receives the third wave (FIN message) again on the way, it will reset the timer and wait for After 2MSL, the client will disconnect.

#Why is the waiting time of TIME_WAIT 2MSL?

MSL It is Maximum Segment Lifetime, the maximum survival time of a message . It is the longest time for any message to exist on the network. After this time, the message will be discarded. Because TCP messages are based on the IP protocol, and there is a  TTL field in the IP header, which is the maximum number of routes that the IP datagram can pass through. This value is reduced by 1 each time it passes through a router that processes it. When this value is 0, the datagram will be discarded and an ICMP message will be sent to notify the source host.

The difference between MSL and TTL: The unit of MSL is time, while TTL is the number of routing hops. Therefore,  MSL should be greater than or equal to the time when TTL consumes 0 to ensure that the packet has been naturally destroyed.

The TTL value is generally 64. Linux sets the MSL to 30 seconds, which means that Linux believes that the time for a data packet to pass through 64 routers will not exceed 30 seconds. If it exceeds, the packet will be considered to have disappeared in the network .

TIME_WAIT waits for 2 times the MSL. A more reasonable explanation is: there may be data packets from the sender in the network. When these data packets from the sender are processed by the receiver, a response will be sent to the other party, so you need to wait for each return . 2x the time .

For example, if the passive closing party does not receive the last ACK message for disconnection, it will trigger a timeout and FIN resend message. After the other party receives the FIN, it will resend the ACK to the passive closing party, exactly 2 times. MSL.

It can be seen  that the 2MSL duration  is actually equivalent to allowing at least one packet loss . For example, if the ACK is lost within one MSL, the FIN resent by the passive party will arrive within the second MSL, and the connection in the TIME_WAIT state can handle it.

Why not 4 or 8 MSL? You can imagine a bad network with a packet loss rate of 1%. The probability of two consecutive packet losses is only 1 in 10,000. This probability is too small. Ignoring it is more cost-effective than solving it.

2MSL The time is counted from the time when the client sends ACK after receiving the FIN . If within the TIME-WAIT time, because the client's ACK is not transmitted to the server, and the client receives a FIN message resent by the server, the 2MSL time will be re-  timed .

In the Linux system,  2MSL the default is  60 seconds, so one  MSL is  30 seconds. The Linux system stays in TIME_WAIT for a fixed 60 seconds .

Its name defined in the Linux kernel code is TCP_TIMEWAIT_LEN:

#define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME-WAIT 
                                    state, about 60 seconds  */

If you want to modify the length of TIME_WAIT, you can only modify the value of TCP_TIMEWAIT_LEN in the Linux kernel code and recompile the Linux kernel.

Why is the TIME_WAIT state needed?

Only the party that actively initiates closing the connection will have  TIME-WAIT status.

The TIME-WAIT state is needed mainly for two reasons:

  • Prevent data in historical connections from being incorrectly received by subsequent connections with the same four-tuple;

  • Ensure that the party that "passively closes the connection" can be closed correctly;

Reason 1: To prevent data in historical connections from being incorrectly received by subsequent connections with the same four-tuple group.

In order to better understand this reason, let's first understand the sequence number (SEQ) and the initial sequence number (ISN).

  • The sequence number is a header field of TCP, which identifies a byte of the data stream from the TCP sender to the TCP receiver. Because TCP is a reliable protocol oriented to byte streams, in order to ensure the order and reliability of the message, TCP provides each Each byte in each transmission direction is assigned a number to facilitate confirmation after successful transmission, retransmission after loss, and to ensure that there is no disorder at the receiving end. The sequence number is a 32-bit unsigned number, so it loops back to 0 after reaching 4G .

  • Initial sequence number . When TCP establishes a connection, the client and server will each generate an initial sequence number. It is a random number generated based on the clock to ensure that each connection has a different initial sequence number. The initialization sequence number can be regarded as a 32-bit counter. The value of the counter increases by 1 every 4 microseconds, and a cycle takes 4.55 hours .

I grabbed a package for everyone. The Seq in the picture below is the serial number, and the red boxes are the initial serial numbers generated by the client and the server respectively.

picture

TCP packet capture diagram

As we know from the previous article, the serial number and initialization serial number do not increase infinitely, and will wrap around to the initial value, which means that the new and old data cannot be judged based on the serial number .

Assuming that TIME-WAIT has no waiting time or the time is too short, what will happen after the delayed data packet arrives?

picture

TIME-WAIT time is too short, data packets from old connections are received

As shown above:

  • SEQ = 301 The message sent by the server before closing the connection  was delayed by the network.

  • Then, the server reopens a new connection with the same four-tuple, and the previously delayed  SEQ = 301 data packet arrives at the client, and the sequence number of the data packet happens to be within the client's receiving window, so the client will receive this normally. Data message, but this data message is left over from the previous connection, which will cause serious problems such as data confusion.

In order to prevent the data in the historical connection from being incorrectly received by subsequent connections with the same four-tuple, TCP designed the TIME_WAIT state. The state will last for a long time. This time is enough for the data  2MSL packets in both directions to be discarded, making the original connection All data packets disappear naturally in the network, and the reappearing data packets must be generated by newly established connections.

Reason 2: Ensure that the party that "passively closes the connection" can be closed correctly

In RFC 793, it is pointed out that another important role of TIME-WAIT is:

TIME-WAIT - represents waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request.

In other words, the role of TIME-WAIT is to wait for enough time to ensure that the last ACK can be received by the passive closing party, thereby helping it to close normally.

If the last ACK message (the fourth wave) of the client (active closing party) is lost in the network, then according to the TCP reliability principle, the server (passive closing party) will resend the FIN message.

Assume that the client does not have a TIME_WAIT state, but directly enters the CLOSE state after sending the last ACK message. If the ACK message is lost, the server will retransmit the FIN message, and at this time the client has entered In the closed state, after receiving the FIN message retransmitted by the server, it will return a RST message.

picture

The TIME-WAIT time is too short to ensure that the connection is properly closed

The server receives this RST and interprets it as an error (Connection reset by peer), which is not a graceful termination for a reliable protocol.

In order to prevent this situation, the client must wait long enough to ensure that the server can receive the ACK. If the server does not receive the ACK, the TCP retransmission mechanism will be triggered, and the server will resend a FIN, so that It takes exactly two MSLs to go and come back.

picture

TIME-WAIT time is normal to ensure that the connection is closed normally

When the client receives the FIN message retransmitted by the server, the waiting time in the TIME_WAIT state will be reset back to 2MSL.

What are the dangers of too much #TIME_WAIT?

There are two main hazards of excessive TIME-WAIT states:

  • The first is to occupy system resources, such as file descriptors, memory resources, CPU resources, thread resources, etc.;

  • The second is to occupy port resources. Port resources are also limited. Generally, the ports that can be opened are  32768~61000, and  net.ipv4.ip_local_port_rangethe range can also be specified through parameters.

Too much TIME_WAIT on the client and server will have different impacts.

If the client (the party that actively initiates the connection closure) has too many TIME_WAIT statuses and occupies all port resources, it will not be able to initiate a connection to the server with the same "destination IP + destination PORT", but the used port can still be used. Continue to initiate a connection to another server. For details, please read my article: Can the client port be reused? (opens new window)

Therefore, if the client (the party initiating the connection) establishes a connection with the server with the same "destination IP + destination PORT", when there are too many connections in the TIME_WAIT state of the client, it will be limited by port resources. If all ports are occupied resource, then it will no longer be possible to establish a connection with the server with the same "destination IP + destination PORT".

However, even in this scenario, as long as the connection is to a different server, the port can be reused, so the client can still initiate a connection to other servers. This is because when the kernel locates a connection, It is located by the four-tuple (source IP, source port, destination IP, destination port) information, and will not cause connection conflicts because the ports of the clients are the same.

If the TIME_WAIT status of the server (the party that initiates the close of the connection) is too much , it will not lead to limited port resources, because the server only listens to one port, and since a quadruple uniquely determines a TCP connection, theoretically the server Many connections can be established, but too many TCP connections will occupy system resources, such as file descriptors, memory resources, CPU resources, thread resources, etc.

#How to optimize TIME_WAIT?

Here are several ways to optimize TIME-WAIT, both of which have advantages and disadvantages:

  • Open net.ipv4.tcp_tw_reuse and net.ipv4.tcp_timestamps options;

  • net.ipv4.tcp_max_tw_buckets

  • SO_LINGER is used in the program, and the application is forced to use RST to close.

Method 1: net.ipv4.tcp_tw_reuse and tcp_timestamps

After the following Linux kernel parameters are enabled, the socket in TIME_WAIT can be reused for new connections .

One thing to note is that the tcp_tw_reuse function can only be used by the client (connection initiator), because this function is enabled, when the connect() function is called, the kernel will randomly find a connection with a time_wait state of more than 1 second for the new connection Reuse.

net.ipv4.tcp_tw_reuse = 1

There is also a prerequisite for using this option, which is to turn on support for TCP timestamps, that is

net.ipv4.tcp_timestamps=1(默认即为 1)

The field of this timestamp is in the "option" of the TCP header. It consists of a total of 8 bytes to represent the timestamp, of which the first 4-byte field is used to save the time when the data packet is sent, and the second 4-byte field is The stanza field is used to store the time of the latest arrival data sent by the receiving party.

Due to the introduction of timestamps, the problem we mentioned earlier  2MSL no longer exists, because duplicate data packets will be naturally discarded because the timestamp expires.

Method 2: net.ipv4.tcp_max_tw_buckets

This value defaults to 18000. Once the TIME_WAIT connection in the system exceeds this value, the system will reset the subsequent TIME_WAIT connection status . This method is relatively violent.

Method 3: Use SO_LINGER in the program

We can set the behavior of calling close to close the connection by setting the socket option.

struct linger so_linger;
so_linger.l_onoff = 1;
so_linger.l_linger = 0;
setsockopt(s, SOL_SOCKET, SO_LINGER, &so_linger,sizeof(so_linger));

If l_onoffit is non-zero and l_lingerthe value is 0, then after the call close, a flag will be sent RSTto the peer immediately, and the TCP connection will skip four waves, thus skipping TIME_WAITthe state and closing directly.

TIME_WAITBut this provides a possibility for crossing the state, but it is a very dangerous behavior and not worth promoting.

The methods introduced earlier are all trying to transcend  TIME_WAITthe state, which is actually not good. Although the TIME_WAIT state lasts a bit long and seems very unfriendly, it is designed to avoid messy things.

The book "UNIX Network Programming" says: TIME_WAIT is our friend, it helps us. Don't try to avoid this state, but figure it out .

If the server wants to avoid too many connections in the TIME_WAIT state, it should never actively disconnect, let the client disconnect, and let the clients distributed everywhere bear the TIME_WAIT .

#What are the reasons why a large number of TIME_WAIT states appear on the server?

First of all, you must know that the TIME_WAIT state appears only after the connection is actively closed. Therefore, if the server has a large number of TCP connections in the TIME_WAIT state, it means that the server has actively disconnected many TCP connections.

The question is, under what circumstances will the server actively disconnect?

  • The first scenario: HTTP does not use long connections

  • Second scenario: HTTP long connection timeout

  • Third scenario: The number of HTTP persistent connection requests reaches the upper limit

Next, introduce them respectively.

The first scenario: HTTP does not use long connections

Let’s first take a look at how the HTTP long connection (Keep-Alive) mechanism is turned on.

It is turned off by default in HTTP/1.0. If the browser wants to turn on Keep-Alive, it must add it to the request header:

Connection: Keep-Alive

Then when the server receives the request and responds, it is also added to the response header:

Connection: Keep-Alive

Doing so, the TCP connection is not interrupted, but remains connected. When the client sends another request, it uses the same TCP connection. This continues until either the client or the server proposes a disconnect.

Starting from HTTP/1.1, Keep-Alive is turned on by default . Now most browsers use HTTP/1.1 by default, so Keep-Alive is turned on by default. Once the client and server reach an agreement, the long connection is established.

Connection:close If you want to turn off HTTP Keep-Alive, you need to add information to the header of the HTTP request or response  . That is to say, as long as there is information in the HTTP header of either the client or the server  Connection:close , the HTTP long connection mechanism cannot be used .

After turning off the HTTP long connection mechanism, each request must go through the following process: establish TCP -> request resources -> respond to resources -> release the connection, then this method is  HTTP short connection , as shown below:

picture

HTTP short connection

We know earlier that as long as there is information in the HTTP header of either party  Connection:close , the HTTP long connection mechanism cannot be used, so that the connection will be closed after completing an HTTP request/processing.

The question arises, does the client or the server actively close the connection at this time?

In the RFC document, it is not clear who closes the connection. Both parties of the request and response can actively close the TCP connection.

However, according to the implementation of most web services, no matter which party disables HTTP Keep-Alive, the server will actively close the connection , and then a connection in the TIME_WAIT state will appear on the server.

The client has disabled HTTP Keep-Alive and the server has enabled HTTP Keep-Alive. Who is the one who actively closes it?

When the client disables HTTP Keep-Alive, the header of the HTTP request will have  Connection:close information. At this time, the server will actively close the connection after sending the HTTP response.

Why is it designed like this? HTTP is a request-response model, and the initiator is always the client. The original intention of HTTP Keep-Alive is to reuse the connection for subsequent requests from the client . If we define  information in the request header in a certain HTTP request-response model  , then connection:closeThe only time to no longer reuse this connection is on the server side , so it is reasonable for us to close the connection at the "end" of the HTTP request-response cycle.

HTTP Keep-Alive is enabled on the client and HTTP Keep-Alive is disabled on the server. Who is the one who actively closes it?

When the client enables HTTP Keep-Alive and the server disables HTTP Keep-Alive, the server will also actively close the connection after sending the HTTP response.

Why is it designed this way? When the server actively closes the connection, it only needs to call close() once to release the connection, and the remaining work is directly processed by the kernel TCP stack. There is only one syscall in the whole process; if the client is required to close, the server After writing the last response, you need to put the socket into the readable queue and call select/epoll to wait for the event; then call read() once to know that the connection has been closed. There are two syscalls and one more user mode program is called. Activate execution, and the socket retention time will be longer.

Therefore, when there are a large number of TIME_WAIT status connections on the server, you can check whether both the client and the server have enabled HTTP Keep-Alive. Because either party has not enabled HTTP Keep-Alive, it will cause the server to wait until after processing an HTTP After the request, the connection is actively closed, and a large number of connections in the TIME_WAIT state will appear on the server.

For this scenario, the solution is also very simple, let both the client and the server turn on the HTTP Keep-Alive mechanism.

Second scenario: HTTP long connection timeout

The characteristic of HTTP long connection is that as long as either end does not explicitly propose to disconnect, the TCP connection status will be maintained.

HTTP long connections can receive and send multiple HTTP requests/responses on the same TCP connection, avoiding the overhead of connection establishment and release.

picture

img

Some students may ask, if HTTP persistent connections are used, if the client completes an HTTP request and no longer initiates new requests, wouldn't it be a waste of resources to keep this TCP connection occupied?

That's right, so in order to avoid resource waste, web service software generally provides a parameter to specify the timeout period of HTTP long connections, such as the keepalive_timeout parameter provided by nginx.

Assuming that the timeout for HTTP long connections is set to 60 seconds, nginx will start a "timer". If the client does not initiate a new request within 60 seconds after completing the last HTTP request, the timer time Once it arrives, nginx will trigger the callback function to close the connection, and then a TIME_WAIT state connection will appear on the server .

picture

HTTP long connection timeout

When a large number of connections in the TIME_WAIT state appear on the server, if a large number of clients do not send data for a long time after establishing TCP connections, then the high probability is that the server actively closes the connection because of the HTTP long connection timeout. Generates a large number of connections in TIME_WAIT state.

You can check for network problems, such as whether the data sent by the client has not been received by the server due to network problems, causing the HTTP long connection to time out.

Third scenario: The number of HTTP persistent connection requests reaches the upper limit

The web server usually has a parameter to define the maximum number of requests that can be processed on a long HTTP connection. When the maximum limit is exceeded, the connection will be actively closed.

For example, nginx's keepalive_requests parameter means that after a long HTTP connection is established, nginx will set a counter for this connection to record the number of client requests that have been received and processed on this long HTTP connection. If the maximum value of this parameter setting is reached, nginx will actively close the long connection , and then a connection in the TIME_WAIT state will appear on the server.

The default value of the keepalive_requests parameter is 100, which means that each HTTP long connection can only run a maximum of 100 requests. This parameter is often ignored by most people, because when the QPS (requests per second) is not very high, the default value of 100 is just fine. Enough.

However, for some scenarios with relatively high QPS, such as more than 10,000 QPS, or even 30,000, 50,000 or even higher, if the keepalive_requests parameter value is 100, nginx will close the connection very frequently at this time, and then the server will A large number of TIME_WAIT states will appear .

For this scenario, the solution is also very simple, just increase the keepalive_requests parameter of nginx.

What are the reasons for a large number of CLOSE_WAIT states in the server?

The CLOSE_WAIT state is only available to the "passive closing party", and if the "passive closing party" does not call the close function to close the connection, then it cannot send a FIN message, so that the connection in the CLOSE_WAIT state cannot be changed to the LAST_ACK state.

Therefore, when a large number of connections in the CLOSE_WAIT state appear on the server, it means that the server program did not call the close function to close the connection .

So what circumstances would cause the server-side program to fail to call the close function to close the connection? At this time, you usually need to troubleshoot the code.

Let's first analyze the process of an ordinary TCP server:

  1. Create a server socket, bind the binding port, and listen the listening port

  2. Register the server socket to epoll

  3. epoll_wait waits for the connection to arrive. When the connection arrives, call accpet to obtain the connected socket.

  4. Register the connected socket to epoll

  5. epoll_wait waits for an event to occur

  6. When the other party's connection is closed, we call close

The possible reasons why the server does not call the close function are as follows.

The first reason : Step 2 is not done, the server socket is not registered to epoll, so that when a new connection arrives, the server cannot perceive this event, and cannot obtain the connected socket, so the server will naturally There is no chance to call the close function on the socket.

However, the probability of this happening is relatively small. This is an obvious code logic bug that can be discovered in the early read view stage.

The second reason : Step 3 was not done. When a new connection arrived, accpet was not called to obtain the socket of the connection. As a result, when a large number of clients actively disconnected, the server did not have the opportunity to call the close function on these sockets. As a result, a large number of connections in the CLOSE_WAIT state appear on the server.

This may happen because the server code is stuck in a certain logic or throws an exception in advance before executing the accpet function.

The third reason : Step 4 was not done. After obtaining the connected socket through accpet, it was not registered to epoll. As a result, when the FIN message was subsequently received, the server could not sense this event. Then the server would not be able to detect the event. It's time to call the close function.

This may happen because the code is stuck in a certain logic or an exception is thrown in advance before the server registers the connected socket to epoll. I have seen other people's practical articles on solving the close_wait problem before. If you are interested, you can read it: Analysis of a large number of CLOSE_WAIT connections caused by unrobust Netty code

The fourth reason : Step 6 was not done. When it was found that the client closed the connection, the server did not execute the close function. This may be because the code missed processing, or the code was stuck in a certain logic before executing the close function, such as Deadlocks, etc.

It can be found that when a large number of connections in the CLOSE_WAIT state appear on the server, it is usually a code problem. At this time, we need to investigate and locate the specific code step by step. The main direction of analysis is why the server did not call close .

#What if the connection has been established but the client suddenly fails?

Client failure refers to the scenario where the client's host is down or has a power outage. When this happens, if the server never sends data to the client, the server will never be aware of the client's downtime, that is, the server's TCP connection will always be in state, occupying system resources  ESTABLISH . .

In order to avoid this situation, TCP has a keep-alive mechanism . The principle of this mechanism is as follows:

Define a time period. During this period, if there is no connection-related activity, the TCP keep-alive mechanism will take effect. Every time interval, a detection message is sent. The detection message contains very little data. If If no response is received for several consecutive detection messages, the current TCP connection is considered dead, and the system kernel notifies the upper-layer application of the error message.

There are corresponding parameters in the Linux kernel to set the keep-alive time, the number of keep-alive detections, and the time interval of keep-alive detections. The following are the default values:

net.ipv4.tcp_keepalive_time=7200
net.ipv4.tcp_keepalive_intvl=75  
net.ipv4.tcp_keepalive_probes=9
  • tcp_keepalive_time=7200: Indicates that the keep-alive time is 7200 seconds (2 hours), that is, if there is no connection-related activity within 2 hours, the keep-alive mechanism will be activated.

  • tcp_keepalive_intvl=75: Indicates that the interval between each detection is 75 seconds;

  • tcp_keepalive_probes=9: Indicates that after detecting no response for 9 times, it is considered that the other party is unreachable, thus interrupting this connection.

In other words, in a Linux system, it takes at least 2 hours, 11 minutes and 15 seconds to find a "dead" connection.

picture

img

Note that if the application wants to use the TCP keep-alive mechanism, it needs to set  SO_KEEPALIVE the option through the socket interface for it to take effect. If it is not set, the TCP keep-alive mechanism cannot be used.

If TCP keepalive is enabled, you need to consider the following situations:

  • First, the peer program is working normally. When the TCP keep-alive probe message is sent to the peer, the peer will respond normally, so that the  TCP keep-alive time will be reset and wait for the next TCP keep-alive time to arrive.

  • Second, the peer host crashes and restarts. When the TCP keep-alive probe message is sent to the peer, the peer can respond, but since there is no valid information about the connection, a RST packet will be generated , and it will be soon discovered that the TCP connection has been reset.

  • The third type is that the peer host is down ( note that it is not a process crash. After the process crashes, the operating system will send a FIN message when reclaiming process resources. However, the host downtime cannot be sensed, so a TCP keep-alive mechanism is required. (to detect whether the other party has a host downtime ), or the other party's packets are unreachable due to other reasons. When the TCP keep-alive detection message is sent to the peer, there is no response. After several consecutive times, and the number of keep-alive detections is reached, TCP will report that the TCP connection has died .

The detection time of the TCP keepalive mechanism is a bit long. We can implement a heartbeat mechanism ourselves at the application layer.

For example, web service software generally provides  keepalive_timeout parameters to specify the timeout period of HTTP long connections. If the HTTP long connection timeout is set to 60 seconds, the web service software will start a timer . If the client does not initiate a new request within 60 seconds after completing an HTTP request, the timer time will be When it arrives, the callback function will be triggered to release the connection.

picture

Heartbeat mechanism of web service

#What happens if the connection has been established but the server process crashes?

TCP connection information is maintained by the kernel, so when the server process crashes, the kernel needs to recycle all TCP connection resources of the process, so the kernel will send the first wave FIN message, and subsequent wave processes are also in the kernel. Completion does not require the participation of the process, so even if the server process exits, it can still complete the TCP four-wave process with the client.

I did an experiment myself and used kill -9 to simulate a process crash. I found that after killing the process, the server would send a FIN message and wave to the client four times .


Socket programming

How to program Socket for TCP?

picture

Client and server work based on TCP protocol

  • The server and client are initialized  socket, and the file descriptor is obtained;

  • Called by the server  bindto bind the socket to the specified IP address and port;

  • Called by the server  listento monitor;

  • Called by the server  acceptand waiting for the client to connect;

  • Called by the client  connectto initiate a connection request to the address and port of the server;

  • The server  accept returns the file descriptor used for transmission  socket ;

  • The client calls  write to write data; the server calls  read to read data;

  • When the client disconnects, it will be called  close. Then when the server  read reads the data, it will read it  EOF. After the data is processed, the server calls  close, indicating that the connection is closed.

What needs to be noted here is that when the server calls  accept , if the connection is successful, a socket with a completed connection will be returned, which will be used to transmit data later.

Therefore, the listening socket and the socket actually used to transmit data are "two" sockets, one is called the listening socket , and the other is called the completed connection socket .

After a successful connection is established, both parties begin to read and write data through the read and write functions, just like writing to a file stream.

What is the meaning of parameter backlog when listening?

Two queues are maintained in the Linux kernel:

  • Semi-connection queue (SYN queue): Received a SYN connection establishment request, in the SYN_RCVD state;

  • Full connection queue (Accpet queue): The TCP three-way handshake process has been completed and is in the ESTABLISHED state;

picture

SYN queue and Accpet queue

int listen (int socketfd, int backlog)
  • Parameter one, socketfd, is the socketfd file descriptor

  • Parameter two, backlog, has certain changes in historical versions.

In the early days, the Linux kernel backlog was the SYN queue size, which was the outstanding queue size.

After Linux kernel 2.2, the backlog becomes the accept queue, which is the queue length of the completed connection establishment, so now the backlog is usually considered to be the accept queue.

But the upper limit is the size of the kernel parameter somaxconn, which means accpet queue length = min(backlog, somaxconn).

At which step of the three-way handshake does accept occur?

Let’s first take a look at what the client sends when it connects to the server?

picture

socket three-way handshake

  • The client's protocol stack sends a SYN packet to the server and tells the server that it is currently sending the sequence number client_isn, and the client enters the SYN_SENT state;

  • After receiving this packet, the server's protocol stack responds with ACK to the client. The value of the response is client_isn+1, which indicates the confirmation of the SYN packet client_isn. At the same time, the server also sends a SYN packet to tell the client the current sending sequence. The number is server_isn, and the server enters the SYN_RCVD state;

  • After the client protocol stack receives the ACK, the application  connect returns from the call, indicating that the one-way connection from the client to the server is successfully established. The client's status is ESTABLISHED. At the same time, the client protocol stack will also respond to the SYN packet from the server. The data is server_isn+1;

  • After the ACK response packet reaches the server, the server's TCP connection enters the ESTABLISHED state, and the server protocol stack causes the  accept blocking call to return. At this time, the one-way connection from the server to the client is also successfully established. So far, the connection between the client and the server has been successfully established in both directions.

From the above description process, we can know that the client connect successfully returns after the second handshake, and the server accept returns successfully after the three-way handshake is successful.

When the client calls close, what is the process for disconnecting the connection?

Let's see if the client actively calls  close, what will happen?

picture

The client calls the close process

  • When called by the client  close, indicating that the client has no data to send, it will send a FIN message to the server and enter the FIN_WAIT_1 state;

  • When the server receives the FIN packet, the TCP protocol stack will insert an end-of-file character for the FIN packet  EOF into the receiving buffer, and the application can  read perceive the FIN packet by calling it. This  EOF will be placed after other received data that has been queued , which means that the server needs to handle this exception, because EOF means that no additional data has arrived on the connection. At this time, the server enters the CLOSE_WAIT state;

  • Then, after processing the data, it will naturally read it  EOF, so it also calls to  close close its socket, which will cause the server to send a FIN packet and then be in the LAST_ACK state;

  • The client receives the FIN packet from the server and sends an ACK confirmation packet to the server, at this point the client will enter the TIME_WAIT state;

  • After receiving the ACK confirmation packet, the server enters the final CLOSE state;

  • After the time has elapsed, the client  2MSL also enters the CLOSE state;

Can a TCP connection be established without accept?

Answer: Yes .

The accpet system call does not participate in the TCP three-way handshake process. It is only responsible for taking out a socket with an established connection from the TCP full connection queue. The user layer obtains the socket with an established connection through the accpet system call, and can read and write the socket. .

picture

Semi-join queue and full-join queue

Can a TCP connection be established without listening?

Answer: Yes .

The client can connect itself to form a connection ( TCP self-connection ), or two clients can send requests to each other to establish a connection at the same time ( TCP opens simultaneously ). Both situations have one thing in common, that is, there is no server participation. , that is, without listening, TCP connection can be established.

Guess you like

Origin blog.csdn.net/liuxing__jacker/article/details/132026056