Talking about tcp mechanism

TCP channel establishment process
First, let’s take a look at the superficial process of TCP channel establishment. First, let’s look at four pictures, three-way handshake analysis diagram and tcp protocol header diagram, tcp state transition diagram, tcp protocol header diagram;
there are two related to the three-way handshake Two queues: syn queue and accept queue. The data filling of these two queues is maintained by the kernel. Calling accept will fetch data in the accept queue.
Insert picture description here

Insert picture description here
Insert picture description here
Insert picture description here

Insert picture description here
In fact, there are four conventional fields in the protocol header related to the three-way handshake: Sequence Number, Acknowledgment, SYN bit, ACK flag bit, syn and ack respectively mark the Sequence Number and Acknowledgment two 32bits field values ​​are valid

The first handshake: host A sends a bit code of SYN=1, rand randomly generates a data packet with Sequence Number=1234567 to the server, host B knows by SYN=1, and A requests to establish a connection;

The second handshake: host B needs to confirm the connection information after receiving the request, and sends Acknowledgment=(seq+1 of host A), syn=1, ack=1, and rand randomly generates a packet of seq=7654321;

The third handshake: host A checks whether the Acknowledgment is correct after receiving it, that is, the seq number+1 sent for the first time, and whether the bit code ack is 1, if it is correct, host A will send ack number=(host B’s seq +1), ack=1, host B confirms the seq value and ack=1 after receiving it, the connection is established successfully.

The above pictures should be able to explain the details of the entire channel establishment process of tcp

How to close the TCP channel
First of all, let’s take a look at the two pictures: the
Insert picture description here
Insert picture description here
above two pictures should be well understood, they can already explain the process of TCP channel disconnection, which is what we often say waved four times, and the others
Not much to say, let’s focus on the more issues discussed in the four tcp recycling process, the case of too much time_wait:
This is because: Although both parties agree to close the connection, and the 4 packets of the handshake are also ...................... ........................................................;..................................................................................... Before Before................................... After all coordination and transmission are completed, it is reasonable to return to the CLOSED state directly (just like from the SYN_SEND state to the ESTABLISH state); but because we must assume that the network is unreliable, you cannot guarantee that the last ACK message you send will be The other party receives it, so the SOCKET in the LAST_ACK state of the other party may retransmit the FIN message because it has not received the ACK message due to timeout, so the function of this TIME_WAIT state is to retransmit the ACK message that may be lost, and guarantee Here. At this time, the client is closed, and the server will receive an RST, which is an error to prevent duplicate packets in the current connection from interfering with the next connection.

Close_wait (server side), fin_wait1 (client side) is too much:
close_wait should be a very short state according to normal operation. After receiving the client's fin packet and replying to the client ack, it will continue to send the fin packet to inform the client to close Close the connection, and then migrate to the Last_ACK state. However, too much close_wait can only indicate that there is no migration to Last_ACK, that is, whether the server sends the fin packet, and the migration occurs only when the fin packet is sent, so the problem is located in whether to send the fin packet. The underlying implementation of the fin package is actually to call the socket's close method. The problem here is that the close method is not executed (in fact, it is very simple to verify that you can see it without calling close in the program). It means that the server socket is busy with reading and writing, and the socket is not closed in time: for example, the I/O thread is accidentally blocked, or the proportion of user-defined tasks executed by the I/O thread is too high, which leads to untimely processing of I/O operations and failure of the link. Was released in time.
Program bug, the socket is not closed in time after receiving the FIN signal. This may be a Netty bug or a business layer bug; the
socket is not closed in time: for example, the I/O thread is accidentally blocked, or the I/O thread is executed by the user. The defined task ratio is too high, which causes the I/O operation to be processed untimely and the link cannot be released in time.

Solution:
First at the code level: call the close method after using the socket; second: socket read control, when the read length is 0 (read to the end), close immediately; third: if read returns -1, If an error occurs, check the error return code. There are three situations: INTR (interrupted, you can continue to read), WOULDBLOCK (indicating that the current socket_fd file descriptor is non-blocking, but it is now blocked), AGAIN (indicating that there is no data later) Reread). If it is not AGAIN, close immediately
can set the TCP connection duration keep_alive_time and tcp monitor the frequency of the connection and how long the connection is inactive for being forced to disconnect

In the case of rst, the
key code of the test is as follows:
Insert picture description here

  1. The other party's port is not open, and the client is connected to the unopened port (provided that it has been opened, the program crashes and restarts, or the crash has not started, if the link has not been established before, many systems will not actively reply to this): The
    Insert picture description here
    picture above is After the server is launched abnormally, the client actively connects, which will also trigger the server to actively send rst;

  2. When
    Insert picture description here
    closing Socket, recv buffer is not empty: The test results in the above figure show that closing a socket that has not received data will prompt the local end to actively send rst to the other party;

3. Receiving data
Insert picture description here
on a closed socket . The test results in the above figure show that recv data on a closed socket will trigger the kernel to actively send rst to the opposite end;

**Summary:** When the client always inexplicably receives the rst sent from the background, you can check the above three situations.
############################################## ###################
The basic concept of sliding window:
sender allows the number of packages sent to the network at the same time without the confirmation of recver. ???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

There are many parameters that affect actual business traffic in the TCP protocol. Here we mainly analyze the impact of the window.
​The problem solved by the sliding window:
In order to obtain the optimal connection rate, the TCP window is used to control the flow rate (flow control), and the sliding window is a main mechanism. This window allows the source to transmit data segments on a given connection without waiting for the target to return an ACK. In one sentence: the size of the window determines the amount of data to be transmitted without the need for an acknowledgement from the peer. ​Official definition: "The amount of octets that can be transmitted without receiving an acknowledgement from the other side".
Sliding window principle:
The sending window of the sender is calculated based on the response ACK of the receiving window of the receiving end. There is a Window Size field in the TCP header, which actually refers to the window of the receiving end, that is, the receiving window, which is used to inform the sender of itself The amount of data that can be received, so as to achieve part of the purpose of flow control. In fact, TCP is also measuring the current network status during the entire sending process, in order to maintain a healthy and stable sending process, such as congestion control. Data transmission requires confirmation from the opposite end. The sent data is divided into the following four categories:
Insert picture description here
(1) Has been sent and confirmed by the opposite end (outside the buffer outside the sending window)
(2) Confirmed data has been sent but not received (buffer in the sending window) Within the area​)
(3) Data that is allowed to be sent but not yet protected​ (in the buffer within the sending window)
(4) Not allowed if not sent (in the buffer outside the sending window​)

In this way, the TCP window gradually slides to send new data. The basis of the slide is that the sent data has received an ACK, and the opposite end has received it. Only then can the window slide to send new data. It can be seen that the window size has an important impact on the throughput, and the ACK response is closely related to the system delay. It should be noted that if the window of the sender is too large and the receiver closes the window, the processing will not work. On the contrary, if the window is set to be small, the bandwidth cannot be fully utilized. Therefore, carefully adjusting the window is very important for systems that adapt to different delays and bandwidth requirements. .

How does tcp ensure the sequence of the upper layer data of the application (take delayed confirmation ack, and reorganize the message according to the seq sequence number): The
tcp protocol stipulates that when a data segment is received, an acknowledgment needs to be sent to the other party, but if it is just a simple acknowledgment, the cost It will be relatively high (20-byte ip header, 20-byte tcp header), and it is best to send it to the other party with the response data. So when tcp sends an ack to the other party, there are the following rules:

  1. When there is response data to be sent, ack will immediately send the response data to the other party.

  2. If there is no response data, there will be a delay in sending the ack to wait to see if there is response data that can be sent together. This is called "Delayed Ack". But this delay will not exceed 500ms at most, generally 200ms. If it is within 200ms If there is data to send, then ack will be sent to the other party immediately along with the data. Note that the 200ms delay here does not refer to the longest waiting time difference from receiving the other party's data to sending the ack. It refers to a timer started by the kernel, It checks whether there is an ack to send every 200ms. For example, if the timer starts at 0ms and the other party's data segment arrives at 185ms, then the ack will be sent at 200ms at the latest instead of 385ms.

  3. If while waiting to send the ack, the second data segment of the other party arrives again, then the ack should be sent immediately. But if the three data segments of the other party arrive one after another, then the ack will be sent immediately when the second data segment arrives, but the first Whether the three data segments are sent immediately when they arrive depends on the above two.

TCP window size When the
earliest TCP protocol is used for large-scale network transmission, it actually does not exceed the connection speed of 56Kb/s. Therefore, only 16 bits are reserved in the TCP header to identify the window size, and the maximum allowable buffer size does not exceed 64KB. In order to break this limitation, RFC1323 specifies the TCP window size selection, which is negotiated during the three-way handshake at the beginning of the TCP connection (SYN, SYN-ACK, ACK), and a Window size scaling factor will be negotiated, and then the data will be exchanged. Window size value, so the final window size is the product of the two.
Window size value: 64 or 0000 0000 0100 0000 (16 bits)
​Window size scaling factor: 256 or 2 ^ 8 (as advertised by the 1st packet)
The actual window size is 16,384 (64 * 256)
The window size here means that until 16384 bytes are sent, it will stop waiting for the other party's ACK. As the two parties continue to talk back, the size of the window can be modified by the window size value parameter. Narrow or widen, but note: Window size scaling factor product factor must remain unchanged. The shift count specified in RFC1323 is 14, which means that the largest window can reach Gbit, which is very large, but this mechanism is not always enabled by default and related to the system. It seems that Linux is enabled by default, and Windows is disabled by default. .
Insert picture description hereWireshark packet capture example

Parameters of the TCP window is set
to play the role of controlling the flow rate of the TCP window, the actual use it is a double-ended coordination process also relates to the TCP slow start (Rapid Increase / Multiplicative Decrease), congestion avoidance, the congestion window and congestion control. It can be remembered that the sending rate is determined by min (congestion window, receiving window), and the receiving window will be discussed below.
TCP optimization settings window
TCP window now so important, then how to set up, a simple rule is two times the BDP. BDP here mean bandwidth-delay product, is the product of the bandwidth and latency of bandwidth for network Take the bandwidth of the worst connection.
buffer size = 2 * bandwidth * delay
there is a simple way to calculate using ping loopback network delay (RTT), then expressed as:
Buffer size = bandwidth * RTT
Why is 2 times? Because you can think of it this way, if the sliding window is the bandwidth delay, when the last byte of the data is sent, the opposite end has to return an ACK to continue sending, and it needs to wait for a one-way delay time, so when it is 2 times , It can just continue to send data while waiting for ACK, and when the ACK is received, the data is just sent, which improves efficiency.
For example: the bandwidth is 20Mbps, we calculate the one-way delay of 20ms through ping, then we can calculate: 20000000bps
8*0.02 = 52,428bytes​, so our optimal window uses 104,856 bytes = 2 x 52,428, so the sender needs to wait for an ACK response after sending 104,856 bytes of data. When half of the transmission is completed, the peer has already received it. And return ACK (ideally), wait until the ACK comes back, and send the remaining half, so the sender does not need to wait for the ACK to return.
Did you find it? The window here is obviously larger than 64KB, so the mechanism has been improved, the upper level.
TCP window flow control
and now we see in the end how to control traffic. TCP is closely related to windows size when transmitting data. The window itself is used to control the flow. When transmitting data, the sender's data exceeds the receiver and the packet will be lost. Flow control. Flow control requires the data transmission parties to declare each other during each interaction. The size of the receiving window "rwnd" is used to indicate the maximum amount of data that you can store. This is mainly for the receiver. In layman's terms, it is to let the sender know how many bowls of rice the receiver can eat. If the window decays to zero , That is, the sender can no longer send, then it means that you are full and must be digested. If the swelling is hard to support, it is a packet loss.

flow control

Slow Start​Although flow control can prevent the sender from overloading the receiver, it cannot avoid overloading the network. This is because the receiving window "rwnd" only reflects the situation of the individual server, but not the overall situation of the network.
In order to avoid network overload, slow start introduces the concept of a congestion window "cwnd", which is used to indicate the maximum amount of unconfirmed data that the sender is allowed to transmit before being confirmed by the receiver. The difference between "cwnd" and "rwnd" is that it is only an internal parameter of the sender and does not need to be notified to the receiver. Its initial value is often relatively small. Then, as the data packet is confirmed by the receiver, the window expands exponentially. It's a bit similar to a boxing match. At the beginning, I didn't understand the enemy's situation, and it was often a temptation. I gradually became more confident and began to gradually increase the intensity of the heavy punch attack.
Insert picture description here
Congestion window expansion
During the slow start process, as the "cwnd" increases, network overload may occur. Its external manifestation is packet loss. Once such a problem occurs, the size of the "cwnd" will quickly decay so that the network can Slow down.
Insert picture description here

Congestion window and packet loss

Note: The size of the unconfirmed data actually transmitted on the network depends on the small value of "rwnd" and "cwnd".
Congestion Avoidance​ From the introduction of slow start, we can see that the sender can avoid network overload by controlling the size of "cwnd". In this process, packet loss is not so much a network problem, it is more of a problem. A feedback mechanism through which we can perceive the occurrence of network congestion, and then adjust the data transmission strategy. In fact, there is also the concept of a slow start threshold "ssthresh". If "cwnd" is less than "ssthresh", it means that it is slow. Start-up phase; if "cwnd" is greater than "ssthresh", it means that it is in the congestion avoidance phase. At this time, "cwnd" no longer increases exponentially like the slow-start phase, but tends to increase linearly in order to avoid network congestion. This stage There are multiple algorithm implementations, usually keep the default.
​How to adjust "rwnd" to a reasonable value. Many times the TCP transmission rate is abnormally low. It is likely that the receiving window "rwnd" is too small, especially for networks with large delays. In fact, the receiving window "rwnd" is The reasonable value depends on the size of the BDP, which is the product of bandwidth and delay. Assuming the bandwidth is 100Mbps and the delay is 100ms, the calculation process is as follows:
BDP = 100Mbps * 100ms = (100 / 8) * (100 / 1000) = 1.25MB

​If you want to maximize throughput under this problem, the size of the receiving window "rwnd" should not be less than 1.25MB.
How to adjust "cwnd" to a reasonable value Generally speaking, the initial value of "cwnd" depends on the size of MSS. The calculation method is as follows:

min(4 * MSS, max(2 * MSS, 4380))

The MSS size of the Ethernet standard is usually 1460, so the initial value of "cwnd" is 3MSS. When we browse videos or download software, the effect of the initial value of "cwnd" is not obvious. This is because the amount of data transmitted is relatively large and the time is relatively long. In contrast, even if the initial value of "cwnd" is compared in the slow start phase If it is small, it will also accelerate to the full window in a relatively short time, which is basically negligible. The following diagram using IxChariot completed once
Insert picture description here
the picture is not really a network environment of slow start and congestion control process, to achieve their own user mode protocol stack own set of strategies, but really the environment and the only difference is the size of the window changes the sender Strategy. The process in the figure below uses a strategy that responds to ack+1 and is smaller than the size of the receiver's window; the real strategy should be cwnd=cwnd*2, just to illustrate that the sliding window and congestion control process is sent by The party decides jointly based on the network environment and the receiver’s buffer capacity.
Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here

Insert picture description here

Set cwnd

But when we browse the web, the situation is different. This is because the amount of data transferred is relatively small and the time is relatively short. In contrast, if the initial value of "cwnd" in the slow start phase is relatively small, then it is probably not When it is too late to speed up to the full window, the communication is over. This is like Bolt participating in a 100-meter race. If he starts slowly, even if he accelerates quickly, he may not get good results, because he hasn’t waited for him to run completely and the finish line is already here.

Guess you like

Origin blog.csdn.net/wangrenhaioylj/article/details/108372258