Computer Network Notes-4 (Detailed Explanation of TCP Protocol)

TCP header

The TCP header structure is shown in the figure
Insert picture description here

  • The 16-bit port number stores the source and destination ports. For TCP communication, the client uses the temporary port number automatically selected by the system, and the server uses the well-known service port number (the port numbers used by all well-known services are defined in /etc/services).

  • A 32-bit sequence number, the number of each byte of a byte stream in a transmission direction. The initial sequence number is a random value ISN (Initial Sequence Number) initialized by the system. Note that a TCP segment contains a piece of data, and its sequence number value is the sequence number value of the first byte.

  • A 32-bit confirmation number. If A and B are in TCP communication, the TCP segment sent by A not only carries its own serial number, but also contains the confirmation number of the TCP segment sent by B. The value of the acknowledgment number is the sequence number value of the received TCP segment plus one.

  • The 4-bit header length indicates how many 4-bytes the TCP header has. The maximum TCP header is 15 * 4 bytes.

  • The 6-bit flag contains the following items:

    1. The URG flag indicates whether the emergency pointer is valid.
    2. The ACK flag indicates whether the confirmation number is valid. The TCP segment that carries the ACK flag is called the acknowledgment segment.
    3. The PSH flag indicates that the receiving application should immediately read data from the TCP receive buffer area to make room for subsequent data (if it is not read, it will always stay in the TCP receive buffer area)
    4. The RST sign indicates that the other party is required to re-establish the connection. The TCP segment carrying the RST flag is called the reset segment.
    5. The SYN flag indicates that a connection is requested. It is called a synchronization segment.
    6. The FIN flag indicates that the other party is notified that the connection is to be closed. It is called the end segment.
  • The 16-bit window size is used for flow control. The window here is the Receiver Window (RWND). It is used to tell the other party how many bytes of data can be accommodated in the receiving buffer area of ​​the local end, so that the other party can control the speed of sending data.

  • The 16-bit checksum is filled by the sender, and the receiver uses the CRC algorithm to check (not only check the header, but also check the data).

  • The 16-bit emergency pointer is an offset. The segment sequence number value plus this offset represents the sequence number of an emergency data, which is used by the sender to send emergency data to the receiver.

The above is a fixed field, occupying 20 bytes, and the following 40 bytes are the option field

TCP three-way handshake

The TCP three-way handshake process is shown in the figure. The
Insert picture description here
three-way handshake actually means that when a TCP connection is established, the client and server need to send a total of 3 packets. The main function of the three-way handshake is to confirm whether the receiving and sending capabilities of both parties are normal, and to specify its own initialization sequence number to prepare for the subsequent reliable transmission. The essence of the designated port is actually connected to a server, establish a TCP connection, connect and synchronize both the sequence and acknowledgment numbers, exchange TCP window size information.

Assuming that the client is in the Closed state and the server is in the Listen state at the beginning, the detailed process of the three-way handshake is as follows:

  • The first handshake: The client sends a SYN message to the server and indicates the client's initial sequence number ISN. At this time, the client is in the SYN_SENT state.

    The synchronization bit of the header SYN=1, the initial sequence number seq=x, and the segment of SYN=1 cannot carry data, but it consumes a sequence number.

  • Second handshake: After the server receives the SYN message from the client, it will respond with its own SYN message and also specify its own initialization sequence number ISN(s). At the same time, the client's ISN + 1 will be used as the ACK value, indicating that it has received the client's SYN, and the server is in the SYN_RCVD state.

    In the confirmation segment, SYN=1, ACK=1, confirmation number ack=x+1, and initial sequence number seq=y.

  • The third handshake: After the client receives the SYN message, it will send an ACK message. Of course, the server's ISN + 1 is the same as the ACK value, indicating that it has received the server's SYN message. At this time, the client In ESTABLISHED state. After the server receives the ACK message, it is also in the ESTABLISHED state. At this time, the two parties have established a connection.

    Acknowledgment segment ACK=1, acknowledgment number ack=y+1, sequence number seq=x+1 (initially seq=x, so the second segment needs to be +1), ACK segment can carry data, no Carrying data does not consume the serial number.

The end that sends the first SYN will perform an active open, and the other end that receives this SYN and sends it back to the next SYN will perform a passive open.

In socket programming, when the client executes connect(), a three-way handshake will be triggered.

Reasons for the "three-way" handshake

The purpose of each handshake is as follows:

  • The first handshake: the client sends a network packet and the server receives it.
    In this way, the server can conclude that the sending ability of the client and the receiving ability of the server are normal.
  • The second handshake: The server sends the package and the client receives it.
    In this way, the client can conclude that the receiving and sending capabilities of the server and the client's receiving and sending capabilities are normal. However, at this time, the server cannot confirm whether the client's receiving capability is normal.
  • The third handshake: the client sends the package and the server receives it.
    In this way, the server can conclude that the client's receiving and sending capabilities are normal, and the server's own sending and receiving capabilities are also normal.

Therefore, a three-way handshake is required to confirm whether the receiving and sending capabilities of both parties are normal.

Semi-connected queue

After the server receives the SYN from the client for the first time, it will be in the SYN_RCVD state. At this time, the two parties have not fully established their connection. The server will put the request connection in this state in a queue. We call this queue Semi-connected queue .

Of course, there is also a fully connected queue , that is, the three-way handshake has been completed, and those that have established a connection will be placed in the fully connected queue. If the queue is full, packet loss may occur.

Regarding the number of SYN-ACK retransmissions: After the
server sends the SYN-ACK packet, if it does not receive the client confirmation packet, the server retransmits for the first time, waits for a period of time and does not receive the client confirmation packet, and then performs the second retransmission. If the number of retransmissions exceeds the maximum number of retransmissions specified by the system, the system deletes the connection information from the semi-connection queue.
Note that the waiting time for each retransmission is not necessarily the same, it will generally increase exponentially, for example, the interval time is 1s, 2s, 4s, 8s...

ISN

When one end sends its SYN to establish a connection, it chooses an initial sequence number for the connection. The ISN changes over time, so each connection will have a different ISN. ISN can be regarded as a 32-bit counter, which increases by 1 every 4ms. The purpose of selecting the sequence number in this way is to prevent the delayed packet in the network from being transmitted in the future, which may cause a connected party to interpret it incorrectly.

One of the important functions of the three-way handshake is that the client and the server exchange ISN (Initial Sequence Number) so that the other party knows how to assemble the data according to the sequence number when receiving the data next. If the ISN is fixed, the attacker can easily guess the subsequent confirmation number, so the ISN is dynamically generated.

Carry data

The third handshake can carry data. But the first and second handshake cannot carry data

If the first handshake can carry data, if someone wants to maliciously attack the server, he will put a lot of data in the SYN message in the first handshake every time. Because the attacker simply ignores whether the server's receiving and sending capabilities are normal, and then frantically repeats sending SYN messages, which will make the server spend a lot of time and memory space to receive these messages.

SYN attack

The resource allocation on the server side is allocated during the second handshake, and the resources on the client side are allocated when the three-way handshake is completed , so the server is vulnerable to SYN flooding attacks.

The SYN attack is that the client forges a large number of non-existent IP addresses in a short period of time, and continuously sends SYN packets to the server, and the server replies with confirmation packets and waits for the client to confirm. Since the source address does not exist, the server needs to retransmit until the timeout expires. These forged SYN packets will occupy the unconnected queue for a long time, causing normal SYN requests to be discarded because the queue is full, causing network congestion and even system paralysis. SYN attack is a typical DoS/DDoS attack.

It is very convenient to detect SYN attacks. When you see a large number of semi-connected states on the server, especially the source IP address is random, you can basically conclude that this is a SYN attack. On Linux/Unix, you can use the netstat command that comes with the system to detect SYN attacks.

netstat -n -p TCP | grep SYN_RECV

The common methods to defend against SYN attacks are as follows:

  • Shorten the SYN Timeout time
  • Increase the maximum number of semi-connections
  • Filter gateway protection
  • SYN cookies technology

TCP waved four times

The process of TCP's four waves of
Insert picture description here
hands is shown in the figure. Establishing a connection requires three handshake, and terminating a connection requires four waves of hands. This is caused by the TCP half-close result (half-close) a. The so-called half-close, in fact, is that TCP provides the ability for one end of the connection to receive data from the other end after it finishes sending it.
Either the client or the server can actively initiate a wave action. At the beginning, both parties are in the ESTABLISHED state, if the client initiates a shutdown request first. The process of waving four times is as follows:

  • Wave for the first time: The client sends a FIN message with a serial number specified in the message. At this time, the client is in the FIN_WAIT1 state.

    That is, the connection release segment (FIN=1, sequence number seq=u) is sent, and the data is stopped again, the TCP connection is actively closed, and the FIN_WAIT1 (termination waiting 1) state is entered, waiting for the server's confirmation.

  • Second wave: After receiving the FIN, the server will send an ACK message, and use the client's serial number value + 1 as the serial number value of the ACK message, indicating that the client's message has been received. At this time, the server In CLOSE_WAIT state.

    That is, the server sends an acknowledgment segment (ACK=1, confirmation number ack=u+1, sequence number seq=v) after receiving the connection release segment, and the server enters the CLOSE_WAIT (close waiting) state. At this time, TCP In the half-closed state, the connection from the client to the server is released. After receiving the confirmation from the server, the client enters the FIN_WAIT2 (termination waiting 2) state, and waits for the connection release message segment sent by the server.

  • Third wave: If the server also wants to disconnect, just like the first wave of the client, send a FIN message and specify a serial number. At this time, the server is in the state of LAST_ACK.

    That is, the server has no data to send to the client, the server sends a connection release segment (FIN=1, ACK=1, sequence number seq=w, confirmation number ack=u+1), and the server enters LAST_ACK (final confirmation) ) Status, waiting for confirmation from the client.

  • Fourth wave: After receiving the FIN, the client sends an ACK message as a response, and uses the server's serial number value + 1 as the serial number value of its own ACK message. At this time, the client is in the TIME_WAIT state. It takes a while to ensure that the server will enter the CLOSED state after receiving its own ACK message. After the server receives the ACK message, it will close the connection and be in the CLOSED state.

    That is, after the client receives the connection release segment from the server, it sends an acknowledgment segment (ACK=1, seq=u+1, ack=w+1), and the client enters the TIME_WAIT (time waiting) state. At this time, the TCP is not released, and the client enters the CLOSED state after the time 2MSL set by the waiting timer has elapsed.

Receiving a FIN only means that there is no data flow in this direction. It is normal for the client to perform active shutdown and enter TIME_WAIT. The server usually performs passive shutdown and will not enter the TIME_WAIT state.

In socket programming, any party performs a close() operation to generate a wave operation.

Reasons for waving "four times"

When the server receives the SYN connection request message from the client, it can directly send the SYN+ACK message. The ACK message is used for reply, and the SYN message is used for synchronization. But when the connection is closed, when the server receives the FIN message, it may not immediately close the SOCKET, so it can only reply with an ACK message first and tell the client, "I received the FIN message you sent." Only after all the messages of my server have been sent, I can send FIN messages, so I can't send them together. Therefore, four waves are required.

TIME_WAIT status

Each specific TCP implementation must choose a message segment maximum lifetime MSL (Maximum Segment Lifetime), which is the longest time in the network before any message segment is discarded. This time is limited because TCP segments are transmitted in the network as IP datagrams, and IP datagrams have a TTL field that limits their lifetime.

For a given MSL value for a specific implementation, the processing principle is: when TCP performs an active close and sends back the last ACK, the connection must stay in the TIME_WAIT state for 2 times the MSL. This allows TCP to send the last ACK again to prevent this ACK from being lost (the other end times out and resends the last FIN).

Another result of this 2MSL waiting is that the socket that defines this connection (the client's IP address and port number, the server's IP address and port number) can no longer be used during the 2MSL waiting period of this TCP connection. This connection can only be used after the end of 2MSL.

Waiting for 2MSL meaning

In order to ensure that the last ACK segment sent by the client can reach the server. Because this ACK may be lost, the server in the LAST-ACK state cannot receive the FIN-ACK confirmation message. The server will retransmit the FIN-ACK after a timeout, and then the client will retransmit an acknowledgment and restart the time waiting timer. Finally, both the client and the server can be shut down normally. Suppose the client does not wait for 2MSL, but releases the closure directly after sending the ACK. Once the ACK is lost, the server cannot normally enter the closed connection state.

Two reasons:

  1. Ensure that the last ACK segment sent by the client can reach the server.
    This ACK segment may be lost, so that B in the LAST-ACK state cannot receive the confirmation of the sent FIN+ACK segment. The server retransmits the FIN+ACK segment over time, and the client can After receiving this retransmitted FIN+ACK segment within 2MSL, the client retransmits an acknowledgment, restarts the 2MSL timer, and finally both the client and server enter the CLOSED state. If the client is in the TIME-WAIT state Do not wait for a period of time, but release the connection immediately after sending the ACK segment, you will not be able to receive the FIN+ACK segment retransmitted by the server, so the confirmation segment will not be sent again, and the server will not be normal Enter the CLOSED state.

  2. Prevent "failed connection request message segment" from appearing in this connection.
    After the client finishes sending the last ACK segment, after 2MSL, all the segments generated during the duration of this connection can disappear from the network, so that this will not appear in the next new connection. An old connection request segment.

TCP state transfer process

The TCP state transition process is shown in the figure. The
Insert picture description here
thick dotted line represents the state transition of the server-side connection, and the thick solid line represents the state transition of the client connection.

TCP reliability

TCP uses mechanisms such as sequence number, acknowledgement response, timeout retransmission, window control, and congestion control to ensure reliability.

Sequence number, confirmation response, timeout retransmission

When the data arrives at the receiver, the receiver needs to send a confirmation response to indicate that the data segment has been received, and the confirmation sequence number will indicate the data sequence number it needs to receive next. If the sender does not receive the confirmation response late, it may be that the sent data is lost, or the confirmation response is lost. At this time, the sender will retransmit after waiting for a certain period of time.

Window control and high-speed retransmission control/fast retransmission (repeat confirmation response)

TCP will use window control to increase the transmission speed, which means that within a window size, you don't have to wait for a response to send the next piece of data. The window size is the maximum value that can continue to send data without waiting for confirmation. If window control is not used, every data that has not received a confirmation response must be retransmitted.
Using window control, if the data segment 1001-2000 is lost, every time the subsequent data is transmitted, the confirmation response will continuously send a response with the sequence number 1001, indicating that I want to receive the data starting from 1001. If the sender receives the same response three times, It will be resent immediately.

Congestion control

If the window is set to be large and the sender continuously sends a large amount of data, it may cause network congestion or even network paralysis. Therefore, TCP performs congestion control in order to prevent this situation.
Insert picture description here

  • Slow start (slow start): Define the congestion window (cwnd), set the window size to 1 at the beginning (generally set the size to 2-4 SMSS, the maximum data segment size of the sender), and then receive an acknowledgement each time ( After a rtt), the congestion window size*2.
  • Congestion avoidance: Set the slow start threshold (ssthresh), generally set to 65536 at the beginning (16 in the figure). Congestion avoidance means that when the size of the congestion window reaches this threshold, the value of the congestion window no longer rises exponentially, but increases incrementally (each confirmation response/each rtt, congestion window size +1) to avoid congestion.
    Regarding the timeout retransmission of the message segment as congestion, once a timeout retransmission occurs, we need to first set the threshold to half of the current window size, and set the window size to the initial value 1, and then re-enter the slow start process.
  • Fast retransmission: When encountering 3 repeated confirmation responses (high-speed retransmission control), it means that 3 message segments have been received, but the previous segment is lost, and it will be retransmitted immediately. The process of fast retransmission and fast recovery is as follows:
    1. When the third repeated confirmation segment is received, change ssthresh, ssthresh = max (FlightSize / 2, 2 * SMSS), and immediately retransmit the lost segment, cwnd= ssthresh + 3 * SMSS
    2. Each time a duplicate confirmation message segment is received, cwnd = cwnd + SMSS. At this time, the sender can send a new TCP segment (if the new cwnd allows it).
    3. When receiving confirmation of new data, set cwnd= ssthresh (ssthresh is the new slow start threshold calculated in the first step).

This can be achieved: during TCP communication, the network throughput gradually increases, and as congestion reduces the throughput, and then enters the process of slowly increasing, the network will not be easily paralyzed.

The congestion control algorithm mentioned in this article is outdated, and Google’s BBR algorithm is now commonly used.

BBR algorithm

The BBR algorithm has two characteristics:

  1. The feedback is poor. For example, Cubic has implemented a tall window-enhancing mechanism based on the convex-concave curve of the cubic equation, but this sawtooth drop is too strong, and the strategy to avoid blocking is too conservative (violent conservative).
  2. The congestion algorithm is taken over.
    When the TCP congestion control mechanism detects packet loss (ie RTO or N repeated ACKs, etc.), TCP will completely take over the congestion control algorithm and control the congestion window by itself. However, the problem is that this so-called packet loss may not be a real packet loss, it is just that TCP considers a packet loss.

Generally speaking, the congestion control logic before BBR will be divided into two stages in the execution process, namely the normal stage and the abnormal stage. In the normal phase, the TCP modular congestion control algorithm dominates the adjustment of the window. In the abnormal phase, the TCP core congestion control state machine takes over the calculation of the window from the congestion control algorithm. The logic is as follows:

static void tcp_cong_control(struct sock *sk, u32 ack, u32 acked_sacked,  int flag)  
{  
    if (tcp_in_cwnd_reduction(sk)) { // 异常模式  
        /* Reduce cwnd if state mandates */  
        // 在进入窗口下降逻辑之前,还需要tcp_fastretrans_alert来搜集异常信息并处理异常过程。  
        tcp_cwnd_reduction(sk, acked_sacked, flag);  
    } else if (tcp_may_raise_cwnd(sk, flag)) { // 正常模式或者安全的异常模式!  
        /* Advance cwnd if state allows */  
        tcp_cong_avoid(sk, ack, acked_sacked);  
    }  
    tcp_update_pacing_rate(sk);  
}  

BBR no longer allows the congestion control state machine to take over in abnormal mode:

static void tcp_cong_control(struct sock *sk, u32 ack, u32 acked_sacked,  int flag, const struct rate_sample *rs)  
{  
    const struct inet_connection_sock *icsk = inet_csk(sk);  
    // 这里是新逻辑,如果回调中宣称自己有能力解决任何拥塞问题,那么交给它 
    if (icsk->icsk_ca_ops->cong_control) {  
        icsk->icsk_ca_ops->cong_control(sk, rs);  
        // 直接return!TCP核心不再过问。  
        return;  
    }  
    // 这是老的逻辑。  
    if (tcp_in_cwnd_reduction(sk)) {  
        /* Reduce cwnd if state mandates */  
        // 如果不是Open状态...记住,tcp_cwnd_reduction并不受拥塞控制算法控制
        tcp_cwnd_reduction(sk, acked_sacked, flag);  
    } else if (tcp_may_raise_cwnd(sk, flag)) {  
        /* Advance cwnd if state allows */  
        tcp_cong_avoid(sk, ack, acked_sacked);  
    }  
    tcp_update_pacing_rate(sk);  
}  

BBR continuously collects the maximum bandwidth max-bw and the minimum RTT min-rtt in the time window of the connection, and calculates the transmission rate and congestion window based on this, and adjusts the gain coefficient according to the feedback actual bandwidth bw and max-rtt.

The BBR algorithm eliminates unnecessary aliasing. This kind of sawtooth is simply the power source of TCP before BBR. Various algorithms blindly increase the window. Once TCP thinks that packet loss occurs (although it may not be true packet loss. Therefore, there are various more and more complex mechanisms. For example, DSACK and the like...), after leaving a ssthresh, all logic is taken over, and here is where the jagged tooth tip is. In fact, the sawtooth is formed by the "work handover" between the TCP congestion state machine control logic and the TCP congestion control algorithm when a congestion event occurs. The BBR algorithm cancels this unnecessary handover, so the sawtooth naturally becomes blunt. It's even flattened.
It's not that Vegas, CUBIC, etc. cannot detect congestion, but TCP does not give them full power. This may actually be the practice in the original TCP implementation, such as the concept of ssthresh, which is actually not needed in many algorithms. BBR does not use ssthresh (ssthresh reflects the coupling between the congestion algorithm and the TCP congestion state machine. BBR does not have this coupling, so ssthresh is not needed).

Four waving draw three-way handshake ape Valley article

Guess you like

Origin blog.csdn.net/MinutkiBegut/article/details/113848991