Combing TCP Protocol of Computer Network

Preface

This article combs the TCP protocol
TCP: Transmission Control Protocol, transmission control protocol

Main features of TCP

  • The connection-oriented transmission protocol
    means that the application program must first establish a TCP transmission connection before using TCP. After the data transmission is completed, the established TCP transmission connection must be released.
  • Only supports unicast transmission.
    Each TCP transmission connection can only have two endpoints (socket), and can only carry out point-to-point data transmission, and does not support multicast and broadcast transmission methods.
  • Provide reliable delivery service
    TCP can be error-free, non-loss, non-repetitive, and reach the opposite end according to time sequence.
  • The transmission unit is the data segment
    TCP still uses the traditional "data segment" as the data transmission unit.
    The data segment refers to the data block obtained by TCP segmenting the data received from the application layer.
  • Only one TPDU format
    Because the TCP data segment header has included the characteristic fields required by various TPDUs, it is mainly realized through multiple control actions among them.

  • Both ends of the TCP connection that support full-duplex communication are equipped with sending and buffering to temporarily store data for two-way communication.
  • TCP connections are based on byte streams, rather than message streams.
    TCP does not transmit individual messages independently like UDP, but instead transmits in byte stream mode without preserving message boundaries.
  • The TCP data segment size and the number of data segments sent each time are variable.
    The data segment size of TCP transmission is determined according to the window size given by the other party and the currently available sending window (the current network congestion); it is also subject to The size of the message transmitted by the application layer is determined by the size of the MTU value in the network that is
    routed ; in this way, the number of TCP data segments that can be sent at a time is also not fixed; you can send only one TCP data segment at a time, or you can send multiple TCP data at a time Segment, as long as it is within the currently available 发送窗口大小limits.
    In addition, if the data transmitted to the TCP buffer by the application process is too long, TCP can segment it; conversely, if the data transmitted to the TCP buffer is too small, the TCP will wait for enough data in the buffer to assemble it into A data segment is sent together (dipping and unpacking).

TCP segment

Message

The uppercase words of ACK, SYN, and FIN represent flag bits, and their value is either 1 or 0;
lowercase words of ack and seq represent sequence numbers.

  • The source port and destination port
    respectively represent the TCP port numbers of the calling party and the called party. A port and its host IP address can completely identify an endpoint, namely Socket.
    Port is the concept of the transport layer. The source IP and destination IP will be added to the IP header at the network layer.

  • Sequence number seq:
    occupies 4 bytes, used to mark the sequence of the data segment, TCP encodes all data bytes sent in the connection with a sequence number, and the number of the first byte is randomly generated locally; encode the byte After the sequence number is added, a sequence number is assigned to each segment; the sequence number seq is the data number of the first byte in this segment.

  • Acknowledgement number ack:
    occupies 4 bytes, expecting to receive the
    sequence number of the first data byte of the next message segment from the other party; sequence number represents the number of the first byte of data carried in the message segment; and the confirmation number refers to It is the number of the next byte expected to be received; therefore, the number of the last byte of the current segment +1 is the confirmation number.

  • Synchronization SYN: Used to synchronize the serial number when the connection is established.
    When SYN=1 and ACK=0, it means: this is a connection request segment. If the connection is agreed, SYN=1 and ACK=1 in the response segment.
    Therefore, SYN=1 means that this is a connection request or connection acceptance message.
    The SYN flag will be set to 1 only when the TCP connection is established, and the SYN flag will be set to 0 after the handshake is completed.

  • Acknowledgment ACK:
    occupies 1 bit, only when ACK=1, the confirmation number field is valid. When ACK=0, the confirmation number is invalid

  • Terminate FIN:
    used to release a connection.
    FIN=1 means: the data of the sender of this segment has been sent, and the transport connection is required to be released

  • Window size
    Indicates the window size used to store the incoming data segment on the host sending this TCP data segment, that is, the maximum number of bytes that the sender can currently receive. The value of the "window size" field tells the host receiving this data segment that, starting from the value of the "confirmation number" set in this data segment, the number of bytes that the local end currently allows the opposite end to send is used as a setting for the other end to send The basis of the window size.

Why TCP is a reliable connection

Check, serial number, confirmation, retransmission.

  1. Byte numbering mechanism The
    TCP data segment is 数据部分numbered one by one in the data segment in bytes to ensure that the data of each byte can be transmitted and received in order.
    In the "data" part of the data segment sent by TCP (excluding the header of the TCP data segment), each byte has a sequence number, and the "sequence number" field in each data segment is based on the first byte in the data segment The serial number is filled.

  2. Data segment acknowledgment mechanism
    TCP requires that every time a data segment is received, the receiver must return an acknowledgment data segment ACK to the sender. The acknowledgment number indicates that the receiver has the correct interface data segment sequence number (all data segments before the confirmation number) .
    ACK is a flag indicating whether the "acknowledgement number" field is valid. Only when the value of the ACK field is 1, the "confirmation number" in the data segment is meaningful, otherwise the "confirmation number" in the data segment is meaningless, that is, it does not have the meaning of the "confirmation number" mentioned above.

    • TCP can send multiple data segments continuously at one time. TCP can send multiple data segments continuously at one time
      without waiting to receive the confirmation data segment sent by the other party (data segment with the "ACK" field being 1), which can greatly improve the efficiency of data transmission. However, the number of data segments that can be sent at one time is limited by both the value of the "window size" field returned by the other party and the size of the currently available "sending window". Because the sender needs to buffer the data segment that has not yet received the confirmation, it needs to occupy a certain "sending window" size.
    • Only the continuously received data segments are confirmed.
      Assuming that the length of each data segment is 100 bytes, the receiving end has received four data segments with sequence numbers 1, 101, 201, and 401. The data segment with serial number 301 has not been received temporarily. At this time, the "confirmation number" in the confirmation data segment returned by the receiving end can only be 301, not 501, that is, only the first three data segments are confirmed. The following 401 data segment will be confirmed because the middle 301 data segment has not been received. When the 301 data segment is received later, a data segment with a "confirmation number" of 501 may be returned, which means that both the 301 and 401 data segments have been received correctly.
    • Data with non-contiguous serial numbers will be cached first. For
      example, a host has successively received data segments with serial numbers 1, 101, 201, 301, 601, 401, 801, and 501 from the opposite end (assuming that the data segments are all 100 bytes in size) , The host first submits the four data segments 1, 101, 201, and 301 to the application layer, and sends a confirmation data segment with a "confirmation number" of 401 to the sender, so that this can be deleted from the "receiving window" Four data segments, release the "receiving window"; then the three data segments 601, 401, and 801 are first buffered in the "receiving window" until data segment 501 is received, and then reorganized in the order of 401, 501, and 601 And submit it to the application layer, and then send a confirmation data segment with a "confirmation number" of 701, so that these three data segments can be deleted from the "receiving window" and the "receiving window" is released, but at this time in the "receiving window" There is still data segment 801 in the cache because the data segment 701 has not been confirmed yet.

  3. There is a retransmission timer in the timeout retransmission mechanism TCP, which is also started when sending a data segment. If the data segment has not been confirmed by the other party before the timer expires, the timer stops and the data segment corresponding to the sequence number is retransmitted.
    TCP uses an adaptive algorithm to dynamically change the timeout retransmission time.
    At the same time, there is also a fast retransmission mechanism, which can be sent immediately without waiting for the timer to expire.

  4. Selective ACK (Selective ACK, SACK) mechanism
    With the support of SACK, only the missing part of the data can be retransmitted, but the data that has been received correctly will not be retransmitted.
    Assuming that the receiving end has received the data segments of the five sequence numbers 1, 101, 201, 401, and 501, when sending the confirmation data segment with the confirmation number 301, mark 401 in the SACK extended option (starting sequence number is 401, ending The serial number is 500) and 501 (starting serial number is 501, ending serial number is 600) these two discontinuous data segments. At this time, the sender will know that there is no need to send the two data segments 401 and 501, just send the data segment 301. This greatly saves network resources and also improves data transmission efficiency.

flow control

流量控制It is based on the data sending and receiving rate matching considerations of the two communication parties. The ultimate goal is not to send data too fast so that the receiving end can receive it in time. It is a point-to-point behavior at both ends of a link. That is to prevent the sender from sending data too fast and let the receiver have time to receive it.

  • The purpose of flow control: not to send data too fast, so that the receiving end can receive it in time

  • Flow control scheme: TCP uses a sliding window mechanism to achieve flow control .

  • In the communication process, the receiver dynamically adjusts the sender's sending window size according to the size of its receiving buffer, that is, the interface window rwnd (the receiver sets the window field of the confirmation segment to notify the sender of rwnd), and the sender's Send the minimum value of the window 取接收窗口rwndsum 拥塞窗口cwnd.

  • Persistence timer
    TCP sets a persistence timer for each connection. As long as one party of the TCP connection receives the zero window notification from the other party, the persistence timer is started. If the time set by the continuous timer expires, a probe segment with a zero window is sent, and the receiver will give the current window value when it receives the probe segment. If the window is still 0, the sender resets the duration timer.
    image.png

Congestion control

  • Conditions for congestion: sum of resource requirements> available resources.
    Congestion control is based on the bandwidth of each link in the network and the data processing capabilities of intermediate devices. Do not block data transmission in the network, that is, do not send The data sent by the end is greater than the data processing capacity of the receiving end, which is an end-to-end behavior.
    The causes of congestion can be complex. For example, if the buffer space of the node device in the entire link of the TCP connection is too small, the data forwarding capability is too low, the bandwidth of a certain link is too small, and the peer data receiving capability is low, it may cause network congestion.
    For example, a user wants to unilaterally improve the data forwarding capacity of the intermediate router node, but ignores the bandwidth of each link along the path. As a result, although the router forwarding performance is improved, the data forwarding speed is also increased, but at the same time As a result, the data queued on the link continues to increase, which not only fails to solve the network congestion problem, but makes the network congestion more serious.
    For another example, the user wants to unilaterally improve the router’s cache capacity, but does not simultaneously consider the router’s data forwarding capacity and link bandwidth. As a result, although more data can be temporarily cached on the router, it is queued in the cache. The waiting time for data will be longer, because the queue is longer than before, and as a result, the data will be retransmitted due to timeout. The more data is retransmitted, the network load will exceed heavier, eventually leading to more serious congestion.

  • The purpose of congestion control: to prevent excessive data from being injected into the network

  • The congestion control scheme is:
    slow start, congestion avoidance; fast retransmission, fast recovery

  • Sending window = min (receiving window rwnd, congestion window cwnd)
    Receiving window: the receiver will notify the sender according to the value set by the receiving buffer, reflecting the receiver's capacity.
    Congestion window: The window value set by the sender according to the degree of network congestion estimated by itself, reflecting the current capacity of the network.

Slow start and congestion avoidance

Slow start refers to an initial TCP congestion prevention program to avoid network congestion. The basic idea is that when the TCP connection officially transmits data, the size 拥塞窗口of the data that can be sent each time is gradually increased, that is, some small bytes of tentative data are sent first, and after receiving the confirmation of these data segments, Slowly increase the amount of data sent until it reaches a previously set limit 慢启动阈值SSTHRESH.
When 拥塞窗口大小CWNDit is greater than or equal to SSTHRESH again, the "congestion avoidance" solution is initiated. The basic idea is that when the CWND value reaches SSDHRESH for the second time, the "congestion window" size will only increase by 1 (the new CWND) after each RTT (the time required for a data segment to go back and forth between the receiving end and the sending end). Only increase the size of one MSS instead of several times the original CWND), make it slowly increase in a linear manner, instead of continuing to increase exponentially as in the "slow start" scheme. Obviously, this CWND growth rate is obviously slower than the CWND growth rate in the "slow start" scheme. When data loss occurs again, SSTHRESH will be reduced to half of the current CWND, and CWND will be set to 1, re-enter the "slow start" data sending process, and so on.

The slow start increases exponentially until it is reached 慢启动阈值SSTHRESH, and congestion avoidance 加法增大(linear) is performed. When data loss occurs, the SSTHRESHmultiplication is reduced to half of the original value, and at the same time, it is cwndset to 1, and the slow start phase is re-entered.

image.png

Fast retransmission and fast recovery

  • Fast retransmission
    When the receiving end receives a data segment that does not arrive in order, the TCP entity quickly sends a repeated ACK data segment, instead of waiting for data to be sent, and sends out an acknowledgment; when three repeated ACK data segments are received repeatedly Later, it is considered that the data segment corresponding to the "acknowledgement number" field has been lost, and TCP does not wait for the retransmission timer to expire before retransmitting the data segment that appears to have been lost.
    image.png


  • The basic idea of ​​the fast recovery fast recovery algorithm: when the third repeated ACK is received, the current CWND value is set to half of the current SSTHRESH value to reduce the network load, and then the "congestion avoidance" algorithm introduced above is executed to make the CWND value Increase slowly to avoid network congestion again.
    image.png

Reference:
TCP flow control, congestion control

TCP three-way handshake

Three handshake

  1. When establishing a connection: the
    client sends a syn packet (seq = x, SYN = 1) to the server, and enters the SYN_SENT state , waiting for the server to confirm.

  2. The server receives the syn package: it
    must confirm the client's SYN, and at the same time send a SYN package, namely SYN+ACK package (seq = y, ack = x + 1, SYN = 1, ACK = 1), and the server enters SYN_RECV state ;

  3. The client receives the SYN+ACK packet
    from the server : sends an acknowledgment packet ACK (seq = x + 1, ack = y + 1, ACK = 1) to the server, this packet is sent, and the client and server enter ESTABLISHED (TCP connection is successful ) State , complete the three-way handshake.

Wave four times

Wave four times

  1. The client process sends a connection release message (seq = u, FIN = 1) and stops sending data.
    At this time, the client enters the FIN-WAIT-1 (termination waiting 1) state .
    TCP stipulates that even if the FIN segment does not carry data, it will consume a sequence number.

  2. The server receives the connection release message and sends an acknowledgment message (seq = v, ack = u + 1, ACK=1).
    At this time, the server enters the CLOSE-WAIT state .
    The TCP server notifies the high-level application process that the client is released in the direction of the server. At this time, it is in a half-closed state, that is, the client has no data to send, but if the server sends data, the client still has to accept it . This state will continue for a while, that is, the duration of the entire CLOSE-WAIT state.

  3. After the client receives the server's confirmation request, the client enters the FIN-WAIT-2 (termination waiting 2) state and
    waits for the server to send a connection release message (before that, it needs to accept the last data sent by the server).

  4. After the server sends the final data, it sends a connection release message to the client (seq = v + 1, ack = u + 1, FIN=1, ACK = 1). Since the server is in a semi-closed state, the server is likely to be Some data is sent, assuming that the serial number at this time is seq=w, at this time, the server enters the LAST-ACK (last confirmation) state , waiting for the client's confirmation.

  5. After the client receives the connection release message from the server, it must send an acknowledgment (seq = u + 1, ack=w+1, ACK=1) and enter the TIME-WAIT (time waiting) state . Note that the TCP connection has not been released at this time. It must pass 2*MSL (the longest message segment life) time, and the client can enter the CLOSED state after canceling the corresponding message.

  6. As long as the server receives the confirmation from the client, it immediately enters the CLOSED state . Similarly, after the TCB is withdrawn, the TCP connection is ended. As you can see, the server ends the TCP connection earlier than the client.

Introduction to TCP Fast Open

image.png

  • The first time the
    user sends a SYN packet to the server and requests a TFO cookie; the
    server generates a cookie according to the user’s IP encryption, and sends it to the user along with the SYN-ACK The
    user stores the TFO cookie

  • Third: The
    user sends a SYN packet (carrying a TCP Cookie) to the Server with a request;
    Server checks the Cookie (decrypts the Cookie and compares the IP address or re-encrypts the IP address to compare with the received Cookie).
    If the verification is successful, send SYN+ACK to the user, before the user reply ACK, can transmit data to the user; (1RTT)
    If the verification fails, discard the data carried in the TFO request, reply SYN-ACK to confirm the SYN Seq, and the completion is normal Three handshake.

  • Two major benefits of TFO:
    Improve network utilization (application data can be transmitted during the three-way handshake, 1RTT) and
    improve network security (can prevent SYN flooding attacks).

Silly Window syndrome

Window-based flow control schemes, such as those used by TCP, can lead to a situation known as "Silly Window Syndrome (SWS)". If this happens, a small amount of data will be exchanged over the connection instead of a full-length message.
This phenomenon can occur at either end: the receiver can advertise a small window (rather than waiting until there is a large window), and the sender can also send a small amount of data (rather than waiting for other data) In order to send a large segment). Measures can be taken at either end to avoid the phenomenon of confused window syndrome.

This problem can be attributed to the problem of small packets, which is due to the inconsistency of the processing on the sending end and the receiving end, resulting in many small packets on the network. Measures to avoid too many small packets on the network have been introduced before, such as the Nagle algorithm. Under the sliding window mechanism, if the rates of the sender and receiver are very inconsistent, this kind of silly state will also occur: the data sent by the sender only needs a large header and carries very little data.
For the receiving end, if the receiving end is very slow, receiving 1 byte or several bytes at a time, the buffer of the receiving end will be filled up quickly at this time, and then the window is announced as 0 bytes, and the sending end will stop at this time After sending, the application receives 1 byte and sends out the window notification as 1 byte. After the sender receives the notification, it sends 1 byte of data. In this way, the transmission efficiency will be very low.
At the same time, if the sender program sends one byte at a time, although the window is large enough, the transmission is still byte by byte, which is very inefficient.

Reference:
Speed ​​Reading Original-TCP/IP (Confused Window Syndrome)
Detailed TCP-IP: Confused Window Syndrome

Q&A

1. Why is there a three-way handshake when connecting, but a four-way handshake when closing?

Because when the Server side receives the SYN connection request message from the Client side, it can send a SYN+ACK message directly.
However, when the connection is closed, when the server receives a FIN message, it may not immediately close the SOCKET, so it can only reply with an ACK message and tell the client, "I received the FIN message you sent." Only after all the messages on my Server side have been sent can I send FIN messages, so I cannot send them together. Therefore, a four-step handshake is required.

2. Why does the TIME_WAIT state need to pass 2MSL (maximum segment survival time) before returning to the CLOSE state?

Although it is reasonable to say that the four packets have been sent, we can directly enter the CLOSE state, but we must assume that the network is unreliable, and the last ACK may be lost. So the TIME_WAIT state is used to retransmit ACK packets that may be lost.
The client sends the final ACK reply, but the ACK may be lost. **If the server does not receive an ACK, it will continue to send FIN fragments repeatedly. **So the Client cannot be closed immediately, it must confirm that the Server has received the ACK.
The Client will enter the TIME_WAIT state after sending the ACK. Client will set a timer to wait for 2MSL time. If the FIN is received again within this time, the Client will resend the ACK and wait for 2MSL again.
If the Client does not receive the FIN again until 2MSL, the Client concludes that the ACK has been successfully received and ends the TCP connection.
The so-called 2MSL is twice the MSL (Maximum Segment Lifetime). MSL refers to the maximum survival time of a segment in the network, and 2MSL is the maximum time required for a transmission and a reply.

3. What if the connection has been established, but the client suddenly fails?

TCP has a keep-alive timer. Obviously, if the client fails, the server cannot wait forever, and resources are wasted. The server will reset this timer every time it receives a request from the client. The time is usually set to 2 hours. If it has not received any data from the client for two hours, the server will send a probe segment, and then every 75 Sent every second. If there is still no response after sending 10 probe packets, the server considers the client to be faulty and then closes the connection.

4. Why can't I connect with two handshake?

First of all, we need to know that the channel is unreliable, but we need to establish a reliable connection to send reliable data, that is, data transmission needs to be reliable. At this time, the three-way handshake is a theoretical minimum, not to say that it is required by the tcp protocol, but to meet the requirements for transmitting reliable data on unreliable channels.
It is mentioned in the book "Computer Network" that the purpose of the three-way handshake is "to prevent the invalid connection request segment from being transmitted to the server suddenly, which causes an error .
This situation is: client A The first connection request message sent was not lost, but was stuck on a certain network node due to some unknown reason, which caused the delay to reach the other end (server) B until a certain time after the connection was released. Originally, this was an invalid message segment, but after B received the invalid message, it mistakenly thought it was a new connection request sent by A again, so the B side sent a confirmation message to A again, expressing agreement Establish a connection. If the "three-way handshake" is not used, then as long as the B side sends an acknowledgment message, it will think that the new connection has been established, but the A side has not sent a connection establishment request, so it will not send data to the B side. If the end does not receive the data, it will wait forever, so that end B will waste a lot of resources.
If the "three-way handshake" is used, this will not happen. After the end B receives an outdated and invalid segment, it will The A side sends a confirmation. At this time, A does not request to establish a connection, so it will not send a confirmation to the B side. At this time, the B side can also know that the connection is not established.

Reference:
TCP's three-way handshake and four waved hands understanding and interview questions

Guess you like

Origin blog.csdn.net/u014099894/article/details/112340850