Detailed explanation of TCP principles

1. TCP message protocol segment

The following is a relatively clear TCP protocol segment format (source network):

  1. Source Port and Destination Port: 16-bit fields indicating the port numbers of the sender's and receiver's applications or services.

  2. Sequence Number: A 32-bit field used to number each byte in the TCP data stream to ensure the orderly transmission of data.

  3. Acknowledgment Number: A 32-bit field indicating the number of the next byte expected to be received, used to confirm successfully received data.

  4. Data Offset: 4-bit field indicating the length of the TCP header in 4-byte units. In this field, the TCP header can be up to a maximum length of 60 bytes.

  5. Control bits (Flags): including URG (emergency pointer is valid), ACK (confirmation number is valid), PSH (push), RST (reset connection), SYN (establish connection), FIN (end connection) flag bits, used to control TCP Connection establishment, maintenance and closing operations.

  6. Window Size: 16-bit field indicating the buffer size of the receiver for flow control.

  7. Checksum: A 16-bit field used to detect errors or damage in TCP segments during transmission.

  8. Urgent Pointer: 16-bit field, only valid when the URG flag is set, indicating the end position of urgent data.

  9. Options: Optional fields, used for some optional function extensions, such as selection confirmation, timestamp, etc.

  10. Data: Optional field used to carry application layer data.

2. TCP principle

It can also be seen from the above protocol format that compared to UDP, TCP adds many fields, and most of these fields point to: security and efficiency. This is also the core feature of the TCP transport protocol:

  • The control mechanism provided by TCP for data transmission is mainly reflected in two aspects: security and efficiency.
  • These mechanisms are similar to the design principles of multi-threading: improve the transmission efficiency as much as possible while ensuring the safety of data transmission.

1. Confirmation response mechanism

For the TCP protocol, one of its major features is reliable transmission. The reliability here does not mean that it can be transmitted 100%, but that a response will be returned to the receiver. The confirmation response mechanism is the core mechanism to achieve TCP transmission reliability.

In network transmission, due to the ever-changing network environment, a " 后发先至" situation is likely to occur. The last-come-first-served situation usually refers to packets arriving out of order on the network. In this case, some data packets sent later may arrive at the receiver first during transmission, while data packets sent first may be delayed or have not yet arrived. for example:

As can be seen in the above example, "first-in, last-out" may lead to transmission errors, but we can solve this error by numbering the data. In real TCP data transmission, and are 序号introduced 确认序号.

TCP numbers each byte of data, that is, the sequence number:

Assume that the sender needs to transmit a 10000-byte data file and uses the TCP protocol for transmission. In order to send this data file, the sender will divide it into several segments. Each segment contains a part of the data and related control information (such as sequence number, confirmation number, etc.). Assuming that this data file is divided into 10 message segments, each segment is 1000 bytes in size, the sender will assign a unique sequence number to each segment to identify the first segment in the segment. The position of a byte in the entire data stream.

Response message and confirmation number:

In the TCP protocol, the ACK message is a confirmation message used to inform the sender that it has successfully received the corresponding data packet. When the receiver receives a data packet, it will return an ACK message to the sender, indicating that the data packet has been successfully received. The ACK message usually contains an acknowledgment number (Acknowledgement Number), which indicates the sequence number of the next data packet that the receiver expects.


Confirm the serial number rules:

1. The confirmation sequence number refers to the sequence number of the next byte after the last byte of the data sent. For example, the sent data is 1-1000, and the confirmation sequence number is 1001.
2. Confirmation sequence number 1001 indicates that data less than 1000 bytes has been received.
3. The confirmation sequence number 1001 indicates and expects to receive the sequence number of the next byte as 1001, that is, it wants to ask the sender for the data starting with 1001.

Back to the "first come first served" issue:

This happens when the receiver receives three data packets with sequence numbers 1-1000, 1001-2000, and 2001-3000, but due to network reasons, the order in which they arrive is 2001-3000, 1-1000, and 1001-2000. out of order. The receiver will first reorder these data packets (TCP has a receiving buffer and takes on the "sorting" task), and the correct order is 1-1000, 1001-2000, 2001-3000 (so the application layer reads data It must arrive in the same order as sent, in the correct order). Then the receiver will send an ACK sequence number to tell the sender that it has received these data packets and expects that the sequence number of the next piece of data received is 3001 (that is, the sequence number of the next data packet).

2. Timeout retransmission mechanism

During network transmission, due to network congestion, media failure and other reasons, the sent data packet may not arrive, which is what we often call packet loss.

For packet loss, there are two main scenarios:

1. The data packet sent is lost and the receiver cannot receive it, so no ACK will be returned. The sender cannot receive the ACK and considers the packet lost.
2. The receiver received the data packet, but the ACK packet was lost. The sender also failed to receive the ACK and considered the packet lost.

Scenario 1 : If host A does not receive a confirmation response from B within a specific time interval, it will resend.

Scenario 2 : During retransmission, Host B will receive a lot of duplicate data. Then the TCP protocol needs to be able to identify those packets as duplicate packets and discard the duplicates. Here we can easily achieve the effect of deduplication by using the serial number mentioned earlier.

Timeout retransmission time limit
In order to ensure high-performance communication in any environment, TCP will dynamically calculate the maximum timeout time:

  1. In Linux (the same is true for BSD Unix and Windows), the timeout is controlled in units of 500ms, and the timeout for each timeout retransmission is an integer multiple of 500ms.
  2. If you still get no response after retransmitting once, wait 2*500ms before retransmitting.
  3. If still no response is received, wait 4*500ms for retransmission. And so on, increasing exponentially.
  4. When a certain number of retransmissions is accumulated, TCP considers that there is an abnormality in the network or the peer host and forcibly closes the connection.

To sum up, for TCP timeout retransmission: retransmit if it can be retransmitted, close the connection if retransmission is not possible, and ensure transmission as much as possible!

3. Connection management mechanism

Tcp establishes a connection: three-way handshake
. Handshake refers to the network interaction between the communicating parties. The three-way handshake is equivalent to three interactions between the client and the server to establish a connection (each records the other party's information) relationship.

  1. Host A sends a SYN message to Host B, indicating a request to establish a connection.
  2. After receiving the SYN message, host B returns an ACK message to host A, which is used to inform host A that it has received the SYN message it sent. At the same time, B will also send a SYN message to A, hoping to establish a connection with host A.
  3. After host A receives B's SYN+ACK message, it will send an ACK message to B. At this point, the connection between host A and host B is successfully established and data transmission can be carried out.

Note : The above process is automatically completed in the system kernel, and the application cannot interfere. Waiting for the connection to be completed, accept will take the established connection from the kernel to the application.

Supplement : The SYN here refers to the synchronization message segment, which means that one party applies to the other party to establish a connection. The SYN here is actually the flag bit in the TCP message segment, which is initially 0. The ACK+SYN message segment here sets the ACK and SYN guarantee flags to 1.

The role of the three-way handshake : The three-way handshake is essentially a " 投石问路" process, which verifies whether the sending and receiving capabilities of the client and the server are normal. This is also the basis for subsequent reliable transmission!

Tcp disconnect: four waves

  1. Host A sends a FIN message to Host B, hoping to close the connection. The FIN flag is set to 1, indicating the end of the transmission.
  2. After receiving the FIN message, host B returns an ACK message to host A. Host B notifies host A that it has received the FIN message it sent. However, host B may still have untransmitted data at this time, so B must continue to send data until it is completed.
  3. After host B completes the data transmission, it sends a FIN message to host A, indicating that host B has completed all data transmission and is ready to close the connection. The FIN flag is set to 1.
  4. After host A receives the FIN message sent by B, it returns an ACK message to B to confirm that it has received B's FIN message. At this time, A notifies B that the connection can be closed.

Why can't ACK and FIN be combined here?

ACK and FIN are triggered at different times. ACK is completed by the kernel and will be returned as soon as FIN is received. FIN is controlled by the application code. FIN is triggered when the Socket's close method is called or the process ends.

Supplement : Assuming a client-server disconnection process, although the client process ends, the TCP connection is still there (kernel maintenance) until the four waves are completed, and the server is the same.

4. Sliding window

In the above "acknowledgment response mechanism", for each data segment sent, an ACK confirmation response is given, that is, a send-and-receive mechanism (left of the figure below). After receiving the ACK, the next data segment is sent. This has a major disadvantage, which is poor performance. Therefore, in order to improve efficiency, the "sliding window mechanism" can be used (right in the figure below), that is, multiple pieces of data are sent at one time and the waiting times of multiple segments are overlapped together. This can significantly improve efficiency compared with one send and one receive.

Principle of sliding window mechanism

  1. The window size refers to the maximum value that can continue to send data without waiting for an acknowledgment. The window size in the picture below is four segments. That is, when sending the first four segments, there is no need to wait for any ACK and is sent directly.
  2. After receiving the first ACK, the sliding window moves backward and continues to send the data of the fifth segment, and so on.
  3. In order to maintain this sliding window, the operating system kernel needs to open a send buffer to record which data is currently unanswered. Only data that has been confirmed can be deleted from the buffer.
  4. The larger the window, the higher the throughput rate of the network.


Packet loss processing scenario 1 under the sliding window mechanism : Data packet arrives and ACK is lost.


Different from the "one send and one receive" situation where ACK is lost, timeout retransmission is required. Only ACK is lost under the sliding window. In fact, it has no impact on reliability because it is sent in batches and overlaps in waiting for ACK. Due to the characteristics of the "confirmation sequence number", it means that the data before the sequence number has been received. If 1001ACK, 3001ACK, and 4001ACK are lost at this time, as long as a larger "confirmation sequence number" is received subsequently, for example, if a 5001ACK is subsequently received, it can be determined that the data before 5001 has been received, so even if part of the ACK is lost at this time, the overall reliability is It has no impact and can be confirmed by subsequent ACK.

Scenario 2 : Packet loss.

If a data packet is lost under the sliding window mechanism, it will be triggered as "high-speed retransmission control" (also called "fast retransmission")

  1. When a certain segment is lost, the sender will always receive the same ACK. For example, in the above scenario, ACKs such as 1001 will be received continuously, which is like reminding the sender "What I want is 1001";
  2. If the sending host receives the same ACK response three times in a row, it will resend the corresponding data. If the above sender host receives the same "1001" response three times in a row, it will resend the corresponding data 1001 - 2000;
  3. At this time, after the receiving end receives the retransmitted data, the ACK returned again is the latest confirmation sequence number. After receiving 1001 above, the ACK returned again is 7001 (because 2001 - 7000). The receiving end has actually received it before, and it was placed in the receiving buffer of the receiving operating system kernel.

5. Flow control

The speed at which the receiving end can process data is limited. If the sender sends too fast, causing the receiver's buffer to be full, and if the sender continues to send, it will cause 丢包a series of chain reactions such as packet loss and retransmission.

Therefore, TCP supports determining the sending speed of the sender based on the processing capabilities of the receiving end. This mechanism is called Flow Control;

In the returned ACK message, a "window size" field will take effect, and the value here is the window size recommended by the sender. (Calculating the window size can be understood as returning the remaining space of the receive buffer)

The size of the sending window = flow control + congestion control

6. Congestion control

Although TCP has the sliding window as a big killer, it can send large amounts of data efficiently and reliably, but if it is not restricted, problems will still occur. In the above-mentioned "flow control", the receiver reversely restricts the sending speed according to its own processing capabilities. In fact, there is another mechanism "congestion control" in actual transmission to measure the transmission capability of the transmission path.

Congestion control actually means understanding the current network congestion status and then deciding at what speed to transmit data.

  1. A concept process is introduced here, which is the congestion window. When sending starts, it is a slow start mechanism. First, a smaller window size is given, and a small amount of data is sent to explore the path.
  2. Each time a data packet is sent, the congestion window and the flow control window are compared, and the smaller value is taken as the actual sending window.
  3. It should be noted that "slow start" only means that it is slow at first, but the growth rate is very fast. The growth rate of congestion windows like the above is exponential. This allows the window size to reach a relatively large value in a short period of time, thereby quickly approaching the capacity bottleneck of the current network transmission path.
  4. In order to prevent the transmission speed from exceeding the upper limit due to excessive growth, a threshold called slow start is introduced, that is, when the exponential growth reaches a certain threshold, it becomes a smooth linear growth, which can make the transmission speed gradually approach the transmission upper limit.
  5. At this time, it continues to grow. After reaching a certain level, if packet loss occurs, it is considered that the current window size has reached the upper limit of the current transmission. At this time, the window is immediately returned to a relatively small initial value, and the above process is repeated.

7. Delayed response

The key factor that determines efficiency in Tcp is " 窗口大小". We know that for the sender and the receiver, the sender is constantly sending data, and the receiver is constantly consuming data from the receive buffer. If the receiver immediately returns ACK after receiving the data at this time, the ACK message will be The window size carried in is set to N. If we wait for "a short period of time" to let the receiver consume some data first and then return ACK, the window size carried at this time is N+, which is easy to know: N+ > N.

The above is the delayed response. The effect it achieves is to allow the receiving program to take the opportunity to consume more data through delay. At this time, the feedback window size will be larger, so that the sender can process it under the premise that the receiver can process it. The sending data rate is also faster.

So can all packets be delayed in response? Certainly not;

  1. Quantity limit: Answer every N packets.
  2. Time limit: Respond once if the maximum delay time is exceeded.

8. Piggybacking on responses

In many cases, the client server is also " 一发一收" at the application layer, which means that the client sends a request to the server, and the server will also return a response. Therefore, on the basis of "delayed response", ACK can take advantage of it. ACK is the responsibility of the kernel. Generally, it returns immediately after receiving the message, while the response is returned after a series of codes are executed. The two timings are originally different, but through the "delayed response" mechanism, it is very possible Causes the ACK and response to be combined into one datagram.

Every transmission of datagrams involves a series of complex processes such as encapsulation and separation. Two datagrams need to be encapsulated and separated once, and two datagrams need to be encapsulated and separated twice. It is obvious that this " " mechanism can be used 捎带应答through The method of merging reduces the number of package separations, thereby improving efficiency.

It is precisely because of this "piggyback response" mechanism that "four wave waves" may be completed in three times.

9. Oriented to byte streams (sticky packet problem)

In the Tcp protocol, a big feature is that it is oriented to byte streams, which makes Tcp reading and writing more flexible, but at the same time it also hides a murderous intention, that is " " 粘包问题.

Under the premise of using the Tcp protocol, assume that I send the following 4 pieces of data:

The imperialists invaded us,
enslaved us, and wanted to carve up
our land.

Imagine that from the perspective of the application layer, you will see the following information in the receive buffer:

The imperialists invaded us, enslaved us, and wanted to divide our land.

Wait, why are they sharing our sweet potatoes?

The above example shows a 粘包问题joke caused by " ". When A continuously sends multiple application layer datagrams to B, the data is accumulated into B's receiving buffer, and the data will be closely together. At this time, when B's application reads the data, It is difficult to distinguish from where to where a complete application layer datagram is, and it is possible to read half a packet, one and a half packets, etc.

So how to avoid it? In fact, in the previous chapter "Network Programming", when using the transport layer protocol Tcp for communication, we wrote a Tcp echo client-server. At that time, in order to distinguish application layer data packets, we made the following agreement:

1. Each request is a string
2. Use \n (line feed) to separate requests from each other.

In fact, the above is a simple custom application layer protocol. Generally speaking, we mainly deal with the problem of sticky packets as follows 两种方案:

  1. Define separator (as above)
  2. Agreed length

Trivia : UDP does not have the "sticky packet problem". The fundamental reason is that the UDP protocol is datagram-oriented and has clear boundaries between them.

10. Abnormal situations (machine power outage/network cable disconnection: heartbeat packet)

Process shutdown/process crash : The process terminates and the socket file is also closed. However, the operating system kernel still maintains the Tcp connection until the four waves are completed, so it is no different from normal shutdown.

Machine shutdown/machine restart : Shutdown or restart will kill all user processes first, and will also trigger four waves. The expected situation is that the four waves will be completed during this process. If the four waves have not been completed, such as the peer sending Come to FIN. The current machine shuts down before it has time to ACK. At this time, the peer will retransmit the FIN. After several retransmissions, it is found that there is no ACK, so it will try to reset the connection. If it still does not work, release the connection.

The machine is powered off/the network cable is disconnected : the connection is instantly closed, and there is no time to perform any waving operation. There are two situations at this time:

  1. Assuming that its counterpart is the sender, the sender will not receive the ACK. At this time, it will timeout and retransmit. If the retransmission fails, the connection will be reset. If the connection fails to be reset, the connection will finally be released.
  2. Assuming that its counterpart is the receiver, the receiver cannot immediately know whether the abnormal end has not come and sent new data or "hanged". TCP itself also has a built-in 保活定时器heartbeat packet. Although the peer is the receiver, the peer will regularly (periodically) send a heartbeat packet to the sender. If the sender of each heartbeat packet has a corresponding response, it means that the current peer is in good condition. Otherwise, if the heartbeat is gone, it is determined that the other end is "hanged".

For more details, please refer to the TCP RFC standard document

Guess you like

Origin blog.csdn.net/LEE180501/article/details/133342304