Detailed Explanation of TCP Protocol Features

TCP Protocol Features

The TCP protocol has the characteristics of connection, reliable transmission, byte stream-oriented, and full-duplex

TCP protocol segment format

insert image description here
TCP message = TCP header (header) + TCP payload

  • Source/destination port number: Indicates which process the data comes from and which process it goes to;
  • 32-bit serial number/32-bit confirmation number: for detailed distinction of multiple sets of data
  • 4-bit header length: describe the specific length of the TCP header (the length of the TCP header is variable, the length of the UDP header is not variable, fixed at 8 bytes)

Note : The unit of the length of the 4-bit header is not a byte, but 4 bytes, so the maximum length of the TCP header is 15 * 4 = 60 bytes

  • 6 bits reserved: for future expansion consideration

For network protocols, expansion and upgrade is a very expensive matter. When TCP introduces some new functions in the future, these reserved bit fields can be used

  • 6 flag bits
    • URG: Whether the urgent pointer is valid
    • ACK: Whether the confirmation number is valid
    • PSH: Prompt the receiving end application to read the data from the TCP buffer immediately
    • RST: The other party requests to re-establish the connection; we call the segment carrying the RST flag a reset segment
    • SYN: Request to establish a connection; we call the SYN identifier a synchronization segment
    • FIN: Notify the other party that the local end is about to be closed. We call the end segment carrying the FIN flag
  • 16-bit window size:
  • 16-bit checksum: the same function as UDP checksum: both verify whether the data transmission is correct, but cannot guarantee data security
  • 16-bit urgent pointer: identifies which part of the data is urgent data;
  • Option: The part before the option is a fixed length (20 bytes) (formula: option length = header length - 20 bytes)

If the header length value is 5, it means that the entire TCP header length is 20 bytes (5x4 bytes) (equivalent to no option)
If the header length value is 15, it means that the entire TCP header length is 60 bytes (15x4 bytes) (option Equivalent to 40 bytes)

TCP principle

Acknowledgment response (safety mechanism)

Example picture: insert image description here
A sends a message to B, and B will return an acknowledgment message (ACK) after receiving it. At this time, after A receives the acknowledgment, he knows that the data just sent has been successfully received by B.

Consider a more complex situation
example diagram: insert image description here
Due to the possibility of "last come, first come" on the network, the order of receiving messages may be variable, and it becomes "Go to eat?: Not good", "Go to study?: "Okay", which is inconsistent with the original meaning.

In order to solve the above-mentioned last-send-first-come problem, both the transmitted data and the response message are numbered.
Example picture: insert image description here
In this way, the serial number can be used to distinguish which data the current response message is for

Judging whether a message is a response message depends on the ACK flag in its header. If the ACK flag is 1, it is a response message, and if it is 0, it is not.

In fact, since TCP is byte-oriented, the sequence number of TCP is also according tobyteto number

insert image description here

The sequence numbers of the TCP bytes are accumulated sequentially, and the sequence number filled in the header of each TCP datagram only needs to write the sequence number of the first byte of the TCP data.
TCP knows the serial number of the first byte, and then according to the length of the TCP message, it is easy to know the serial number of each byte.

The value of the confirmation sequence number is the sequence number of the last byte of the received data + 1

For example: the meaning of confirmation sequence number 1001
:
1. All data <1001 have been confirmed received.
2. A should continue to send from the serial number 1001 (B asks A for the data of 1001)

Summary: TCP reliable transmission capability,mainis throughAcknowledgment response mechanismto guarantee, throughresponse messageACK), so that the sender can clearly know whether the transmission is successful, and further introduce theserial numberandConfirm serial number, for detailed differentiation of multiple sets of data

Timeout retransmission (safety mechanism)

When discussing the confirmation response above, we only discussed the situation of smooth transmission, and did not discuss the situation of transmission problems (such as packet loss).
The following are the two situations of packet loss

Case 1: Data Loss
insert image description here

  • After host A sends data to B, the data may not reach host B due to network congestion and other reasons;
  • If host A does not receive an acknowledgment from B within a certain time interval, it will retransmit

The second case: ACK loss
insert image description here
The second case retransmits data, which may cause host B to receive a lot of duplicate data. TCP will deduplicate and rearrange them.

TCP has a storage space such as a "receive buffer" (a segment of memory in the receiver's operating system kernel), and each TCP socket object has a receive buffer.
Host B receives the data from host A. In fact, the network card of B reads the data, and puts the data into the receiving buffer of the corresponding socket of B. The subsequent application program uses getInputStream, and further uses read to read from the receiving buffer data.
TCP uses this receive buffer, according to the dataserial numberIdentify whether there is data duplication, and if so, discard the subsequent data and reorder the received data.

summary: Due to the existence of the deduplication and reordering mechanism (both rely on the serial number of the TCP header), the sender will retransmit the data as long as it finds that the ACK does not arrive on time. deal with.

The reliable transmission of TCP is reflected by confirmation response + timeout retransmission.
Among them, the acknowledgment response describes the situation of smooth transmission, and
the timeout retransmission describes the situation of transmission problems.

Connection Management (Security Mechanism) (Interview Frequent Questions)

Under normal circumstances, TCP needs to go through three handshakes to establish a connection, and wave four times to disconnect

three handshake

insert image description here

Several states in the connection establishment phase:
1.LISTEN:
Listen to the connection establishment request of the other party, indicating that the server is ready, and the client can establish a connection at any time, which is equivalent to turning on the mobile phone, with a good signal, and can answer other people's calls at any time.

2. SYN_SEND: It belongs to the request connection, and the segment sent at this time cannot carry data .
3. SYN_RECEIVE: Received a connection establishment request from the other party.

4.ESTABLISHED:
The connection is established, and then normal communication can be carried out, which is equivalent to making a call and the other party is connected. When the client is in this state, the sent ACK segment can carry data,

The so-called three-way handshake is essentially a "four-way" interaction.
The two parties in the communication each initiate a "connection establishment" request to the other party, and at the same time, each responds to the other party with an ack. The two interactions in the middle can be combined into one interaction, thus forming a "three-way handshake".

Question 1: Why should the two interactions in the middle be merged?
Answer: It is related to package decommissioning. It costs more to decompose a package twice than to decompose a package once.

Question two:If it is a two-way handshake, can the connection be established?
Answer: No.
The first handshake: the client sends a network packet, and the server receives it.
The server can know that the sending capability of the client and the receiving capability of the server are normal.
The second handshake: the server sends a network packet, and the client receives it.
The client can know that the sending and receiving capabilities of the client are normal. The sending and receiving capabilities of the server are normal. But at this time, the server does not know whether its sending ability is normal.
The third handshake: the client sends the network packet again, and the server receives it.
The server knows that the sending and receiving capabilities of the client are normal. The sending and receiving capabilities of the server are normal.
Therefore, a three-way handshake is required to confirm whether the receiving and sending capabilities of both parties are normal.

The meaning of the three-way handshake:
1. Let the communication parties establish their "recognition" of each other.
2. Verify whether the sending and receiving capabilities of the communication parties are normal.
3. During the handshake process, the two parties need to negotiate some important parameters to complete data synchronization.
connection establishment phase

waved four times

The four-way handshake is very similar to the three-way handshake. Both parties in the communication initiate a disconnection request to each other, and then give each other a response.
insert image description here
Several states in the connection establishment phase :
1. FIN_WAIT_1: When the client actively calls close, it sends the end segment (FIN) to the server and enters FIN_WAIT_1 at the same time;

2.CLOSE_WAIT: (appears inpassiveThe party that initiated the disconnection)
When the client actively closes the connection (calling close), the server will receive the end segment (FIN), and the server will return the confirmation segment (ACK) and enter CLOSE_WAIT; 3. FIN_WAIT_2: The client

receives When the server confirms (ACK) the end segment, enter FIN_WAIT_2 and
start waiting for the end segment (FIN) from the server;
4. LAST_ACK: After entering CLOSE_WAIT, it means that the server is ready to close the connection (the previous data needs to be processed); When the server actually calls close to close the connection, it will send a FIN to the client. At this time, the server enters the LAST_ACK state and waits for the last ACK to arrive (this ACK is the client's confirmation that it has received the FIN)

5. TIME_WAIT: (Appears on the party that initiates the disconnection)
Assuming that the client actively disconnects, when the client enters the TIME_WAIT state, it is equivalent to waving four times and the client receives the message from the server End the segment (FIN), enter TIME_WAIT, and issue LAST_ACK;

[TIME_WAIT -> CLOSED] The client will wait for a 2MSL (maximum packet lifetime) time before entering the CLOSED state.

6. The server receives the ACK for FIN, completely closes the connection, and enters the CLOSED state

The meaning of TIME_WAIT: The four-way handshake is the same as the three-way handshake, and there will also be packet loss. When the server does not receive the last ACK, it will perform a retransmission operation. Therefore, the TIME_WAIT state is used for a certain period of time to deal with the last ACK packet loss. In this case, after receiving the FIN retransmitted by the server, the client can respond with an ACK for the retransmitted FIN.

Question 1: Why can the two j interactions in the middle of the three-way handshake be merged, but not the four handshakes?
Answer:

  • The three-way interaction process of the three-way handshake is completed in the pure kernel. After the server's system kernel receives the SYN, it will immediately send the ACK and the SYN at the same time, so they can be merged.
  • The initiation of FIN in the four-wave wave is not controlled by the kernel, but the FIN is triggered by the application calling the close method of the socket (or the process exits), and the ACK is controlled by the kernel. When the FIN sent by the sender is received After that, ACK will be returned immediately, and there is usually a time difference between the two, so they cannot be merged.

Question 2: Think about it, why is the time of TIME_WAIT 2MSL?
MSL is the maximum lifetime of a TCP message, so if TIME_WAIT persists for 2MSL,
it can ensure that the unreceived or late message segments in both transmission directions have disappeared (otherwise the server restarts immediately, and may receive messages from the upper Late data of a process, but this data is likely to be wrong);
at the same time, it is also theoretically guaranteed that the last message arrives reliably (assuming that the last ACK is lost, then the server will resend a FIN. At this time, although the client’s The process is gone, but the TCP connection is still there, and LAST_ACK can still be resent);

Question 3: There are a large number of CLOSE_WAIT states on the server, what is the reason?
Answer: The server did not close the socket correctly, resulting in the four hand waves not being completed correctly. This is a bug. Just add the corresponding close to solve the problem

Sliding window (efficiency mechanism)

The acknowledgment response strategy discussed above requires an ACK confirmation response for each sent data segment. After receiving the ACK, the next data segment is sent. Doing so makes performance worse.

At this time, the sliding window mechanism is used to improve performance .
Specific operations : send in batches, wait in batches, and use a waiting time to wait for multiple ACKs of a set of data, which essentially reduces the time consumed by waiting for acks for confirmation responses.
insert image description here

window size: The maximum value that can continue to send data without waiting for a confirmation response, the window size in the above figure is 4000;

  • When sending the first four segments, you don’t need to wait for any ACK, just send it directly;
  • After receiving the first ACK, the sliding window moves backwards and continues to send the data of the fifth segment; and so on;

The following is a discussion of the two cases of packet loss
Situation 1: The data packet has arrived, but the ACK is lost.
insert image description here
In this case, it doesn't matter if part of the ACK is lost, because it can be confirmed by the subsequent ACK;

Case 2: The data packet is lost directly.
insert image description here

  • When a segment of a message is lost, the sender will always receive ACK like 1001, which is to remind the sender that the data of 1001-2000 has not been received, and the corresponding data 1001-2000 should be resent.
  • At this time, after the receiving end receives 1001, the ACK returned again is 7001 (because 2001 - 7000). The receiving end has actually received it before and put it in the receiving buffer.

The above mechanism is also called "fast retransmission mechanism".

Flow Control (Security Mechanism)

Flow control is a mechanism to intervene in the sending window size

Question 1: Why control the window size?

  • The window is too large and consumes a lot of system resources
  • . The window is too large, and reliability cannot be guaranteed without waiting for ack at all
  • .To consider the receiver's processing capabilities

Question 2: In the TCP protocol segment format, does the 16-bit window size mean that the maximum window size is 64kb?
Answer: No, TCP introduces the window expansion factor M in the option part, so that the actual window size is the value of the window field shifted left by M.

The job of flow control is to coordinate the sending rate of the sender according to the processing power of the receiver (by checking the remaining size of the receiver's acceptance buffer).

The sending end sends data to the receiving end, and the receiving end checks the remaining size of its receiving buffer, and returns this value to the sending end through the ACK message, and the sending end will confirm the window size for the next round of sending according to this number.
When the window size is 0, the sender will suspend sending. During the waiting process of suspending sending, the receiving end will periodically send a window detection message. This message does not carry specific business data, just to trigger ack and query the window size .

Note: The window size is the concept of the "sender", which is told to the sender through the window size field in the receiver's ack header.

Congestion control (safety mechanism)

Flow control considers the processing capability of the receiver, and the congestion control described next is to consider the processing capability of the intermediate nodes during the transmission process. The two jointly determine the window size of the sender (the smaller value of the two)

Congestion control, in essence, is to gradually find an appropriate window size through experiments.

insert image description here

  • The window size is 1 unit at round 0, and the data is sent at a very slow speed. If the transmission is smooth, the window size will be doubled.
  • In the initial stage, since the initial window is relatively small, the window size will double (exponential growth) without packet loss in each round.
  • When the growth rate reaches the threshold, the exponential growth at this time is transformed into a linear growth (provided that there is no packet loss).
  • When packet loss occurs during transmission, it means that the sending rate is close to the limit of the network at this time, then shrink the window size to a small value (repeat the above process of exponential growth and linear growth) The congestion window is not a fixed
    value , but is always changing dynamically.

TCP is a very complex protocol, not only the features mentioned above! ! ! If you want to know more about TCP features, you can read the RFC standard document.

Guess you like

Origin blog.csdn.net/m0_63904107/article/details/130181386