[TCP protocol] message format, mechanism for reliable data transmission (1)

Hello, everyone~ I am your old friend: Protect Xiao Zhou ღ  


This issue brings you the concept of TCP transmission control protocol for network programming. First, it will explain the message format of the TCP protocol . After learning the message format, you will learn two TCP mechanisms to ensure reliable data transmission, confirm the response, Timeout retransmission, which is also a core mechanism in TCP, and the receiving buffer can sort and deduplicate data.


This issue is included in the blogger's column : JavaEE_Protect Xiao Zhouღ's Blog-CSDN Blog

Suitable for programming beginners, interested friends can subscribe to view other "JavaEE Basics".
Stay tuned for more highlights: Protect Xiaozhou ღ *★,°*:.☆( ̄▽ ̄)/$:*.°★*'


 1. Introduction to network programming

The real network communication is realized based on the TCP/IP five-layer network model, protocol layering, top-down, application layer -> transport layer -> network layer -> data link layer -> physical layer, protocol layered management, It is stipulated that the lower-layer protocol provides services (interface-API) for the upper-layer protocol. The upper-layer protocol can directly call the lower-layer protocol, but cross-layer calls are not allowed. Except for the application layer, the rest of the layers are encapsulated by the operating system, and provide a transport layer interface (API) - Socket, then the application layer is developed based on Socket, which is what we often call network programming.

Socket——socket, which mainly provides protocol interfaces of two sets of transport layers.

1. Development of Socket interface based on UDP transport protocol

2. Development of Socket interface based on TCP transport protocol 


Briefly describe the main functions of the protocols at each level:

Application layer: The main thing is that the developed program needs to have a function and mode of networking and network interaction. This point is designed and developed by the programmer according to the actual needs of the business (choose which transport layer protocol to use).

Transport layer: mainly responsible for the authenticity and validity of data during transmission. For example, if it is detected that the data is not transmitted successfully, then send another copy to ensure that the data reaches the destination from the source. The transport layer provides services to the application layer.

Network layer: mainly responsible for data transmission, using the IP address to find the location of the destination host in the network, routing and forwarding, and providing a reliable transmission path.

Data link layer: mainly for the exchange of data between two adjacent nodes, for example, the data in the network card is handed over to the physical layer for conversion and identification.

Physical layer: hardware devices that transmit binary data in various forms, high and low level electrical signals - network cables, optical signals of optical fibers, and broadcasting, etc.


2. Introduction to TCP transport protocol

TCP (Transmission Control Protocol) is a connection-oriented and reliable transmission protocol. It belongs to the transport layer in the Internet protocol model and is also the most widely used protocol in daily development. The TCP protocol can guarantee reliable data transmission.

This blog will revolve around how the TCP protocol ensures reliable data transmission.

Features of the TCP protocol :

  1. Connection-oriented: before transmitting data, the two communicating parties first establish a reliable connection
  2. Reliable transmission:  The TCP protocol has a series of measures such as a timeout retransmission mechanism and a confirmation response mechanism. When it is found that there is no problem with the connection, but the data is not successfully transmitted, the data will be resent.
  3. Oriented to byte stream: Literally, use byte stream transmission.
  4. Buffer: During network data communication, the buffer used to store the data to be sent or received. With the concept of buffer, it can help coordinate the data transmission speed between the sender and the receiver, and minimize the Risk of data loss or confusion...
  5. Unlimited size of transferred data: byte stream, streaming, provided there is a connection.
  6. Full-duplex communication: Both communication parties can exchange information at the same time, such as: telephone communication

 2. 1 TCP protocol message format

When we learn a protocol, it is best to start analyzing the datagrams of the protocol.

Structural diagram of a TCP message:

Carefully observe the TCP data packet structure diagram, we will find that the entire TCP fixed header occupies 20 bytes, 160 binary bits, but according to the structure diagram we drew, the sum of all nodes is only 158 binary bits.

In this line of data offset, 4 + 4 + 6 + 16 = 30 bits, in fact, two binary bits are " wasted ".

TCP packet analysis:

Source port and destination port: the two ports each occupy 16 binary bits.  The port can be regarded as the identity of the application . Every program that enters the network will be randomly bound to a port number when it is started, or it can be manually assigned to the program. A port number, but note that the port number cannot be repeated in the same host. Another point is that the port number is less than or equal to 1024, which is called a "well-known port number" and is used by some well-known servers, such as: http 80. Use the port number to identify which application is sending the message and which application is receiving the message.

Serial number and confirmation number: Each occupies 32 binary bits  for reliable data transmission. Each byte of the transmitted byte stream will be numbered in sequence to ensure that the order of data transmission is the same as the order of reception. The sequence number represents the sequence number of the currently sent data packet, and the confirmation number represents the sequence number of the next data packet expected to be received.

Sender: 1. Good morning buddy, 2. Are you interested in going out on weekends?

Recipient: 2. Are you interested in going out on weekends? 1. Good morning buddy.

This situation is obviously unreliable. The significance of the serial number and confirmation number is to ensure that the order of data transmission is the same as the order of reception.               

Data offset: It occupies 4 binary bits  to determine the length of the TCP message header, and tells the application program at the receiving end where to start the data part after removing the TCP header message.

Flag bits: including ACK, SYN, FIN, RST, PSH, and URG each occupying a binary bit.

If the flag position is 1, it means true.

ACK is used to confirm receipt of data packets;

SYN judges whether the communication parties have established a connection;

FIN is used to close the connection;

RST is used to reset the connection;

PSH is used to prompt the receiver to push the data to the application immediately instead of entering the receive buffer;

URG is used to indicate that there is urgent data in the TCP packet .

Window size : used for flow control, whether the checksum value is correct, and whether the data is affected during transmission (for example: electromagnetic signal interference indicates the maximum number of bytes of unconfirmed data that the sender can send, this field It can be used for TCP flow control, and the sliding window will be described in detail below.

TCP checksum : used to detect data deviation caused by TCP header and data. Ensure the correctness of the data.

Urgent pointer: used only when the URG flag is set to 1, used to indicate the end position of urgent data.

Reserved: The reserved fields in the TCP header are mainly used for future use, compatibility with old versions, and error prevention, and provide scalability and flexibility for the TCP protocol.


3. TCP mechanism to ensure reliable data transmission 

3.1 Acknowledgment response

Confirmation response: In the TCP protocol, every time data is sent, the receiver will send a confirmation response to tell the sender that the data has been received. If the sender does not receive the confirmation response "delayed", it will consider that the data has not been sent successfully, and it will The data will be retransmitted.

This is also the core mechanism of the TCP protocol to achieve reliable transmission.

The ACK (acknowledge) flag is used to confirm the receipt of the data packet, and will give an ACK feedback to the communication parties. Using this feedback, we can judge that the receiving end has successfully received the information by the sending end. At the same time, this also means that the transmission "efficiency" will be reduced. After hard work, you have to wait for the other party's feedback. If there is no feedback for a long time, you will try to resend it~

TCP acknowledgment response (ACK) refers to a mechanism in which the receiver sends a TCP segment with an acknowledgment flag bit to the initiator as a response after successfully receiving the TCP data packet sent by the initiator.


In network communication, we often encounter the situation that information is "sent first". This situation existed in QQ chat in the past two years. I have also encountered this situation. What does it mean? When I send a large message to my friend  Stack messages. In theory, the order of messages received by the other party should be the same as the order in which I sent messages. In fact, it is not the case. Sometimes the messages do not come in the order we expected, which can easily cause some misunderstandings. For example :

 There are many possibilities for why messages arrive later than others in network communication, and the following are several possibilities:

  1. Network congestion: When the amount of data in the network is too large, it will cause network congestion, causing some messages to be delayed or lost, while other messages are transmitted and processed by the receiver before it. In this case, even if the message sent earlier than the message sent later, the message sent later may arrive at the receiver before the message sent earlier.

  2. Transmission errors: Transmission errors may occur during network transmission, such as packet damage, loss, and so on. If an error occurs in a data packet, the sender needs to resend it via the feedback of the acknowledgment message, which may cause some messages sent later to reach the receiver sooner than messages sent earlier.

  3. Queuing delay: When multiple messages arrive at a receiver at the same time, the receiver may put them in a queue for processing. If a message sent earlier is queued ahead of a message sent later, but due to other factors (such as message size, processing time, etc.) A message arrives at the receiver before the message that was sent earlier.


In order to solve the situation of "messages sent last come first", the messages can be numbered, a "serial number" is assigned to the sent message, and a "confirmation serial number" is given in the response message.

TCP protocol data transmission introduces sequence numbers and confirmation sequence numbers.

The TCP protocol will number each byte of data from front to back, which is called "serial number" , because TCP is oriented to byte stream transmission, there is no such thing as one message or two messages.

 The sender's TCP packet acknowledgment sequence number doesn't mean much:

The rule for confirming the sequence number: the sequence number of the next byte according to the sequence number of the last byte of the data sent by the sender.

According to the meaning of serial number 1001:

1. If the acknowledgment number of the response message is < 1001, the information disclosed by the receiver to the sender is that I have received

2. The receiver next wants to ask the sender for the byte stream starting from 1001.

Using this mechanism, we can guarantee the order of data transmission, and the data will not be immediately processed by the application when it arrives at the receiver. As mentioned above, TCP has a concept of a buffer, and the buffer can sort the data according to the "serial number " , Even if the data is affected during the transmission and the order of data transmission is changed, there is still a buffer to cover the bottom~ Ensure that the application reads the data, and the order in which the data is read must be the same as the order in which it was sent.

If the transmission goes well, the receiver will send back an acknowledgment ( a "TCP header" where ACK = 1 ).


3.2 Timeout retransmission

In the process of data transmission, it will inevitably cause "packet loss" problem (missing data). Fortunately, the TCP protocol has a confirmation response mechanism. If there is no feedback response (ACK) for that part of the data, the sender will regard it as just If the part of the data packet is lost, it will resend it again. If there is no response from the receiver after resending several times, it will consider the problem of connection and try to re-establish the connection, etc., as described below.

Timeout retransmission: In the TCP protocol, if the sender does not receive an acknowledgment response, it will perform timeout retransmission. The sender will set a timeout period, and if no confirmation is received within this period, the data will be resent.

The sender's judgment on packet loss is that within a certain period of time, no acknowledgment response ( TCP segment with acknowledgment flag (ACK) ) has been received - ACK = 0

There are three situations at this time:
1. The data is directly lost during transmission, and the receiver does not receive the data, so it will not feed back the TCP segment with the acknowledgment flag (ACK).

2. After receiving the data, the receiver feeds back a TCP segment with an acknowledgment flag (ACK) (TCP header information contains ACK), and then the segment is lost.

3. When the sender does not receive feedback for a certain period of time, it will trigger a timeout retransmission, and then the retransmitted data will be lost

First case:

 Second case:

The third case:

In the case of continuous loss of multiple data packets " no feedback from the receiver ", there is a high probability that there is a very serious problem in the network environment. The TCP protocol deals with this situation where multiple data packets are sent continuously and then lost again . However , every time a data packet is lost, the waiting time for timeout will be longer (the frequency of retransmission is reduced), and if the retransmission is timed out for multiple consecutive transmissions, the receiver’s feedback cannot be obtained (ACK-TCP header information contains ACK ), try to re-establish the connection between the communication parties. At this time, if the re-establishment of the connection fails, the TCP protocol will give up this network communication.


Four. Summary

TCP transmission control protocol is the most widely used protocol in Internet communication. Using TCP protocol to transmit data, the data is safe and reliable, with high authenticity and fast transmission speed.

The core of learning a protocol is to be familiar with its message format.

Acknowledgment response:

When the receiver receives the sender's data, it will give the sender a field with a TCP header, where ACK = 1, indicating that I have received the data. 

If the sender does not wait for the acknowledgment from the receiver within a certain period of time, it will retransmit the "data packet". If no feedback is received during the period, the sender will reduce the frequency of retransmissions, exceeding a certain number of times. , TCP will try to re-establish the connection for the communication, if the connection fails, it will terminate this network communication.

In order to solve the problem that the data sent by TCP is different from the data received, TCP encodes each byte for byte stream transmission. The serial number field describes the serial number of the current "data packet", and the confirmation serial number of the sender has no actual Meaning, the sequence number of the expected next "data packet" described by the acknowledgment sequence number fed back by the receiver , so that the sender knows that the other party needs "that segment" of data next time, and then it is the role of the receiving buffer , and the transmitted data is available. Numbering, the data can be sorted by using the serial number of the "data packet", so that the order of the data is guaranteed.

In the event that the receiver receives data successfully, but the feedback ACK packet is lost, the sender retransmits after a timeout, and the receiving buffer will deduplicate the data to ensure that the data obtained by the application is unique.


Ok, here we go, [TCP protocol] message format in network programming, the mechanism of reliable data transmission (1) bloggers have finished sharing, today is mainly to learn the message structure of TCP, and two kinds of guarantee data reliable The transmission mechanism is also a relatively core mechanism. I hope it will be helpful to everyone. If there is anything wrong, please criticize and correct~

 

Next issue preview: [TCP protocol] "Three handshakes, four waves" of connection management

Thank you to everyone who read this article, and more exciting events are coming: Protect Xiaozhou ღ *★,°*:.☆( ̄▽ ̄)/$:*.°★* 

Met you, all the stars are falling on my head...

Guess you like

Origin blog.csdn.net/weixin_67603503/article/details/130307122