Linux [Network Basics] UDP&TCP Protocol

foreword

The transport layer is one of the key layers in the entire network architecture and is mainly responsible for data transmission between processes in two hosts. Common protocols in the transport layer are TCP (Transmission Control Protocol) and UDP (User Datagram Protocol).

1. UDP protocol

1. UDP protocol format
insert image description here
16-bit source port : which port the data is sent from, that is, which process the data is sent from.
16-bit destination port : which port the data wants to go to, that is, which process the data wants to go to.
16-bit UDP length : Indicates the maximum length of the entire datagram (UDP header + UDP data).
16-bit UDP checksum : check whether the data is distorted during the transmission process. The data needs to pass through many link devices during the transmission process. If a certain byte is damaged during the forwarding process, it is equivalent to the entire data distortion If the checksum of the UDP receiver is wrong, the data will be discarded directly, and the sender will not be notified. If the checksum is wrong, it will be discarded directly

2. The characteristics of UDP
The characteristics are: connectionless, unreliable, and datagram-oriented.
Connectionless
means that data can be sent without establishing a connection. You only need to know the port number and ip address of the opposite end to transmit directly. Just like sending a courier, you only need to know the address of the person to send it there, and you don’t need to establish contact with others in advance.
Unreliable
means that the data cannot be guaranteed to reach the peer end in a safe and orderly manner. Because UDP does not have a retransmission mechanism and confirmation mechanism, there is no guarantee that the data can reach the peer end, and packet loss may occur on the way. Even if the packet is lost, it will not report an error and retransmit. There is no serial number in the UDP header, and it is connectionless, so it cannot guarantee the order of the data after it arrives, and we need to manage the packet order at the application layer.
Datagram-oriented
means that the number of reads and writes and the length of data cannot be flexibly controlled, and it is a transmission method that limits the size of the transmitted data. And the data transmission of UDP is a whole piece, and the data will not be split and merged.
The datagram length field is only 16 bits, and the header occupies 8 bytes, so the length of the datagram cannot be greater than 65536.
After UDP adds a header to the message passed down from the application layer, it will be directly transferred to the network layer. Therefore, for larger data, we need to carry out packetization and packet sequence management at the application layer for multiple transmissions.
UDP defines the data length in the header, so the entire message is sent and received during transmission.
Therefore, the buffer must be large enough when receiving. If the buffer is larger or smaller than the size of a piece of data, the reception will fail, because UDP will not deliver half or more pieces of data.
For example, use UDP to transmit 100 bytes of data :
if the sender calls sendto once to send 100 bytes, then the receiver must also call the corresponding recvfrom once to receive 100 bytes; instead of calling recvfrom 10 times in a loop, each Receive 10 bytes at a time

3. UDP buffer
UDP socket can read and write, called full-duplex.
UDP does not have a real sending buffer. Calling sendto will be directly handed over to the kernel, and the kernel will pass the data to the network layer protocol for subsequent transmission actions.
Sending buffer : After the application layer data is marked with a UDP header, it is directly submitted to the network layer.
UDP has a receiving buffer. But this receiving buffer cannot guarantee that the order of received UDP messages is consistent with the order of sending UDP messages; if the buffer is full, the arriving UDP data will be discarded.
Receive buffer : After removing the UDP header, the data is delivered to the application layer.
The UDP protocol does not guarantee the orderly arrival of data.

4. UDP-based application layer protocol
NFS: Network File System
TFTP: Simple File Transfer Protocol
DHCP: Dynamic Host Configuration Protocol
BOOTP: Boot Protocol (for diskless device startup)
DNS: Domain Name Resolution Protocol

5. How UDP achieves reliable transmission
If only relying on UDP itself, it is impossible to achieve reliable transmission, because it cannot guarantee the order and arrival of data, so you can refer to the reliability mechanism of TCP and introduce similar logic for it in the application layer

  • Introduce serial numbers to ensure data order
  • Confirm the response mechanism to ensure that the peer can receive the data
  • Introduce a timeout retransmission mechanism to ensure that data will not be lost

Two, TCP protocol

(1) TCP protocol format

insert image description here
Source/destination port number : Indicates which process the data comes from and which process it goes to.
32-bit serial number/32-bit confirmation number : Details will be given later.
4-bit TCP header length : Indicates how many 32-bit bits (how many 4 bytes) the TCP header has; so the maximum length of the TCP header is 15 * 4 = 60.
6 flags :
URG: Whether the emergency pointer is valid
ACK: Whether the confirmation number is valid
PSH: Prompt the receiving end application to read the data from the TCP buffer immediately
RST: The other party requests to re-establish the connection; we call the one carrying the RST flag a reset Segment
SYN: request to establish a connection; we call the SYN flag as a synchronous packet
FIN: notify the other party that the local end is going to close, we call the FIN flag as the end message segment
16-bit window size : specify sliding The size of the window, that is, the size of the data that can be received from the position indicated by the acknowledgment sequence number of the TCP header. TCP does not allow sending data exceeding the size shown here. Used to implement the sliding window mechanism for flow control
16-bit checksum : padding at the sending end, CRC checksum. If the checksum at the receiving end fails, it is considered that there is a problem with the data. The checksum here includes not only the TCP header, but also the TCP data part.
16-bit urgent pointer : Identify which part of the data is urgent data (out-of-band data). This data has a higher priority and will be transmitted in advance with a
40-byte header option : The option field is used to improve the transmission performance of TCP, mainly for negotiation and describe some information. Because the TCP header size is up to 60 bytes, and there must be 20 bytes in front, the size of the option can be 0-40 bytes
The options are as shown in the figure :
insert image description here

(2) Confirmation response mechanism

TCP sets a sequence number and an acknowledgment sequence number for each sent data in the header.
Through the serial number and the confirmation serial number, the sender tells the receiver the serial number of the data I sent and the length of the data. And the receiver replies to the sender, what data have I received, and where should you start sending next time.

seq: The starting sequence number of this piece of data. It is the ack of the last piece of data.
ack: The acknowledgment number of the data sent by the other party, telling the other party that all the data before this position has been received. The confirmation sequence number is the starting sequence number of this piece of data plus the data length. That is, seq + len in the previous article.
len: The length of this piece of data.
As shown in the figure below: insert image description here
But if some of the previous data is lost during the transmission process, even if the 1025-2048 and 2049-3072 here have arrived, the confirmation reply of these data still cannot be confirmed, because the confirmation reply must ensure the confirmation sequence number before the ack All data has to arrive. The purpose of this is to prevent retransmissions due to loss of acknowledgment replies. As shown below:

insert image description here
Even if the previous data arrives late or is retransmitted due to packet loss or delay, after the retransmitted data is received, the previously sent data will be confirmed one by one, and sorted in the receiving buffer by sequence number.
Each ACK has a corresponding confirmation sequence number: it means telling the sender what data I have received; where do you start sending next time.
Why does the TCP protocol need a sequence number:
to ensure that TCP messages arrive in order.
Why does the TCP protocol have two serial numbers, one confirmation serial number and one serial number :
because TCP is full-duplex communication, the serial numbers of the two parties communicating at the same time are different, and the confirmation serial number is the confirmation response of the communicating party.

(2) Timeout retransmission mechanism

After host A sends data to B, the data may not reach host B due to reasons such as network congestion.
If host A does not receive an acknowledgment from B within a specific time interval, it will retransmit. It is also possible that host A has not received the confirmation response from B, or it may be because the ACK is lost. As shown in the figure below:
insert image description here
insert image description here
Host B will receive a lot of duplicate data, so the TCP protocol needs to be able to identify those packets that are duplicate packets and discard the duplicate ones. At this time, we can use the serial number mentioned above to easily achieve the effect of deduplication.
The timeout time should also be reasonable :
ideally, find a minimum time to ensure that the confirmation response can be returned within this time.
However, the length of this time varies with different network environments.
If the timeout is set too long, it will affect the overall retransmission efficiency.
If the timeout is set too short, repeated packets may be sent frequently.

However, in order to ensure relatively high-performance communication in any environment, TCP will dynamically calculate the maximum timeout period. In
Linux (BSD Unix and Windows are also the same), the timeout is controlled with a unit of 500ms, and each time the timeout is determined to be repeated. The timeout time of sending
is an integer multiple of 500ms.
If you still don’t get a response after resending once, wait for 2500ms before retransmitting.
If you still don’t get a response, wait for 4500ms for retransmission. And so on, increasing exponentially
When a certain number of retransmissions has been accumulated, TCP considers that the network or the peer host is abnormal, and forcibly closes the connection

(3) Link management mechanism

The connection of the TCP protocol is connection-oriented, reliable transmission, and byte-stream-oriented, and the reason why TCP can maintain reliable transmission is because the three-way handshake is established and the four-way handshake is disconnected.
(1), TCP three-way handshake
1. First, understand the three-way handshake process through the data packet name and the connection status of the two parties :
insert image description here
after the client sends a SYN data packet, the state of the client changes to SYN_SENT. When the server receives the client sent After receiving the SYN data packet, the state of the server changes to SYN_RECV. When the client receives the SYN data packet and ACK data packet from the server, the state of the client becomes ESTABLISHED. When the server receives the ACK data packet sent by the client, The state of the server changes to ESTABLISHED. At this time, the client and the server have completed the three-way handshake, that is, a two-way connection has been established. The client
state transition :
[CLOSED -> SYN_SENT] The client calls connect and sends a synchronization segment.
[SYN_SENT -> ESTABLISHED] If the connect call is successful, it will enter the ESTABLISHED state and start reading and writing data.
Service state transition :
[CLOSED -> LISTEN] The server enters the LISTEN state after calling listen, waiting for the client to connect. Start listening (the connection will be processed only in the listening state)
[LISTEN -> SYN_RCVD] Once the connection request (synchronization segment) is monitored, the connection will be put into the kernel waiting queue, and a SYN confirmation message will be sent to the client .
2. Perceptual understanding :
the first handshake: client: are you there, this is C, did you hear me?
The second handshake: Server: I heard you, did you hear me?
The third handshake: Client: I heard you too, and we can hear each other, so let's start communicating.
3. The question adds
the function of the three-way handshake:
the first handshake: the client synchronizes its sending sequence number seq and the receiving window win (and others), and at the same time tests whether its sending ability is normal. The second handshake
: the server sends the confirmation number ack , which proves that the client sends normally, and at the same time synchronizes its sending information seq and win, and also tests whether its sending ability is normal. The third handshake:
the client sends the confirmation number ack, which proves that the server sends normally, the sending information is synchronized, and the reliability test Finish.
Whether the two handshakes are possible.
The first two handshakes need to send synchronization information. It is absolutely necessary. The only possibility is the third time. However, from the above analysis, if the third handshake is missing, the client knows its sending ability. It is normal, and the sending information is also completed synchronously, but the server cannot determine whether its sending ability is normal, nor does it know whether its own sending information is completed synchronously. If you want to establish a connection, you must ensure that both parties have the ability to send and receive data and are currently online , so the third handshake is indispensable.
And if there are two handshakes, there will be two consequences:
1. If the client sends multiple requests continuously, the server will establish multiple connections, which is a serious waste of resources. The SYN flood attack is that the client maliciously does not send the third handshake packet. After the OS sends the second handshake packet, it will allocate some resources to the half link and put the half link into the half link queue. Too many links will cause the server semi-link queue to be full, and normal user requests cannot be responded to. At the same time, too many semi-links occupy too many resources and affect server performance.
2. If the client disconnects after initiating a SYN request, or it is sent due to a long delay, and the client has disconnected when the server receives it, then the connection fails, but the server still creates a SYN request. Pointless sockets, a serious waste of resources.
Is the four-way handshake possible?
Maybe we will think, what should we do if the third handshake is lost? Is it necessary to confirm the third handshake for the third time? First of all, the third handshake may indeed be lost. Confirmation, the fourth time needs to be confirmed by the fifth time. It can be found that the effect of the fourth time or more times is the same. Such a protocol is impossible and has no efficiency, so TCP adopts a timeout retransmission strategy to ensure transmission Reliable, the server will start a timer after sending the second one. If it does not receive the third ack after the timeout, it will resend SYN+ACK. If it fails three times, it means that the link has failed. Moreover, the three-way handshake is only to verify full-duplex communication, and does not guarantee that the link must be established .
It is not necessary at all. The SYN for establishing the connection and the ACK message for confirming the reply can be sent together. There is no need to separate them to increase the operation. Why does
TCP need a sequence number?
In essence, in order to maintain reliable transmission, the client maintains a set of serial numbers, and the server also maintains a
set of serial numbers client–>server: what consumes (seq) is the serial number maintained by the client, and the server tells the client that it has received the data When, it is the serial number of the confirmation (ACK) client

  • server–>client: What is consumed (seq) is the sequence number maintained by the server. When the client tells the server that it has received the data, it is the sequence number of the acknowledgment (ACK) server.
    What should the server do when the TCP three-way handshake fails?
    1. If the server does not receive the SYN, do nothing (because the connection is not established at all, the situation may be that the SYN is lost)
    2. After the server sends the SYN and ACK, but does not receive the ACK from the client, at this time It means that the client may not be online. At this time, an RST is sent to reset the connection and release the existing resources.

(2), TCP waved four times
1. Analyze the data packet name and the status of both parties. The process of four waved is shown in the figure below:
insert image description here
The first wave (the client waves to the server): the client sends a FIN to the server Request to disconnect, client is in FIN_WAIT_1 state. Sending FIN means that the sender will no longer send data, but it does not mean that it will not receive data.

The second wave (the server waves to the client): the server replies an ACK to the client to confirm receipt of the FIN. And turn to the CLOSE_WAIT state (at this time, the server no longer accepts data (closes the read operation), but will continue to send, so it will wait for the upper program to process at this time)

The third wave (the server waves to the client): At this time, the server has also completed the write operation, so it sends a FIN request to the server to disconnect, and enters the LASK_ACK state.

The fourth wave (the client waves to the server): the client receives the FIN from the server, and replies an ACK to the server to indicate that it has been received. At the same time, the client enters the TIME_WAIT state. After receiving the ACK, the server disconnects and enters the CLOSED state, and the client has to wait for 2MSL to also enter the CLOSED state.
Reason : When the active disconnection party is in the TIME_WAIT state, if the ACK packet sent by the active disconnection party is lost, after the MSL, the passive connection party will resend the FIN data packet, so that the active disconnection party will resend the ACK data packet , if the status changes to CLOSED after actively disconnecting the MSL of the connected party, the resent FIN data packet cannot be received, and the ACK data packet cannot be resent. 2MSL = MSL of lost ACK + MSL of retransmitted FIN.
2. Perceptual understanding
The first wave: C: I have finished what I want to say.
Second wave: S: I heard everything you want to say, but I haven't finished my words yet.
The third wave: S: I have finished what I want to say.
Fourth wave: C: Now that we're all done, let's end the call.
3. Supplement to the question
Why do you need to wave your hand four times to disconnect
? 1. The TCP link is full-duplex communication. When disconnecting, both parties need to determine two issues, whether they still have data to send, and the four waved hands are synchronized on both sides. These two questions
wave for the first time, c tells s that all his data has been sent, c can recycle the sending buffer, and s can recycle the receiving buffer for the
second time, s tells c that he has received the closing message
and waves thirdly , s tells c that all its data has been sent, s can recycle its own sending buffer, and c can recycle its receiving buffer. The
fourth wave: c tells s that he has received the shutdown message
It can be seen from the above that the four waved hands are indispensable. If one of them is missing, one party cannot reliably notify the other party that its data has been sent. Therefore, the link cannot be disconnected reliably, and TCP becomes an unreliable protocol.
2. Sending a FIN packet can only mean that the active closing party will no longer send data, but it does not mean that it will no longer receive data, so the passive closing party may continue to send data after replying ACK, and wait until all the data of the passive closing party has been sent FIN will be sent, and the active close party will reply ACK to disconnect after receiving it. This is why ACK can be sent with SYN when the connection is established, but FIN cannot be sent with ACK when the connection is disconnected.
The role of TIME_WAIT
In the TIME_WAIT state, the active closing party does not directly call close to release resources, but waits until the passive closing party receives the ACK confirmation to call close, and then calls the active closing party.
If there is no TIME_WAIT, the active closing party directly sends the ACK and then disconnects, there will be the following two situations.

If the passive closing party does not close to release resources, the newly enabled client may have the same address information as the original client. The newly enabled new client successfully binds the address information, but at this time it receives the FIN retransmitted by the passive closing party, which affects the new connection.

The ACK sent by the active closing party is lost. If the passive closing party does not receive the ACK, it will be stuck in the LACK_ACK state, so after the timeout, the passive closing party will retransmit a FIN. If the newly enabled client sends a SYN connection to the server, but At this time, the server is in the LACK_ACK state. At this time, what he needs is ACK instead of SYN, so he will send a RST to reset the connection.

Therefore, in order to avoid the above situations, TIME_WAIT is added to wait for a period of time to ensure that the passive closing party receives the ACK. Even if it does not receive it, there is enough time for the passive closing party to retransmit the FIN.

What is the reason for a large number of TIME_WAIT on a host? How to deal with it?
The TIME_WAIT state appears on the active closing side. If there is a large amount of TIME_WAIT, it means that a large number of connections are being actively closed, which may be malicious attacks or crawlers. The processing method is to adjust the waiting time of TIME_WAIT or enable address reuse (running a new socket uses the bound address port, because the socket in TIME_WAIT cannot be bound, and the original one can be directly replaced after enabling it).

What is the reason for a large number of CLOSE_WAIT on a host? How to deal with it?
The CLOSE_WAIT state is entered after the passive closing party receives the FIN from the active closing party. At this time, the active closing party has no longer sent data, so the reading end of the passive closing party has been closed at this time, but the passive closing party still needs to wait until All of its data will not end until it is sent, so at this time, the passive closing party will wait for the upper-layer application to process, and the FIN will not be sent until the processing is completed. And if there is a large amount of CLOSE_WAIT, it means that the upper layer application processing of the passive closing party has a problem, and the socket is not closed correctly to release resources, so FIN will not be sent at this time, resulting in being stuck in CLOSE_WAIT.

(4) Avoid packet loss retransmission mechanism

(1) Sliding window

The preface introduces that
under the confirmation response mechanism and the timeout retransmission mechanism, TCP can already ensure that the data arrives at the opposite end in a reliable and orderly manner, but if only these two mechanisms are used alone, it will cause the sender of tcp to fail before releasing the next data. Wait for the last data confirmation response, so the overall sending efficiency will be relatively low.
1. The sliding window mechanism allows the sender to send multiple packets to the network for transmission at one time, but after the sender receives the confirmation response of the earliest packet, the window can move backwards, essentially updating the subscript, including the next Packets that can be sent
2. The sliding window mechanism improves the throughput of the sender and the receiver, and improves the efficiency of data transmission between the two parties.
1. Understand that the sliding window
TCP uses a segment as a unit, and each segment is sent for a confirmation response. A disadvantage of such a transmission method is that the longer the round-trip time, the lower the communication performance.
For each data segment sent, an ACK must be given to confirm the response. After receiving the ACK, the next data segment is sent. This has a relatively big disadvantage, that is, the performance is poor. In particular, the
insert image description here
round-trip time of data is longer, as shown in the figure below: Since the performance of this way of sending and receiving is low, then we can send multiple pieces of data concurrently at one time. Greatly improve performance (in fact, the waiting time of multiple segments is overlapped together). To solve this problem, TCP introduces the concept of window. It controls degradation in network performance even with high round-trip times. As shown in the figure, when the acknowledgment response is no longer in each segment, but in a larger unit, the forwarding time will be greatly shortened. That is to say, the sending host does not have to wait for the confirmation response after sending a segment, but continues to send.
insert image description here
The window size refers to the maximum value that can continue to send data without waiting for an acknowledgment response. In the figure, the window size is 4 segments, that is, 4000 bytes.
When sending the first four segments, there is no need to wait for any ACK, just send directly.
After receiving the first ACK, the sliding window moves backwards and continues to send the data of the fifth segment; and so on.
In order to maintain this sliding window, the operating system kernel needs to create a sending buffer to record which data is currently unanswered; only the data that has been confirmed and answered can be deleted from the buffer.
The larger the window, the higher the throughput of the network.

In the case of receiving an acknowledgment response, slide the window to the position of the sequence number in the acknowledgment response, so that multiple segments can be sent sequentially at the same time to improve communication performance. This mechanism is called the sliding window mechanism. As shown in the figure:
insert image description here
insert image description here
Sliding window Supplementary question
1 The sliding window size
When tcp double-sends data, the window size maintained by the tcp sender is a set of packets that can be thrown into the network at one time. The number of packets and the window size of the sender are dynamic Varies, depending on the window size advertised by the receiver.
The small size of the tcp receive buffer window does not mean that the limit is large. When the application layer calls recv to obtain data from the receive buffer, it will cause the tcp receive buffer to keep receiving the data sent by the sender, resulting in the size of the receive buffer always changing. Small.
Conclusion:
The receiver informs the sender of its receiving capability through the window size, and the sender dynamically adjusts the amount of data it sends according to the receiving capability notified by the receiver when sending data.
2 The data packet has been transmitted to the other party, but the ACK data packet returned by the other party is lost.
First, let's consider the situation that the acknowledgment response cannot be returned. In this case, the data has already arrived at the peer end, and there is no need to resend it. However, when window control is not used, data that does not receive an acknowledgment will be resent. With window control, some acknowledgment responses do not need to be retransmitted even if they are lost, because it does not matter if some ACKs are lost, and can be confirmed by subsequent ACKs.
insert image description here
3. The transmitted data packet is directly lost, or a certain message segment is lost.
When the receiving end does not receive the data with the expected sequence number, it will confirm the previously received data. Once the sending end receives a confirmation response and receives the same confirmation response three times in a row, it considers that the data segment has been lost and needs to be resent. This mechanism can provide a faster retransmission service than the timeout mechanism.
For example, when a certain segment is lost, the sender will always receive ACKs like 1001, as if reminding the sender that "I want the data starting from 1001".
If the sending host receives the same "1001" response for three consecutive times, it will resend the corresponding data 1001 - 2000. At this time, after receiving 1001 at the receiving end, the ACK returned again is 7001 (because 2001 - 7000). The receiving end has actually received it before, and it is placed in the receiving buffer of the operating system kernel at the receiving end.
insert image description here
Fast retransmission: When the network data packet sent by the sender is lost and the sender has not triggered the timeout retransmission mechanism, the receiver quickly confirms the starting sequence number of the lost packet and informs the sender that the packet is lost, then The sender does not need to wait for the timeout before retransmitting. This mechanism is called "high-speed retransmission control".

(2) Flow control

The speed at which the receiving end can process data is limited. If the sending end sends too fast, the buffer at the receiving end will be filled. At this time, if the sending end continues to send, it will cause packet loss, which in turn will cause packet loss and retransmission, etc. A series of chain reactions. Therefore, TCP supports determining the sending speed of the sending end according to the processing capability of the receiving end. This mechanism is called flow control.
If the receiving end discards the data that should have been received, the retransmission mechanism will be triggered again, resulting in unnecessary waste of network traffic. Its specific
operation : the receiving end host notifies the sending end host of the size of the data that it can receive, so the sending end Data not exceeding this limit will be sent. This size limit is called the window size. The value of the window size introduced above is determined by the receiving host.
In the TCP header, there is a dedicated field to notify the window size. The receiving host puts the size of the buffer that it can receive into this field to notify the sending end. The larger the value of this field, the higher the throughput of the network. However, once the buffer at the receiving end faces data overflow, the value of the window size will also be set to a smaller value to notify the sending end, thereby controlling the amount of data sent. That is to say, the sending host will control the amount of sent data according to the instructions of the receiving host. This also forms a complete TCP flow control.

a . The receiving end puts the buffer size it can receive into the "window size" field in the TCP header, and notifies the sending end through the ACK end.
b . The larger the window size field, the higher the throughput of the network.
c . Once the receiving end finds that its buffer is almost full, it will set the window size to a smaller value and notify the sending end.
When to reply to the sender to send data to the receiver
1. The sender actively sends a window detection packet to the receiver
to inquire about the receiving capability of the receiver. The data size of the window detection packet is fixed at 1 byte
2. The receiver actively sends a message to the sender Send window update notification
e . After the sender receives this window, it will slow down its sending speed
f . If the receiver buffer is full, it will set the window to 0; at this time, the sender will no longer send data, but It is necessary to send a window detection data segment periodically, so that the receiving end can tell the sending end the window size.

How does the receiving end tell the sending end the window size? In the TCP header, there is a 16-bit window field, which stores the window size information; but the maximum 16-bit number represents 65535, so is the maximum TCP window 65535 bytes? In fact
, The TCP header 40-byte option also includes a window expansion factor M, and the actual window size is the value of the window field shifted left by M bits.
Is it necessary to send data according to the size of the sliding window at the beginning?
No, it is also necessary to consider whether the forwarding capability of the network can satisfy the data of the size of the one-time transmission window.

(3) Congestion control

Background:
With TCP window control, even if the sending and receiving hosts no longer send confirmation responses in units of one data segment, they can still send a large number of data packets continuously. However, sending large amounts of data at the beginning of the communication can also cause other problems.
Generally speaking, computer networks are in a shared environment. Therefore, it is also possible that the network may be congested due to communication between other hosts. When the network is congested, if a large amount of data is sent suddenly, it is very likely that the entire network will be paralyzed.
In order to prevent this problem, TCP will control the amount of data sent through a value obtained by an algorithm called slow start at the beginning of communication.
Understanding Congestion Control
Although TCP has the big killer of sliding windows, it can efficiently and reliably send a large amount of data. However, if a large amount of data is sent at the initial stage, it may still cause problems.
Because there are a lot of computers on the network, the current network status may already be congested. If you don’t know the current network status, sending a large amount of data hastily is likely to make things worse.
Therefore, TCP introduces a slow-start mechanism, first sends a small amount of data, explores the path, finds out the current network congestion state, and then decides how fast to transmit data.

Specific process:
A concept called congestion window is introduced here. When the transmission starts, the size of the congestion window is defined as 1 data segment (1MSS).
insert image description here
Each time an ACK is received, the congestion window is incremented by 1.
Every time a packet is sent, compare the congestion window with the size of the sliding window fed back by the receiving host, and take the smaller value as the actual sending window.
Conclusion
1. When the network is transmitted, the data is transmitted in the "shared network". In other words, the network is used by multiple networks sending and receiving. Simultaneously forwarding multiple personal data.
2. The router/switch in the network link has an upper limit on the forwarding capability.
3. The transmission medium (optical fiber/twisted pair) in the network link is limited by the transmission limit.
When both TCP parties send data, in order not to cause frequent network retransmissions, TCP dual-transmission also considers the network forwarding capability when sending data.
Congestion control mechanism: the essence is to control the amount of data sent by the sender.
The final amount of data sent = min (the size of the sliding window of the sender (depending on the receiving ability of the receiver)
, the size of the congestion window in the congestion control mechanism (depending on the network Congestion level))
Observation
Congestion window growth rate like above, is exponential. "Slow start" just refers to the initial slow, but the growth rate is very fast. In order not to grow so fast, the congestion window cannot simply be doubled.
A threshold called slow start is introduced here.
When the congestion window exceeds this threshold, it no longer grows exponentially, but grows linearly.
When TCP starts to start, the slow start threshold is equal to the window maximum.
At each timeout retransmission, the slow start threshold will become half of the original value, and the congestion window will be set back to 1. As shown in the picture:
insert image description here
A small amount of packet loss, we just trigger timeout retransmission; a large amount of packet loss, we think the network is congested, when the TCP communication starts, the network throughput will gradually increase, and as the network congestion occurs, the throughput will drop immediately.
Summary
1. After the early TCP discovers network congestion, it adjusts the congestion window to 1 (that is, one packet), and re-increases the congestion window size from the slow start. When network congestion occurs, the slow start threshold is readjusted. Slow start threshold
(new) = congestion window size/2
when congestion occurs
Once the size of the window reaches the situation of network congestion, the size of the congestion window will directly change to the threshold of slow start, that is, 1 MSS, but this situation is actually unscientific, because a packet loss may be normal Yes, this normal situation may be due to a sudden disconnection of the network. When the network is disconnected suddenly, the size of the congested serial port will be set to 1, and then slow start, and then congestion control. For this situation, We need a fast recovery mechanism.

In order to achieve congestion control, TCP uses a slow start mechanism, a congestion avoidance mechanism, a fast retransmission mechanism, and a fast recovery mechanism.

(5) Rescue transmission performance mechanism

(1) Delayed response

Delayed response mechanism concept
The receiver receives the data, in order to notify the message sender of a larger window size, wait for a period of time, if within this period, the application layer receives the data from the (recv function) receiving buffer. Afterwards, when the receiver resumes confirmation, it can notify the sender of a larger window size.
If the host receiving data immediately replies with an acknowledgment every time, it may return a smaller window. That's because the data has just been received and the buffer is full. When a receiving end receives the notification of this small window, it will use it as the upper limit to send data, thereby reducing the utilization rate of the network (this is actually a problem unique to window control, and the technical term is called confused window syndrome). For this reason, a method is introduced, that is, a mechanism that does not return an acknowledgment immediately after receiving the data, but delays for a period of time.
For example
, suppose the receiving end buffer is 1M. 500K data is received at one time; if the response is immediate, the returned window is 500K;
but in fact, the processing speed of the processing end may be very fast, and 500K data will be consumed from the buffer within 10ms , changed back to 1M.
So in this case, the processing at the receiving end is far from reaching its limit, even if the window is enlarged, it can still handle it. If the receiving end waits for a while before responding, such as waiting for 200ms before responding, then the window size returned at this time is 1M.
All packets must be answered?
No, the larger the window, the greater the network throughput and the higher the transmission efficiency. Our goal is to improve the transmission efficiency as much as possible while ensuring that the network is not congested Quantity limit
: Reply every N packets.
Time Limit: Respond once when the maximum delay time is exceeded.

(2) Piggybacking

The concept of piggybacking acknowledgment:
It is applied to the scenario where both parties want to send data to the other party, and quickly interact, and put the ack in the PSH data packet and send it to the other party.
Based on the delayed response, we found that in many cases, the client server also "sends and receives" at the application layer. It means that the client says "hello" to the server, and the server will return a " hello”; Then at this time, the ACK can hitch a ride and send back to the client together with the server’s response of “hello”.

If an acknowledgment is returned immediately after receiving data, piggybacking cannot be achieved. Instead, pass the received data to the application to process and generate the return data, and then proceed to send the request, and must wait for the sending of the confirmation response. That is, piggybacking is not possible without enabling delayed acknowledgments. Delayed acknowledgment response is a better processing mechanism that can improve network utilization and reduce computer processing load.

(3) Fast retransmission

said before

(6) Keep alive mechanism

Heartbeat mechanism: to judge whether the two parties in the idle connection are in a normal state.
When the connection is idle, a keep-alive timer will be started to record the idle time of the connection. When the idle time of the connection exceeds 2 hours a day, the TCP protocol on the server will Actively send a keep-alive detection packet (heartbeat packet) to the client.
If there is no response to the detection packet for 10 times, it is considered that the peer end is in an abnormal state, and it will actively disconnect (if the server has not received the detection packet connection, It will be sent once every 75s, a total of ten times)
If there is a response, the connection is considered normal, and the keep-alive timer is restarted for timing

3. TCP is byte-oriented

TCP is oriented to the byte stream:
from the perspective of sending data : when the application at the application layer sends data, it calls the send function to submit the application layer data to the tcp protocol of the transport layer. After submitting to the tcp protocol, the data is temporarily saved in the send buffer. But don't think that when TCP sends data, it sends according to the rules of the application calling the send function.
The data sent by TCP must be smaller than MSS, and it has its own rules for sending data.
From the point of view of receiving : when the data arrives at the receiver's receiving buffer, the receiver's application can call the recv function to receive any byte

Create a TCP socket, that is, create a sending buffer and a receiving buffer in the kernel :
when calling write, the data will be written into the sending buffer first.
If the number of bytes sent is too long, it will be split into multiple TCP packets and sent.
If the number of bytes sent is too short, it will wait in the buffer until the length of the buffer is almost full, or send it at other suitable times.
When receiving data, the data also arrives at the receive buffer of the kernel from the network card driver.
The application can then call read to get data from the receive buffer.
To put it simply, a TCP connection has both a sending buffer and a receiving buffer, so for this connection, data can be read or written. This is called full-duplex.
The existence of the buffer allows TCP to read and write without matching one by one.
For example:
when writing 200 bytes of data, you can call write once to write 200 bytes, or you can call write 200 times, writing one word at a time Festival.
When reading 200 bytes of data, there is no need to consider how to read it. You can read 200 bytes at a time, or read one byte at a time and repeat 200 times.

4. TCP sticky packet problem

Byte stream-oriented services will lead to no obvious boundaries between data, which will lead to the problem of TCP sticky packets. To solve the problem of sticky packets, it is necessary to make the protocol application layer header + application layer data + spacer (\r\ n)
TCP sticky packet
Because the TCP protocol is byte stream-oriented, there is no obvious data boundary between the last and next data, that is, when TCP receives data through the recv function, it will not distinguish whether this data is sent for the first time or It is sent for the second time, but it is directly read according to the specified size, so it is possible to read incomplete data or data stuck together.
1. First of all, it must be clear that the "package" in the sticky packet problem refers to the data packet of the application layer.
2. In the protocol header of TCP, there is no field such as "packet length" like UDP, but there is a field such as sequence number.
3. From the perspective of the transport layer, TCP comes one by one. They are sorted according to the serial number and placed in the buffer.
4. From the perspective of the application layer, what you see is only a series of continuous byte data.
5. Then the application program sees such a series of byte data, and does not know from which part to which part, it is a complete application layer data packet.
insert image description here
As shown in the figure above, the client sends 6+7 and 7+8 to the server, but the server cannot distinguish the first and second data when receiving data, so the first and second data are pasted. Receiving together, that is, receiving 6+77+8, this situation is the problem of data sticking packets.

Solving the TCP sticky packet problem
is actually to clarify the boundary between two packets
1. Use fixed-length bytes to send: its essence is to set a fixed byte size for sending and receiving data. In reality, the data we send can be long or short. It is obviously unrealistic to set different sending and receiving lengths for each piece of data according to the length of each piece of data sent, so this method is only suitable for theoretical basis implementation.
2. In the application layer, "package" the data generated by the application program, that is, add header description information to the application data, and add a separator at the end of the application data.
Application data = "zxcvbnm"
Application data = head of application layer descriptor data + "zxcvbnm" + delimiter
insert image description here

For the UDP protocol, there is a "sticky packet problem"?
1. For UDP, if the data has not been delivered to the upper layer, the UDP packet length is still there. At the same time, UDP delivers data to the application layer one by one, so there is a clear data boundary." 2. From the perspective of the application
layer , when using UDP, either receive a complete UDP message, or not. There will be no "half" situation.

Guess you like

Origin blog.csdn.net/m0_59292239/article/details/132021661