[Network Programming·Transport Layer] Classic eight-part essay on UDP and TCP


Students who need cloud servers and other cloud products to learn Linux can move to / --> Tencent Cloud <-- / --> Alibaba Cloud <-- / --> Huawei Cloud <-- / official website, lightweight cloud servers are low-cost to 112 yuan/year, and new users can enjoy ultra-low discounts on their first order.


 Table of contents

1. Port number division

2. Some instructions

1. pidof (used to view process id)

2. netstat (check network status)

3. UDP protocol

1. UDP protocol format

2. How to encapsulate, unpack and split the UDP protocol

2.1 Encapsulation and unpacking

2.2 minutes for use

3. Characteristics of UDP protocol

3.1 Characteristics of UDP protocol

3.2 UDP protocol buffer

3.3 UDP protocol 16-bit UDP length

4. TCP protocol (Transmission Control Protocol)

1. TCP protocol format

2. Reliability of TCP protocol

2.1 Reflection of unreliability

2.2 How to ensure reliability

3. Header of TCP protocol

3.1 Encapsulation and unpacking (4-digit header length)

3.2 points (16-bit source port number, destination port number)

3.3 32-bit sequence number and confirmation sequence number of TCP protocol (this field is useful for sliding window and timeout retransmission deduplication)

3.4 16-bit window size of TCP protocol (used to control the speed of sending messages)

3.5 Six flag bits of TCP protocol (differentiating message types)

4. Confirmation response mechanism (ACK)

5. Timeout retransmission mechanism

6. Connection management mechanism

6.1 Three-way handshake (connection establishment is initiated by the client)

6.2 Wave four times (disconnection is a matter for both parties)

7. Flow control

8. Sliding window/fast retransmission mechanism

8.1 The nature of sliding windows

8.2 The role of sliding windows

8.3 Some questions and answers about sliding windows

9. Congestion control

10. Delayed response

11. Piggybacking on responses

12. Understand TCP’s byte stream orientation and UDP’s datagram orientation

13. Sticky bag problem

14. The problem of abnormal TCP connection establishment

5. Summary of UDP/TCP protocol

1. TCP reliability and performance

2. Applicable scenarios of UDP and TCP protocols

3. Regarding the second parameter of listen


1. Port number division

The port number is a 16-bit unsigned integer with a value range of 0-65535.

0 - 1023: Well-known port numbers, HTTP, FTP, SSH and other widely used application layer protocols, their port numbers are fixed.

The following are some conventional port numbers:

ssh server, using port 22

ftp server, using port 21

telnet server, using port 23

http server, using port 80

https server, use 443

1024 - 65535: Port number dynamically assigned by the operating system. The port number of the client program is assigned by the operating system from this range. 

2. Some instructions

1. pidof (used to view process id)

pidof httpServer | xargs kill -9//xargs用于将标准输入转换为命令行参数

2. netstat (check network status)

n 拒绝显示别名,能显示数字的全部转化成数字
l 仅列出有在 Listen (监听) 的服务状态
p 显示建立相关链接的程序名
t (tcp)仅显示tcp相关选项
u (udp)仅显示udp相关选项
a (all)显示所有选项,默认不显示LISTEN相关

3. UDP protocol

1. UDP protocol format

If the checksum is incorrect, the packet will be discarded.

The UDP header is essentially a structure object or a structure containing bit fields.

struct udphdr {
    uint16_t uh_sport;  /* 源端口号 */
    uint16_t uh_dport;  /* 目的端口号 */
    uint16_t uh_ulen;   /* UDP数据报长度(包括头部+数据) */
    uint16_t uh_sum;    /* 数据校验和 */
};

2. How to encapsulate, unpack and split the UDP protocol

2.1 Encapsulation and unpacking

When we customized the protocol before, we used special characters to distinguish headers and payloads.

The UDP protocol directly follows the fixed header length method and considers the first 8 bytes to be the header.

2.2 minutes for use

The UDP header contains a 16-bit destination port number. There is a process bound to this port number in the other party's application layer. The process bound to this port number in the upper layer can read it in the form of a file descriptor and hand it to Application layer specific protocols are processed.

3. Characteristics of UDP protocol

3.1 Characteristics of UDP protocol

The process of UDP transmission is similar to sending a letter. Just send the letter and it's done. It doesn't care whether the other party receives it or not.

No connection: Transmit directly without knowing the IP and port number of the opposite end, without establishing a connection;

Unreliable: There is no confirmation mechanism and no retransmission mechanism; if the segment cannot be sent to the other party due to network failure, resulting in packet loss, the UDP protocol will not return any error message to the application layer;

Datagram-oriented: It is not possible to flexibly control the number and quantity of reading and writing data. Sendto is as many times as recvfrom is as many times as possible. The application layer hands over the length of the message to UDP. UDP sends it as it is, neither splitting nor merging; using UDP to transmit 100 bytes of data: If the sender calls sendto once and sends 100 bytes, then The receiving end must also call the corresponding recvfrom once to receive 100 bytes; it cannot call recvfrom 10 times in a loop and receive 10 bytes each time.

A successful UDP read will read a message. We do not need to consider the header issue, we only need to do the serialization and deserialization of the payload.

3.2 UDP protocol buffer

Don't consider the problem of multiple UDP packets sticking together, because UDP does not have a real sending buffer. Calling sendto will be directly handed over to the kernel (the application layer sends one, the transport layer sends one away), and the kernel passes the data to the network layer. The protocol performs subsequent transmission actions;

UDP has a receive buffer. However, this receiving buffer cannot guarantee that the order of received UDP packets is consistent with the order of sent UDP packets; if the buffer is full, the arriving UDP data will be discarded; so the UDP protocol is a full-duplex protocol.

3.3 UDP protocol 16-bit UDP length

There is a 16-bit UDP length field in the UDP protocol header, which means that the maximum transmittable length of a UDP message is 2^16 or 64KB (including the 8 bytes of the UDP header). If you need to transmit data larger than 64KB, you need to apply The layers are subcontracted multiple times and assembled manually at the receiving end.

4. TCP protocol (Transmission Control Protocol)

1. TCP protocol format

The header is a structure object:

struct tcphdr 
{
    uint16_t th_sport;  /* 源端口号 */
    uint16_t th_dport;  /* 目的端口号 */
    uint32_t th_seq;    /* 序列号 */
    uint32_t th_ack;    /* 确认号 */
    uint8_t th_off;     /* 偏移量,指明TCP报文头的长度,单位是4个字节 */
    uint8_t th_flags;   /* 控制标志,如SYN、ACK、FIN等 */
    uint16_t th_win;    /* 接收窗口大小 */
    uint16_t th_sum;    /* 校验和 */
    uint16_t th_urp;    /* 紧急指针 */
};

2. Reliability of TCP protocol

2.1 Reflection of unreliability

The network transmission distance is very long and will pass through multiple device nodes on the road, so there may be problems such as packet loss, disorder, verification errors, and message duplication on the road.

2.2 How to ensure reliability

Only when a message receives a response from the other party can the reliability of the message be guaranteed. When the two parties communicate, there must be the latest data. For example, the message "It is 12 o'clock" in the figure has not yet received a response from the other party, so it is unreliable whether this message has been received.

The response of the message can be sent alone, or it can be combined with the data to be sent and sent in one message. For example, in the picture above, the subsequent server replies "I received the message that it is now 12 o'clock, so let's go at 13 o'clock." Let's play!" This is to stuff the response and the data to be transmitted into one message.

Whether the client sends a message to the server or the server sends a message to the client, every message must be responded to. Of course, there is no need to respond to pure response messages.

3. Header of TCP protocol

3.1 Encapsulation and unpacking (4-digit header length)

The first 20 bytes of the TCP protocol can be converted into a structured data, and the 4-bit header length in the standard header can be extracted. The total length of the TCP message = 4-bit header length * 4 bytes = [20 bytes (minimum 20 bytes), 60 bytes]. The total size of the TCP packet that can be calculated based on the header fields - the standard packet size, is the size of the remaining header. After all the message information is removed, the remainder is the payload.

3.2 points (16-bit source port number, destination port number)

The TCP header contains a 16-bit destination port number. When the server is started, the port number is bound, and the application layer process corresponding to the TCP message can be found. The data structure in which the operating system maintains the PCB and port number of the network process is a hash table.

3.3 32-bit sequence number and confirmation sequence number of TCP protocol (this field is useful for sliding window and timeout retransmission deduplication)

The client may send multiple messages in a short period of time. Note that the order in which these messages arrive at the server is uncertain. So after the client receives the response from the server, how can it distinguish which request I sent before this response corresponds to? This is because there are 32-bit sequence numbers and 32-bit confirmation sequence numbers in the TCP protocol.

Sequence Number is a field used to identify data packets in the TCP protocol. It represents the sequence number of the first byte in the byte stream of the data packet sent by the sender. The purpose of the sequence number is to ensure the order and integrity of the data packets. In the TCP protocol, each data packet has a unique sequence number.

The acknowledgment number (ACK Number) is a field in the TCP protocol used to confirm that the receiver has successfully received the data packet. It indicates the sequence number of the next data packet the receiver expects to receive. In the TCP protocol, the receiver needs to send an acknowledgment packet (ACK packet) to tell the sender that the data packet has been successfully received. The ACK Number field in the acknowledgment packet is the sequence number of the next data packet the receiver expects to receive.

The reason for using two sets of sequence numbers is that the TCP protocol is full-duplex, and both the client and the server may act as senders or receivers, so two sets of sequence numbers need to be used.

3.4 16-bit window size of TCP protocol (used to control the speed of sending messages)

When the TCP protocol sends data, if the sender sends it too fast, the receiver will not have time to read it. The message will fill up the other party's input buffer. If the other party's buffer is full, it will still be sent, and the newly sent message will be directly discarded. . Then you need to control the speed of sending messages.

There is a 16-bit window size in the TCP protocol, which is used to fill in the size of its own input buffer . When the message is transmitted to the other party, the other party will know how much space is left in its input buffer and can control the speed of sending data.

The existence of the 16-bit window size enables the communicating parties to exchange receiving capabilities and achieve flow control.

3.5 Six flag bits of TCP protocol (differentiating message types)

The server may face many different types of clients. There may be messages initiating connections and responses, etc. In order to distinguish different types of messages, the TCP header is set with various 01 flags.

  1. URG: Set, the emergency pointer field is valid
  2. ACK: Confirm that the segment has been received
  3. PSH: urge the receiver to let its upper layer take away the input buffer data (push) as soon as possible, even if the buffer is not full
  4. RST: reset, forcefully close an abnormal connection
  5. SYN: Requests to establish a connection
  6. FIN: Disconnect a connection

4. Confirmation response mechanism (ACK)

TCP numbers each byte of data. That is the serial number. Each ACK carries a corresponding confirmation sequence number, which means to tell the sender what data I have received and where to start sending it next time.

5. Timeout retransmission mechanism

A major unreliability of network transmission is the problem of packet loss. The solution of TCP protocol is to timeout and retransmit.

If host A sends data to B and packets are lost due to network problems, B will naturally not respond to A. If host A finds that it has not received a confirmation response for a period of time, it will resend.

If host A sends the message to host B and host B receives the message but the response is lost, host A will retransmit it after a period of time.

In this way, the receiver will receive two identical messages and receive duplicate data, which also reflects unreliability. The receiver needs to perform deduplication based on the 32-bit sequence number.

Retransmission time: Since data transmission is related to the network environment at that time, the retransmission time cannot be set to a fixed value. Instead, the operating system dynamically adjusts the retransmission time.

In order to ensure high-performance communication in any environment, TCP dynamically calculates this maximum timeout.

In Linux (the same is true for BSD Unix and Windows), the timeout is controlled in units of 500ms, and the timeout for each timeout retransmission is an integer multiple of 500ms.

If you still get no response after retransmitting once, wait 2*500ms before retransmitting.

If there is still no response, wait 4*500ms for retransmission. And so on, increasing exponentially.

When a certain number of retransmissions is accumulated, TCP considers that there is an abnormality in the network or the peer host and forcibly closes the connection.

6. Connection management mechanism

The three-way handshake is the basis for creating a TCP connection structure. The TCP connection structure is a data structure that maintains timeout retransmission, sequential arrival, flow control, connection status, etc. to ensure reliability.

The connection-oriented nature of TCP indirectly ensures the reliability of communication. At the same time, because the UDP protocol is connectionless, it does not maintain data structures such as timeout retransmission, sequential arrival, and flow control, so UDP is unreliable. 

  1. CLOSED: Initial state, indicating that the TCP connection has not been established or has been terminated.
  2. LISTEN: Indicates that the TCP port is waiting for a connection request.
  3. SYN-SENT: Indicates that the TCP connection request has been sent to the remote host and is waiting for response confirmation.
  4. SYN-RECEIVED: Indicates that the TCP connection request has been accepted by the server and a reply has been sent to confirm the connection request.
  5. ESTABLISHED: Indicates that the TCP connection has been successfully established and data transmission can begin.
  6. FIN-WAIT-1: Indicates that the TCP connection has been closed, but the local endpoint can still send data and wait for confirmation from the remote endpoint.
  7. CLOSE-WAIT: Indicates that the TCP connection has received the closing request from the peer, the local port can still send data, and then the connection is closed.
  8. FIN-WAIT-2: Indicates that the TCP connection is ready to be closed, but the remote endpoint still has data to transmit and is waiting for a close request from the remote endpoint.
  9. LAST-ACK: Indicates that the TCP connection has issued a closing request and received confirmation at both the local port and the remote port, and the final confirmation is sent to complete the connection closing.
  10. TIME-WAIT: Indicates that the TCP connection has been closed normally and is waiting for all related network messages to be cleared. It usually takes a fixed time interval before entering the CLOSED state.

6.1 Three-way handshake (connection establishment is initiated by the client)

SYN_SENT is a state in the TCP/IP protocol. It represents a TCP client that has sent a connection request (SYN) and is waiting for the other party to return a connection confirmation (ACK). During the TCP three-way handshake connection process, when the client sends a SYN packet, the client's TCP status will change to SYN_SENT. In this state, the TCP client will continue to try to send SYN packets to the server until it receives an ACK packet returned by the server to establish a connection.

SYN_RECV, which is a state in the TCP/IP protocol. During the establishment of a TCP connection, after receiving the SYN packet from the client, the server sends an ACK and SYN packet as a reply, indicating that it has successfully received the client's connection request and initiates a connection confirmation request to the client. At this time, the server's TCP status becomes SYN_RECV. In this state, the server will wait for the client's ACK packet to complete the TCP three-way handshake connection process. Don’t worry if you don’t receive an ACK packet from the client within a period of time. The server will think that its ACK+SYN packet has been lost and will initiate a timeout retransmission.

The three-way handshake is considered complete when the client sends an ACK during the third handshake. The server considers the three-way handshake complete only when it receives the ACK response from the client. There is a difference in the completion time of the three-way handshake between the client and the server. The three-way handshake does not necessarily have to be successful, especially since the client does not know whether the last ACK of the client is lost, but we have timeout retransmission and other mechanisms to solve this problem.

Why is it necessary to shake hands three times?

Because TCP is at the transport layer and is managed by the operating system, maintaining TCP connections has time and space costs.

If the number of handshakes is only once and the client only sends SYN once to successfully establish the connection, then some malicious elements can use the client to connect like crazy to initiate a connection. There is a cost for the server to maintain the connection, which will cause the number of available connections on the server to increase. Less (SYN flood).

If the number of handshakes is two. It is the same as the one-time handshake. After the client sends SYN and the server returns SYN+ACK, the server considers that the connection is successfully established. It will also cause SYN flooding.

Why use a three-way handshake? First of all, TCP is a full-duplex communication protocol, and the three-way handshake is the minimum cost to verify whether the full-duplex communication is normal. Secondly, in order to prevent SYN flooding, the last time the client initiates the connection in the three-way handshake, the last time the client sends ACK, it means the connection is successful. If the client wants to establish a connection on the server, it must establish the connection on the client. If the client wants to launch a SYN flood attack, it will also suffer the same consumption. The three-way handshake can effectively avoid SYN attacks on a single host. (However, it should be noted that preventing SYN flood attacks is not solved by the TCP protocol. The main task of the TCP protocol is network communication!)

If the number of handshakes is four, it will not work. After the server sends an ACK response for the last time, it will establish a connection here. As long as the client finds a way not to receive this connection, the connection relationship will not be established on the client. There is Risk of SYN flood attack.

If the number of handshakes is an odd number greater than three, three handshakes can already meet the requirements. There is no need to increase the number of handshakes to waste the time of both parties.

6.2 Wave four times (disconnection is a matter for both parties)

Disconnection is a matter for both parties and requires the consent of both parties.

During the four wave phases, both the client and the server can actively initiate a disconnection.

When one party wants to disconnect, the other party may also want to disconnect, which may cause four waves to become three waves.

The final state of the party that actively disconnects is TIME_WAIT, and the party that passively disconnects will enter the CLOSE_WAIT state after waving twice. If a large number of CLOSE_WAIT states appear on the disconnected party, either the server is under too much pressure and has no time to execute close (the server still has data that has not been pushed yet), or your close has simply been forgotten.

After four waves are completed, the party that actively disconnects will maintain TIME_WAIT for a period of time.

How long to maintain: This time is twice the MSL (MSL: the longest survival time of a TCP segment in the network. After this time, the segment will be discarded. The default value of MSL is 2 minutes (CentOS7 is 1 minute ).)

Why it needs to be maintained for a period of time: 1. Because the initiator cannot confirm whether the other party has received the last ACK response of the four waves, and needs to wait for a period of time. If the passive party resends the FIN message of the third wave, then the active party Then you can resend the ACK response message for the fourth wave, trying to ensure the success of the fourth wave. 2. When both parties disconnect, there are still packets remaining in the network, and they need to wait for all relevant network packets to be cleared. (Otherwise, if the server restarts immediately, it may receive late data from the previous process, but this data is likely to be wrong);

If the server is shut down first and the server is restarted within a period of time, it will no longer be able to bind the last bound port. You must wait until the last server status changes to CLOSED before that port can be reused for connections. For a server with a large number of visits, if it hangs, a large number of ports cannot be bound again for a period of time, so this is unreasonable. Using setsockopt() allows the creation of socket descriptors with the same port number but different IP addresses.

GETSOCKOPT(2)         
#include <sys/types.h>          /* See NOTES */
#include <sys/socket.h>
int getsockopt(int sockfd, int level, int optname,
              void *optval, socklen_t *optlen);
int setsockopt(int sockfd, int level, int optname,
              const void *optval, socklen_t optlen);
int opt=1;
setsockopt(listenfd,SOL_SOCKET,SO_REUSEADDR,&opt,sizeof(opt));

7. Flow control

As mentioned in the above chapter, there is a 16-bit window size in the TCP message. In fact, the 40-byte option in the TCP header also contains a window expansion factor M. The actual window size is the value of the window field shifted left by M bits. Used to exchange the size of the receive buffers of both parties. During the three-way handshake, the two parties incidentally completed the exchange of the receive buffer size in the TCP message.

When the receiving buffer is full, there will be two strategies: 1. When the upper layer of the receiving end takes away the data, the receiving end will send a window update notification to the sender; 2. The sending end will ask the receiving end for the window size from time to time. .

8. Sliding window/fast retransmission mechanism

8.1 The nature of sliding windows

The sliding window is essentially a part of the sending buffer, and the part of the sending buffer that has been sent but not responded to is called the sliding window.

The movement of the sliding window is essentially the movement of the array subscript.

8.2 The role of sliding windows

  1. Flow control: The TCP sliding window mechanism allows the receiver to control the sending rate of the sender to ensure that the receiver can receive and process data within the processing capabilities. The receiver informs the sender by adjusting the size of the sliding window, and it is also able to dynamically adjust the size of the sliding window based on the receiver's processing power. The sliding window improves TCP's data transmission efficiency to a certain extent.
  2. Congestion control: The TCP sliding window mechanism can help control congestion in the network. When network congestion occurs, the receiver can reduce the size of the sliding window to slow down the data sending rate, thereby avoiding further load pressure on the network. By dynamically adjusting the size of the sliding window, TCP can adaptively control the data transmission rate according to the degree of network congestion.
  3. Reliable transmission: TCP sliding window mechanism provides reliable data transmission. The sender will only send the data of the next sliding window after receiving the data confirmed by the receiver, and the receiver will also confirm the received data at the same time. If the sender does not receive an acknowledgment or the receiver receives out-of-order data, the sender can selectively retransmit the data through the sliding window mechanism to ensure reliable transmission of data.

8.3 Some questions and answers about sliding windows

How is the starting size of the sliding window set? How the future will change:

The sliding window size refers to the maximum size that can send data directly without waiting for a confirmation response. The window size in the picture above is 4000 bytes (four segments). The size of the sliding window is related to the receiving capability of the other party. 

Does the sliding window have to slide to the right? Will it slide left:

The sliding window may slide to the right, or it may not move (the other party's receiving buffer is full, and the sliding window is suspended). But it will never slide to the left , because the data on the left has been sent and answered.

Will the size of the sliding window change? How to change:

The sliding window changes dynamically according to the receiving capability of the other party, and may increase or decrease.

The order in which the data in the sliding window receives responses is not the same. If the middle or tail data is ACKed first, how should the sliding window be handled:

The sliding window ACK out of order does not affect the sliding window. As long as the leftmost ACK arrives, the sliding window will move to the right.

The most feared packet loss problem is:

1. If the data is not lost, but the ACK is lost. Don’t be afraid of this, because there is a 32-bit confirmation sequence number in the TCP protocol. For example, if the ACK of the sequence number 1001-2000 is lost, as long as the client receives the ACK of the sequence number 2001-3000 (larger than 2000), it means that all previous messages have been received. , then slide the window directly to 3001.

2. If the data is lost (the server does not receive the data, and of course there is no ACK), when the client subsequently receives three ACK packets with the same sequence number, the fast retransmission mechanism will be triggered .

What to do if there is insufficient space behind you if you slide backwards:

The send buffer is organized into a ring structure by the operating system kernel. (Similar to the producer-consumer model of the ring structure)

Since the sliding window takes into account the receiving capability of the peer, why can't all the data in the sliding window be sent out at once?

 This is because the MAC frame protocol of the data link layer stipulates that its payload cannot exceed 1500 bytes, so the maximum payload of a single transmission at the transport layer is 1460 bytes. (Maximum segment size MSS) Of course, it is okay if the payload size of a single transmission at the transport layer exceeds 1460 bytes, but the IP protocol needs to fragment and assemble excessively large messages, which increases the probability of packet loss.

9. Congestion control

There are too many cars on the road, so traffic jams are inevitable. The same is true for the network. Multiple hosts conducting TCP communications at the same time will also cause network congestion. Each receiving host suddenly experiences a large number of packet losses. If a large number of lost packets are rashly retransmitted over a timeout at this time, it will undoubtedly worsen the situation of the network.

When congestion occurs, TCP will enable the slow start mechanism . All hosts will first send a small amount of data to explore the path (reducing the number of messages sent at the same time, greatly alleviating network pressure), and then increase the sending rate after understanding the network condition. rate. As shown below:

There is also a concept of congestion window (the size changes according to the network status) on the figure , which reflects the storage capacity of the network.

Slow start mechanism:

1. When sending starts, define the congestion window size as 1;

2. Each time an ACK response is received, the congestion window increases by 1; (doubled, 1 becomes 2, 2 becomes 4, 4 becomes 8. Exponential growth. In the early stage, exponential growth is used to perform slow-start path exploration. If the path exploration is successful, It means that the network is OK, and at the same time the index explodes rapidly, quickly restoring the communication speed to the maximum. It is meaningless to double the congestion window in the later period of growth, so the congestion window cannot be simply doubled in the later period, and a threshold called slow start needs to be introduced. When When the congestion window exceeds this threshold, it no longer grows exponentially, but linearly)

3. The actual sent window size = min (congestion window, the 16-bit window size in the TCP header provided by the peer host);

Explanation for the third point:

The sending capability of the sender depends on the following three situations:

1. The sliding window size of the sender;

2. The 16-bit window size in the TCP header provided by the peer host. (Peer receiving buffer)

3. The congestion window of the current network

1. When TCP starts, the slow start threshold is equal to the maximum value of the window; (equal to 24 in the figure)

2. During each timeout and retransmission, the slow start threshold will become half of the original value, and the congestion window will be reset to 1 (the threshold in the figure is reduced from 24 to 12)

10. Delayed response

The sending end sends a message to the receiving buffer of the opposite end. After receiving the message, the opposite end finds that there is still 1M in the receiving buffer. If it responds immediately and tells the sending end that there is still 1M left in the receiving buffer, then send The end will perform flow control according to the capacity of 1M next time. If the peer adopts a delayed response, the upper layer will have a probability of taking away some messages in the receiving buffer, so that it can respond to a larger space and reduce the number of responses (there is a 32-bit confirmation sequence number in TCP, which can ensure The message with the previous sequence number has been received). Delayed acknowledgments can increase network throughput .

Of course, this does not mean that every packet is delayed in response.

1. Quantity limit: respond once every N packets;

2. Time limit: Respond once if the maximum delay time is exceeded; (must be less than the timeout retransmission time)

The specific number and timeout period vary depending on the operating system; generally N is 2 and the timeout period is 200ms;

11. Piggybacking on responses

One end sends a message to the other end. In order to reduce the transmission of messages, the other end will set the ACK flag to 1 and carry a 32-bit confirmation sequence number, plus the payload and send it to the opposite end. (Response carrying payload, which is also a common communication method in TCP, can improve message transmission efficiency)

12. Understand TCP’s byte stream orientation and UDP’s datagram orientation

The essence of the application layer calling write is to copy the data to the sending buffer. If the data is full, it will be suspended. The essence of the application layer calling read is to read the data from the receiving buffer. If the data is full, it will be suspended. Since both parties have a read/write buffer, TCP is a full-duplex communication protocol.

For the TCP protocol, it does not care about the length of a message in the data given by the application layer. In its eyes, all the data given by the application layer are bytes. Due to the existence of the buffer, the reading and writing of the TCP protocol do not need to match. How much the upper layer gives or wants and how to combine these bytes into messages, TCP does not care at all. If you want to combine them into messages, the application Think of a solution on your own . For example, the application layer can call write multiple times to write a total of 100 bytes, and the peer can call read once to read out 100 bytes. (UDP is for datagrams. It must be read once to send. Each time it is received, it must be a message. This message can be serialized and deserialized later. The transmission of this independent message is called for datagram oriented)

13. Sticky bag problem

As mentioned above, TCP does not care about message synthesis and needs the application layer to solve the sticky problem by itself.

The essence of solving the sticky problem is to clarify the boundaries between messages.

How to clarify boundaries:

1. Fixed-length reading strategy: For fixed-length packets, ensure that they are read at a fixed size every time; for example, the above Request structure is of a fixed size, so start from the beginning of the buffer and press sizeof(Request) in sequence Just read;

2. Packet header agreement packet length strategy : For variable-length packets, you can agree on a field for the total length of the packet at the position of the packet header, so that you know the end position of the packet;

3. Special character separation strategy : For variable-length packages, clear delimiters can also be used between packages (the application layer protocol is determined by the programmer himself, as long as the delimiter does not conflict with the text)

14. The problem of abnormal TCP connection establishment

1. Either one or both of the TCP client and server hang up.

Before hanging up, the TCP client and server were connected. Once one party hangs up, the operating system will automatically wave four times and disconnect normally. (No difference from calling close manually) 

2. Restart the machine

The OS has sent the process as well. The operating system will exit the process before restarting, so this situation is no different from the normal exit of the process.

3. The machine is powered off or the network cable is unplugged.

The end that was powered off did not have time to send a message to the other end before sending it. The other end will periodically initiate inquiries and will disconnect after multiple no-responses. (For connection management mechanisms, most of them are managed by the application layer (HTTP, HTTPS, etc.) Certain protocols of the application layer also have some such detection mechanisms, such as HTTP long connections, which will also regularly detect the status of the other party. For example, QQ After disconnection, it will also periodically try to reinitiate the TCP three-way handshake to connect)

5. Summary of UDP/TCP protocol

1. TCP reliability and performance

reliability:

Checksum, sequence number (arrival in order), confirmation response, timeout retransmission, connection management, flow control, congestion control

Improve performance:

Sliding window, fast retransmission, delayed response, piggyback response, flow control, congestion control

2. Applicable scenarios of UDP and TCP protocols

UDP is used in communication fields that require high-speed transmission and real-time performance, such as live broadcast, early QQ, video transmission, etc. In addition, UDP can be used for broadcasting

TCP is used for reliable transmission and is used in scenarios such as file transfer and important status updates;

Which agreement to use ultimately depends on the company leadership.

3. Regarding the second parameter of listen

int listen(int sockfd, int backlog);

The TCP protocol maintains a full connection queue for the upper layer. In the queue are objects to be connected that have completed the three-way handshake. This queue cannot be left out. Its existence is like going out to eat and needing to queue up. The last customer has left and the queue is still there. Users can connect immediately, which greatly improves the speed of TCP. Of course, the queue cannot be too long. Are you willing to wait because the queue is too long? The same is true for TCP. There is no need for the queue to be too long, because maintenance of this queue requires overhead. This full connection queue is affected by the second parameter of listen.

The second parameter of listen is set to 2. When the connection is established for the fourth time (client 36648), you can see that the client thinks it has successfully established the connection, but on the server side, the status of the connection establishment is SYN_RECV. (The client initiates a request to establish a connection. After receiving it, the server is in the SYN_RECV state and sends a SYN+ACK response to the client, allowing the client to establish a successful connection. However, when the client initiates the third handshake, it will ignore the client's ACK. The server will not establish a connection when speaking, and it is still in the SYN_RECV state, which is called a semi-connected state )

The bottom layer of tcp allows at most backlog + 1 full connection, and all subsequent connections are half connections. (If the semi-connection does not complete the handshake as soon as possible, it will be automatically closed by the server)

1. Semi-link queue (used to save requests in SYN_SENT and SYN_RECV status)

2. Full connection queue (accpetd queue) (used to save requests that are in the established state, but the application layer does not call accept)

 (Note: You can use wireshark to analyze the TCP communication process )

Guess you like

Origin blog.csdn.net/gfdxx/article/details/132126000
Recommended