Take you to understand the TCP protocol in five minutes

TCP is one of the core protocols of the Internet. This article introduces its basic knowledge.

1. The role of the TCP protocol

The Internet consists of a set of agreements. TCP is just one of these layers, with its own division of labor.

(Picture description: TCP is the upper layer protocol of the Ethernet protocol and the IP protocol, and also the lower layer protocol of the application layer protocol.) The bottom-layer Ethernet protocol (Ethernet) specifies how electronic signals form packets and solves the subnet Internal point-to-point communication.

(Picture description: The Ethernet protocol solves the point-to-point communication of the local area network.) However, the Ethernet protocol cannot solve how multiple LANs communicate with each other. This is solved by the IP protocol.

(Picture description: IP protocol can connect multiple LANs.) IP protocol defines a set of its own address rules, called IP addresses. It implements the routing function, allowing host A of a local area network to send messages to host B of another local area network.

(Picture description: The router is based on the IP protocol. The LANs are connected by routers.) The principle of routing is simple. All routers on the market have many network ports behind them, and multiple network cables must be connected. There is a routing table inside the router, which stipulates that the IP address of segment A goes to exit one, and the address of segment B goes to exit two ... Through this set of "guide signs", data packets are forwarded.

(Picture description: The routing table of this machine indicates the interface to which the data packets of different IP destinations are sent.) The IP protocol is only an address protocol, and does not guarantee the integrity of the data packet. If the router loses packets (for example, the buffer is full, new incoming packets will be lost), you need to find which packet was lost and how to resend the packet. This depends on the TCP protocol. Simply put, the role of the TCP protocol is to ensure the integrity and reliability of data communication and prevent packet loss

Second, the size of the TCP packet

The size of the Ethernet packet (packet) is fixed, initially 1518 bytes, and later increased to 1522 bytes. Among them, 1500 bytes are the payload and 22 bytes are the head information. The IP data packet is in the payload of the Ethernet data packet. It also has its own header information, which requires at least 20 bytes, so the load of the IP data packet is at most 1480 bytes.

(Picture description: IP packets are in Ethernet packets, and TCP packets are in IP packets.) TCP packets are in the payload of IP packets. It requires at least 20 bytes of header information, so the maximum load of TCP packets is 1480-20 = 1460 bytes. Since IP and TCP protocols often have additional header information, the TCP load is actually around 1400 bytes. Therefore, a 1500-byte message requires two TCP packets. A major improvement of the HTTP / 2 protocol is to compress the header information of the HTTP protocol, so that an HTTP request can be placed in a TCP packet instead of being divided into multiple, which increases the speed.

(Picture description: The load of Ethernet data packets is 1500 bytes, and the load of TCP data packets is about 1400 bytes.)

3. The number of the TCP packet (SEQ)

A packet of 1400 bytes, then sending a large amount of data at once, must be divided into multiple packets. For example, a 10MB file needs to send more than 7,100 packets. When sending, the TCP protocol assigns each sequence number (SEQ for short) so that the receiving party can restore it in order. In case of packet loss, you can also know which packet was lost. The number of the first packet is a random number. For ease of understanding, it is called package 1 here. Assuming that the payload length of this packet is 100 bytes, it can be deduced that the number of the next packet should be 101. This means that each packet can get two numbers: its own number, and the number of the next packet. The recipient then knows in what order they should be restored to their original files.

(Picture description: The current packet number is 45943, and the next packet number is 46183, which shows that the load of this packet is 240 bytes.)

4. Assembly of TCP packets

After receiving the TCP data packet, the assembly is restored by the operating system. The application does not directly process TCP packets. For applications, don't care about the details of data communication. Unless the line is abnormal, the complete data is always received. The data required by the application is placed in a TCP packet and has its own format (such as the HTTP protocol). TCP does not provide any mechanism to indicate the size of the original file, which is specified by the application layer protocol. For example, the HTTP protocol has a header Content-Length, indicating the size of the information body. For the operating system, it is to continuously receive TCP packets and assemble them in order, there are many in one packet. The operating system will not process the data in the TCP packet. Once the TCP packets are assembled, transfer them to the application. There is a port parameter in the TCP packet, which is used to specify the application to be forwarded to the port.

(Picture description: The system transfers the assembled data to the corresponding application according to the port in the TCP packet. In the figure above, port 21 is the FTP server, port 25 is the SMTP service, and port 80 is the web server. After receiving the assembled original data, taking a browser as an example, it will read out a piece of data correctly according to the Content-Length field of the HTTP protocol. This also means that a single TCP communication can include multiple HTTP communications.

Five, slow start and ACK

The server sends data packets, of course, the sooner the better, it is best to send them all at once. However, if it is sent too fast, there is a possibility of packet loss. Many factors such as small bandwidth, overheated routers, and buffer overflows can cause packet loss. If the line is not good, the faster you send it, the more you lose. The most ideal state is to reach the highest rate if the line permits. But how do we know what is the ideal rate of the other party's line? The answer is to try slowly. In order to achieve the unity of efficiency and reliability, TCP protocol has designed a slow start mechanism. At the beginning, the transmission is slow, and then the rate is adjusted according to the situation of packet loss: if the packet is not lost, the transmission speed is accelerated; if the packet is lost, the transmission speed is reduced. It is set in the Linux kernel (constant TCP_INIT_CWND). When the communication starts, the sender sends 10 packets at a time, that is, the size of the "send window" is 10. Then stop, wait for the receiver's confirmation, and then continue to send. By default, the receiver sends a confirmation message every time it receives two TCP packets. "Acknowledgement" in English is acknowledgement, so this confirmation message is abbreviated as ACK. ACK carries two messages.

Expect to receive the next packet number
The remaining capacity of the receiver's receiving window

The sender has these two information, plus the latest number of the data packet that he has sent, it will guess the receiver's approximate receiving speed, thereby reducing or increasing the sending rate. This is called the "send window", and the size of this window is variable.

(Picture description: Each ACK carries the number of the next packet and the remaining capacity of the receiving window. Both parties will send ACK.) Note that since TCP communication is bidirectional, both parties need to send ACK. The window sizes of the two parties are likely to be different. And ACK is just a few simple fields, usually combined with data and sent in a data packet.

(Picture description: There are 4 communications in the picture above. In the first communication, the data packet number sent by host A to host B is 1, and the length is 100 bytes. Therefore, the second time the ACK number of host B is 1 + 100 = 101, the packet number of the host A in the third communication is also 101. Similarly, the packet number sent from the host B to the host A in the second communication is 1, and the length is 200 bytes, so the host A ACK in the third communication It is 201, and the packet number of the B host in the fourth communication is also 201.) Even for a connection with a large bandwidth and a good line, TCP always tries slowly from 10 packets. Reach the highest transmission rate. This is the slow start of TCP.

6. Lost processing of data packets

TCP protocol can guarantee the integrity of data communication, how is this done? As mentioned earlier, each packet carries the number of the next packet. If the next packet is not received, the ACK number will not change. For example, now I have received packet No. 4, but I have not received packet No. 5. ACK will be recorded, looking forward to receiving packet No. 5. After a period of time, packet 5 is received, then the next round of ACK will update the number. If packet 5 is still not received, but packet 6 or 7 is received, the number in the ACK will not change, and packet 5 is always displayed. This will result in a lot of duplicate content ACK. If the sender finds that it has received three consecutive repeated ACKs, or if it has not received any ACK after timeout, it will confirm the packet loss, that is, packet 5 is lost, and then send this packet again. Through this mechanism, TCP guarantees that there will be no packet loss.

(Picture description: Host B does not receive packet No. 100, it will continuously send the same ACK, triggering Host A to resend packet No. 100.)

Published 488 original articles · praised 85 · 230,000 views +

Guess you like

Origin blog.csdn.net/Coo123_/article/details/105225059