TCP protocol: How to ensure that the page file can be completely delivered to the browser?

When measuring the performance of a Web page, there is an important indicator called " FP (First Paint) ", which refers to the time from when the page is loaded to when the drawing starts for the first time . This indicator directly affects the bounce rate of users. Faster page response means more PV, higher participation, and higher conversion rate. So what affects the FP indicator? One of the important factors is network loading speed .

To optimize the loading speed of web pages, you need to have a good understanding of the network. The key to understanding the network is to have a deep understanding of network protocols. Whether you use HTTP or WebSocket, they are all based on TCP/IP. If you have enough understanding of these principles, you will also know how to optimize Web performance, or to be able to more easily locate web problems. In addition, the design thinking of TCP/IP can also help to broaden your knowledge boundaries, thereby improving your overall project understanding and problem-solving ability.

So, in this article, I'll give you an overview of how TCP/IP works in the Web world . Of course, the protocol is not the focus of this column. In this article, I will combine HTTP to analyze the core path of network requests from my perspective. If you want to have a deeper understanding of network protocols, then I recommend you to learn from Teacher Liu Chao. "Interesting Talk about Network Protocol" column, and teacher Tao Hui's "Web Protocol Detailed Explanation and Packet Capture Actual Combat" video course.

Ok, let's get back to the topic and start today's content. In the network, a file is usually divided into many data packets for transmission, and the data packets have a high probability of being lost or errored during transmission. So how to ensure that the page file can be completely delivered to the browser?

This article will give answers to questions from the perspective of data packets.

The "journey" of a packet

Next, I will describe the data transmission process for you from the three perspectives of "how the data packet reaches the host", "how the host transfers the data packet to the application" and "how the data is completely delivered to the application".

The Internet is actually an architecture composed of concepts and protocols . Among them, the protocol is a well-known set of rules and standards, if the parties agree to use, then the communication between them will become barrier-free.

Data on the Internet is transmitted in packets. If the data sent is large, the data will be split into many small packets for transmission. For example, the audio data you are listening to now is split into small data packets for transmission, not a large file at a time.

1. IP: Send the data packet to the destination host

For data packets to be transmitted on the Internet, they must conform to the Internet Protocol (Internet Protocol, referred to as IP ) standard. Different online devices on the Internet have unique addresses, and the address is just a number, which is similar to the receiving address of most households. You only need to know the specific address of a household, and you can send packages to this address, so that the logistics system can Deliver items to their destination.

The address of a computer is called an IP address, and visiting any website is actually just your computer requesting information from another computer.

If you want to send a data packet from host A to host B, the IP address information of host B will be attached to the data packet before transmission, so that it can be correctly addressed during transmission. In addition, the IP address of host A itself will be attached to the data packet, and with this information, host B can reply information to host A. This additional information is packed into a data structure called the IP header. The IP header is the information at the beginning of the IP packet, including the IP version, source IP address, destination IP address, time to live and other information. If you want to know more about IP header information, you can refer to this link .

In order to facilitate understanding, I first divide the network into a simple three-layer structure, as shown in the following figure:

Simplified three-layer transmission model of IP network

Let's take a look at the journey of the next packet from host A to host B:

  • The upper layer hands over the packet containing "geek time" to the network layer;
  • The network layer then attaches the IP header to the data packet to form a new  IP data packet and hand it over to the bottom layer;
  • The bottom layer transmits the data packet to host B through the physical network;
  • The data packet is transmitted to the network layer of host B, where host B disassembles the IP header information of the data packet, and hands the disassembled data part to the upper layer;
  • Eventually, the packet containing the "geek time" message reaches the upper layers of host B.

2. UDP: Send the data packet to the application

IP is a very low-level protocol. It is only responsible for transmitting data packets to the other party's computer, but the other party's computer does not know which program to hand the data packet to. Should it be handed over to the browser or to the glory of the king? Therefore, it is necessary to develop a protocol that can deal with applications based on IP, the most common is " User Datagram Protocol (User Datagram Protocol)", referred to as UDP .

One of the most important information in UDP is the port number . The port number is actually a number. Every program that wants to access the network needs to bind a port number. The specified data packet can be sent to the specified program through the port number UDP, so IP sends the data packet to the specified computer through the IP address information, and UDP distributes the data packet to the correct program through the port number . Like the IP header, the port number will be packed into the UDP header, and the UDP header is combined with the original data packet to form a new UDP data packet. In addition to the destination port, the UDP header also includes information such as the source port number.

In order to support the UDP protocol, I expanded the previous three-layer structure to a four-layer structure, and added a transport layer between the network layer and the upper layer, as shown in the following figure:

Simplified four-layer transmission model of UDP network

Let's take a look at the route of the next packet traveling from host A to host B:

  • The upper layer hands over the data packet containing "geek time" to the transport layer;
  • The transport layer will append a UDP header in front of the data packet to form a new UDP data packet, and then hand over the new UDP data packet to the network layer;
  • The network layer then attaches the IP header to the data packet to form a new IP data packet and hand it over to the bottom layer;
  • The data packet is transmitted to the network layer of host B, where host B disassembles the IP header information and passes the disassembled data part to the transport layer;
  • At the transport layer, the UDP header in the data packet will be disassembled, and the data part will be handed over to the upper layer application according to the port number provided in UDP ;
  • Eventually, the packet containing the "geek time" information travels to the upper application on Host B.

When using UDP to send data, there are various factors that will cause data packet errors. Although UDP can verify whether the data is correct, UDP does not provide a retransmission mechanism for erroneous data packets. It just discards the current packet. It is impossible to know whether the destination can be reached after sending.

Although UDP cannot guarantee data reliability, the transmission speed is very fast , so UDP will be used in some fields that focus on speed but not so strictly require data integrity, such as online video, interactive games, etc.

3. TCP: Deliver the data to the application in its entirety

For browser requests, or e-mail applications that require data transmission reliability (reliability), if UDP is used for transmission, there will be two problems :

  • Data packets are easily lost during transmission;
  • Large files will be split into many small data packets for transmission, these small data packets will go through different routes, and arrive at the receiving end at different times, and the UDP protocol does not know how to assemble these data packets, so these Packets are restored to complete files.

Based on these two problems, we introduced TCP. TCP (Transmission Control Protocol, Transmission Control Protocol) is a connection-oriented, reliable, byte stream-based transport layer communication protocol . Compared with UDP, TCP has the following two characteristics:

  • For packet loss, TCP provides a retransmission mechanism;
  • TCP introduces a data packet sorting mechanism to ensure that out-of-order data packets are combined into a complete file.

Like the UDP header, the TCP header not only includes the destination port and the local port number, but also provides a sequence number for sorting, so that the receiving end can rearrange the data packets through the sequence number.

Let's take a look at the transmission process of a single packet under TCP:

Simplified four-layer transmission model of TCP network

From the picture above, you should be able to understand how a data packet is transmitted through TCP. The transmission process of a single TCP packet is similar to that of UDP, the difference is that the integrity of a large piece of data transmission is guaranteed through the information in the TCP header.

Next, let's look at the complete TCP connection process . Through this process, you can understand how TCP guarantees the retransmission mechanism and the sequencing function of data packets.

As can be seen from the figure below, the life cycle of a complete TCP connection includes three stages: " connection establishment ", " transmission data " and " disconnection ".

Life cycle of a TCP connection

  • First, the connection phase is established . This stage is to establish a connection between the client and the server through a "three-way handshake". TCP provides a connection-oriented communication transport. Connection-oriented refers to the preparation between the two ends before the start of data communication. The so-called three-way handshake means that when a TCP connection is established, the client and the server need to send a total of three data packets to confirm the establishment of the connection.
  • Second, the data transmission phase . At this stage, the receiving end needs to perform an acknowledgment operation on each data packet , that is, after receiving the data packet, the receiving end needs to send an acknowledgment data packet to the sending end. Therefore, when the sending end sends a data packet and does not receive the confirmation message fed back by the receiving end within the specified time, it is judged that the data packet is lost and triggers the retransmission mechanism of the sending end. Similarly, a large file will be split into many small data packets during the transmission process. After these data packets arrive at the receiving end, the receiving end will sort them according to the serial number in the TCP header, so as to ensure the complete data.
  • Finally, the disconnect phase . After the data transmission is complete, the connection will be terminated, which involves the last stage of "waves four times" to ensure that both parties can disconnect.

At this point, you should understand that in order to ensure the reliability of data transmission, TCP sacrifices the transmission speed of data packets, because the "three-way handshake" and "data packet verification mechanism" increase the number of data packets in the transmission process. double.

Summarize

Well, that's all for this section, let me make a brief summary.

  • Data in the Internet is transmitted through data packets, and data packets are prone to loss or error during transmission.
  • IP is responsible for delivering data packets to the destination host.
  • UDP is responsible for delivering data packets to specific applications.
  • While TCP guarantees the complete transmission of data, its connection can be divided into three stages: connection establishment, data transmission and disconnection.

In fact, understanding the TCP protocol is to fully understand HTTP, including its actual functions and limitations, and then to understand why HTTP/2 and QUIC protocol are launched, which is the future HTTP/3. This is a step-by-step process from the shallower to the deeper. I hope you can work steadily, learn every step and every agreement well, and "it will come naturally."

Extended quiz

1. The browser can open multiple tabs at the same time, are they the same port? If the same, how does the data know which tab to go to?

The port is the same, the network process knows which label each tcp link corresponds to, so after receiving the data, it will distribute the data to the corresponding rendering process.

2. Does the browser do rendering processing when TCP transmits data? If the previous data packet is lost, do you have to wait for the subsequent data packet to come first? How to deal with similar kind of real-time rendering? For the sequentiality of packets?

When the content-type type in the http response header is received, the rendering process starts. Once the response body data is received, DOM parsing begins! Based on http, there is no need to worry about packet loss, because packet loss and retransmission are all resolved at the tcp layer. http can guarantee that the data is received in order (that is to say, the data from tcp to http is already complete, even if it is rendered in real time, if packet loss occurs, the rendering can only start after retransmission)

3. Are http and websocket both protocols at the application layer?

They are all application layer protocols, and the name of websocket is quite confusing. In fact, it is completely different from socket. You can see that websocket is a modified version of http, which increases the ability of the server to actively send messages to the client.

4. Regarding "data may be lost or errored during transmission", where do the lost data packets go? Disappeared out of nowhere? What happened to the wrong packet? Why did it go wrong?

Such as network fluctuations, physical line failures, equipment failures, malicious program interception, network congestion, etc.

Guess you like

Origin blog.csdn.net/qq_47443027/article/details/127261551