Explain the TCP/IP protocol stack in a simple way丨Handwritten code to realize the network protocol stack

The TCP/IP protocol stack is the sum of a series of network protocols and constitutes the core skeleton of network communication. It defines how electronic devices connect to the Internet and how data is transmitted between them. The TCP/IP protocol uses a four-layer structure, namely the application layer, the transport layer, the network layer, and the link layer. Each layer calls the protocol provided by its next layer to fulfill its own needs. Since we work at the application layer most of the time, we don’t need to worry about the lower-level things; secondly, the network protocol system itself is very complex and huge, and the entry barrier is high, so it is difficult to figure out the working principle of TCP/IP. In layman's terms, it is a What process does the host's data go through before it can be sent to the other's host? Next, let's explore this process.

Here are two real video explanations of the TCP/IP protocol stack:

tcp training camp, sliding window, udp concurrency, state transition diagram, waved intermediate state 丨 network underlying principle 丨 handwritten code to realize the network protocol stack (on)

tcp training camp, sliding window, udp concurrency, state transition diagram, waved intermediate state 丨 network underlying principle 丨 handwritten code to realize the network protocol stack (below)

0, physical medium

The physical medium is the physical means to connect the computer. The common ones are optical fiber, twisted pair, and radio waves, which determine the transmission mode of electrical signals (0 and 1). The difference in physical media determines the transmission bandwidth of electrical signals. Speed, transmission distance and anti-interference etc.

The TCP/IP protocol stack is divided into four layers. Each layer communicates with each other by a specific protocol, and the communication between the protocols must eventually be converted into electrical signals of 0 and 1, which can be transmitted through the physical medium to reach the other party. Computers, so the physical medium is the cornerstone of network communication.

Let's take a picture to get an overview of the basic framework of the TCP/IP protocol:

When a request is initiated via http, the relevant protocols of the application layer, transport layer, network layer, and link layer sequentially wrap the request and carry the corresponding headers, and finally generate Ethernet data packets at the link layer, and Ethernet data packets It is transmitted to the host of the other party through physical media. After the other party receives the data packet, it then uses the corresponding protocol to unpack it layer by layer, and finally the application layer data is handed over to the application program.

Network communication is like delivering express delivery. The layers of packages outside the product are various agreements. The agreement includes product information, receiving address, recipient, contact information, etc., and then also requires delivery vehicles, delivery stations, couriers, and products. In order to finally reach the user's hands.

Under normal circumstances, express delivery is not direct, and it needs to be forwarded to the corresponding delivery station first, and then the delivery station will dispatch the package.

The delivery vehicle is the physical medium, the delivery station is the gateway, the courier is the router, the delivery address is the IP address, and the contact information is the MAC address.

The courier is responsible for forwarding the package to each distribution station. The distribution station confirms whether it needs to continue to be forwarded to other distribution stations according to the provinces and cities in the harvest address. When the package reaches the target distribution station, the distribution station will find the recipient according to the contact information The person dispatches the pieces.

With the overall concept, let's take a closer look at the division of labor at each level.

1. Link layer

Network communication is to transmit data with specific meaning to the other party through physical media. It is meaningless to simply send 0 and 1. To transmit meaningful data, you need to group 0 and 1 in bytes, and Identify the information characteristics of each group of electrical signals, and then send them sequentially in the order of grouping. Ethernet stipulates that a group of electrical signals is a data packet, and a data packet is called a frame. The protocol that makes this rule is the Ethernet protocol. A complete Ethernet data packet is shown in the figure below:

The entire data frame consists of three parts: header, data and tail. The header is fixed at 14 bytes, including the destination MAC address, source MAC address and type; the shortest data is 46 bytes, and the longest is 1500 bytes. If The data that needs to be transmitted is very long and must be divided into multiple frames for transmission; the tail is fixed at 4 bytes, indicating the data frame check sequence, used to determine whether the data packet is damaged during transmission. Therefore, the Ethernet protocol groups electrical signals and forms data frames, and then sends the data frames to the receiver through the physical medium. So how does Ethernet recognize the identity of the receiver?

The Ethernet regulations stipulate that all devices connected to the network must be equipped with a network adapter, that is, a network card, and data packets must be transmitted from one network card to another. The network card address is the sending address and receiving address of the data packet, that is, the MAC address contained in the frame header. The MAC address is the identity of each network card, just like the ID number on our ID card, which is globally unique. The MAC address is identified by hexadecimal notation, a total of 6 bytes, the first three bytes are the manufacturer number, and the last three bytes are the serial number of the network card, such as 4C-0F-6E-12-D2-19

With the MAC address, the Ethernet uses a broadcast format to send the data packet to all hosts in the subnet. After each host in the subnet receives this packet, it will read the target MAC address in the header, and then communicate with itself Compare the MAC addresses, if they are the same, do the next step, if they are different, discard the packet.

Therefore, the main job of the link layer is to group electrical signals and form data frames with specific meanings, and then send them to the receiver through the physical medium in the form of broadcast.

2. Network layer

For the above process, there are several details worthy of our consideration:

How does the sender know the receiver's MAC address? How does the sender know that the receiver belongs to the same subnet as himself? If the receiver and you are not in the same subnet, how to send the data packet to the other party?

In order to solve these problems, the network layer introduced three protocols, namely the IP protocol, the ARP protocol, and the routing protocol.

【1】IP protocol

From the previous introduction, we know that the MAC address is only related to the manufacturer, and has nothing to do with the network in which it is located, so it is impossible to judge whether two hosts belong to the same subnet through the MAC address.

Therefore, the network layer has introduced the IP protocol and formulated a new set of addresses, enabling us to distinguish whether two hosts belong to the same network. This set of addresses is the network address, which is the so-called IP address.

There are currently two versions of IP addresses, namely IPv4 and IPv6. IPv4 is a 32-bit address, often represented by 4 decimal digits. The IP protocol divides this 32-bit address into two parts. The first part represents the network address, and the back part represents the address of the host in the local area network. Since the classification of various addresses is not the same, take the class C address 192.168.24.1 as an example, where the first 24 bits are the network address, and the last 8 bits are the host address. Therefore, if two IP addresses are in the same subnet, the network addresses must be the same. In order to determine the network address in the IP address, the IP protocol also introduces a subnet mask. The IP address and the subnet mask can get the network address after bitwise AND operation.

Since the IP addresses of the sender and receiver are known (the application layer protocol will be passed in), we can judge whether the two IP addresses are on the same subnet by ANDing the two IP addresses through the subnet mask. .

[2] ARP protocol

Namely, the address resolution protocol is a network layer protocol that obtains the MAC address based on the IP address. Its working principle is as follows:

ARP will first initiate a request packet, the header of the packet contains the IP address of the target host, and then the packet will be repackaged at the link layer to generate an Ethernet packet, which is finally broadcast by the Ethernet to the subnet All hosts, each host will receive this packet, take out the IP address in the header, and compare it with its own IP address. If they are the same, return their own MAC address. If they are different, discard the packet. ARP receives the return message to determine the MAC address of the target machine; at the same time, ARP will also store the returned MAC address and the corresponding IP address in the local ARP cache for a certain period of time, and directly query ARP when requested next time Cache to save resources. Enter arp -a in cmd to query the ARP data cached in the machine.

【3】Routing protocol

Through the working principle of the ARP protocol, it can be found that the MAC addressing of ARP is still limited to the same subnet. Therefore, the network layer introduces a routing protocol. First, the IP protocol is used to determine whether the two hosts are in the same subnet. If it is not in the same subnet, Ethernet will forward the data packet to the gateway of the subnet for routing through the ARP protocol to query the corresponding MAC address, and then send the data packet to the host in the subnet in the form of broadcast; if it is not in the same subnet, Ethernet will forward the data packet to the gateway of the subnet for routing. The gateway is the bridge between the subnet and the subnet on the Internet, so the gateway will forward multiple times, and finally forward the data packet to the subnet where the target IP is located, and then obtain the target machine MAC through ARP, and finally through broadcast The form sends the data packet to the receiver.

The physical device that completes this routing protocol is the router. In the complex network world, the router plays the role of a traffic hub. It selects and sets routes according to the channel conditions, and forwards data packets with the best path.

[4] IP data packet

The packet that is packaged at the network layer is called an IP packet. The structure of an IPv4 packet is shown in the figure below:

The IP data packet consists of two parts: the header and the data. The length of the header is 20 bytes, which mainly contains the destination IP address and the source IP address. The destination IP address is the clue and basis of the gateway routing; the maximum length of the data part is 65515 bytes In theory, the total length of an IP data packet can reach 65535 bytes, while the maximum length of an Ethernet data packet is 1500 characters. If it exceeds this size, the IP data packet needs to be divided and sent in multiple frames.

Therefore, the main work of the network layer is to define network addresses, distinguish network segments, MAC addressing in subnets, and route data packets of different subnets.

To share with you a wave of C/C++Linux background server development learning materials, including: C/C++, Linux, Nginx, ZeroMQ, MySQL, Redis, MongoDB, ZK, streaming media, P2P, Linux kernel, Docker, TCP /IP, coroutine, DPDK multiple advanced knowledge points.

Friends in need can get +C/C++Linux background server development technology exchange qun: learning materials

3. Transport layer

The link layer defines the identity of the host, that is, the MAC address, while the network layer defines the IP address, which clarifies the network segment where the host is located. With these two addresses, data packets can be sent from one host to another. . But in fact, the data packet is sent from a certain application of a host, and then received by the application of the other host. And each computer may be running many applications at the same time, so when the data packet is sent to the host, it is impossible to determine which application will receive the packet.

Therefore, the transport layer introduces the UDP protocol to solve this problem. In order to identify the identity of each application, the UDP protocol defines a port. Each application on the same host needs to specify a unique port number and specify the transmission in the network. Port information must be added to the data packet. In this way, when the data packet arrives at the host, the corresponding application can be found based on the port number. The data packet defined by UDP is called UDP data packet, and its structure is as follows:

UDP data packet consists of two parts: header and data. The length of the header is 8 bytes, mainly including the source port and destination port; the maximum data size is 65,527 bytes, and the length of the entire data packet can reach up to 65535 bytes.

The UDP protocol is relatively simple and easy to implement, but it has no confirmation mechanism. Once a data packet is sent, it is impossible to know whether the other party has received it, so its reliability is poor. In order to solve this problem and improve network reliability, the TCP protocol was born. Transmission control protocol is a connection-oriented, reliable, byte stream-based communication protocol. To put it simply, TCP is a UDP protocol with an acknowledgment mechanism. It requires acknowledgment every time a data packet is sent. If a data packet is lost, the acknowledgment cannot be received and the sender must resend the data packet.

In order to ensure the reliability of transmission, the TCP protocol establishes a three-conversation confirmation mechanism on the basis of UDP, that is to say, a reliable connection must be established with the other party before officially sending and receiving data. Since the establishment process is more complicated, we will make a vivid description here:

Host A: I want to send data to you, can I? Host B: Yes, when do you post it? Host A: I will send it right away, and you will continue!

After three conversations, host A will send formal data to host B. UDP is a non-connection-oriented protocol. It does not establish a connection with the other party, but sends the data packet directly. Therefore, TCP can ensure that data packets are not lost during transmission, but good things must pay a price. Compared with UDP, TCP has a complicated implementation process, consumes more connection resources, and is slow in transmission speed.

TCP data packet is the same as UDP, it is composed of two parts: header and data. The only difference is that TCP data packet has no length limit, theoretically it can be infinitely long, but in order to ensure the efficiency of the network, usually the length of TCP data packet is not Exceed the length of the IP data packet to ensure that a single TCP data packet does not have to be split.

To sum up, the main work of the transport layer is to define ports, identify application identities, and implement port-to-port communication. The TCP protocol can ensure the reliability of data transmission.

4. Application layer

In theory, with the support of the above three-layer protocol, data can already be transmitted from an application on one host to an application on another host, but the data transmitted at this time is a byte stream, which is not very good. Recognized by the program, poor operability. Therefore, the application layer defines a variety of protocols to standardize the data format, the common ones are HTTP, FTP, SMTP, etc. HTTP is a relatively commonly used application layer protocol, mainly used for data communication between B/S architectures , The message format is as follows:

In Request Headers, Accept represents the data format that the client expects to receive, and ContentType represents the data format sent by the client; in Response Headers, ContentType represents the data format of the server response. The format defined here is generally the same as Request Headers The format defined in Accept is consistent.

With this specification, after the server receives the request, it can correctly parse the data sent by the client. When the request is processed, it will return in the format required by the client. After the client receives the result, it will follow the server. The returned format is parsed.

Therefore, the main job of the application layer is to define the data format and interpret the data according to the corresponding format.

5. The whole process

First, we sort out the responsibilities of each layer model:

  • Link layer: Group 0 and 1, define data frames, confirm the physical address of the host, and transmit data;

  • Network layer: Define the IP address, confirm the network location where the host is located, and perform MAC addressing through IP, and route and forward external network data packets;

  • Transport layer: define the port, confirm the identity of the application on the host, and deliver the data packet to the corresponding application;

  • Application layer: Define the data format and interpret the data according to the corresponding format.

Then connect the responsibilities of each layer model together, in a simple and easy-to-understand sentence:

When you enter a URL and press the Enter key, first, the application layer protocol defines the format of the request packet; then the transport layer protocol adds the port numbers of both parties to confirm the application of the communication between the two parties; then The network protocol adds the IP addresses of both parties to confirm the network location of both parties; finally the link layer protocol adds the MAC addresses of both parties to confirm the physical location of both parties, and at the same time, the data is grouped into data frames, which are broadcast. , Sent to the other host through the transmission medium. For different network segments, the data packet is first forwarded to the gateway router, and after multiple forwarding, it is finally sent to the target host. After the target machine receives the data packet, it uses the corresponding protocol to assemble the frame data, and then parses it through the layer-by-layer protocol, and finally is parsed by the application layer protocol and handed over to the server for processing.

6. Summary

The above content is a simple introduction to the TCP/IP four-layer model. In fact, each layer model has many protocols, and each protocol has a lot of things to do. But we must first have a clear context and structure. The most basic role of each layer of the model, and then to enrich the details, may be easier to understand.

 

Guess you like

Origin blog.csdn.net/Linuxhus/article/details/113884831