In-depth understanding of TCP/IP protocol

In-depth understanding of TCP/IP protocol

The TCP/IP protocol stack is the sum of a series of network protocols and constitutes the core skeleton of network communication. It defines how electronic devices are connected to the Internet and how data is transmitted between them. The TCP/IP protocol adopts a 4-layer structure, namely the application layer, the transport layer, the network layer and the link layer . Each layer calls the protocol provided by the next layer to complete its own needs. Since we work at the application layer most of the time, we don't need to worry about the lower layer; secondly, the network protocol system itself is very complex and huge, and the entry threshold is high, so it is difficult to understand the working principle of TCP/IP. What process does the host's data go through before it can be sent to the other party's host . So let's explore the process:

0. Physical medium

The physical medium is the physical means to connect the computers. The common ones are optical fiber, twisted pair, and radio waves, which determine the transmission method of electrical signals (0 and 1). The difference in physical medium determines the transmission bandwidth of electrical signals, speed, transmission distance and anti-interference, etc.

The TCP/IP protocol stack is divided into four layers, each layer communicates with the other party by a specific protocol, and the communication between the protocols must eventually be converted into electrical signals of 0 and 1, which can be transmitted through the physical medium to reach the other party. computers , and therefore the physical medium is the cornerstone of network communication.

Network communication is like express delivery. The products purchased by users are wrapped layer by layer, which is a variety of agreements. The agreements describe the size, recipient, contact information and delivery address of the product, while the actual delivery vehicle is the physical medium. Secondly, for some remote places, express delivery cannot be directly reached, and it needs to be forwarded halfway. At this time, various protocols on express delivery come into play. It records the address to be forwarded, the information of the sender, etc., which is TCP/IP. The role of so many agreements.

Let's first understand the data flow of the TCP/IP protocol through a picture:

When a user initiates a request through http, the relevant protocols of the application layer, transport layer, network layer and link layer sequentially wrap the request and carry the corresponding header , and finally generate an Ethernet data packet at the link layer . The packet is transmitted to the other host through the physical medium. After the other party receives the data packet, the packet is unpacked layer by layer using the corresponding protocol, and finally the application layer data is handed over to the application program for processing.

After we have the overall concept, let's talk about the division of labor at each layer and the corresponding protocols in detail:

1. Link layer

Network communication is to transmit data with specific meaning to the other party through the physical medium. It is meaningless to simply send 0 and 1. Therefore, it is necessary to group 0 and 1, and to identify the information characteristics of each group of electrical signals. Then send them in the order of the packets. Ethernet stipulates that a group of electrical signals is a data packet, and a data packet is called a frame.  The protocol that formulates this rule is the Ethernet protocol . A complete Ethernet packet is shown in the following figure:

The entire data frame consists of three parts: header , data and tail . The header is fixed to 14 bytes, including the destination MAC address, source MAC address and type; the minimum data is 46 bytes and the longest is 1500 bytes. If If the data to be transmitted is very long, it must be divided into multiple frames for transmission; the tail is fixed to 4 bytes, indicating the data frame check sequence, which is used to determine whether the data packet is damaged during transmission. Therefore, the Ethernet protocol forms data frames by grouping electrical signals, and then sends the data frames to the receiver through the physical medium. So how does Ethernet know the identity of the receiver?

The Ethernet regulations stipulate that all devices connected to the network must install a network adapter, that is, a network card, and  data packets must be transmitted from one network card to another. The network card address is the sending address and receiving address of the data packet, that is, the MAC address contained in the header of the frame. The MAC address is the identity of each network card, just like the ID number on our ID card, which is globally unique. The MAC address is identified by hexadecimal, a total of 6 bytes,  the first three bytes are the manufacturer number, and the last three bytes are the serial number of the network card, such as  4C-0F-6E-12-D2-19

After having the MAC address, Ethernet uses the broadcast form to send the data packet to all hosts in the subnet . After each host in the subnet receives the packet, it will read the target MAC address in the header , and then communicate with itself. The MAC addresses are compared, and if they are the same, go to the next step, and if they are different, discard the packet.

Therefore, the main job of the link layer is to group electrical signals and form data frames with specific meanings, and then send them to the receiver through the physical medium in the form of broadcast.

2. Network layer

For the above process, there are several details worth thinking about:

How does the sender know the recipient's MAC address?

How does the sender know that the receiver belongs to the same subnet as itself?

If the receiver is not in the same subnet as himself, how can the data packet be sent to the other party?

In order to solve these problems, the network layer introduces three protocols, namely IP protocol , ARP protocol , and routing protocol.

【1】IP protocol

From the previous introduction, we know that the MAC address is only related to the manufacturer and has nothing to do with the network where it is located. Therefore, it is impossible to judge whether two hosts belong to the same subnet through the MAC address.

Therefore, the IP protocol is introduced into the network layer, and a new set of addresses is formulated, which enables us to distinguish whether two hosts belong to the same network. This set of addresses is the network address, which is the so-called IP address.

There are currently two versions of IP addresses, namely IPv4 and IPv6 . IPv4 is a 32-bit address, usually represented by 4 decimal digits. The IP protocol divides this 32-bit address into two parts, the former part represents the network address, and the latter part represents the address of the host in the local area network. Since the classification of various types of addresses is different, take the class C address 192.168.24.1 as an example , the first 24 bits are the network address, and the last 8 bits are the host address. Therefore,  if two IP addresses are in the same subnet, the network addresses must be the same. In order to determine the network address in the IP address, the IP protocol also introduces a subnet mask, and the network address can be obtained by the bitwise AND operation between the  IP address and the subnet mask .

Since the IP addresses of the sender and receiver are known (the application layer protocol will be passed in), we can judge whether the two parties are in the same subnet by ANDing the two IP addresses through the subnet mask. .

[2] ARP protocol

The Address Resolution Protocol is a network layer protocol that obtains the MAC address based on the IP address . It works as follows:

ARP will first initiate a request packet, the header of the packet contains the IP address of the target host, and then the packet will be repackaged at the link layer to generate an Ethernet packet, which is finally broadcast by Ethernet to the subnets All hosts, each host will receive this packet, take out the IP address in the header, and compare it with its own IP address. If it is the same, it will return its own MAC address, and if it is different, the packet will be discarded. ARP receives the return message to determine the MAC address of the target machine; at the same time, ARP will also store the returned MAC address and the corresponding IP address in the local ARP cache and keep it for a certain period of time, and directly query ARP for the next request. Cache to save resources. cmd input arp -a to query the local cached ARP data.

[3] Routing protocol

Through the working principle of the ARP protocol, it can be found that the MAC addressing of ARP is still limited to the same subnet . Therefore, the network layer introduces a routing protocol. First, the IP protocol is used to determine whether two hosts are in the same subnet. If it is not in the same subnet, the Ethernet will forward the data packet to the gateway of this subnet for routing. The gateway is a bridge between subnets and subnets on the Internet, so the gateway will forward multiple times, and finally forward the packet to the subnet where the target IP is located, and then obtain the target machine's MAC through ARP, and finally broadcast it. form to send the packet to the receiver.

The physical device that completes this routing protocol is the router. In the complex network world, the router plays the role of the transportation hub . It selects and sets the route according to the channel conditions, and forwards the data packets by the best path.

【4】IP packet

The data packets packaged at the network layer are called IP data packets. The structure of IPv4 data packets is shown in the following figure:

An IP data packet consists of a header and data. The length of the header is 20 bytes, and it mainly includes the destination IP address and the source IP address. The destination IP address is the clue and basis for gateway routing; the maximum length of the data part is 65515 bytes. In theory, the total length of an IP data packet can reach 65535 bytes, while the maximum length of an Ethernet data packet is 1500 characters. If this size is exceeded, the IP data packet needs to be segmented and sent into multiple frames.

Therefore, the main work of the network layer is to define network addresses, distinguish network segments, address MAC addresses in subnets, and route packets of different subnets.

3. Transport layer

The link layer defines the identity of the host, that is, the MAC address, while the network layer defines the IP address, which specifies the network segment where the host is located. With these two addresses, packets can be sent from one host to another. . But in fact the data packets are sent from an application of one host and then received by the application of the other host. And each computer may be running many applications at the same time, so when the data packet is sent to the host, it is impossible to determine which application will receive the packet.

Therefore, the transport layer introduces the UDP protocol to solve this problem. In order to identify the identity of each application, the UDP protocol defines a port . Each application on the same host needs to specify a unique port number, and specifies the transmission in the network. The data packet must add port information. In this way, when the data packet arrives at the host, the corresponding application can be found according to the port number. The data packets defined by UDP are called UDP data packets, and the structure is as follows:

A UDP data packet consists of a header and data. The length of the header is 8 bytes, mainly including the source port and the destination port; the maximum data size is 65527 bytes, and the maximum length of the entire data packet can reach 65535 bytes.

The UDP protocol is relatively simple and easy to implement, but it has no confirmation mechanism. Once a data packet is sent, it is impossible to know whether the other party has received it, so the reliability is poor. In order to solve this problem and improve network reliability, the TCP protocol was born. Transmission Control Protocol is a connection-oriented, reliable, byte stream-based communication protocol. In simple terms, TCP is a UDP protocol with an acknowledgement mechanism. Every time a data packet is sent, an acknowledgement is required. If a data packet is lost, the confirmation cannot be received, and the sender must resend the data packet.

In order to ensure the reliability of transmission, the TCP protocol establishes a confirmation mechanism for three conversations on the basis of UDP , that is, before sending and receiving data formally, a reliable connection must be established with the other party. Since the establishment process is more complicated, we will make an image description here:

Host A: I want to send data to you, okay?

Host B: Yes, when will you post it?

Host A: I will send it right away, and you will follow!

After three conversations, host A will send official data to host B, and UDP is a non-connection-oriented protocol. It does not establish a connection with the other party, but directly sends the data packet. Therefore, TCP can ensure that data packets are not lost during transmission, but good things must come at a price. Compared with UDP, TCP has a complex implementation process, consumes a lot of connection resources, and has a slow transmission speed.

Like UDP, TCP data packets are composed of two parts: header and data. The only difference is that TCP data packets have no length limit and can theoretically be infinitely long. However, in order to ensure network efficiency, the length of TCP data packets is usually not Exceeds the length of an IP packet to ensure that individual TCP packets do not have to be fragmented.

To sum up, the main job of the transport layer is to define the port, identify the identity of the application, and implement port-to-port communication. The TCP protocol can ensure the reliability of data transmission .

4. Application layer

In theory, with the support of the above three-layer protocols, data can already be transmitted from an application on one host to an application on another host, but the data transmitted at this time is a byte stream, which cannot be very good. It is recognized by the program and has poor operability. Therefore, the application layer defines various protocols to standardize the data format. Common ones are http, ftp, smtp, etc. http is a relatively common application layer protocol, which is mainly used for data communication between B/S architectures. , the message format is as follows:

In the Request Headers, Accept indicates the data format that the client expects to receive, and ContentType indicates the data format sent by the client; in the Response Headers, ContentType indicates the data format of the server response, and the format defined here is generally the same as that of the Request Headers. The format defined in Accept is consistent.

With this specification, after the server receives the request, it can correctly parse the data sent by the client. After the request is processed, it will return it in the format required by the client. After the client receives the result, it will follow the server. The returned format is parsed.

Therefore, the main job of the application layer is to define the data format and interpret the data according to the corresponding format.

5. The whole process

First, let's sort out the responsibilities of each layer of the model:

  • Link layer : Group 0 and 1, define data frames, confirm the physical address of the host, and transmit data;
  • Network layer : define the IP address, confirm the network location of the host, and perform MAC addressing through IP to route and forward external network data packets;
  • Transport layer : define the port, confirm the identity of the application on the host, and deliver the data packet to the corresponding application;
  • Application layer : Define the data format and interpret the data according to the corresponding format.

Then connect the responsibilities of each layer of the model in series, in an easy-to-understand sentence:

When you enter a URL and press the Enter key, first, the application layer protocol defines the format of the request packet; then the transport layer protocol adds the port numbers of the two parties to confirm the application program for communication between the two parties; then The network protocol adds the IP addresses of the two parties to confirm the network location of the two parties; finally, the link layer protocol adds the MAC addresses of the two parties to confirm the physical location of the two parties, and at the same time, the data is grouped to form a data frame, which is broadcasted. , sent to the other host through the transmission medium. For different network segments, the data packet is first forwarded to the gateway router, and after multiple forwarding, it is finally sent to the target host. After the target machine receives the data packet, it uses the corresponding protocol to assemble the frame data, and then parses it through the layer-by-layer protocol, and finally is parsed by the application layer protocol and handed over to the server for processing.

6. Summary

The above content is a brief introduction to the four-layer model of TCP/IP. In fact, each layer model has many protocols, and each protocol has a lot of things to do, but we must first have a clear context structure, master the The most basic role of each layer of the model, and then to enrich the details, it may be easier to understand.

 

[Reprint|Original link: http://www.cnblogs.com/onepixel/p/7092302.html]

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324845909&siteId=291194637