Five-layer Internet Protocol Stack (1)-bottom-up

Source: Ruan Yifeng's web log

table of Contents

I. Overview

1.1 Five-layer model

1.2 Layers and protocols

Second, the physical layer

Third, the link layer

3.1 Definition

3.2 Ethernet protocol

3.3 MAC address

3.4 Broadcast

Fourth, the network layer

4.1 The origin of the network layer

4.2 IP protocol

4.3 IP packet

4.4 ARP protocol

Five, the transport layer

5.1 The origin of the transport layer

5.2 UDP protocol

5.3 TCP protocol

Six, application layer


I. Overview

1.1 Five-layer model

The realization of the Internet is divided into several layers. Each floor has its own function, just like a building, each floor is supported by the next floor.

The user touches only the top layer and does not feel the layer below. To understand the Internet, we must start from the bottom layer and understand the functions of each layer from the bottom up.

There are different models for how to layer, some models have seven layers, and some have four layers. I think it is easier to explain the Internet divided into five layers.

As shown in the figure above, the bottom layer is called "Physical Layer", the top layer is called "Application Layer", and the middle three layers (from bottom to top) are "Link Layer" "(Link Layer), "Network Layer" (Network Layer) and "Transport Layer". The lower the layer, the closer to the hardware; the upper layer, the closer to the user.

What they are called does not really matter. Just know that the Internet is divided into several layers.

1.2 Layers and protocols

Each layer is to complete a function. In order to achieve these functions, everyone needs to abide by common rules.

The rules that everyone follows are called "protocols".

Many protocols are defined on every layer of the Internet. The general term for these protocols is called "Internet Protocol Suite". They are the core of the Internet. The functions of each layer are introduced below, mainly to introduce the main protocols of each layer.

Second, the physical layer

We start from the bottom layer.

The computer needs to be networked, what is the first thing to do? Of course, connect the computer first, using optical cables, cables, twisted pairs, radio waves, etc.

This is called the "physical layer", which is the physical means to connect computers. It mainly specifies some electrical characteristics of the network, and its role is to transmit electrical signals of 0 and 1.

Third, the link layer

3.1 Definition

Simple 0 and 1 have no meaning, and the interpretation method must be specified: how many electrical signals count as a group? What is the meaning of each signal bit?

This is the function of the "link layer", which is above the "physical layer" and determines the grouping of 0 and 1.

3.2 Ethernet protocol

In the early days, each company had its own way of grouping electrical signals. Gradually, a protocol called "Ethernet" (Ethernet) became dominant.

Ethernet stipulates that a group of electrical signals form a data packet, called a "frame". Each frame is divided into two parts: header (Head) and data (Data).

"Header" contains some description items of the data packet, such as sender, receiver, data type, etc.; "data" is the specific content of the data packet.

The length of the "header" is fixed at 18 bytes. The length of "data" is 46 bytes at the shortest and 1500 bytes at the longest. Therefore, the shortest "frame" is 64 bytes and the longest is 1518 bytes. If the data is very long, it must be divided into multiple frames for transmission.

3.3 MAC address

As mentioned above, the "header" of an Ethernet packet contains information about the sender and receiver. So, how are the sender and receiver identified?

Ethernet stipulates that all devices connected to the network must have a "network card" interface. Data packets must be transmitted from one network card to another. The address of the network card is the sending address and receiving address of the data packet, which is called the MAC address.

When each network card leaves the factory, it has a unique MAC address in the world, with a length of 48 binary digits, usually represented by 12 hexadecimal numbers.

The first 6 hexadecimal numbers are the manufacturer's serial number, and the last 6 are the serial number of the manufacturer's network card. With the MAC address, the path of the network card and data packet can be located.

3.4 Broadcast

Defining the address is only the first step, there are more steps later.

First, how does one network card know the MAC address of another network card?

The answer is that there is an ARP protocol that can solve this problem. This is left to the introduction later, here you only need to know that the Ethernet data packet must know the MAC address of the receiver before it can be sent.

Secondly, even with the MAC address, how can the system accurately deliver the data packet to the receiver?

The answer is that Ethernet uses a very "primitive" method. Instead of sending the data packet to the receiver accurately, it sends it to all computers in the network so that each computer can determine whether it is the receiver.

In the above figure, computer No. 1 sends a data packet to computer No. 2, and computers No. 3, No. 4, and No. 5 in the same subnet will all receive this packet. They read the "header" of the packet, find the receiver's MAC address, and compare it with their own MAC address. If the two are the same, they accept the packet for further processing, otherwise they discard the packet. This way of sending is called "broadcasting" (broadcasting).

With the definition of the data packet, the MAC address of the network card, and the broadcast transmission method, the "link layer" can transmit data between multiple computers.

Fourth, the network layer

4.1 The origin of the network layer

The Ethernet protocol relies on the MAC address to send data. In theory, the Shanghai network card can find the Los Angeles network card by relying solely on the MAC address, which is technically achievable.

However, this has a major disadvantage. Ethernet uses broadcast to send data packets, and all members have one "packet" by hand, which is not only inefficient, but also limited to the sub-network where the sender is located. In other words, if the two computers are not on the same subnet, the broadcast will not pass. This design is reasonable, otherwise every computer on the Internet will receive all packets, which will cause disaster.

The Internet is a huge network composed of countless sub-networks. It is like imagining that the computers in Shanghai and Los Angeles will be on the same sub-network. This is almost impossible.

Therefore, a method must be found to be able to distinguish which MAC addresses belong to the same subnet and which are not. If it is the same subnet, it will be sent by broadcast, otherwise it will be sent by "route". (The meaning of "routing" refers to how to distribute data packets to different sub-networks. This is a big topic and is not covered in this article.) Unfortunately, the MAC address itself cannot do this. It is only related to the manufacturer, not to the network.

This led to the birth of the "network layer". Its function is to introduce a new set of addresses so that we can distinguish whether different computers belong to the same subnet. This set of addresses is called "network address", or "web address" for short.

Therefore, after the emergence of the "network layer", each computer has two types of addresses, one is the MAC address and the other is the network address. There is no connection between the two types of addresses. The MAC address is bound to the network card, and the network address is assigned by the administrator. They are just randomly combined.

The network address helps us determine the subnet where the computer is located, and the MAC address sends the data packet to the target network card in the subnet. Therefore, it can be logically inferred that the network address must be processed first, and then the MAC address.

4.2 IP protocol

The protocol that specifies the network address is called the IP protocol. The address it defines is called an IP address.

Currently, the fourth version of the IP protocol, referred to as IPv4, is widely used. This version stipulates that the network address consists of 32 binary bits.

Traditionally, we use decimal numbers divided into four segments to represent the IP address, from 0.0.0.0 to 255.255.255.255.

Every computer on the Internet is assigned an IP address. This address is divided into two parts, the first part represents the network, and the second part represents the host. For example, the IP address 172.16.254.1, which is a 32-bit address, assumes that its network part is the first 24 bits (172.16.254), then the host part is the last 8 bits (the last 1). Computers on the same subnet must have the same network part of their IP address, which means that 172.16.254.2 should be on the same subnet as 172.16.254.1.

However, the problem is that we cannot judge the network part from the IP address alone. Let's take 172.16.254.1 as an example. Whether its network part is the first 24 bits, the first 16 bits, or even the first 28 bits, you can't tell from the IP address.

So, how can we judge whether two computers belong to the same subnet from the IP address? This requires another parameter "subnet mask" (subnet mask).

The so-called "subnet mask" is a parameter that represents the characteristics of the subnet. It is equivalent to the IP address in form, and is also a 32-bit binary number. Its network part is all 1 and the host part is all 0. For example, the IP address 172.16.254.1, if it is known that the network part is the first 24 digits and the host part is the last 8 digits, then the subnet mask is 11111111.11111111.11111111.00000000, which is 255.255.255.0 in decimal.

Knowing the "subnet mask", we can determine whether any two IP addresses are on the same subnet. The method is to perform an AND operation on the two IP addresses and the subnet mask (both digits are 1, and the result of the operation is 1, otherwise it is 0), and then compare whether the results are the same. If so, it means that they are in the same subnet. In the network, otherwise it is not.

For example, the subnet masks of the known IP addresses 172.16.254.1 and 172.16.254.233 are both 255.255.255.0, are they in the same subnet? Both and the subnet mask are ANDed separately, and the result is 172.16.254.0, so they are in the same subnet.

To sum up, there are two main functions of the IP protocol. One is to assign IP addresses to each computer, and the other is to determine which addresses are on the same subnet.

4.3 IP packet

The data sent according to the IP protocol is called an IP packet. It is not difficult to imagine that it must include IP address information.

But as mentioned earlier, the Ethernet packet only contains the MAC address, and there is no field for the IP address. So do I need to modify the data definition and add another field?

The answer is no, we can put the IP data packet directly into the "data" part of the Ethernet data packet, so there is no need to modify the Ethernet specifications. This is the benefit of the hierarchical structure of the Internet: changes in the upper layer do not involve the structure of the lower layer.

Specifically, IP data packets are also divided into two parts: "header" and "data".

The "header" part mainly includes information such as version, length, and IP address, while the "data" part is the specific content of the IP data packet. After it is put into the Ethernet packet, the Ethernet packet becomes the following.

The length of the "header" part of an IP data packet is 20 to 60 bytes, and the total length of the entire data packet is a maximum of 65,535 bytes. Therefore, in theory, the "data" part of an IP data packet can be up to 65,515 bytes long. As mentioned earlier, the "data" part of an Ethernet packet is only 1500 bytes long. Therefore, if the IP data packet exceeds 1500 bytes, it needs to be divided into several Ethernet data packets and sent separately.

4.4 ARP protocol

Regarding the "network layer", there is one last point to explain.

Because IP data packets are sent in Ethernet data packets, we must know two addresses at the same time, one is the other party's MAC address and the other is the other party's IP address. Normally, the other party's IP address is known (explained later), but we don't know its MAC address.

Therefore, we need a mechanism to get the MAC address from the IP address.

This can be divided into two situations. In the first case, if the two hosts are not in the same subnet, there is no way to get the MAC address of the other party. The only way is to send the data packet to the "gateway" where the two subnets connect, and let the gateway handle it.

In the second case, if two hosts are on the same subnet, then we can use the ARP protocol to get each other's MAC address. The ARP protocol also sends out a data packet (included in the Ethernet data packet), which contains the IP address of the host it wants to query. In the field of the other party's MAC address, fill in FF:FF:FF:FF:FF:FF , Which means this is a "broadcast" address. Every host in its subnet will receive this data packet, take out the IP address from it, and compare it with its own IP address. If the two are the same, they both reply and report their MAC address to the other party, otherwise the packet is discarded.

In short, with the ARP protocol, we can get the MAC address of the host in the same subnet, and can send the data packet to any host.

Five, the transport layer

5.1 The origin of the transport layer

With the MAC address and IP address, we can already establish communication between any two hosts on the Internet.

The next problem is that there are many programs on the same host that need to use the Internet. For example, you browse the web while chatting online with your friends. When a data packet is sent from the Internet, how do you know whether it represents the content of a web page or the content of an online chat?

In other words, we also need a parameter to indicate which program (process) the data packet is used for. This parameter is called "port" (port), which is actually the number of each program that uses the network card. Each data packet is sent to a specific port of the host, so different programs can get the data they need.

"Port" is an integer between 0 and 65535, exactly 16 binary bits. Ports from 0 to 1023 are occupied by the system, and users can only choose ports greater than 1023. Whether it is browsing the web or chatting online, the application will randomly select a port and then contact the corresponding port of the server.

The function of "transport layer" is to establish "port-to-port" communication. In contrast, the function of the "network layer" is to establish "host-to-host" communication. As long as the host and port are determined, we can achieve communication between programs. Therefore, the Unix system refers to the host + port as a "socket". With it, you can develop network applications.

5.2 UDP protocol

Now, we must add port information to the data packet, which requires a new protocol. The simplest implementation is called the UDP protocol, its format is almost in front of the data, plus the port number.

UDP data packet is also composed of two parts: "header" and "data".

The "header" part mainly defines the sending port and the receiving port, and the "data" part is the specific content. Then, put the entire UDP data packet into the "data" part of the IP data packet. As mentioned earlier, the IP data packet is placed in the Ethernet data packet, so the entire Ethernet data packet now becomes the following :

The UDP data packet is very simple. The "header" part has only 8 bytes in total, and the total length does not exceed 65,535 bytes, which fits into an IP data packet.

5.3 TCP protocol

The advantage of the UDP protocol is that it is relatively simple and easy to implement, but the disadvantage is that it has poor reliability. Once a data packet is sent, it is impossible to know whether the other party has received it.

In order to solve this problem and improve network reliability, the TCP protocol was born. This protocol is very complex, but it can be approximated as a UDP protocol with a confirmation mechanism, which requires confirmation every time a data packet is sent. If a data packet is missing, the confirmation cannot be received, and the sender knows that it is necessary to resend the data packet.

Therefore, the TCP protocol can ensure that data will not be lost. Its disadvantages are that the process is complicated, the realization is difficult, and it consumes more resources.

TCP data packets, like UDP data packets, are embedded in the "data" part of IP data packets. There is no limit on the length of TCP data packets, and theoretically it can be infinitely long, but in order to ensure the efficiency of the network, usually the length of TCP data packets will not exceed the length of IP data packets to ensure that a single TCP data packet does not need to be split.

Six, application layer

The application program receives the "transport layer" data, and then it must be interpreted. Since the Internet is an open architecture, data sources are diverse, and the format must be specified in advance, otherwise it will be impossible to interpret.

The role of the "application layer" is to specify the data format of the application.

For example, the TCP protocol can transfer data for various programs, such as Email, WWW, FTP, and so on. Then, there must be different protocols that specify the format of email, web pages, and FTP data, and these application protocols constitute the "application layer."

This is the highest level, directly facing the user. Its data is placed in the "data" part of the TCP packet. Therefore, the current Ethernet packet becomes the following.

So far, the entire five-layer structure of the Internet has been explained from the bottom up. This is from a system perspective, explaining how the Internet is structured. In the next article , I will, in turn, look at how this structure works from the user's point of view from top to bottom to complete a network data exchange.

Guess you like

Origin blog.csdn.net/weixin_37719279/article/details/82846226