Introduction to Internet Protocol (1)

We use the Internet every day, have you ever wondered, how does it work?

Billions of computers around the world are connected and communicate in pairs. A network card in Shanghai sends a signal, and another network card in Los Angeles actually receives it. The two actually don't know each other's physical location at all. Don't you think this is a very magical thing?

The core of the Internet is a series of protocols, collectively known as the "Internet Protocol" (Internet Protocol Suite). They make detailed regulations on how computers are connected and networked. When you understand these protocols, you understand the principles of the Internet.

Below are my study notes. Because these protocols are so complex and huge, I wanted to put together a concise framework to help me grasp them in general. In order to keep it simple and easy to understand, I have made a lot of simplifications. Some places are not comprehensive and precise, but it should be able to explain the principle of the Internet clearly.

=================================================

Getting Started with Internet Protocols

I. Overview

1.1 Five-layer model

The realization of the Internet is divided into several layers. Each floor has its own function, and like a building, each floor is supported by the next floor.

What the user touches is only the top layer, and the layers below are not felt at all. To understand the Internet, you must start from the bottom layer and understand the function of each layer from the bottom up.

There are different models for how to layer, some models are divided into seven layers, and some are divided into four layers. I think it's easier to explain by dividing the Internet into five layers.

Enter image description

As shown in the figure above, the bottom layer is called the "Physical Layer", the top layer is called the "Application Layer", and the three layers in the middle (from bottom to top) are the "Link Layer" respectively. "(Link Layer), "Network Layer" and "Transport Layer". The lower the layer, the closer to the hardware; the higher the layer, the closer to the user.

It doesn't really matter what they're called. Just know that the Internet is divided into layers.

1.2 Layers and Protocols

Each layer serves a function. In order to achieve these functions, it is necessary for everyone to abide by common rules.

The rules that everyone follows are called "protocols".

At each layer of the Internet, many protocols are defined. The collective name for these protocols is called "Internet Protocol" (Internet Protocol Suite). They are the core of the Internet. The functions of each layer are introduced below, mainly to introduce the main protocols of each layer.

Second, the entity layer

We start with the bottom layer.

What is the first thing to do when the computer needs to be networked? Of course, first connect the computer, you can use optical cable, cable, twisted pair, radio waves and so on.

Enter image description

This is called the "physical layer," and it's the physical means of connecting the computers together. It mainly specifies some electrical characteristics of the network, and is responsible for transmitting the electrical signals of 0 and 1.

Third, the link layer

3.1 Definitions

Simple 0s and 1s have no meaning, and the interpretation method must be specified: how many electrical signals count as a group? What does each signal bit mean?

That's the function of the "link layer", which is above the "entity layer" and determines how the 0s and 1s are grouped.

3.2 Ethernet Protocol

In the early days, each company had its own way of grouping electrical signals. Gradually, a protocol called "Ethernet" took over.

Ethernet specifies that a group of electrical signals form a data packet, called a "frame". Each frame is divided into two parts: header (Head) and data (Data).

Enter image description

"Header" contains some description items of the data packet, such as sender, receiver, data type, etc.; "Data" is the specific content of the data packet.

The length of the "header", fixed at 18 bytes. The length of "data", the shortest is 46 bytes and the longest is 1500 bytes. Therefore, the entire "frame" is as short as 64 bytes and as long as 1518 bytes. If the data is very long, it must be split into multiple frames for transmission.

3.3 MAC address

As mentioned above, the "header" of an Ethernet packet contains information about the sender and receiver. So, how are senders and receivers identified?

Ethernet stipulates that all devices connected to the network must have a "network card" interface. Packets must be sent from one network card to another. The address of the network card is the sending address and receiving address of the data packet, which is called the MAC address.

When each network card leaves the factory, it has a unique MAC address in the world. The length is 48 binary digits, usually represented by 12 hexadecimal digits.

Enter image description

The first six hexadecimal numbers are the manufacturer number, and the last six are the serial number of the manufacturer's network card. With the MAC address, you can locate the network card and the path of the data packet.

3.4 Broadcast

Defining the address is only the first step, there are more steps to follow.

First of all, how does one NIC know the MAC address of another NIC?

The answer is that there is an ARP protocol that can solve this problem. This is left for later introduction. All you need to know here is that Ethernet packets must know the MAC address of the receiver before they can be sent.

Second, even with the MAC address, how can the system accurately deliver the packet to the receiver?

The answer is that Ethernet adopts a very "original" method. It does not send the data packet to the receiver accurately, but sends it to all the computers in the network, so that each computer can judge by itself whether it is the receiver.

Enter image description

In the above figure, computer No. 1 sends a data packet to computer No. 2, and computers No. 3, 4, and 5 in the same subnet will all receive this packet. They read the "header" of the packet, find the recipient's MAC address, and compare it with their own MAC address. If the two are the same, they accept the packet for further processing, otherwise discard the packet. This way of sending is called "broadcasting".

With the definition of the data packet, the MAC address of the network card, and the sending method of the broadcast, the "link layer" can transmit data between multiple computers.

Fourth, the network layer

4.1 The origin of the network layer

Ethernet protocol that relies on MAC addresses to send data. In theory, just relying on the MAC address, the network card in Shanghai can find the network card in Los Angeles, which is technically possible.

However, doing so has a major disadvantage. Ethernet uses broadcast mode to send data packets, and all members have a "packet", which is not only inefficient, but also limited to the sub-network where the sender is located. That is to say, if the two computers are not on the same subnet, the broadcast cannot be transmitted. This design is reasonable, otherwise every computer on the Internet would receive all the packets, which would cause disaster.

The Internet is a giant network composed of countless sub-networks. It is almost impossible to imagine that computers in Shanghai and Los Angeles will be in the same sub-network.

Enter image description

Therefore, a way must be found to be able to distinguish which MAC addresses belong to the same subnet and which do not. If it is the same subnet, it will be sent by broadcasting, otherwise, it will be sent by "routing". (By "routing" I mean how to distribute packets to different subnets, which is a big topic that this article doesn't cover.) Unfortunately, MAC addresses alone can't do this. It is only related to the manufacturer, not related to the network in which it is located.

This led to the birth of the "network layer". Its role is to introduce a new set of addresses that allow us to distinguish whether different computers belong to the same subnet. This set of addresses is called "network address", or "website" for short.

Therefore, after the emergence of the "network layer", each computer has two kinds of addresses, one is the MAC address and the other is the network address. There is no connection between the two addresses, the MAC address is bound to the network card, and the network address is assigned by the administrator, they are just randomly combined.

The network address helps us determine which subnet the computer is on, and the MAC address sends the packet to the target NIC in that subnet. Therefore, it can be logically deduced that the network address must be processed first, and then the MAC address.

4.2 IP Protocol

The protocol that specifies the network address is called the IP protocol. The address it defines is called an IP address.

Currently, the fourth version of the IP protocol, or IPv4 for short, is widely used. This version specifies that network addresses consist of 32 binary bits.

Enter image description

Conventionally, we represent IP addresses as decimal numbers divided into four segments, from 0.0.0.0 all the way up to 255.255.255.255.

Every computer on the Internet is assigned an IP address. This address is divided into two parts, the first part represents the network and the latter part represents the host. For example, the IP address 172.16.254.1, which is a 32-bit address, assumes that its network part is the first 24 bits (172.16.254), then the host part is the last 8 bits (the last 1). For computers on the same subnet, the network part of their IP addresses must be the same, that is, 172.16.254.2 should be on the same subnet as 172.16.254.1.

However, the problem is that we cannot tell the network part from the IP address alone. Taking 172.16.254.1 as an example, whether its network part is the first 24 bits, or the first 16 bits, or even the first 28 bits, it cannot be seen from the IP address.

So, how can we judge whether two computers belong to the same subnet from the IP address? This uses another parameter "subnet mask" (subnet mask).

The so-called "subnet mask" is a parameter that represents the characteristics of the subnet. It is formally equivalent to an IP address and is also a 32-bit binary number with all 1s for the network part and all 0s for the host part. For example, for the IP address 172.16.254.1, if it is known that the network part is the first 24 bits and the host part is the last 8 bits, then the subnet mask is 11111111.11111111.11111111.00000000, which is 255.255.255.0 in decimal.

Knowing the "subnet mask", we can judge whether any two IP addresses are in the same subnet. The method is to perform AND operation on the two IP addresses and the subnet mask respectively (both digits are 1, the operation result is 1, otherwise it is 0), and then compare whether the results are the same, if so, it means that they are in the same subnet. network, or not.

For example, it is known that the subnet masks of IP addresses 172.16.254.1 and 172.16.254.233 are both 255.255.255.0. Are they on the same subnet? Both are ANDed with the subnet mask, and the results are both 172.16.254.0, so they are on the same subnet.

To sum up, the IP protocol has two main functions, one is to assign an IP address to each computer, and the other is to determine which addresses are in the same subnet.

4.3 IP packets

The data sent according to the IP protocol is called an IP packet. It is not difficult to imagine that it must include IP address information.

But as mentioned earlier, Ethernet packets only contain MAC addresses, and there is no field for IP addresses. So do I need to modify the data definition and add another field?

The answer is no, we can put the IP packet directly into the "data" part of the Ethernet packet, so there is no need to modify the Ethernet specification at all. That's the beauty of the Internet's layered structure: changes at the top don't involve the structure at the bottom at all.

Specifically, IP packets are also divided into two parts, "header" and "data".

Enter image description

The "header" part mainly includes information such as version, length, and IP address, and the "data" part is the specific content of the IP data packet. After it is put into the Ethernet packet, the Ethernet packet becomes the following.

Enter image description

The length of the "header" portion of an IP packet is 20 to 60 bytes, and the total length of the entire packet is a maximum of 65,535 bytes. So, in theory, the "data" portion of an IP packet can be up to 65,515 bytes long. As mentioned earlier, the "data" part of an Ethernet packet is only 1500 bytes long. Therefore, if an IP packet exceeds 1500 bytes, it needs to be split into several Ethernet packets and sent separately.

4.4 ARP protocol

There is one last point to note about the "network layer".

Because IP packets are sent in Ethernet packets, we must know two addresses at the same time, one is the MAC address of the other party, and the other is the IP address of the other party. Usually, the IP address of the other party is known (explained later), but we do not know its MAC address.

So, we need a mechanism to be able to get the MAC address from the IP address.

There are two cases here. In the first case, if the two hosts are not in the same subnet, there is actually no way to get the MAC address of each other, and the data packet can only be sent to the "gateway" at the connection between the two subnets for the gateway to process.

In the second case, if two hosts are in the same subnet, then we can use the ARP protocol to get the MAC address of each other. The ARP protocol also sends out a data packet (included in the Ethernet data packet), which contains the IP address of the host it wants to query. In the column of the other party's MAC address, fill in FF:FF:FF:FF:FF:FF , indicating that this is a "broadcast" address. Each host in the subnet where it is located will receive this data packet, extract the IP address from it, and compare it with its own IP address. If the two are the same, both make a reply and report their MAC address to the other party, otherwise the packet is discarded.

In short, with the ARP protocol, we can get the MAC address of the host in the same subnet, and we can send the data packet to any host.

5. Transport layer

5.1 The origin of the transport layer

With the MAC address and IP address, we can already establish communication between any two hosts on the Internet.

The next problem is that there are many programs on the same host that need to use the network, for example, you are browsing the web while chatting with friends online. When a packet comes from the Internet, how do you know whether it represents the content of a web page, or the content of an online chat?

That is to say, we also need a parameter that indicates which program (process) this packet is used for. This parameter is called "port" (port), which is actually the number of each program that uses the network card. Each data packet is sent to a specific port on the host, so different programs can get the data they need.

"Port" is an integer between 0 and 65535, exactly 16 bits. Ports from 0 to 1023 are occupied by the system, and users can only select ports larger than 1023. Whether browsing the web or chatting online, the application picks a random port and then contacts the corresponding port on the server.

The function of the "transport layer" is to establish "port-to-port" communication. In contrast, the function of the "network layer" is to establish "host-to-host" communication. As long as the host and port are determined, we can communicate between programs. Therefore, the Unix system calls the host + port a "socket" (socket). With it, web application development is ready.

5.2 UDP protocol

Now, we have to add port information to the packet, which requires a new protocol. The simplest implementation is called the UDP protocol, and its format is almost just in front of the data, plus the port number.

UDP packets are also composed of "header" and "data".

Enter image description

The "header" part mainly defines the sending port and the receiving port, and the "data" part is the specific content. Then, put the entire UDP packet into the "data" part of the IP packet, and as mentioned earlier, the IP packet is placed in the Ethernet packet, so the entire Ethernet packet now becomes the following :

Enter image description

UDP packets are very simple, the "header" part is only 8 bytes in total, and the total length does not exceed 65,535 bytes, which fits into an IP packet.

5.3 TCP protocol

The advantage of the UDP protocol is that it is relatively simple and easy to implement, but the disadvantage is that the reliability is poor. Once the data packet is sent, it is impossible to know whether the other party has received it.

In order to solve this problem and improve network reliability, the TCP protocol was born. This protocol is very complex, but it can be approximated that it is a UDP protocol with an acknowledgment mechanism, which requires an acknowledgment for every data packet sent. If a packet is lost, no acknowledgment is received, and the sender knows that it is necessary to resend the packet.

Therefore, the TCP protocol can ensure that data is not lost. Its disadvantage is that the process is complicated, the realization is difficult, and it consumes more resources.

TCP packets, like UDP packets, are embedded in the "data" part of IP packets. There is no length limit for TCP data packets, which can theoretically be infinitely long. However, in order to ensure the efficiency of the network, the length of TCP data packets usually does not exceed the length of IP data packets, so as to ensure that a single TCP data packet does not need to be divided.

6. Application layer

The application receives the data from the "transport layer" and then interprets it. Since the Internet is an open architecture and data sources come from a variety of sources, the format must be specified in advance, otherwise it is impossible to interpret it at all.

The role of the "application layer" is to specify the data format of the application.

For example, the TCP protocol can transfer data for various programs, such as Email, WWW, FTP, and so on. Then, there must be different protocols specifying the format of e-mail, web pages, FTP data, and these application protocols constitute the "application layer".

This is the highest layer, directly facing the user. Its data is placed in the "data" part of the TCP packet. Therefore, the current Ethernet packet is as follows.

Enter image description

So far, the five-layer structure of the entire Internet has been explained from bottom to top. This is from a systems perspective, explaining how the Internet is structured. In the next article, I will, in turn, take a top-down look at how this structure works from the user's point of view to complete a network data exchange.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324465501&siteId=291194637