[Linux] Network layer protocol: IP

We must accept criticism, because it can help us get out of the illusion of narcissism, so that we will not be intoxicated morally and intellectually for a long time, and go to destruction in narcissism. In fact, we are far more hypocritical and dark than we imagine.

insert image description here


1. The relationship between IP and TCP (providing policies and providing capabilities)

1.
We have learned TCP before. At that time, when we were explaining, we directly said that host A sent data segments to host B from the perspective of God, but did host A really send data segments directly to the other party? In fact, it is not. The data segments of the transport layer are delivered down to the network layer. So what role does TCP play in the process of data network transmission? What is the core work of the network layer? How should I understand these two layers of protocol stack?

2.
The network layer can provide the ability to send a data packet from host A to host B across the network, but is it possible to do it? In the world we live in, nothing is 100% likely to happen, because everything is probable. For example, Zhang San, the math bully in your class, has the ability to score 150 in every math test, but this Does it mean that he must be able to pass 150 in every test? This is not necessarily the case! We can only say that it has a very high probability of being able to achieve 150 in the math test, but if Zhang San's father is the principal of the school, his father requires Zhang San to get 150 in every test, if this test If he doesn't reach 150, his father will cancel the test until Zhang San gets 150. This is the strategy his father provided to Zhang San.
In the same way, we say that the network layer can provide the ability to send a data packet from host A to host B across the network, but this does not mean that the network layer must be able to do it! Is it possible that the network is congested and data packets are lost in a large area? Or the receiving capacity of the other party is too small, and the speed of sending data packets at the network layer is too fast, so that the other party has no time to receive and discards many packets? Or the other party has not returned the message for a long time, what should the sender do? So the network layer is not only faced with the requirement of only sending data packets across the network, but also some other problems that may be faced during network transmission, and who will solve these problems? In fact, it is the transmission control protocol TCP to solve

3.
So we say that TCP provides strategies for sending data packets across the network, such as timeout retransmission, confirmation response, flow control, congestion control, sliding window, piggyback response, delayed response, etc. These are data packets sent on the network , if there is an unreliable problem, how should the data packet be processed? These are all controlled by TCP.
The IP layer provides the ability to send data packets across the network. For example, the IP layer will check the routing table through the destination IP in the header to determine the next hop location of the data packet. The IP layer is only responsible for delivering the data packet to the next hop. As for any problems in the transmission process, this is a strategy provided by the TCP layer to solve it, and the IP layer does not care and will not do anything.
Therefore, we say that TCP provides a strategy for data transmission across networks, while IP provides the ability to transmit data across networks. The combination of the two will ensure that data packets are reliably sent from host A to host B across the network, which is why Many people call it the TCP/IP protocol because these two protocols can reliably ensure that data packets are sent to the target host across the network, and this is the essence of network communication.

4.
So how does the IP layer send data packets across the network? (A simple understanding)
In the path selection, the destination IP is very important. The destination IP can determine what path the data packet will be routed. The host or router in the figure below is called a node. , each node will determine the next hop position of the data packet through the destination ip (actually, it is determined by checking the routing table), and ip=target network+target host, the first half of the 32-bit destination ip indicates the target network number , the second half represents the target host number in the target network, because there are multiple hosts in a LAN, so after finding the target network segment, you need to find the target host in the network segment.

insert image description here

5.
The picture below is a photo of my router, and you can see the router's IP address and MAC address. When routing data packets, not only routers have the ability to route, but the host itself also has the ability to route data packets, and can also select routes for data packets.

insert image description here

Second, the understanding of the IP header (again understanding datagram-oriented)

The three header fields in the green part are put at the end of the article.

insert image description here

1.
When learning the protocol, the two most inseparable questions are how to separate the header from the payload? How will the payload be delivered upwards?
The length of the 4-bit header is the same as that of TCP, which represents the length of the header. The unit is 4 bytes, and the maximum 4bit is 15. Therefore, the size of the IP header is 20~60 bytes. IP, like TCP, has its own header. External options, when encapsulating the message, the option can be included or not, depending on the specific requirements. At the same time, IP also has a 16-bit total length field, so you only need to read the value of 4-bit header length × 4 bytes to get the IP header, and then use the 16-bit total length - the length of the IP header to get it Payload, through these two field values, the header and payload can be separated.
The 8-bit protocol indicates the protocol type of the upper layer of the network layer. The ICMP protocol is 0000 0001, the TCP is 0000 0110, and the UDP is 0001 0001. Through the 8-bit protocol, the payload can be distributed upwards and delivered to the upper layer protocol.

2.
Why is UDP the same as IP, with a total length of 16 bits in the header field? At the same time, the header length of UDP and IP can also be determined. UDP has a fixed length, and IP has a 4-digit header length.
In fact, the reason is very simple, because UDP is oriented to datagrams, and IP must also be a message, and IP is not oriented to byte streams, so even if it is TCP on IP, no matter how much you spend on TCP, no matter how much you are oriented to byte streams , You have to be a message when you reach the IP layer! After the IP is sent, when the peer receives it, it is necessary to determine the payload of the IP message! After it is determined, the payload can be delivered to the TCP layer, and at the TCP layer, he will not care about the payload at this time, because TCP is oriented to byte streams, how to interpret the payload, the right to speak is in the application layer .
Therefore, the distinction between UDP for datagrams and TCP for byte streams can be understood from two aspects: whether the protocol header field can get the payload size and the communication socket interface .
(1) If it is datagram-oriented, then after separating the payload from the header, the payload can be accurately handed over to the upper layer protocol, because you are a message! It is necessary to accurately deliver the payload of a complete message to the upper layer. Packets between two parties are one-to-one.
If it is byte-oriented, after separating the header and payload, you don’t need to hand over the payload to the upper layer accurately, you only need to put the payload in the receiving buffer first, as for how much payload is accumulated in the receiving buffer, so As a result, it is impossible to distinguish which payload belongs to which message. TCP does not care about this. This is not a matter for TCP to solve. The right to speak about how to interpret these payloads lies in the application layer.
(2) recv/send and recvfrom/sendto can not only see that TCP is connection-oriented and UDP is connectionless, but also that one is datagram-oriented and the other is byte stream-oriented. Each time recvfrom and sendto send and receive data packets, they need to point out the source socket address and destination socket address of the data packet respectively. This is because the packets they send are datagram-oriented. After each packet arrives at the opposite end, Both need to allow the application layer of the other party to accurately obtain the payload of a complete message from the transport layer, just like IP accurately delivers the payload of a message to its upper layer, this is because they are oriented to Datagram, a complete message, not a stream of bytes!
When recv and send send data, they communicate under the condition of a TCP connection, so send only needs to continuously stuff the data into the receiving buffer of the peer end in the form of byte stream, without caring about the upper layer Can you get the payload of a complete message, because I am oriented to byte streams, can your upper layer get the payload of a complete message, what does it matter to me? I am only responsible for the crazy transmission of data in the form of byte streams at my level! I don't care about the problem of TCP sticky packets! It's not my TCP problem, it's your application layer's job to solve it!

3.
The 16-bit header checksum is filled by the sender. After receiving the IP message, the receiver performs a CRC check algorithm on the IP header (note: only the header is checked). If the IP header is found to be damaged during transmission , the IP packet is discarded directly.
When the TCP layer at the sending end does not receive a response message for a long time, it will automatically trigger the timeout retransmission mechanism to resend one or more data segments within the sliding window. Therefore, the IP protocol does not guarantee reliability. As for unreliable problems in the process of data transmission across the network (such as packet loss, out-of-order data segments, bit flipping, and repeated data segment transmission), the IP protocol does not care. These are TCP needs to solve things, such as providing data segment sorting + deduplication, timeout retransmission...etc., while IP protocol is only responsible for sending data across the network.

4.
The 8-bit TTL (Time To Live) refers to the number of router hops that the datagram is allowed to pass before reaching the destination host.
In a complex network topology, no one can guarantee that the datagram will reach the destination host safely and without error. The datagram may be continuously forwarded in a ring (routing loop) during the routing process. In fact, such a message is It is meaningless, because he will never be able to reach the destination host, and there may be problems in path selection. For example, the datagram can choose an optimal path to reach the destination host, but it may be due to some network environment. It takes a long way to reach the destination host, and it takes a long time, so such a message is actually meaningless, because it wastes too much time, which greatly reduces the efficiency of network data transmission, correct The best way to do this is to resend the datagram when the network environment is good, and let the IP layer choose an optimal path to re-route the datagram.
Therefore, in view of the above possible problems, the router should have the ability to directly discard packets, and this ability is actually reflected in the 8-bit survival time. Once the IP packet is in the process of routing and forwarding, the number of router hops it passes through exceeds 8 bit lifetime, then the router can directly abandon the continued routing of the message and discard it directly.
The TTL value is generally set to 64 by the sender. During the forwarding process of the IP packet, each time it passes through a router, the value will be reduced by 1 until it reaches 0, and the router will automatically discard the IP packet. In today's network environment, a message passes through 8-9 routers at most during the forwarding process. It is amazing to pass through more than 10 routers. Once it passes through 64 routers, don't worry about it. It must be an IP message. There was a problem in the process of forwarding! Therefore, TTL can effectively prevent routing loops and the accumulation of a large number of invalid packets in the network.

insert image description here

5.
When we wrote the TCP socket communication code before, the port number of bind in the server code was actually handed over to the transport layer of the server host, which was used by the server to distribute the messages sent upwards to a specific process. The destination IP address specified by the client is actually handed over to the network layer of the client host, which is used by the client host to route and forward the IP message and send it across the network to the destination host.

6.
The 4-digit version is generally filled with 0100, which represents the ipv4 protocol. Since the ipv6 and ipv4 protocols are not compatible, there is no way to directly replace them, so the 4-digit version is filled with 4, which is basically useless and is a fixed usage.
The 8-bit service type TOS (Type Of Service) refers to the different service types that the IP layer can provide according to the different needs of the application layer, including a 3-bit priority field, which has been ignored now, and a 4-bit The TOS field, and the 1-bit reserved field (must be set to 0), the 4-bit TOS field, at most one of which can be set to 1, respectively indicates the minimum delay, maximum throughput, highest reliability, and minimum Fees, the application can set the TOS field according to actual needs.
For example, login programs like ssh and telnet need the service with minimum delay, while the file transfer program ftp needs the service with maximum throughput.

3. Network segment division

1. Why do you need to divide the network segment? (It is convenient to locate the target host for the Internet to divide and conquer the host)

1.
To give an example, the school is divided into many colleges, and each college is subdivided into many majors. Each major has students of different grades, and each student has his own student number. The school manages all The premise of a student is to be able to find or locate any student first. For the convenience of management, the school will add many positions, such as class monitor, president of the student union, and so on.
For example, Zhang San from the Faculty of Science found a student card. He knew that the student card did not belong to their Faculty of Science, because we know that the serial number of the student card is designed. The number of digits at the end represents the student's major number, and the number of digits at the end represents the student's number in the major, the year of enrollment and other information, so Zhang San knew at a glance that the student card was not from his college, so Zhang San handed the student card to The chairman of the student union of their school, there will be a group among the chairman of the student union of each school, so this student card was seen by the chairman of the student union of electronic information engineering, and he said that this is not the student card of our school? He took the student card to their college, and then asked the chairpersons of each major, which major this card belongs to. After the major chairperson claimed it, he yelled in the group of his major, who lost his student card? ? Hurry up and claim it, and finally Li Si got his student card.

insert image description here

2.
In the above story, there is a very important detail, which is the exclusion of the student when the student card belongs to. When the student card is handed over to the chairman of the student union of the Faculty of Science, we can immediately exclude the electronic information For all other colleges except engineering, at the same time, when the student card is handed over to the group of each professional chairman of electronic information engineering, other majors other than Li Si's major can be excluded immediately, and finally in the Li Si major's group, through a After receiving confirmation from a student, Li Si finally got his student card.
In fact, a large number of other non-target elements can be eliminated at once, mainly because of the divide and conquer idea, which has made a lot of divisions at the macro level. The Internet has divided network segments, and the country has divided many provinces. A lot of cities have been divided in China, and various districts or counties have been divided in the city, and each district or district will also be divided into streets. Why is it divided? In fact, it is to facilitate the country to quickly locate a person, so as to manage the people of the country.
The same is true for the Internet. Why do we need to divide the network segment? In fact, it is for the convenience of the Internet to quickly locate a host, because every time a host is confirmed, the troubleshooting efficiency is high, and multiple subnets can be excluded at one time, which is why IP is divided into target network and target host.

2. How to divide the subnet?

2.1 Classification method (routers build LAN)

1.
The IP address is divided into two parts, the network number and the host number. A router is an important device for building a LAN. A router must at least bridge two subnets, so the router will be in both network segment 1 and network segment 2. The network numbers of network segment 1 and network segment 2 must not be the same, otherwise, problems will occur when the router receives packets that need to be forwarded to the network segment it manages! Therefore, each network segment bridged by a router needs to be identified by a different network number. Generally, the IP of the router is the network number .1, but not all router IPs on the network segment are configured in this way, depending on the settings of the network administrator and the specific network topology. But most home routers are configured like this.

insert image description here
The following is the ip address when my laptop connects to the mobile hotspot. You can see that the default gateway address is 192.168.20.219, which is not the so-called network number. 1. It may be because the base station is different from the router. After all, the base station is so big. The method of assigning management ip addresses is definitely different from that of routers, and the address of natural gateways is not the same.
insert image description here

2.
The so-called different network segments are actually putting hosts with the same network number together, and there must be an egress router in this group of hosts, thus forming a network segment. If you want to add a new host in the network segment, the newly added host must have the same network number as other hosts, but the host number must be unique. By setting the host number and network number reasonably, you can ensure that in the interconnected network , the ip address of each host is different, even if they are the same, it does not matter, because they may also have different network numbers.
But manually managing the ip in the subnet is a troublesome thing, and there is a technology called DHCP (dynamic host configuration protocol) that can dynamically assign the ip address of the host. In a subnet, the device that manages the subnet ip address is usually It is a router, and the current mainstream routers all have their own DHCP function.

3.
The classification and division method is already an old division scheme. This kind of network segment division scheme still exists, but we do not focus on it. We mainly learn a new network segment division scheme, using the method of subnet mask To flexibly divide network segments.
The granularity of the following classification and division scheme is very coarse. For example, class A addresses rarely have 167 77216 hosts stored in a LAN, which is more than 100 w hosts, so the network segment applying for class A addresses will Rarely, such addresses are wasted a lot. You said that there are about 4.3 billion ipv4 addresses, and there are not enough people in the world. You still waste it. This subnet division scheme can be good ? In fact, the addresses with the most claims are Class B addresses. The network segments he divides can store 65,536 hosts in each network segment, so it will cause a large number of insufficient Class B addresses and waste of Class A addresses. Therefore, this This network segment division scheme is not flexible enough, and it is easy to cause the waste of ipv4 addresses.
Therefore, someone proposed a new network segment division scheme, CIDR.

insert image description here

2.2 CIDR (Introduction of subnet mask)

1.
CIDR (classless interdomain routing) classless inter-domain routing means that the addresses of network segments are not classified. Unlike the traditional classification method, CIDR adjusts the network prefix to a dynamic length, specifically expressed as 192.168.101.26/24, the front of / is the ip address, and the back of / is the number of subnet mask bits with 1, which means that the first 24 bits of the ip address are the network number. For example, if 24 is converted into a subnet mask, then It is 255.255.255.0, the front of / is the network number 192.168.101.0
In fact, the correct way to get the network number is to perform bitwise AND operation with the ip address and the subnet mask, and the result of the bitwise AND is the network number, subnet The rule of the mask is that the bits from left to right will gradually change from 1 to 0. By adjusting the number of bits from left to right that are 1, the number of bits of the network number can be dynamically adjusted, and the number of bits of the natural host number The number can also be changed, so adjusting the number of 1 bits from left to right in the network mask can not only adjust the network number, but also adjust the number of hosts in the network number, which greatly improves the ipv4 Address usage, try not to waste ipv4 addresses.

2.
Next, let's divide the ip addresses of the two subnets specifically to see the network numbers of the two subnets and the number of hosts they can accommodate.
In example 1, after the bitwise AND of the IP address and the subnet mask, the result is 140.252.20.0, which is the network number, and the last 8 bits in the subnet mask are 0, which means that the host number is the last 8 bits , so the address range of the subnet is 140.252.20.0 ~ 140.252.20.255, and there are 2^8 different host numbers, that is, 256 kinds.
Example 2, the subnet mask is written in binary form as 11111111.11111111.11111111.11110000, so the network number is the first 28 bits, the host number is the last 4 bits, and after the ip address and the subnet mask are bitwise ANDed, we get The result of the network number is 140.252.20.64, and the host number has only 4 bits to identify, so there are 16 kinds of host numbers in total, and the address range of the subnet is only 16 ipv4 addresses.

insert image description here

3.
After the bitwise AND of the ip address and the subnet mask, the host number ranges from all 0 to all 1, which is the number of ipv4 addresses of hosts that can be accommodated within this subnet. The host number is both all 0 and all 1. For special purposes, all 0s are reserved and not allocated as the host’s ip address, but only as the network number itself, and all 1s are also reserved and not allocated as the host’s ip address, as a broadcast address, to the network represented by the network number All hosts in the segment broadcast messages, and all hosts will receive them.
The local loopback address like 127.0.0.1 cannot be used in the public network, and this address is only used as the host itself. Therefore, not all 2^32 addresses will be used in the public network, and some IPs are used for special purposes.

4.
The following is the ipv4 address and subnet mask of my host under the wireless LAN technology viewed by my windows host. Through the subnet mask, you can see that the host number is 8 bits, and the network number is 24 bits. Therefore, the maximum number of hosts that can be supported theoretically in this network segment is 256, which is also the number supported by most home wireless LANs. However, not all 256 ipv4 addresses will be used by hosts in this network segment. Some addresses are for special purposes! For example, the network number and broadcast address will be reserved and will not be assigned to hosts in the LAN.
Logically speaking, my computer should not have a public network ip, how can I find the public network ip? My computer is in the local area network of my home. I will understand the private ip and public network ip below. In fact, the public network ip viewed by the LAN host is the WAN of the router connected to the public network when the local area network communicates with the public network. Port IP.

insert image description here

4. Private IP and public IP

1. Limitation on the number of IP addresses and division of private IP addresses

1.
We know that an ipv4 address is a 4-byte 32-bit integer. That is to say, the absolute upper limit of an ipv4 address is 4.29 billion. is not enough! Moreover, the ip address is not divided according to the network host, but according to the network card on the network connection host. Each network card must have at least one ip address. Generally, a network card is only equipped with one ip address. But it is also possible if you want to configure multiple ip addresses for the network card, for example, you want this network card to communicate with multiple subnets (network segments). The current mainstream notebook computers in the market will be equipped with two network cards integrated on the motherboard, one is a wired network card and the other is a wireless network card, so if it is true that each network-connected device is assigned a globally unique ip address, then absolutely It is not enough!

2.
Some people may say, don't we have CIDR technology? CIDR technology solves the problem of network segment division. It can only increase the utilization rate of ip addresses, so as not to waste some ip addresses, but it cannot increase the absolute upper limit of the number of ip addresses, so it still cannot solve the problem of insufficient ip addresses. The problem.

3.
There are about three common ways to solve and alleviate the lack of ip addresses
(1) DHCP, which dynamically allocates ip addresses and only allocates ip addresses to devices connected to the network. When the computer is not connected to the network, it must have no ip address. To a certain extent, it can also alleviate the problem of insufficient ip addresses, but the effect is indeed better than nothing
(2) In the current Internet environment, the mainstream way to really solve the problem of insufficient ip addresses is through NAT technology and the division of private ip. NAT technology can transfer private ip Convert the address to a public network ip address, and then access services on the public network. Private ip addresses can be repeated in large numbers. After the conversion, it will definitely happen that multiple LAN hosts share a public network ip to access services on the public network. , NAT technology is very important, we will talk about it in detail later, here is a brief overview.
(3) The most direct method is to use the ipv6 protocol to increase the absolute upper limit of the number of ip addresses. However, due to various reasons, my country's ipv6 technology has not been able to be extended to the world, but many domestic companies have begun to use the ipv6 protocol internally. Yes, when accessing the public network, convert the ipv6 address to an ipv4 address. Maybe one day in the future we can see ipv6 being promoted to the world, but now we should learn ipv4 first.

4.
The ip address is equivalent to a big cake. Part of the private ip is taken away, part of the public network ip is taken away, and part of the ip is not used by users, but can only be used by intermediate nodes of the network, such as routers, base stations, etc.
The LAN is not directly connected to the public network, so in theory, any IP address can be used for the LAN, but RFC1918 stipulates that only the IP addresses can be used when building a LAN. We call these IP addresses private IPs. We said before The ip address is unique, which means that the public network ip is unique, and the internal network ip can be repeated, which can solve the problem of insufficient ip addresses, because a large number of LAN hosts use repeated internal network ip addresses.
Intranet ip addresses can be divided into three categories:
(1) 10.*, the first 8 digits are used for fixed network numbers, a total of 1677,7216 addresses, and the latter 24 bits can be divided into specific IP addresses by subnet mask The number of digits in the network number and the number of the host number (this type of address is more common in the company's intranet)
(2) 172.16. The address, the following 16 bits, can be used to divide the specific network number and host number through the subnet mask (in the company intranet and school, this type of ip address is more common) (3) 192.168.
* , similarly, the first 16 bits are fixed network number usage, a total of 65536 addresses, and the latter 16 bits can be used to divide the specific network number and host number through the subnet mask (the ip in the home Generally, this type of address is more common)
The number of the second type of network address is 16 times that of the third type, because the second type of ip address has 16 types beginning with a fixed network number.
Count about 200 million private IPs and 4.1 billion public IPs

5.
The following is a screenshot of the private ip and public network ip on my cloud server. You can see that my cloud server has its own private ip10.0.8.2, and the public network ip43.142.224.5, subnet mask The code is FF FF 1111 1100 0x00 0x00, the last 10 bits are the host number, indicating that the subnet can accommodate up to 1024 ip addresses, and the first 22 bits indicate the network number 10.0.8.0. My cloud server is deployed on
this Many services, such as SSH service (22 port number), FTP service (21), SFTP service (22), TELNET (23) service, etc., SFTP service usually uses the default port 22 of SSH for connection, compared with traditional FTP, SFTP is more secure.
Cloud servers are generally configured with multiple network cards to support multiple ip addresses, such as the public ip used in the public network and the intranet ip used in the Tencent intranet

insert image description here

2. The process of sending intranet data packets to the public network

2.1 NAT technology

1.
The following is the process of sending datagrams from the intranet to the server on the public network. From this picture, we can get a lot of information and clarify many confusing concepts.
(1) The network numbers of two connected network segments cannot be the same, and the network numbers of two disconnected network segments can be the same .
Since the two network segments connected to each other will have a bridging device router, when the router receives a datagram from the outside and wants to forward it to one of the network segments it manages, if the network numbers of these network segments are all Similarly, the router cannot route and forward the datagram, and when the router builds a local area network, it will never be set up in this way.
If there is no bridging device between the two network segments, the routers they use are different, just like the two network segments on the left below, the network numbers of the two network segments are both 192.168.1.0, but the two network segments The segment is not connected, and there will be no situation where the respective routers do not know which network segment to send the datagram to, because the two routers each manage their own home LAN, and there is no intersection between them, so the two families The network number of the LAN is allowed to be the same.
(2) The home router bridges two subnets, one is the local area network in the home, and the other is the local area network built with the operator router and other routers .
A device like a router must bridge many subnets, so the router will be equipped with multiple network cards, and each network card corresponds to a different subnet interface and can establish connections with different network segments.
The router in the figure bridges two network segments, so the router should have the IP address in the network segment in the two network segments. For example, the router on the right of network segment 1 has an intranet in the home LAN. ip192.168.1.1, and the intranet ip10.1.1.2 of the local area network formed between the router and the router. The former is generally called the Lan port ip, which refers to its own ip in the local area network managed internally by the router. The latter The latter is generally called the Wan port ip, which refers to the router's own ip when it acts as a host in the LAN.
For example, the router of the operator on the right side of network segment 1 also has its own Lan port ip and Wan port ip, and manages other routers internally, so the Lan port ip is 10.1.1.1, and externally as a host in the LAN, it has its own The external public network Wan port ip122.77.241.4
(3) Different private IPs can be the same .
If different network segments are not bridged to each other, it is very normal for the same private ip to appear, as long as they can access the public network, this is definitely possible, because they are Network segments that are not connected to each other will not affect each other.
If it is a network segment that is bridged to each other, the network numbers of the two network segments cannot be repeated, and the same private ip cannot appear at the same time. For example, the network number of network segment 1 is 192.168.0.0, and the network number of network segment 2 is 192.168. 101.0, if the two network segments have the same private ip, it will cause the bridging router connecting the two network segments to have the problem of not knowing who to send the message to when forwarding the external message, for example, in the two network segments, at the same time If there is an ip address of 192.168.101.26, when the router determines the network segment that the ip address should go to, both network segment 1 and network segment 2 can go, because their respective subnet masks are FF FF 00 00 and FF FF FF 00, once the bitwise AND, both network segments should go, this must be wrong!
Therefore, in the network segments that bridge each other, the same private ip cannot appear in each! The router can control this from happening, because the router has the function of DHCP.
From the point of view of all LANs, no matter whether the LANs are bridged to each other or not, we can use the three types of private ip addresses in a large area and with high frequency. Therefore, this can solve the problem of insufficient ipv4 addresses, because A large number of private IP addresses can be reused in different network segments, so that a lot of public IP addresses can be saved for use in the public network environment. All hosts can access the same server on the public network.

insert image description here
2.
In the process of sending data packets in the LAN to the public network, the source IP address is actually replaced. The green arrow in the figure is the flow direction of the data packets. I will not talk about routing here, just directly Let's talk about the work to be done next after the datagram jumps to the next hop. As for the routing selection, I will talk about it in detail later. When the datagram is sent from the client host, it will reach the egress router of the local area network where it is located, that is, the home router in the figure, and then the source ip field of the IP layer in the datagram will be replaced by the router with its own Wan port ip10.1.1.2, when the router sends the data packet to the carrier router, the carrier router will replace the source ip field in the IP header with its own Wan port ip122.77.241.4, and finally the carrier router will The message is sent to the server host 122.77.241.4 on the public network.
In this way, the technology of constantly replacing the source ip in the datagram with the wan port ip of the router is actually NAT technology, and the problem of insufficient ip addresses can be solved, mainly because of two reasons. One is that there are a large number of private ip through LAN division Repeated use, the second is that private ip can access the server on the public network through NAT technology. If you talk about it in detail, it is not because of NAT technology that private ip can access the server on the public network, but because of the destination ip With the routing table of each node in the network, NAT technology is also equipped with NAPT technology. If there is no NAT technology, there will be no NAPT technology. Therefore, NAT actually solves the problem of how the data packet is returned from the server, not how the data packet is returned. Forward from the internal network to the public network.

insert image description here

2.2 Doubts arise

1. Can a router maintain multiple Lan port IPs or multiple Wan port IPs?

A router will bridge different subnets, and the router will be in different network segments at the same time. In these network segments, the router must have its own ip address, and these ip addresses cannot be the same. Manage multiple network segments, and may also be in a local area network formed by multiple routers at the same time, so the router may maintain multiple Lan port ips or multiple Wan port ips. When configuring the router, you only need to configure a few more network cards for the router. Implemented multiple ip addresses for routers.

2. Why does NAT technology need to continuously replace the source ip?

If the source ip is not replaced, the public network ip address of the LAN host cannot be seen. Although the source ip replacement cannot solve the problem of datagram routing to the next hop, because this problem is actually caused by the destination ip+node routing table However, the value of source ip replacement is not reflected in the routing of data packets, but to solve the problem of insufficient ipv4 addresses. It allows different LAN hosts to share a public network ipv4 address.

Multiple LAN hosts share a public network ipv4 address: this function is the greatest value of NAT, which can solve the problem of insufficient ipv4 addresses, because it allows many private ips to share the same public network ipv4 address.

Hiding the internal network topology: This function can be regarded as a side effect, just look at it, and you will definitely forget it after a long time, but the above function is the greatest value of NAT technology, you must remember it! When the internal device accesses the Internet through the NAT router, the external network can only see the public IP address of the NAT router, but cannot directly obtain the real IP address of the internal device, thus hiding the topology of the internal network.

3. Why can the host on the LAN still find the public IP?

In fact, all hosts in the local area network can find their own public network ip, just like the public network ip of the four client hosts on the left in the figure below is the wan port ip122.77.241.4/24 of the operator router, because they are When the LAN data packets at the local area network are finally forwarded to the public network, their source ip will be replaced with the wan port ip of the router connected to the public network, so the hosts in these LANs will share the same public network ip. At the same time, the same is true for the two client hosts on the right, they will also share the same public network ip, which is the Wan port ip122.77.241.5/24 of the router connected to the public network on the right

insert image description here

4. Will the hosts in the LAN share a public IP?

It's bound to happen! Because we know that the problem of insufficient ipv4 addresses can be solved, relying on the fact that the ip address of the LAN is a private ip address that can be used in large quantities and at high frequency, then the hosts corresponding to these private ip addresses must be able to access the Internet, then That is to say, these hosts must have a matching public network ip, and the public network ip is only about 4.2 billion, so many hosts in the LAN must share a public network ip.
The following is the public network ip that I found when my mobile device and laptop were connected to my home LAN. It is the result of my mobile phone using the same query of the public network ip URL. You can see that the public network ip of my mobile device is the same as that of my laptop. I also checked my sister's mobile phone and my mother's mobile phone. The network ip is the same as the public network ip of my two devices!

insert image description here

5. Can hosts in two LANs communicate directly?

The hosts in the two LANs cannot communicate directly by sending datagrams, because we have said that the destination ip must be a public network ip in most cases, and there will be a large number of duplicate private ips in the hosts in the LAN. If the datagram If the destination ip is a private ip, then the datagram actually does not know where to go, because there may be many LANs in the network where the address of the host is the destination private ip of the datagram, so we call the following two LANs There is no way for the host to communicate directly.
insert image description here

6. What is the logic of chatting with my friends on QQ? My friends and I are all in the LAN!

In fact, we wrote a socket communication code for a UDP version of the chat room before, and its logic is similar to that of QQ.
The QQ client and server we use are both written by Tencent. When we log in to QQ, the local client process actually establishes a TCP connection with Tencent’s QQ server. If you send a message to your friend at this time , in fact, you are not directly sending the message to your friend, the message must be sent to Tencent's server first, and then Tencent's server will push the message you sent to your friend, if your friend is online Status, and also established a connection with the server, then your friend will immediately receive the message forwarded by the server, if your friend is not online, the server will cache this batch of messages first, and wait for the next time your friend goes online, the server will The message you sent before will be pushed to your friends who are already online.
The push method is also very simple. As long as you log in to QQ, the server will establish a connection with you, and then there will be sockfd1 that can be used for communication between you. The server only needs to temporarily store your message in a buffer. When your After the connection between the friend and the service is also established, your friend's QQ client and server will also have sockfd2 used for communication. The server only needs to call the data in the buffer by calling the send interface and the sockfd2 parameter to push the message to your friends there.

So we say that the clients in the local area network cannot communicate directly, and the server host on the public network must indirectly realize the communication between the two of you. For example, the two hosts in the figure have established a TCP connection with the server. For communication between hosts, the server acts as a message transfer station and sends the message to your friends. So from a client host to the server, and from the server back to the client host, this path is unique, but the client hosts are not unique because their ip addresses are all private ip.
insert image description here

7.
Summary:
(1) The data packets in the internal network can be sent to the public network, relying on the destination ip and the routing table of each node in the network.
(2) NAT replaces the source ip in order to solve the problem of insufficient ipv4 addresses, so that the hosts in the LAN can share an ipv4 address (3) When the data
packet is returned from the public network server, it depends on NAPT, not only as the source The replacement of ip will also replace the port, so that the data packets can be sent from the internal network to the server on the public network, and can also be returned.

5. Routing of IP packets (destination IP + node routing table)

1.
Routing is the most important core work of the IP layer. Routing is actually the process of asking for directions hop by hop. In the process of asking for directions, where we are going is of course the most important, and the place we are going to is actually the destination IP. In most cases, the destination ip is a public network ip. If it is an internal network ip, it is likely to only communicate between internal network hosts in an internal network environment, such as a military network. In order to ensure the network security and sensitive information of the military network For security, the military network will be isolated from the Internet. The internal communication of the military network usually uses private ip addresses, that is, the intranet ip. These messages will only be forwarded inside the military network and will not be routed on the Internet.
Back to the topic, when the message is being routed, it is actually equivalent to asking the node, I want to go to the destination host, what should I do next? Each node in the network has its own network layer and maintains a routing table. After each node queries the routing table, the node will tell the data packet how to go. Generally, there are two types of query results for nodes, one It is to tell the data packet that you can go to that router A in the next step. I don’t know how to go later. You can just ask A at that time. The other is that I don’t know how you should go, but you You can ask that router B to see if he knows how to get there. These two situations actually correspond to two query results of the routing table. One is that the node does know that the next hop of the data packet should be in Well, the other is that although the node does not know it, but the node has a default gateway, the node will automatically deliver the data packet to the location of the default gateway.

insert image description here

2.
In fact, the IP layer routes data packets, and the two most important parts are the destination ip and the routing table.
Let's take a look at the general components and each entry of the routing table. It should be noted that not only routers have routing tables, but any node in the network has routing tables, because hosts and routers have their own network layers.
First of all, my own cloud server host has a routing table. This routing table is very simple, with only three entries. The first entry is actually the default gateway we mentioned earlier. Which host to send the data packet to? Gateway refers to the gateway address. Generally, the network directly connected to the local network card does not need a gateway address. There may be multiple local network cards, so there are many directly connected networks that do not need a gateway. For one field, the data packet can be directly sent to the directly connected network through the interface corresponding to the network card. The second entry is actually the network segment where my cloud server is located. The interface of the network card itself is connected to this network segment. The third entry is the link-local address, which is generally used in the local area network. It and the local loopback address are Differently, the local loopback is used for this host, and the link-local address can be used for communication between various devices in the local area network. The link-local address is generated based on the MAC address of the device, without relying on DHCP, usually automatically configured.

insert image description here

The following two network interfaces correspond to my wireless and wired network cards respectively
insert image description here

3.
Let's take a look at how a specific routing table forwards datagrams.
When the router receives a datagram to be forwarded and routed, the router will traverse its own routing entries, take the destination ip address and the Genmask in the routing entry for bitwise AND, compare the bitwise AND result with the Destination, if they are the same, start The Iface interface corresponding to this entry forwards the datagram to the Destination.
Example 1: The bitwise AND result of the destination ip192.168.56.3 and 255.255.255.0 does not match the first routing entry, then continue to traverse down to the next routing entry, and if a match is found, the datagram is sent from the eth1 interface To the target host 192.158.56.3, because the target network 192.168.56.0 is directly connected to the current host, so it does not need to be forwarded through a router, it can be sent directly to the destination host Example 2
: The destination ip202.10.1.2 can be said to be the same as the first three routes The entries do not match until the Genmask of the last routing entry is bitwise ANDed, and the result of the bitwise AND is found to be 0.0.0.0. At this time, the datagram is sent from the eth0 interface according to the default routing entry, and the destination address is 192.168 .10.1, this address is the egress router of the LAN, and the next next hop will be determined by the egress router by checking its own routing table.
The destination network number of the datagram is actually constantly changing, because the subnet mask of each LAN passed through the routing process is different, what we need to determine is when our ip address and the subnet mask After the bitwise AND, whether it is the same as the Destination of the entry, if it is the same, just send the message to the target network, so during the transmission of the datagram, the target network number will continue to change due to the different Genmasks of each LAN. The change.

insert image description here

4.
In this routing table, it can be seen that there are three Iface interfaces, two Ethernet interfaces, and one local loopback interface, so it can be seen that the host or router corresponding to this routing table is actually in two Among the two LANs, one is 192.168.10.0 and the other is 192.168.56.0. In these two different network segments, the nodes (like to refer to hosts or routers collectively as nodes) will have different private ip addresses, so we can see The node is probably equipped with two network cards, corresponding to eth0 and eth1 interfaces, directly connected to two different network segments. The Flags of the default gateway is generally UG, U means it is in use, and G means the default gateway of the gateway. The most typical feature of the default gateway is that the Genmask is all 0

insert image description here

6. Fragmentation of IP packets

1. The impact of MTU on IP protocol

1.
What is actually transmitted between the router and the router is indeed an IP packet, but within a LAN, what is actually transmitted is a MAC frame, which means that what is actually transmitted in a LAN is a data frame, that is to say, each The LAN will be transmitted in the form of data frames. After reaching the next hop in the LAN, the network layer of the next hop will determine where the next one should be. After determining, continue to use data in the next network segment Frames are used for transmission, so what actually runs on the network line is data frames, not IP packets.
The data link layer has a MAC frame protocol, and the common one is the Ethernet protocol. Ethernet has regulations that the payload of a MAC frame cannot exceed 1500 bytes of the MTU (maximum transmisson unit), and the IP message can determine the size of the transmitted data. size? No, it is TCP that controls the size of the transmitted data. TCP is oriented to byte streams. It can control when to send data and how much to send when sending. This is why there are multiple data segments in the sliding window instead of one data. segment, because the MTU will limit the payload of a single data packet to no more than 1500 bytes, so when TCP sends data segments through the sliding window, it sends multiple data segments instead of combining multiple data segments into one large data segment to send.
The TCP header option kind=2 field provides a field for both parties to negotiate the size of MSS (maximum segment size). This MSS size is generally MTU-40, which is 1460 bytes, and the subtracted 40 is actually It is the size of the TCP header and the IP header.

2.
But if TCP does not control the size of a single data segment, it needs to exceed 1460 bytes, or even send 3000 bytes of data, what should IP do? IP said that the MAC frame does not allow me to hand down the datagram to him with more than 1500 bytes, and your TCP is giving me such a large data segment now, isn't this just embarrassing me!
TCP is like the leader of the company, IP is the project manager, and the MAC layer is the programmer. The company leader said that we are going to make a chat software similar to WeChat now, and we want to compete with Tencent for jobs. Go ahead and do it, project manager, The project manager said to the programmer, the leader said, he asked you to do WeChat, you can do it quickly, the programmer said, we can’t do it, I suggest you hire another Gao Ming, the project manager is a double-sided doormat, the leader asked him to do this , As a result, the bottom-level programmers said that we couldn't do it, so does the project manager have to have his own solutions? For example, tell the bottom-level programmers, I know that our project is very difficult, and I assure you that we will divide the progress of the project into finer details. Every time you complete a progress, I will report to the higher-ups and go to the finance to receive the award. Gold, the bonus must be rich, and when the project is successfully implemented, I will pay each of the brothers a red envelope at my own expense, and then apply to the company leader to give the brothers a free 3-day vacation to relax. What do you think? When the programmer heard this, he gave money and gave false information, just picked up the keyboard and did it, let alone WeChat, and made a browser for our company again (just kidding), he must work hard for the company!

3.
The above is an example of life, but how does the real IP layer solve it? The IP layer will fragment the data segment that TCP handed over to him, and when the peer end receives it, it will reassemble the fragmented packets at its own IP layer.
Therefore, the fragmentation and assembly of IP is the behavior of the IP layer itself, which is decoupled from the upper and lower layers. The data link layer and the transport layer do not care at all. This is purely the behavior of IP itself.

2. How to fragment and assemble?

1.
In fact, to solve how to fragment and assemble, this problem can be subdivided into several small problems. After solving these small problems, this big problem will naturally be solved.
(1) How to judge that a packet is fragmented?
Among the 3-bit flags, the first field is reserved, the second bit indicates that fragmentation is prohibited, and the third bit indicates more fragmentation flags. If the third bit is set to 1, the message will be fragmented, and the fragmented report In the text, except for the last fragment, all other fragments are 1, and only the third bit of the last fragment is 0.
So when a message is received, if the more fragment flag bit of the message is 1, it means that the message is a fragment message, if the more fragment flag bit of the message is 0, but At the same time, its 13-bit fragment offset is greater than 0, which means that this message is the last fragment message in the fragment
(2) How can the fragments of the same message be identified?
16-bit identifier, all fragments of the same datagram will have the same 16-bit identifier, because these fragments originally came from a datagram, each datagram has its own 16-bit identifier field, naturally a data segment After being sharded, the value of the identification field of each shard is the same.
(3) Which shard is the first and which is the last? Has it been collected or lost?
13-bit slice offset, indicating the offset of the payload of the message in the original data segment, so we can determine the order of the fragments and whether they are all received by the size of the offset and the length of the message itself Or lost
(4) How to reassemble the fragmented message into a complete message?
The fragmented messages to be received can be sorted in ascending order according to the size of the offset first, and the offset size of the latter message should be equal to the sum of the lengths of one or more previous messages in the original data segment.
(5) How to ensure that the message I assemble is correct?
After the received fragments are combined, the header of the combined message has a 16-bit IP header checksum. In addition, the TCP header of the TCP segment after the IP header is removed and delivered upwards will also have its own The 16-bit header checksum, through which it can be judged whether the assembled message is correct.

insert image description here

3. The impact of fragmentation on TCP and UDP? (increasing the overall packet loss probability)

1.
The first thing to explain is that fragmentation is not good, especially when communicating across networks! That's why he's not in the mainstream.
In most cases, the packet segment delivered by the transport layer will not exceed MTU-40, and IP packets will not be fragmented and assembled. So in most cases, leaders are good leaders, and programmers are good programmers.

2.
A message is split into multiple fragments. If a fragment is lost during transmission, it will cause the receiver to fail to assemble the fragments, and the receiver will directly discard all the collected fragments. , and the sender’s TCP layer finds that there is no confirmation response segment for a long time, it will trigger the timeout retransmission mechanism, and will resend an entire segment during retransmission, and the segment will be sent to the network layer of the receiver. Carry out fragmentation, and re-perform the process of IP fragmentation and assembly.
So assuming that the probability of a message segment reaching the other party is 99%, if the message segment is divided into three pieces, then the overall probability of packet loss is 99%×99%×99%, which is about 97% probability, so fragmentation It will increase the probability of overall packet loss. It is generally not recommended that the transport layer send too large a segment. The size of the MSS field should be set reasonably, and the value of this field should be negotiated during the three-way handshake.

3.
Next, let's divide a specific IP message. Assume that an IP message is 3020 bytes in size, and the first fragment is 1500. At the same time, this fragment contains the first 20 characters of the original IP message. Section header field, the payload of the second fragment can only be 1480B in the payload of the original IP message, and then encapsulate a new IP header for it, so the original IP message can only be If the last 40B is left, it only needs to be fragmented again, and a new IP header is also encapsulated in front of the last 40B.

4. Divide an IP packet by yourself

Guess you like

Origin blog.csdn.net/erridjsis/article/details/132066805