[Linux] Data Link Layer: Ethernet Protocol

Restraint is not equal to oppression, and calmness and rationality are not equal to indifference and numbness.

insert image description here


1. Ethernet frames and LAN forwarding packets

1. The principle of LAN forwarding (based on Ethernet protocol)

1.
(1) IP provides the ability to send data packets across the network. This ability is actually realized by subnetting + destination ip + querying the routing table of the node, but in fact the data packets must first be able to be sent within the LAN. Forwarding to the destination host, only with this capability, the data packet can cross LANs one by one, and finally send the data packet to the destination host.
Therefore, the essence of cross-network transmission is the result of data packet forwarding across countless LANs. From understanding the entire process of data packet forwarding in the network, we only need to understand the forwarding of LAN data packets.
(2) Now the most common local area network communication technologies are Ethernet, wireless LAN, and token ring network (these three technologies use MAC addresses in the data link layer). As early as the 1970s, IBM invented the local area network. Communication technology Token Ring Network, but later in the 1980s, LAN communication technology entered the tide of Ethernet, and most of the manufacturers that originally provided Token Ring equipment also withdrew from the market. In the current LAN type, Token Ring Network has long been in decline. Well, when we entered the era of mobile devices, in 1990, a foreign doctor led his team to invent wireless LAN technology, that is, wifi technology, which achieved the same fast and stable transmission as wired networks, and in In 1996, he applied for a wireless network technology patent in the United States.
What I am learning today is Ethernet technology.

insert image description here

2.
(1) The following is the format of the MAC frame. There are two main types of MAC frames, one is the IP message with normal payload transmitted from the upper layer, and the other is the payload of the data link layer itself, including ARP request/response or RARP request/response, this type of message will be discussed when talking about the ARP protocol. The ARP protocol and Ethernet technology are the two most important parts of the current wired LAN communication. There is a PAD field at the end of the second type of message, which is a padding field and is used to fill in the length of the message. Because the MAC frame layer stipulates that the minimum payload cannot be lower than 46 bytes, it needs to be filled with a length of 18 bytes, but Filling fields on the receiving side will be ignored directly, and PAD will not have an actual impact on communication, so we can ignore it directly.
(2) The MAC frame includes a 6-byte destination MAC address and source MAC address, and a two-byte type field, 0800 is a normal MAC frame, 0806 and 8035 represent MAC frames that encapsulate ARP and RARP request responses, and 4 The CRC checksum of the byte, this CRC checksum has been seen in the transport layer, network layer, and data link layer, and it is an old friend. The composition of the MAC frame is still very simple, the most important fields are the type and source MAC address and destination MAC address.
(3) Talking about the two issues we have always been inseparable from the protocol, how to separate the header and payload? How to divide it?
The MAC protocol uses a fixed-length header to separate the header from the payload. When reading, you can first read the 14-byte header data in the forward direction, and then read the remaining 4-byte header data in the reverse direction. The header and the payload can be separated. When splitting, we rely on the type field for splitting. If it is 0800, it will be delivered to the IP protocol. If it is 0806, it will be delivered to the ARP protocol. If it is 0835, it will be delivered. to the RARP protocol

insert image description here
The following is the upward sharing process of the data link layer, network layer, and transport layer. The sharing can be reflected in the header fields of their respective protocols, such as the 16-bit type field of the Ethernet protocol, 0800, 0806, 0835, and the IP protocol. 8-bit type field, ICMP is 1, TCP is 6, UDP is 17, the 16-bit destination port number of TCP protocol or UDP protocol.
insert image description here

3.
(1) When m1 constructs a MAC frame and sends it to the LAN, every host in the LAN will receive the data frame, but m2~m7 hosts receive the data frame and read the header of the data frame Later, it was found that the destination MAC address was not their own MAC address, so after receiving the data frame, these hosts would discard the message at their own data link layer, but only the m8 host would not discard it. The destination MAC address is his own. At this time, m8 will separate the header from the payload, and then divide it upward until it is finally passed to the application layer of m8. Conversely, if m8 wants to reply a message to the m1 host, the principle is the same. It will also encapsulate a data frame with the destination MAC address as m1, and then send the data frame to the LAN. In the LAN, only m1 receives the data frame. It will be discarded, and other hosts will discard the data frame at their own data link layer. This is the principle of Ethernet technology for communication in local area networks. (Chestnut: The teacher asked Zhang San to answer the question, and the whole class heard it, but none of the other students stood up to answer the question, only Zhang San stood up and answered the question) (2) In the local area network, the network card has a mode called
promiscuous Mode, our network card will not enable this mode by default. The characteristic of this mode is that it does not give up any data frames and directly delivers them to the network layer. This is also the principle of many LAN packet capture tools

insert image description here

2. Ethernet MTU and MAC address

1.
The MAC address is the physical address of the machine. The 6-byte MAC address is unique globally, because the number of MAC addresses composed of 48 bits is 1.7 billion addresses. According to the statistics at the end of 2020, the global network access There are about 30 billion devices, so MAC addresses are not as scarce as ipv4 addresses, MAC addresses are definitely enough!
When the data packet is routed and forwarded in the network, the ip address describes the source host and the final destination host, while the MAC address describes the start and end of the next hop interval.

2.
In the local area network, if the data is particularly long, the probability of data collision will increase, just like if the soldiers on the battlefield are bigger, the target will be bigger, and the probability of being hit by bullets will be greater, and sending data When the frame is framed, the data must also have a certain value. The data should not be too short, right? Divide one sentence into three sentences? Therefore, when data is transmitted, it is not good to be too big, and it is not good to be too small. The Ethernet protocol stipulates the size range of the payload of the data frame. The maximum cannot exceed the upper limit of MTU1500 bytes, and the minimum cannot be lower than the lower limit of 46 bytes.

insert image description here
3.
(1) Not only the sending host may fragment the message, but the routers on the path may also fragment the message. For example, in the figure below, the sending host uses FDDI technology for the throughput of the first data transmission. Increase the value of MTU and set the value of MTU to 4532 bytes, but when the router transfers to the target host, the Ethernet protocol of the router stipulates that the MTU is 1500 bytes, so at this time the router will update IP at its own network layer. Packets are fragmented.
In addition to this situation, it is also possible that the host has already fragmented the packet at the IP layer when the MTU is 1500, but it is possible that the MTU of the router on the path is 500. At this time, the router may still Continue to fragment the packets that have been fragmented. (2) If we do not want the nodes in the routing path to continue fragmenting the message, we can set the 3-bit flag field in the IP header to prohibit fragmentation. If the MTU of a node on the road is too small, When you want to fragment the message, but the message is prohibited from being fragmented, the router will discard the message at this time. When the sender does not receive the response segment for a long time, the sender will retransmit the message after a timeout. At this time, the routing path of the packet will be re-planned, and a path with the maximum throughput will be found.
(3) If you don't want to find the path with the maximum throughput, but want to find the path with the fastest transmission speed, then we can reduce the size of the data, and the IP layer can find the fastest transmission rate when routing the path path's.

insert image description here

4.
In fact, we have talked about the following topics. It is nothing more than the relationship between MSS and SMSS, and the number of data segments sent by the sliding window. In order to prevent IP layer fragmentation, the TCP header option field contains mutual negotiation between the two parties. The MSS size option kind=2, and the impact on data packet loss when IP fragmentation occurs. In fact, we have talked about these in the previous IP layer and TCP layer. Here is a brief mention. If you have forgotten, you can move to the article I wrote before.

[Linux] Transport layer protocol: UDP and TCP

[Linux] Network layer protocol: IP
insert image description here

2. Data collision in LAN

1. How to solve the data collision in the LAN? (collision detection and collision avoidance algorithms)

1.
(1) In the local area network, multiple hosts must be sending data at the same time, and as long as all hosts are sending data, there will be interference between data, which is called data collision in computing. (The teacher asked Zhang San to answer the question, but all the students in the class were twittering, Zhang San couldn't hear what the teacher was saying, and other students would interfere with each other when they talked) (2) Therefore, the data
link layer stipulates that any At any time, only one host is sending messages. If there are multiple hosts sending messages at the same time, after the data is sent and collides, such data will become invalid data.
(3) How to judge that the data sent by the host has collided? The data sent by m1 will also be received by m1 itself. If the data received by m1 is inconsistent with the data sent by itself, the received data frame will definitely make an error during the CRC check. At this time, it means that the data sent by m1 Frames collided. So there is another name for LAN, which is called collision domain.

insert image description here

2.
How to ensure that in a collision domain, only one host can send data at any time?
In fact, different LAN communication technologies have different solutions.
(1) The method adopted by the Token Ring network is to exile a Token Ring data to the LAN, and only the host holding the Token Ring data can send a message. After the message is sent, the Token Ring data is thrown to the LAN In this way, it can be guaranteed that only one host is sending data at any time, and there will be no collision caused by multiple hosts sending data at the same time.
(2) When host A and host B collide when sending data, the strategy of the Ethernet protocol will be triggered at this time. The strategy is also very simple, which is to temporarily prevent host A and host B from sending data, and let them wait for each other. Wait and send again, the waiting time is random, depending on the situation, while A and B are waiting, other hosts in the LAN can send data frames, and when A and B are finished waiting, send data to the LAN again, The probability of collision will be very low.
(3) When the data collides, the strategy implemented by Ethernet is called the collision detection and collision avoidance algorithm. In fact, this strategy is very simple. The amount of data sent by the host, and don't underestimate the speed at which the photoelectric signal (binary data) propagates in the LAN. The LAN is not so large, and the number of hosts it accommodates is not very large, so for the fast photoelectric signal, collisions occur. The probability of the event itself is low, so a strategy that sounds unreliable, such as waiting for a while before posting, is actually very effective.
In fact, as early as 1980, when Ethernet technology was launched, many people were not optimistic about the technical standards of Ethernet. Not only did we think that the algorithm of collision detection and avoidance was too random, but people at that time also thought so, and at that time Some relevant professional institutions have also drawn conclusions. In theory, the efficiency and reliability of token ring collision avoidance should be higher, but when Ethernet is actually adopted in practice, it is known that Ethernet is very fragrant. , and then Ethernet was rapidly promoted in a large area, while the token ring network gradually became uninterested.

3.
Since data collisions may occur when multiple hosts in the LAN send data, does that mean that if I have a host that continuously sends garbage data to the LAN, this host does not perform collision detection and avoidance? Algorithm, then other hosts in this LAN will not be able to send data all the time? Does it mean that this host has hacked the LAN?
This is indeed the case. There are such tools on the Internet, but I don’t know what this tool is. If you have it, you can try to do it in your home LAN to see if you can hack your home LAN. .

2. How to look at the local area network again? (system perspective)

We said above that there can only be one host in the LAN sending data to the LAN at any time, so can we regard the LAN as a critical resource? And collision detection and avoidance algorithms can protect critical resources and ensure that only one host can access critical resources. Isn't this equivalent to a mutex or a condition variable? And isn't the so-called token ring data equivalent to a mutex? Whoever holds the lock can access critical resources!
Therefore, the system and the network are inseparable. The two may be separated at the code level, but in terms of design concepts and design ideas of certain strategies, the two must overlap!

3. The LAN is very large, how to reduce the collision probability of data (the switch divides the collision domain + hardware forwarding)

1.
The LAN must not be very large, because if it is large, the number of hosts will increase and the probability of collision will increase.
The wifi of the mobile phone is actually a wireless Ethernet. In schools such as playgrounds, everyone goes to participate in activities. At this time, everyone is connected to the campus wifi. At this time, everyone will be in the same local area network. , the probability of collision will become very high, and once a collision occurs, the collision detection and avoidance algorithm will be executed, and the mobile phone or other network-connected devices will wait for a while, and it is intuitively reflected that we feel that the network speed is very slow, I feel that the network is too stuck.
Even if the mobile phone uses data, it will actually be very slow, because there may be only one base station around your environment. For example, when you are in a big class, many students use China Unicom data cards. You sit in a room In the large classroom, when you access the Internet, the data of you and other students will actually be forwarded to the nearby Unicom base station, and the power of the base station must have an upper limit. You and your classmates are sending data to the base station. At this time Data may collide on the wireless channel, so your mobile phone will be stuck when surfing the Internet. In addition, it may be because the load of the base station is already relatively high, and the data request cannot be processed in time, which will lead to increased network delay. , Intuitively, our mobile phone will feel very stuck.

The following is a picture of a base station near my home. I don’t know which operator it belongs to. I will take a look someday.
insert image description here
2.
If the LAN is very large, such as in a school, I still want to improve network transmission efficiency and reduce the probability of data collisions. Is there any other way? Yes, just introduce a switch
(1) The switch will reduce the probability of data collision by dividing the collision domain. At any time, only one device in each collision domain can send data, reducing the number of devices in the collision domain to reduce Probability of collision when sending data.
(2) The switch has the ability of hardware forwarding, which can directly forward the data to the destination device without broadcasting the data to the entire LAN. This point-to-point method can reduce the propagation range of data packets in the network and reduce the collision probability. For example
, When the hosts on the left are communicating with each other, the switch will not forward the messages on the left to the collision domain on the right. If data collision occurs when the hosts on the left are communicating, the switch will not forward the collision data to the right Side collision domain to avoid further propagation of collision data. The situation on the right side is the same as the left side, and when the left side wants to communicate with the right host, the switch can directly forward the data to the right destination device.

insert image description here

3. ARP protocol

1. The process of ARP converting a known ip address to an unknown MAC address

1.
When we talked about Ethernet communication before, we said that we should route data packets to the next hop in the LAN, in this way to transmit data packets across multiple networks, but want to send data packets to The next hop position must be the data frame transmitted on the network cable, and if you want to transmit it in the form of data frame, the network layer must deliver it downwards, and then encapsulate the frame header, which contains the source MAC address and destination MAC In the address field, the source MAC address is easy to solve, but how does the sender know the destination MAC address of the next-hop node? No one told the data link layer destination MAC address is how many ah? At that time, we were from the perspective of God, saying that the data packet was sent to the next-hop host, but in actual communication, the MAC header must be encapsulated, and the destination MAC address is unknown, so how to encapsulate the MAC header? In fact, a key role is missing here, the ARP protocol. Although we don't know the destination MAC address, we know the IP address of the next hop! What the ARP protocol does is to convert the IP address into a MAC address. After knowing the MAC address, the IP message can be delivered downward, and the MAC frame header is encapsulated at the MAC layer, and then through the Ethernet interface Iface, the The data frame is sent to the destination host.
We can understand the ARP protocol as being above the MAC layer in the data link layer. If the sender does not know the MAC address of the next-hop node, it can first obtain the MAC address of the next-hop node through the ARP protocol, and then Delivered to the MAC layer, encapsulate the MAC frame header, and finally send it to the LAN.

2.
The following is the MAC frame format for encapsulating ARP request/response. The hardware type refers to the network type of the link layer, 1 is Ethernet, and the protocol type indicates the address type to be converted. The value is 0x800, indicating the ip address, hardware The value of the address length is 6, indicating the length of the Ethernet MAC address, and the value of the protocol address length is 4, indicating the length of the ip address. The values ​​of these four fields are fixed and have no value.
If the op field is 1, it means an ARP request, if the op field is 2, it means an ARP response message, the last four fields, only the destination Ethernet address, you don’t know what to fill in, when you don’t get the destination MAC address value, generally set this field For all F, the core fields are these 5 fields.

insert image description here
No matter what your LAN technology is, you need to have the ability to convert an ip address to a MAC address, because technologies such as wireless LAN, ethernet, token ring, etc. all use MAC addresses.
Because the ARP protocol is on the upper layer of the MAC frame, the Ethernet frame format can not only encapsulate IP packets, but also encapsulate ARP requests or responses.

insert image description here

3.
(1) In fact, the principle of the ARP request is very similar to the Ethernet technology. The m1 host has filled all the fields in the ARP header at the ARP layer. The most important field is the destination ip (actually the ip of the next hop node The address is not the ip address of the final target host), it is not difficult to fill in this field, you can check the routing table of the node to determine the ip address of the next hop node, after filling, it is delivered to the MAC layer, MAC After the layer encapsulates the MAC header, it sends the MAC frame to the LAN.
(2) Since the destination MAC address in the MAC header is all F, all hosts in all LANs will receive the data frame, and separate the header from the payload, and then look at the op field of the ARP message and find that it is 1 , it means that it is an ARP request, and then all hosts will compare whether their ip address is equal to the ip address in the ARP request. If they are equal, it will be found that this ARP request is sent to my host, and then this host will The remaining ARP header content will be read, thus completing the ARP request process (3) and then the receiving host will construct an ARP response and send it back to the source host. At this time, constructing an ARP response is simple, because when reading an ARP request, the receiving host You already know the MAC address and ip address of the sender, so to build an ARP response, you only need to fill your own ip address and MAC address into the ARP response message, and fill the op field with 2 to indicate the ARP response message, and then put Fill in other fixed fields, and finally encapsulate a layer of MAC frame header, and then send it to the LAN.
When all hosts receive the message, the non-target host will discard the message at the MAC layer, because after reading the MAC header, it is found that the destination MAC address does not match its own MAC address, and only the m1 host After receiving the message, the header and payload will be split, and the ARP response will be delivered to the ARP layer. After m1 reads the content of the ARP response, it will know the MAC address of the next-hop node. At this time, it can The IP packet is encapsulated upright, and the data frame of type 0800 is sent to the LAN.

insert image description here
4.
(1) Another situation is that we only know the MAC address of the other party, but not its IP address. A router cannot send out data frames, and now he only knows the MAC address of the router. He wants to check whether the IP address he assigned to the router has changed, so at this time, another host needs to communicate with the router through the RARP protocol. The router initiates a RARP request to obtain the router's ip address.
(2) The format of RARP is the same as that of ARP, and the principle is actually similar. The destination host receives the RARP request through broadcasting, and the destination host then encapsulates its own IP address into an RARP response message and sends it back to the source host. The principle It is similar to ARP, so I won't go into details here.

insert image description here

2.ARP cache

(1) When the sending host will first determine the IP address of the next-hop node through the destination ip+routing table at the IP layer, and then send the data frame to the LAN, but if the host does not know the MAC address of the next-hop node address, an ARP request can be made to convert the known ip address of the next-hop node into an unknown MAC address, and then the sending host can encapsulate the MAC frame for data transmission.
(2) A LAN is not very large, so the router can send ARP requests to all hosts in the LAN it manages, first obtain the MAC addresses of all hosts in advance, and then build an ARP cache table, and send each host The mapping relationship between the ip address and the MAC address is saved in the ARP cache table as an entry. In this way, if a data packet arrives at the router next time, after the router determines the next hop position of the data packet at its own IP layer, it can query the ARP cache table and obtain the MAC address through the ip address mapping of the next node without performing ARP again. Request, after obtaining the MAC address, the router can encapsulate the IP message into a MAC frame and send it to the LAN. The next-hop node will receive the MAC frame and perform subsequent processing of the data packet.
(3) Since the demand for ARP is very large, in order to prevent each node from frequently performing ARP requests before sending data frames, in addition to the ARP cache table of the router, as long as you have previously communicated with some hosts in the LAN, then When sending a data packet to it next time, there is no need to make an ARP request, and the host itself will cache its MAC address for possible future use.
(4) It should be noted that this kind of cache is at the minute level and will be discarded after a while, because the ip address in the LAN changes, and DHCP will dynamically assign the ip address of the network-connected device. If the ip address changes, the original The mapping relationship between ip and MAC will change, so the ARP cache is at the minute level. When the ip address changes, the corresponding mapping relationship entry will be automatically discarded.

The arp cache in the right part of the figure below is the result of my cloud server and windows machine respectively
insert image description here

3. Man-in-the-middle ARP spoofing

1.
In fact, it is very simple to become a middleman, as long as the communication parties are constantly forced to update the arp cache, let them update the arp cache to the MAC address of the middleman, and then the middleman forwards the data packets they send to the other party, thus completing ARP Deception, so that the middleman obtains the data of the communication between the two parties.
However, we also have a corresponding solution, which is the HTTPS protocol, which encrypts the content in the data packet. Below is the link to the article I wrote before, and the details can be moved.

[Linux] Application layer protocol: HTTP and HTTPS

insert image description here

2.
If we want to play by ourselves, we don’t need to be a middleman. If you want to kill a certain host, you only need to catch the ARP request packet sent by this host to the router (you can step on the point in advance and get the router’s ip address, filter the captured packets, find out the destination ip is the request packet of the router ip when stepping on the point), and then construct an ARP response, write a random value of the destination MAC address in it, and return it to this host. At this time, the host cannot be connected to the Internet, because the data frame sent by him cannot find the router, and the destination MAC address in the data frame is wrong.

Guess you like

Origin blog.csdn.net/erridjsis/article/details/132132982