[Network Programming] Data Link Layer Protocol - Ethernet Protocol

1. Introduction

insert image description here

We know that the data transmitted by the hosts of two different LANs is not transmitted directly, but transmitted through the router "hop by hop".
The essence of cross-network transmission: it is the result of forwarding by countless local area networks (subnets).

Therefore, to understand the principle of data forwarding across networks, one must first understand how data is forwarded in a LAN .
It is the Ethernet protocol .

  • Principles of LAN Communication

Two hosts in the same LAN can communicate directly.
for example:

Communication in the LAN is like the teacher calling Zhang San's name in the classroom, everyone can hear it, but the analysis finds that it is not the shouting, so I ignore it, and only Zhang San can handle the news. In this way, the communication between the teacher and Zhang San can be regarded as a 1-to-1 direct communication.

2. Ethernet protocol

2.1 MAC frame format

The communication between two hosts in the LAN must encapsulate MAC frames.
insert image description here

  • The source address and destination address refer to the hardware address of the network card (also called MAC address), the length is 48 bits, and it is solidified when the network card leaves the factory.
  • The frame protocol type field has three values, corresponding to IP protocol, ARP protocol and RARP protocol respectively.
  • At the end of the frame is the CRC check code.
    insert image description here
    The part circled in red is the header part . The middle part is the data part, which contains the upper layer header plus payload (encapsulation of HTTP, TCP, IP).

2.2 How does the MAC frame separate the header from the payload & deliver upwards?

  • Header separated from payload

The way to separate the MAC frame is to use the fixed-length header . Directly extract the first 14 and the latter 4, and the rest is the payload.

  • Which protocol is delivered up to?

There is a 2-byte type field in the frame header of the MAC frame , so after separating the header and payload, the payload can be delivered to the corresponding upper layer protocol according to this field.

2.3 MAC address

For LAN communication, each host must have its own unique identifier , each machine must be equipped with a network card, and each network card has a serial number, which is the MAC address of the network card , which is unique worldwide (in fact , it is enough to ensure uniqueness in the LAN), and the length is 48 bits (6 bytes).

insert image description here

2.4 LAN forwarding principle (based on protocol)

insert image description here
Assuming that MAC1 is going to send data to MAC7 now, a MAC frame needs to be encapsulated first .
insert image description here
The data link layer of each host will receive this MAC frame, and then separate the header and payload, and then check the destination IP address to find that it is MAC7. If it is not found by itself, the data frame will be discarded directly . Arrived at this data frame ; if it is found to be yourself, deliver the payload upwards.

After processing, MAC7 will also give MAC1 a response.
insert image description here
The sending process is the same as above.

in conclusion:

In fact, all hosts in the LAN can receive MAC frames, but if the target MAC address is not equal to its own MAC address, it will be discarded directly at the data link layer.

Therefore, in the LAN, the network card has a promiscuous mode : it does not discard any data frames, and delivers them all upwards. This is the principle of the LAN packet capture tool.
This also shows the necessity of HTTPS data encryption.

  • data collision

Since all hosts in the Ethernet share a communication channel, data collisions may occur when multiple hosts send data at the same time .

It's similar to when I want to talk to Li Si in the classroom, but everyone is talking to each other, so Li Si can't hear me.

Solution:
Only one host is allowed to send data at the same time .

How to guarantee it?
Two methods:
1️⃣ Token Ring: Whoever holds the card can send a message, analogous to a mutex
2️⃣ Ethernet: If a collision occurs, temporarily stop sending data, and try sending again after a period of (random) time. This approach is called the host's collision detection and collision avoidance algorithm.

If we keep sending garbage data to the LAN and do not perform collision detection and collision avoidance, then all hosts on the LAN will be unable to communicate.

From this, we can regard the LAN as a critical resource , and through collision detection + collision avoidance, only one host can send messages at any time.

  • switch

If the LAN is too large, the probability of collision will increase, so there is a switch.

insert image description here

  • The switch can identify localized collisions and will not forward the collided data.
    For example, if there is a collision on the left side of the switch, it will not affect MAC3 sending messages to MAC4.

  • The switch will not forward the data sent normally .
    For example, if MAC1 sends a message to MAC5, then there is no need for the right side of the switch to receive the message, and the collision probability on the right side is reduced.

The core function of the switch is to divide the collision domain.

So when a host sends data, should it be long or short?

2.5 IP address and MAC address

The IP address describes the overall starting and ending points of the journey .
The MAC address describes the start and end of each interval on the road .

IP is a big goal, and MAC is every small goal to achieve the big goal.

Therefore, during the routing process of data, the source IP address and destination IP address can be understood as not changing, and the source MAC address and destination MAC address of the data will change after each hop.

2.6 PERSON

MTU (Maximum Transmission Unit) describes the maximum amount of data that can be sent by the underlying data frame at one time . This limitation is generated by the physical layer corresponding to different data link layers. The value corresponding to the MTU of Ethernet is generally 1500 bytes.

The range of the length of the payload specified by the MAC frame is [46 ~ 1500].

MTU can be regarded as the size limit of the package when sending express delivery. Different physical layers may have different MTUs.

insert image description here

2.6.1 The influence of MTU on IP protocol

If the number of bytes sent by the IP layer at one time exceeds the MTU, slicing is required. About slicing, I mentioned in the previous chapter [Network Programming] Network Layer Protocol - IP Protocol

Add a little here:

During the process of data forwarding by the router, the router may also perform segmentation, because the MTU of different networks is different. In this way, we can set the indivisible field in the IP protocol to 1. If the MTU is smaller, the data will be discarded directly, and the path will be re-selected after retransmission, so that a path with high throughput can be selected .

2.6.2 Impact of MTU on UDP and TCP protocols

If the data carried by UDP and TCP is too large and exceeds the MTU, the packet will be lost directly for UDP, and the data will be retransmitted for TCP.
So it can be concluded that slicing is bad.

  • How to reduce fragmentation

If the transport layer controls the amount of data delivered to IP at one time so that it is not too large , then the data does not need to be fragmented at the IP layer.
As a transmission control protocol, TCP needs to control the payload data not to exceed a certain threshold. This threshold is called MSS (Maximum Segment Length)
. The maximum payload of a MAC frame is MTU, and the maximum payload of TCP is MSS. Because TCP and Under normal circumstances, the length of the IP header is 20 bytes, so in general, MSS = MTU - 20 - 20, and the value of MTU is generally 1500 bytes, so the value of MSS is generally 1460 bytes.
Therefore, it is generally recommended that TCP control the data sent within 1460 bytes, which can reduce the possibility of data fragmentation.

This also explains why there are multiple message segments within the range of the sliding window, which cannot be sent together directly, because it is not allowed to send too large a single data segment at a time.

3. ARP protocol

3.1 The role of the ARP protocol

When two hosts A and B across different subnets communicate, the final data will be sent to router D in the local area network of host B. We know that D and B belong to the same local area network, so MAC frames must be encapsulated for communication, but in the message It only contains B's IP address, but does not know B's MAC address.

This requires a process for the router to obtain the MAC address of host B.

The role of the ARP protocol is to obtain the MAC address of the target host based on the IP address.

  • Macroscopic Workflow of ARP Protocol

Now in a classroom, the teacher comes for the first time and does not know everyone, but only knows everyone's student number, so how does the teacher know everyone's name?
You can call the student number directly, and then get feedback from the student, and then establish the mapping relationship between the student number and name .

When the router receives data to send to the target host, it will encapsulate the ARP message, broadcast the message, and find the matching target IP. After receiving the ATP, the target host will encapsulate an ARP response , which contains its own MAC address . Therefore, the router knows the MAC address of the target host, and then encapsulates the data packet into a MAC frame for transmission.

3.2 Format of ARP datagram

Before talking about the ARP datagram format, let me talk about a knowledge point:
because ARP contains IP, the ARP protocol belongs to the upper layer protocol of the MAC frame .
insert image description here
Therefore, when the MAC frame is encapsulated, there are not only IP packets, but also ARP requests/responses.

insert image description here

  • ARP packet format

insert image description here
The first three fields here are the header of the MAC frame, so the real ARP request only has the following part:

The hardware type refers to the network type of the link layer, and 1 is Ethernet.
The protocol type refers to the address type to be converted, and 0x0800 is the IP address.
The hardware address length is 6 bytes for Ethernet addresses, because MAC addresses are 48 bits.
The length of the protocol address is 4 bytes for the IP address, because the IP address is 32 bits.
If the op field is 1, it means an ARP request , and if the op field is 2, it means an ARP response .

The last four fields are used for ARP requests and responses.
If some of the following fields are not clear, such as the destination MAC address, you can fill in all F to indicate that it is not set.

3.3 Workflow of ARP protocol

The macro workflow of the ARP protocol has been discussed above, now let's fill in the fields of the following ARP message.

3.3.1 ARP request process

insert image description here
Suppose now that router A constructs an ARP request and sends it to B.

  • Build an ARP request

Fill in 1 for the hardware type, because it is Ethernet communication.
Fill in 0800 for the protocol type, because it is the conversion of the IP protocol.
The length of the hardware address and the length of the protocol address are set to 6 and 4 respectively, because the length of the MAC address is 48 bits, and the length of the IP address is 32 bits.
Fill in the op field with 1 because it is an ARP request.
The Ethernet address and the IP address of the sending end correspond to the MAC address and IP address of router A.
The destination Ethernet address and destination IP address correspond to the MAC address and IP address of host B. Because the MAC address of host B is unknown, fill in all F.

insert image description here

This message is actually encapsulated at the ARP layer:
insert image description here
the message must first be delivered to the data link layer for encapsulation before being sent to the LAN.
So now to add the header of the ethernet frame:
insert image description here

The destination MAC address is unknown, so fill in F.
For the source address, fill in the MAC address of router A.
Fill in 0806 for the type, because the frame type field in the MAC frame is set to 0806.
Finally, add the CRC check.

insert image description here
After the MAC frame is encapsulated, router A can broadcast the encapsulated MAC frame to the LAN.

Assuming that the MAC2 host receives this message, after unpacking, it finds that the target MAC is all F, which is broadcast. When it recognizes that the frame type field in the MAC frame is 0806, it knows that this is an ARP request or response data. packet, which then delivers the payload of the MAC frame up to the ARP layer .
When ARP receives the data packet, it first compares the op field to determine whether it is a request or a response . If it is found that it is 1, it is the request, and then the destination IP field is extracted, and if it is not found by itself, the data packet is directly discarded at the ARP layer .

3.3.2 ARP response process

  • Build an ARP response

OP fills in 2, which means answer.
For the target MAC, fill in Router A.
Others are the same as ARP requests.

insert image description here
In order to send to the LAN, add and encapsulate the MAC frame header:
insert image description here
After the MAC frame is encapsulated, host B can send the encapsulated MAC frame to the LAN.

All hosts will receive this MAC frame. If the destination IP in the header of the MAC frame is not their own, they will discard it directly and will not pass it to the ARP layer .
When the ARP layer of router A receives this data packet, it first sees that the op field is 2, so it is determined that this is an ARP response, and then it extracts the Ethernet address of the sending end and the IP address of the sending end . At this time, router D Get the MAC address of host B.

To sum up:
After all ARP layers receive a data packet, they will first look at the op field . If it is a request, then build a response. If it is a response, then extract the source IP and source MAC address to know the IP and MAC address of the other party. .

3.4 ARP cache table

We know that the MAC address is used for LAN communication, so we need to use the ARP protocol to obtain the MAC address through IP. Do we have to use this process every time?

No, it will cache the result. Every time an ARP request is initiated, the mapping relationship between the corresponding host IP address and MAC address will be established. Each host maintains an ARP cache table, which we can arp -aview with commands.

insert image description here
It should be noted that the entries in the cache table have an expiration time, which is generally 20 minutes. If an entry is not used again within 20 minutes, the entry will become invalid, and it needs to be re-initiated the next time it is used. ARP request to obtain the hardware address of the destination host. This is mainly because IP addresses change .

3.5 RARP protocol

RARP (Reverse Address Translation Protocol) is a protocol for obtaining IP addresses based on MAC addresses .
Sometimes we only know the MAC address but not the IP address, of course this is very rare.

If you know the MAC address in the same LAN, you can directly send a message to the host, so we can directly send a message to ask for the IP address of the other party.

3.6 ARP spoofing

Suppose there is a local area network now, and each host has an ARP cache table inside as follows:
insert image description here
But at this time, a middleman comes, which encapsulates a large number of fake ARP requests and IP4:MAC3sends them to MAC1 IP1:MAC3.

insert image description here
In this way, MAC3 becomes a man-in-the-middle, and this operation is called ARP spoofing.



Guess you like

Origin blog.csdn.net/qq_66314292/article/details/131831351