Talking about the Data Link Layer Protocol
Abstract The data link layer is the second layer of the reference model, between the physical layer and the network layer. It provides services to the network layer on the basis of the services provided by the physical layer. The most basic service is derived from the network layer Incoming data is reliably transmitted to the network layer of the target node to achieve a point-to-point connection. The data link layer has two sublayers, namely the logical link control (LLC) sublayer and the medium access control (MAC) sublayer. The former mainly deals with the exchange of frames between the two stations, providing reliable frame transfer and making errors Control, flow control, etc. The latter provides media access control services, mainly to solve the problem of how to allocate the right to use the channel when the use of the shared channel in the local area network generates competition. This article mainly describes in detail the related issues of some protocols used by the data link layer, such as framing mode, error control, medium access control, MAC addressing mode, etc.
Keywords data link layer; protocol; framing; error control; medium access control; MAC addressing
1. Introduction
With the continuous development and progress of human society, the global information age has come, and people’s requirements for information are getting higher and higher. In order to transmit and process information more effectively, reliably, and safely, research on computer communication networks has become very necessary. In computer communication networks, the transmission of information starts from the encapsulation at one end and ends with the decapsulation at the other end. Connections in computer communication networks are often implemented in a layered manner. After layering, each layer works independently, and the layers are connected through interfaces. The lower layer serves the upper layer, reducing the complexity of the protocol work, and has good flexibility. The change of any layer does not affect other layers and is easy to maintain. In addition, each layer The implementation technology can be different, which reduces the complexity of implementation and facilitates standardization. There are three commonly used layering methods in computer communication networks, namely, the OSI seven-layer model and the TCP/IP (DoD) four-layer model, and the five-layer model that combines the two models. Figure 1 shows the mapping relationship of the three reference models.
Figure 1 The mapping relationship of the three reference models
Based on relevant discussions, we can discuss the network architecture more accurately. The abstract objects that make up the network architecture are called protocols. Each protocol defines two interfaces. One is a service interface defined for other objects on the same computer that want to use its communication services. This service interface defines that local objects can be The operation performed on the protocol is the service provided by itself; the second is a peer-to-peer interface defined for a peer entity on another machine. This peer-to-peer interface defines the information exchanged between peer entities to implement communication services. Format and meaning. From Figure 1, we can also find that the transport layer and the network layer are the two core layers of the reference model, but the data link layer is equally important and indispensable. The data link layer is the second layer of the reference model, between the physical layer and the network layer. It provides services to the network layer on the basis of the services provided by the physical layer. It has two sub-layers, namely logical link control The (LLC) sublayer and the medium access control (MAC) sublayer have their own functions. This article mainly describes in detail some protocols used by the data link layer to realize its functions, such as framing mode, error control, medium access control, MAC addressing mode, etc.
2. Framing
Each layer of the reference model will establish its own protocol data unit (PDU). The PDU of the data link layer is called a frame. So how is framing done? There are three main methods, which will be briefly introduced in combination with related protocols.
The first framing method is a byte-oriented protocol. It regards each frame as a byte set and confirms the position of the frame body (IP packet) in bytes. This method is divided into two structures. One is the start-stop marking method used by the BISYNC protocol and PPP protocol, and the other is the byte counting method used by the DDCMP protocol. Figure 2 shows the frame format defined by the BISYNC protocol and the DDCMP protocol. BISYNC uses specific start and end characters to indicate the start and end of the frame. The start of a frame is represented by a specific SYN (synchronization) character. The data part of the subsequent frame is contained in two special start and end characters STX (start of text) and ETX (text) End), the function of the SOH (start of header) field is consistent with STX. The problem with this is that the same byte as the ETX character may also appear in the body of the frame. At this time, the receiver will think that the frame is over in advance, leading to the appearance of an error frame. This problem can be effectively solved by using escape characters to escape the "ETX characters" in the frame. This solution is called character filling. The PPP protocol has a similar structure and processing method, except that the start and end characters are only added at the beginning and the end of the frame, and the start and end characters are represented by the same specific flag field "01111110".
Figure 2 BISYNC frame format (top) and DDCMP frame format (bottom)
The byte counting method used in the DDCMP protocol uses a specific Count field to indicate the number of bytes contained in the frame body, and the receiver uses this field to determine the frame End position. But this method has a fatal shortcoming. Once the Count field in the receiver's sequence has an error, the end of the frame cannot be detected correctly. This error is called a framing error (the start-stop marking method is also This error may occur). In addition, framing errors may also cause incorrect reception of subsequent frames, leading to error propagation, and a lot of problems.
The second framing method is the bit-oriented protocol (HDLC), which is different from the byte-oriented protocol. It does not care about the byte boundary. It treats the frame as a collection of bits, and determines the start position of the frame in the unit of bit sequence ( The sequence can cross byte boundaries). Figure 3 shows the HDLC frame format, which also uses a specific bit sequence "01111110" to indicate the start and end of the frame, but this sequence can also appear in the frame body. HDLC uses bit stuffing to solve this problem. On the sender, except when trying to send the start and end symbols, any time after sending 5 consecutive 1s, a 0 is inserted before sending the next bit. On the receiving side, if five consecutive 1s have been received, observe the next bit. If it is 0, it must be filled with 0. Remove it. If it is 1, continue to observe the next bit. If it is 0, then It is the end of frame sign, otherwise there must be an error (of course, errors may also occur when the previous few bits are judged), and the entire frame is discarded. These methods will have a greater impact when errors occur in the necessary fields to determine the position of the frame body. The third framing method is time-based framing, which is mostly used in Synchronous Optical Network (SONET). It mainly uses multiplexing, and will not be described in detail here.
Figure 3 HDLC frame format
3. Error control
Due to the existence of noise and interference , errors will occur in the frame received by the receiver. For reliable frame transmission, we can see a CRC in the frame format shown in Figure 2 and Figure 3 Field, this field is used to check transmission errors. There are two technologies with error detection and inspection functions. One is error detection coding, and the other is error correction coding. The former can only detect errors. At this time, the receiver can only discard the error frame, and the sender needs to resend the data frame, while the receiver can Correct the error according to the error correction code by yourself (the ability is limited, and it must be retransmitted if the ability is exceeded). But in fact, error correction coding has relatively large overhead and high complexity, and is often not used, such as Hamming coding. The most commonly used algorithm to check transmission errors is the Cyclic Redundancy Check (CRC) algorithm. In addition, the more commonly used simple algorithms include two-dimensional even check and checksum (generally not used in the data link layer).
The basic idea of any error checking algorithm is to add redundant bits to the frame to determine whether there is an error. These redundant bits do not transmit information and are only used for error detection. The error checking ability is determined by the Hamming distance between codes. The coding rules will not be introduced in detail here, but it needs to be clear that, firstly, because the redundant bits used can also have errors, even if the information bits are free of errors, the receiver will determine that the frame has errors. Discard it and ask the sender to retransmit the frame, which is equivalent to "nothing to find (misjudgment)". Moreover, the information bits and redundant bits may also have errors at the same time, and the final result leads the receiver to determine that the frame is error-free , Continue to accept the next frame, which is equivalent to missed judgment, but the possibility of these two cases is relatively small, especially the latter is smaller, so the system performance is still very good.
For reliable transmission, once the receiver detects an error, it must discard the error frame, and then ask the sender to retransmit. The retransmission can be controlled through two mechanisms: confirmation and timeout. Acknowledgement mechanism means that after receiving a frame, if the receiver detects that there is no error, it will directly send or piggyback an ACK signal to the sender. When the sender receives the ACK, it indicates that the frame is sent successfully. If the sender still does not receive the ACK within a specified period of time, it is determined as a timeout and the frame is retransmitted. The strategy of using acknowledgement and timeout to achieve reliable transmission is called Automatic Repeat Request (ARQ). There are also three commonly used ARQ algorithms.
The simplest kind of ARQ algorithm is the stop-wait algorithm. Its idea is very simple. After the sender sends a frame, it will continue to transmit the next frame unless the frame is confirmed within the specified time. Otherwise, the timeout will be repeated. Pass the frame. There is a problem with the acknowledgement and timeout mechanism. If the acknowledgement frame is lost during transmission, the sender will retransmit the frame after the timeout, and the receiver may think it is the next frame after receiving it, which may cause repeated problems, so the transmission Frames must be numbered during the process, which is necessary in all ARQ algorithms. In addition, the stop-wait algorithm has a significant shortcoming, that is, it only transmits one frame at a time and has to wait for confirmation, which may waste link capacity, waste time, and low efficiency.
In order to solve the problem of low efficiency, a new ARQ algorithm-the sliding window algorithm came into being. This algorithm runs the sender to send the maximum number of frames of the window size at one time, and the receiver must confirm each frame. Figure 4 shows the timelines of the stop-wait algorithm and the sliding window algorithm respectively. In this algorithm, there are two mechanisms for retransmission after timeout. One is to back off N frames, that is, to retransmit the first timeout frame and all subsequent frames. When this mechanism is used, the confirmation mechanism can be based on the cumulative confirmation principle. One is selective retransmission, which only retransmits timeout frames. At this time, care should be taken to ensure the order of frames. This kind of ARQ algorithm should pay attention to selecting a suitable window size, and reasonably number the frames to control the flow.
Figure 4 The timeline of stop-wait algorithm (top) and sliding window algorithm (bottom). The
last ARQ algorithm is a concurrent logical channel. It still uses a simple stop-wait algorithm, but it can keep the channel full and has higher efficiency. The order of frames cannot be maintained, and there is no flow control, so there are not many applications currently.
It is worth noting that there are currently many data link layer technologies that ignore the error control function, and provide reliable transmission in high-level protocols, such as the transport layer and the application layer. However, which layer provides reliable transmission depends on many factors. , The ideas mentioned in this section are not only applicable to the data link layer, but also to other layers.
Fourth, the medium access control
channel resources are generally limited, but shared by multiple users, then if the assignment is unreasonable, it will lead to a conflict by a multiple access protocol dynamically allocating channel resources, conflicts can be reduced It is possible to improve channel utilization. Multiple access protocols include random access protocols and controlled access protocols. The former stations occupy channels randomly, which may cause conflicts. The latter stations are allocated to occupy channels, which are mainly realized through channel multiplexing, and no conflicts occur. Typical random access protocols include ALOHA, CSMA, CSMA/CD, CSMA/CA, etc., and controlled access protocols include TDM, FDM, etc. This article mainly introduces random access protocol.
The ALOHA protocol is divided into a pure ALOHA protocol and a slotted ALOHA protocol. In the pure ALOHA protocol, any station can send immediately after the frame is generated (collision is possible), and through signal feedback, detect the channel to determine whether the transmission is successful. If the sending fails, it will be sent after a random delay. In this way, data is sent directly without caring about the channel state, which may cause conflicts at any time. Slotted ALOHA divides time into time slots, and the length of the time slot corresponds to the transmission time of one frame. The generation of new frames is random, but slotted ALOHA does not allow random transmission. All frames must be sent at the beginning of the time slot. Therefore, collisions only occur at the beginning of the time slot. When a collision occurs, only one time slot is wasted. If a station occupies a time slot and the transmission is successful, there will be no conflict in that time slot. At this time, the conflict will only occur at the beginning of the time slot, thus reducing the possibility of conflict, but the channel utilization rate is still not high.
The CSMA protocol refers to the carrier sense multiple access protocol, which is characterized by listening first and sending later. It includes three modes: non-continuous, 1-continuous, and p-continuous. Non-continuous means that after listening, if the medium is idle, start sending, otherwise, wait for a randomly distributed time and then listen again. This may cause a period of no data transmission, causing waste. 1-Continuous means that after listening, if the medium is idle, then send, otherwise continue to listen, once idle, send immediately, if there is a conflict, wait for a randomly distributed time to listen again, so if two or more stations wait If it happens, conflicts will inevitably occur once the medium is free. p-persistent means that on the basis of 1-persistent, if the medium is free, then send with the probability of p, and send with the probability of (1–p) delayed by one time unit. This method again reduces the possibility of collision . But it should be noted that due to the existence of propagation delay time, any method may still encounter conflicts after the data is sent.
A typical Ethernet (802.3) uses a bus-type structure and has a strong vitality. The multiple access protocol of the Ethernet is implemented using CSMA with collision detection (CD). The characteristic of CSMA/CD is to listen first, then send, and listen while sending. The difference with CSMA is that all workstations also receive their own signals while sending, and monitor the sending situation. Once the received signal is compared with the sent Inconsistent, it means that a conflict has occurred. After the sending station senses the conflict, it immediately stops sending the frame, and sends a short blockage signal (called enhanced conflict signal) to notify each station on the network that a conflict has occurred. This station and all stations on the network wait for a randomly distributed time, and then press The frame is retransmitted in CSMA/CD mode. This period of time can be determined by the binary exponential back-off algorithm. Conflict uncertainty makes the average data rate of the entire Ethernet far less than the highest data rate of the Ethernet. Also due to the existence of propagation delay data, CSMA/CD has a conflict window: the maximum conflict detection time of the network should be twice the transmission time between the two longest distance stations, in order to achieve conflict detection, the slot width should be equal to the conflict window , The time to send a valid frame should be greater than the collision window.
Figure 5 The hidden node problem (top) and the exposed node problem (bottom) are
different from Ethernet. Consider the two situations in Figure 5 in a wireless network. The two circles represent the communication ranges of A and C respectively. In the figure on the left, suppose that A and C both want to communicate with B at the same time, but A and C will not be aware of the other party. After listening, they find that B is idle, so they will send a frame to B. At this time, two frames A conflict will occur at B. This problem is called the hidden node problem. In the picture on the right, B can communicate with A and C, and C can communicate with B and D. If A is communicating with B at a certain time, C wants to communicate with D, but at this time C will hear this communication, so He will not directly communicate with D, but its communication with D will not affect the communication between A and B, causing a waste of resources. This problem is called the exposed node problem.
To solve these two problems, CSMA with collision avoidance (CA) is used in 802.11. When the CSMA/CA protocol is implemented, if it detects that the channel is free, it is not sent immediately, but waits for a period of time before sending the data, and then sends a small channel detection frame RTS first, and returns if the nearest access point is received The CTS considers that the channel is free, and then sends data. When sending data, first check whether the channel is in use. If the channel is detected to be idle, then wait for a random period of time before sending the data; if the receiving end receives the frame correctly, it will send to the sending end after a period of time. Acknowledge frame ACK; the sender receives the ACK frame, confirms that the data is transmitted correctly, and sends the data after a period of time. The CSMA/CA protocol adopts the RTS-CTS mechanism and the ACK mechanism to reduce the possibility of conflicts.
CSMA/CD mainly focuses on the detection of conflicts. When a conflict is detected, the corresponding processing is performed, and the device is required to send data while detecting. CSMA/CA mainly focuses on the avoidance of conflicts. It is also seen in the agreement that it is often to wait for a period of time before taking actions, and avoid conflicts as much as possible through backoff, and also to send some very small channel detection frames first to test whether the channel has conflicts. .
V. MAC addressing
Take Ethernet as an example. After the sender sends a frame, the frame enters the bus, and all devices connected to the bus (usually a network adapter, also called a network card) can receive the frame. Further confirm whether the frame is sent to itself, if so, pass it to the host, otherwise discard it. So at the data link layer, how does the network card confirm whether the frame is sent to itself? Mainly through the MAC address. The MAC address is a physical address and consists of 48 bits. The first 24 bits are applied to IEEE by the network card manufacturer, and the last 24 bits are uniquely allocated by the manufacturer. They are burned in the ROM when the network card is manufactured. Change and unique in the world. Figure 6 shows the frame format of the Ethernet. All frame formats are similar. The Dest addr and Src addr fields respectively indicate the destination MAC address and the source MAC address, which are used to uniquely identify the source host and the destination host.
Figure 6 Ethernet frame format
In addition, the Preamble field indicates the preamble, which is generally a 7-byte alternating sequence of 1 and 0 plus a byte of frame start character (using bit-oriented framing). The Type field indicates the protocol used by the upper layer, for example, the IP protocol is 2048. The CRC field represents a cyclic redundancy check code, which is used to detect errors. The check range does not include the synchronization code. Body represents the frame body. The frame body defined by Ethernet contains at least 46 bytes of data and at most 1500 bytes of data. Setting the minimum frame body data length is due to the existence of the conflict window in the CSMA/CD protocol. The frame must be long enough to detect the conflict. If the length is insufficient, it should be filled. Setting the maximum frame body data length is because if the data frame is too long, some work will not be able to send data for a long time, and may exceed the size of the receiving buffer, causing buffer overflow. If the data frame is too long, a grouping strategy can be adopted. For the host, the frame does not include preamble and CRC, because the sender network card adds the preamble, CRC field and postamble before sending the frame, and the receiver network card removes them after receiving it.
There are many LAN segments in the network, and different LAN segments are generally connected by bridges or switches. Bridges and switches belong to the second-layer switching equipment. They make the decision to forward frames by checking the MAC address. They don't care about the upper-layer protocol or the network layer. IPv4, IPv6, OSI packets, etc. can pass through them. The following mainly introduces the working principle of the bridge. When the bridge is connected to different types of LANs, it can perform some conversion functions, such as changing the frame format through repackaging, changing the data transmission rate through buffering, and changing the frame length through cutting and reorganization. The bridge does not care about the content of the frame, but is only responsible for the correct forwarding operation, so it is transparent. It connects multiple LANs together and receives all LAN frames connected to it. When a frame arrives at the bridge, he must make a decision to discard or forward. If it is forwarding, he also needs to know which LAN to forward to. In this way, part of the traffic flow can be filtered, reducing the chance of conflict, and improving network performance.
The decision is made by looking up the MAC address in an internal MAC address table. Initially, this list is empty. The bridge gradually improves the address table through reverse learning. From the source address of the arriving frame, he realizes that the machine corresponding to the source address is on the LAN from which the frame came, and then writes it into the MAC address table. In addition, because the topology of the network changes, whenever a record is added to the table, a time stamp must also be stamped at the same time. The source address of the newly arrived frame is already recorded in the table, and the time stamp is updated to the current time. The bridge also scans the table periodically and deletes those time-out records from the table.
When a frame arrives, the bridge looks up the address table and compares the destination MAC address. If the source LAN and the destination LAN are the same, the frame is discarded. If the source LAN and the destination LAN are different, the frame is forwarded, and if the destination LAN is unknown, it broadcasts This frame, while maintaining the MAC address table through reverse learning. However, for reliable transmission, a redundant structure is used in the network, and loops may appear. If not controlled, it will cause broadcast storms, repeated frame transmission, and instability of the MAC address library. For a loop-free network topology, the Spanning Tree Protocol is adopted. It stipulates that each network has a root bridge, each bridge has a root port, and each network segment has a designated port. Non-designated ports are not used. However, when If a point on the logical STP tree fails, non-designated ports will be re-enabled. It should be noted that the spanning tree algorithm can generate a spanning tree without a logical loop in a network with physical loops, but it does not guarantee that the path is optimal. This is the price paid by the STP algorithm. This kind of frame addressing exchange idea can also be used in the upper layer, such as IP addressing in routing exchange.
The switch can be understood as a multi-port network bridge. Its working principle is the same as that of a network bridge, and it can replace the hub device at the physical layer. Since the switch can support multiple ports, the network system can be divided into more physical network segments, so that the entire network system has a higher bandwidth. Compared with the bridge, the switch has a faster data transmission rate. When a bridge forwards a data frame, it usually needs to receive a complete frame and detect it before forwarding the frame. The switch has three frame forwarding modes: store and forward, direct forwarding, and fragment forwarding. For example, direct forwarding is as long as After confirming the forwarding destination port, it will directly receive and transmit, without waiting for the complete reception before forwarding.
6. Summary The
data link layer is the second layer of the reference model, between the physical layer and the network layer. It provides services to the network layer on the basis of the services provided by the physical layer. The most basic service is derived from The data from the network layer is reliably transmitted to the network layer of the target node to realize a point-to-point connection. The data link layer uses a variety of protocols to achieve its functions, including framing, error control, medium access control, MAC addressing, etc.
The PDU of the data link layer is called a frame. There are three commonly used framing methods, namely byte-oriented protocol, bit-oriented protocol and clock-based framing. The three framing methods are used in different scenarios. The receiver uses the error detection code to detect whether there is an error in the frame. Once an error occurs, the sender needs to retransmit. Commonly used error control mechanisms are acknowledgement and timeout. Algorithms using these two mechanisms are called ARQ algorithms. ARQ algorithms include stop-wait algorithms, sliding window algorithms, and so on. The protocols used for media access control include ALOHA, CSMA, CSMA/CD, CSMA/CA, etc. They are also used in different scenarios to solve different problems. In MAC addressing, the connection between different LANs is mainly realized through bridges or switches. After receiving a frame, the bridge or switch makes a decision to decide whether to discard, forward or broadcast the frame, and perform reverse learning at the same time. It is worth noting that the idea of the protocol used in the data link layer can be applied to other layers.
References
[1]Denenberg R. Open Systems Interconnection[J]. Library Hi Tech,1985, 3(1):15-26.
[2]Deji, ChenMark, NixonAloysius Mok. WirelessHART[M]. SpringerLink,2010 p. 19-27.
[3]Hercog D. Generalised sliding window protocol[J]. Electronics Letters, 2002, 38(18):p.67-68.
[4]Wang Yue. IEEE802.15.4 MAC layer channel access mechanism Research[D]. Jilin University, 2008.
[5
] Chen Lingjuan. Analysis of Carrier Sense/Conflict Detection (CSMA/CD)[J]. Management and Technology of Small and Medium Enterprises, 2013. [6]Asija M. MAC Address[J] . 2016.
[7] Larry L. Peterson, Bruce S. David. Computer network: a systems approach (5th edition) [M] // Computer network: a systems approach (5th edition) Edition). Machinery Industry Press, 2015.