Final review of "Computer Network" [End]

illustrate

This blog is only for personal review and reference, and the knowledge points are relatively mixed, so it can be used as a reference for reviewing knowledge points.
Search: Ctrl+F
Reference book:
"Computer Network (7th Edition)" by Xie Xiren

链接:https://pan.baidu.com/s/1WLF5CVDIlqRycrAjORxoKQ 
提取码:kuoe 

"Computer Networks - Top-Down Approach" Mechanical Industry Press

The reference links appearing in the text are supplementary information, which are used to replace the introduction of some knowledge points. Please browse the reference blog by yourself. ,

The mind map at the end comes from the teacher's PPT, which is very useful! ! ! !

The last chapter is a dictionary of term abbreviations, which is being updated.

Please point out any problems found in the article.

Chapter 1 Computer Network Overview

computer network and internet

计算机网络: It is made up of some general-purpose, programmable hardware interconnections.
互联网: It is the world's largest computer network connected by many heterogeneous networks, and it is a network of networks.
Features:
连通性: Connect user terminals to each other.
资源共享: Information sharing, software sharing, hardware sharing, etc.
TCP/IP协议have gradually become the de facto standards
for network interconnection. Formal standards for the Internet need to go through two stages: 建议标准and互联网标准

network edge

网络边缘部分由所有连接在互联网上的主机以及接入网构成。
主机包括:个人计算机、服务器、超级计算机、智能手机、智能家电、智能可穿戴装备、网络摄像头、汽车以及各种网络传感器等。
接入网指将主机连接到其边缘路由器的网络,边缘路由器指进入互联网核心部分后的第一台路由器。
接入网包括:ADSL接入、光纤同轴混合网、FTTH接入、以太网接入、WiFi接入以及蜂窝移动接入等。
不同的接入网中使用了不同的传输媒体。
传输媒体包括:双绞线、同轴电缆、光纤以及自由空间等。

接入网
ADSL接入
ADSL指非对称数字用户线
采用频分复用技术把用户线分成了电话上行数据下行数据三个相对独立的信道。用户可以边打电话边上网。

光纤同轴混合网HFC是利用有线电视网现有的基础设施,提供的一种宽带接入网。

光纤到户FTTH是FTT技术的一种。最常用的FTTH实现方案是无源光网络PON
PON分为:窄带PON、APON(ATM PON)、EERON (EthernetPON)GPON (Gigabit PON)等。

通过以太网将主机连接到边缘路由器的方法,称为以太网接入。通过无线局域网的接入,称为WiFi接入

通过蜂窝移动网络,将主机连接到互联网的方法称为蜂窝移动接入
蜂窝移动技术的发展已经经历了4代,俗称:1G、2G、3G和4G,第5代(5G)标准正在制定中。

传输媒体
insert image description here

网络核心

网络核心部分由各种网络以及连接这些网络的路由器组成。
网络核心部分为网络边缘部分的主机提供通信服务。
计算机网络采用分组交换方式,路由器是网络核心部分最重要的设备,是实现分组交换的关键构件,其作用是将收到的分组转发到另一个网络

insert image description here
在端系统数量很少时,可以采用全互连方法实现点对点通信。但全互连方式中,需要
N(N-1)/2条互连线。当增加第N+1个端系统时,必须增设N条线路。

因此,引入交换设备称为交换机交换结点
所有端系统通过用户线连接到交换结点上,由交换结点控制端系统间的连接。
insert image description here
当端系统分布的区域较广时,需设置多个交换结点,交换结点之间用中继线相连
当交换的范围更大时,需要再次引交换结点,由此形成交换网
insert image description here

实现通信必须要有3个要素,即端系统传输交换
在通信网络中有多种交换方式,电路交换分组交换报文交换是其中最典型的三种。

· The traditional telephone network adopts the circuit switching mode;
· The computer network adopts the packet switching mode;
· The early telegraph network adopts the message switching mode.

From the perspective of communication resource allocation, switching is to allocate transmission line resources in a certain way .

Circuit Switching Circuit switching is a pre-allocation system
for communication resources . The circuit switching method is a face-to-face switching method. must go through
连接

建立连接:例如打电话拨号;
数据传输:例如通话:
释放连接:例如挂断电话。

Multiple telephone signals in the trunk line can share communication line resources through channel multiplexing technology such as 频分复用, and so on. Features: : Communication resources have been pre-allocated to the two parties in the call during the connection establishment stage. During the entire call time, the two terminals in the call always occupy the end-to-end communication resources.时分复用

固定分配资源

Packet Switching
Computer network data has burst characteristics. If circuit switching is used, the utilization rate of communication resources will be extremely low.
Therefore, computer networking adopts 分组交换the method. The emergence of packet switching technology laid the foundation for the development of the Internet.
Packet switching belongs to the dynamic allocation system of communication resources .
Features:

分组;
存储转发;
逐段占用通信链路资源;虚电路或数据报。

分组A unit of data transmitted over the Internet.
The complete block of data to be sent is called a message.

insert image description here
Divide the message into smaller data segments, and add control information in front of each data segment to form 分组(packet). Also known as grouping .
The added control information is called 首部(header), also known as 包头.

store and forward

In the Internet, packet switching nodes are also called 路由器. After a router receives a packet, it

  1. temporarily store
  2. Then according to the control information in the header, find a suitable interface to forward the packet.

This way of working is called 存储转发.

Each router processes each packet one by one in a store-and-forward manner 跳(hop), and finally delivers the packet to the destination host.
The destination host restores the packet back into a message.
insert image description here
In the process of packet forwarding from H1 to H4, when the packet is transmitted on the link AF, other communication link resources are not occupied by the two parties in the current communication, that is, packet exchange occupies communication resources segment by segment .
Assume that while the host H1 is sending packets to the host H4, the host H7 is also sending packets to the host H5, and the packets sent by the two communications will pass through the link FC. The bandwidth resources on the link FC are not pre-allocated to a certain communication, but can be shared by multiple communications, and the utilization rate of its communication resources is high.

Virtual circuit mode
Packet switching includes two modes: 虚电路(Virtual Circuit,VC)方式and 数据报(datagram)方式.
The virtual circuit method is oriented 连接, and the connection in the virtual circuit is not a physical connection, but a logical connection.
After the virtual circuit is established, in the data communication stage, the router forwards the packets according to the virtual circuit identifier, and the data packets belonging to the same virtual circuit will pass through the network in sequence along the same path, and reach the destination node .

Datagram mode
The datagram mode is connectionless, that is, it does not need to establish a connection before sending data .
In the datagram mode, the router independently selects a forwarding interface for each packet, and data packets sent from the same source node to the same destination node may follow different paths or pass through the network out of order to reach the destination node .
At present, the exchange method adopted by the Internet is 数据报方式的分组交换.

Problems with packet switching
Although packet switching improves resource utilization, it also brings the following problems:

增大了时延:分组在各路由器中存储转发时,需要在队列中排队,这会增加些时延。
增大了开销:每一个分组的首部中都包含一些控制信息,这会增加一定的开销。

Message Exchange
The message exchange method also adopts 存储转发the method.
The difference between message switching and packet switching is that
the data unit transmitted by message switching is a complete message, while the data unit transmitted by packet switching is a smaller packet.

Comparison of the Three Exchange Modes
insert image description here
When the number of spanning nodes is large, the delay of message exchange will increase significantly. Packet switching, on the other hand, is a pipelined way of message switching, which significantly reduces the delay.

Performance of Packet-Switched Networks

The following indicators can be used to measure the performance of packet switching:

带宽;
吞吐量;
时延:处理时延、排队时延、传输时延(发送时延)、传播时延;
丢包率;
利用率:信道利用率、网络利用率。

Bandwidth
Bandwidth is 频带宽度the abbreviation for Bandwidth, and the unit is Hertz (Hz).

·信号的带宽指该信号所包含的各种不同频率成分所占据的频率范围;
·信道的带宽指该信道允许通过的信号的频带范围。

For example: the standard bandwidth of the traditional telephone signal is 3.1kHz (from 300Hz to 3.4kHz), and the standard bandwidth of the traditional telephone channel is 4kHz (from 0Hz to 4kHz).
In computer networks, bandwidth refers to the maximum amount of data that can be transmitted within a unit of time , also known as bandwidth 最高数据率, used to represent the network 某信道的数据传送能力, and the unit is bit per second ( bt/s).
·For example: the bandwidth of traditional Ethernet is 10Mbit/s.
In the two expressions of bandwidth, the essence is the same, the former is its own 频域称谓, and the latter is its own 时域称谓.

Throughput Throughput represents the actual amount of data
passing through a network or interface per unit time , and the unit is bit per second (bit/s)

Taking the file transfer application as an example, the rate at which the host receives the file at any instant is called 瞬时吞吐量, and the average rate calculated after the host receives the complete file is called 平均吞吐量.

End-to-end throughput is an important indicator of computer network performance.
End-to-end throughput is limited by network bandwidth.
End-to-end throughput is also affected by other traffic in the network.

Delay
The packet starts from the source host, passes through a series of routers, and finally reaches the destination host. The time spent in this process is called 端到端时延. The end-to-end delay is composed of several parts such as 处理时延(processingdelay), 排队时延(queuing delay), 传输时延(transmission delay)and . : After the node receives the packet, the time it takes for the node to process the packet is called : after the packet enters the router, it is called the delay caused by queuing in the input queue or output queue .传播时延(propagation delay)
处理时延处理时延
排队时诞排队时延

传输时延: The time required for a node to transmit a packet to a link, also known as 发送时延.

Sending delay = packet length (bit) / sending rate (bit/s)

传播时延: The time it takes for the electromagnetic wave to propagate a certain distance in the channel is called 传播时延.

Propagation delay = channel length (m) / propagation rate of electromagnetic waves on the channel (m/s)

End-to-end delay = processing delay + queuing delay + sending delay + propagation delay
The round-trip end-to-end delay is usually called round-trip time (Round-Trip Time, RTT)

Packet loss rate
When the rate at which packets arrive at the router exceeds the rate at which the router sends packets, the router may discard the arriving packets. This phenomenon is called 丢包.
Packet loss indicates that the network is congested. The packet loss rate reflects the degree of network congestion to a large extent , and is often used to evaluate and measure network performance.
Packet loss rate = (Ns-Nr) / Ns
where N represents the total number of packets sent, Nr represents the total number of packets received, and Ns - Nr represents the total number of lost packets.

Utilization
Utilization includes 信道利用率and 网络利用率two.
The channel utilization rate indicates how many percent of the time a certain channel is used (data passes through). The network utilization side is the weighted average of the channel utilization of the whole network.
Channel utilization is not as high as possible. When the utilization rate of a channel increases, the time delay caused by the channel increases rapidly.
Under appropriate assumptions, the relationship between delay and network utilization is as follows:

D = D0 / (1-U)

Among them, D represents the current delay of the network, D0 represents the delay when the network is idle, and U is the utilization rate of the network. If the
channel or network utilization rate is too high, a very large delay will be generated.

network architecture

Layered, protocols and services
In order to reduce the complexity of network design, most networks 分层are designed according to the method. In layered design thinking, each layer is built on the basis of the next layer, and the lower layer provides specific services to the upper layer.
The agreement between the two parties of the peer layer is called 协议.
Each layer protocol is completely independent of other layer protocols to complete the function of this layer.
The layered design method has the advantages of 灵活性好, 耦合性低etc., and 易于开发和维护is convenient to carry out 标准化工作.

In the Internet, the rules, standards and conventions established for the exchange of data in the network are called 网络协议, abbreviated as 协议.
The network protocol mainly consists of the following three elements:

①语法:即数据与控制信息的格式:
②语义:即控制信息的含义;
③同步:即事件顺序的详细说明。

The collection of all protocols at each layer is called 协议栈or 协议族.
Any hardware or software process that sends or receives messages is called 实体. On different hosts, entities at corresponding levels are called peer entities.
The implementation of each layer protocol guarantees the provision of services to the upper layer entities.

insert image description here
The protocol is horizontal and controls the exchange of information between peer entities;
the service is vertical and controls the exchange of information between entities at adjacent levels;
the data unit exchanged between peer entities is usually called 协议数据单元PDU;
adjacent The unit of data exchanged between hierarchical entities is often called 服务数据单元SDU.

The collection of layers and protocols becomes the network architecture.
The TCP/IP protocol is the de facto international standard on the Internet, defining 四层协议栈the
The OSI reference model defines 七层协议栈.
Internet architecture actually adopted 五层协议栈.

insert image description here
For the five-layer protocol
application layer of the Internet:

主要任务: Solve a certain type of application problem through inter-process communication.
常见协议:
Domain Name System DNS; Hypertext Transfer Protocol HTTP; Dynamic Host Configuration Protocol DHCP; Simple Mail Transfer Protocol SMTP, etc.
协议数据单元: message.

transport layer:

主要任务: Provide end-to-end communication services to the application process.
常见协议:
Transmission Control Protocol TCP: connection-oriented, reliable;
User Datagram Association UDP: connectionless, unreliable.
协议数据单元: TCP: message segment; UDP: user datagram.

Network layer:

主要任务: Provide host-to-host communication services to the upper layer, including routing and packet forwarding.
常见协议:
Internet Protocol IP;; Internet Control Message Protocol ICMP; Internet Group Management Protocol IGMP; Address Resolution Protocol ARP, etc.
协议数据单元:
IP protocol: IP packet or P datagram.

data link layer:

主要任务:
Provide communication services between adjacent nodes to the upper layer, including encapsulation into framing, addressing, error control and media access control, etc.
常见协议:
Ethernet Ethernet protocol, point-to-point PPP protocol, wireless local area network WLAN protocol, etc.
协议数据单元:frame.

physical layer:

主要任务: Transparently transmit the bitstream.
常见协议: Related to the actual transmission media, different physical layer protocols are defined on different transmission media.
协议数据单元:
Symbol: A symbol can be understood as a pulse signal. A symbol can carry one bit of information, or it can carry multiple bits of information, and it also allows multiple symbols to carry one bit of information together.

insert image description here
Encapsulation and decapsulation
The process of layer-by-layer encapsulation occurs when sending data, and the process of layer-by-layer decapsulation occurs when receiving data.
insert image description hereMultiplexing and Demultiplexing Multiplexing
can occur at multiple levels, and there are different types of identifiers at each level, which are used to indicate which protocol the encapsulated information belongs to on the upper level. The process of multiplexing occurs during encapsulation. The process of dispensing occurs when unpacking.

Control Plane and Data Plane

The operation of a packet-switched network involves the processing of two types of packets: 控制分组and 数据分组.
The information carried by the control packet is used to instruct the node how to forward data, while the data packet includes the data to be sent by the user program.
The most important function of the control plane is 路由选择. Also includes error reporting, system configuration and management, and resource allocation.
The most important function of the data plane is 分组转发.

In a traditional computer network, each packet switching device includes a data plane and a control plane . So its control plane is 分布式yes.
The control plane and data plane in software-defined network SDN are separated. That is, there is only a data plane on the packet switching device, and the control plane is located in a logical centralized controller . So its control plane is 集中式yes.

Chapter 2 IP Address

In the TCP/IP architecture, the IP address is the most basic concept. Every device connected to the internet has at least one IP address.
An IP address is used on the Internet 网络层地址to identify a host . Strictly speaking, an IP address is used to identify a network interface on a host.
The IP address is now assigned and managed by ICANN, the Internet Corporation for Assigned Names and Numbers. The
IP address is a 32-bit binary number. In order to facilitate writing and memory, the IP address is expressed in dotted decimal notation.

insert image description here
The 32-bit IP address is divided into 4 groups, each group is 8 bits, and then each group of numbers is expressed in decimal, and a point is added between these numbers, which is called the dotted decimal notation of the IP address.

Hierarchical Structure
The IP address adopts a hierarchical (hierarchical) structure, that is, the IP address consists of several parts corresponding to a certain hierarchical structure of the Internet.
An IP address consists of two parts: 网络部分and 主机部分.
The network part of the IP address indicates which network the host is connected to, and all hosts connected to the same network have the same network part of their IP addresses.
The host portion of an IP address uniquely identifies a host on a particular network.
After adopting the IP address of hierarchical structure, the router can forward the packet according to the network part of the IP address only, without considering the host part of the IP address.
Evolution of the addressing scheme
The addressing scheme of the IP address has gone through three historical stages:
分类编址. Dividing IP addresses into five categories, A, B, C, D, and E, is the most basic addressing scheme. RFC790 passed in 1981 included the relevant provisions of A, B, and C addresses.
子网划分. It is an improvement made on the basis of classified addressing, and RFC950 passed in 1985 contains relevant regulations.
无分类编址. The addressing scheme based on Classless Inter-Domain Routing (CDR) is the addressing scheme currently in use. After it was proposed in RFC1519 passed in 1993, it was quickly promoted and applied. In 2006, RFC1519 was replaced by RFC4632.

class addressing

In a classful addressing scheme, using 两级编址scheme, each unicast IP address consists of two fields.
IP地址 ::= {<网络号>,<主机号>}
Among them, the network part is 网络号; the host part is 主机号.
Since different networks may contain different numbers of hosts, a simple division method is to allocate IP address spaces of different sizes to different networks according to the expected number of hosts in the classified addressing scheme. Routers forward packets
based on IP addresses .网络部分

Five Classes of IP Addresses
insert image description here
The IP address space is divided into five classes, named A, B, C, D, and E.

Class A, B, and C address spaces are used for unicast addresses;
Class D addresses are used for multicast addresses;
Class E addresses are reserved addresses.

Class A, B, and C IP addresses
Class A, B, and C addresses are unicast addresses, which consist of two parts: the network number and the host number. In the unicast address space, some addresses are used for special purposes and are not used as unicast addresses.

insert image description here
Among the special addresses, the addresses whose host number part is all 1 are called 广播地址.
The address with both the network number and the host number of 1 is used for broadcasting on this network, also known as the 受限广播broadcasting range is limited to the network to which the sender belongs, and the router does not forward the restricted broadcast.
The network number is the specified network, and the address with all 1s in the host number is used to send broadcast datagrams to the specified network. This kind of broadcast is also called directed broadcast.
The original Internet proposal supported router forwarding of directed broadcasts, and it is enabled by default. However, RFC2644 changed this policy, requiring routers to prohibit the forwarding of directed broadcasts by default.
本地环回测试地址Used for interprocess communication of this host, usually only 127.0.0.1.

After excluding the above special addresses, the remaining addresses belong to the assignable addresses of Class A, B, and C networks

insert image description hereIn each network, the host number is all 1 and the host number is all 0 cannot be assigned
In a class A network, network numbers 0 and 127 cannot be assigned

IP addresses of routers and hosts
Hosts on the same network have the same network number in their IP addresses;
hosts on different networks have different network numbers in their IP addresses.
A router always has multiple IP addresses, and each interface of the router has a different network number for its IP address.

subnetting

When some larger units or organizations set up multiple LANs, it is difficult to assign IP addresses to these LANs under the two-level addressing scheme of IP addresses.
RFC950 proposed 子网划分to solve such problems.
Subnetting expands the IP address from a two-level addressing scheme to a three-level addressing scheme, including two methods:

  1. Fixed-length subnetting
  2. Variable length subnetting

Fixed-length subnet division
The method of subnet division is to borrow several bits from the host number part of the IP address as the subnet number, so that the two-level IP address becomes the third-level IP address within the unit.

IP地址:={<网络号>,<子网号>,<主机号>}

The routers on the Internet still regard the network number as the network part of the IP address; the
border routers and internal routers of the unit will be 网络号+子网号recorded as the network address, and the network address is regarded as the network part of the IP address.
Routers forward packets based on the network portion of the IP address.

Example:
Take a class B network 139.9.0.0 as an example to discuss subnetting. Assuming that the unit that has applied for the class B address borrows 8 bits from the host number as the subnet number, the structure of the P address after subnetting is shown in the figure. In this example, a maximum of 2 8 =256 subnets
insert image description here
is supported , and each subnet can contain up to 2 8 -2=254 hosts. After dividing the subnet, the P address whose host number is all 0 and all 1 still cannot be assigned to the host. In this example, all subnets , this subnetting method is called .

子网号长度相等定长子网划分

Assume that the unit has set up 3 LANs, and the network administrator has allocated subnet numbers 5, 10, and 15 to them, and the remaining subnet numbers are temporarily reserved.
After the subnet is divided, the class B network 139.9.0.0 still appears as a network to the outside. When the border router and the internal router receive the P datagram, they forward the packet according to the network address.
insert image description here
Subnet Mask
After subnetting, routers inside the network must be able to distinguish packets destined for different subnets.
Defined in RFC950 子网掩码, it is used to obtain the network address from the IP address.
The 1 in the subnet mask corresponds to the network number and subnet number in the IP address , and the 0 in the subnet mask corresponds to
the host number in the subnetted IP address
.
The subnet mask can be expressed in dotted decimal notation, or in the number of 1 bits in the mask, called 前缀长度.
In the above example, the subnet masks of the three subnets are the same 255.255.255.0, and the prefix length is 24.

Common subnet masks:
insert image description here
Default subnet masks:

The default subnet mask of class A address is 255.0.0.0;
the default subnet mask of class B address is 255.255.0.0;
the default subnet mask of class C address is 255.255.255.0

Bitwise AND operation of subnet mask
Perform bitwise AND operation (AND) of the IP address and its subnet mask to obtain the corresponding network address.
As in the above example, the border router receives the IP datagram with the destination IP address 139.9.10.11, and the process of bitwise ANDing the IP address with the subnet mask 255.255.255.0 is as follows.

insert image description hereThe calculated network address is 139.9.10.0

Variable-length subnet division
If a network is divided into multiple subnets of different sizes, it can meet the needs of multiple subnets with different numbers of hosts. This subnetting method is called 变长子网划分.
In variable-length subnetting, the lengths of the subnet numbers of each subnet are different , so the lengths of the prefixes of the subnet masks are different. This kind of subnet mask is called 变长子网掩码VLSM.

In the example of fixed-length subnet division in the previous section, if subnet 1 is further divided into two subnets, and their subnet masks are configured as 255.255.255.128, that is, the subnet mask prefix length is 25, a variable-length subnet is obtained. Example of network division.
After the variable-length subnet is divided, the entire network still appears as one network to the outside world.
insert image description here
insert image description here

no class addressing

The classless addressing scheme is to alleviate the allocation pressure of IPV4.

On the basis of VLSM, ETF has developed a non-classified addressing scheme. Its official name is 无分类域间路由选择(Classless Inter-Domain Routing,ClDR)
CIDR, which eliminates the traditional concept of Class A, Class B and Class C addresses and subnetting, and can allocate IPv4 address space more effectively. .
CIDR changed the P address from three-level addressing back to two-level addressing, but this is already unclassified two-level addressing.

IP地址:={<网络前缀>,<主机号>}

Among them, the network part is the network prefix; the host part is the host number.
Routers forward packets based on the network portion of the P address.

CIDR uses slash notation, also known as CIDR notation, that is, add a slash after the IP address /, and then write the number of digits occupied by the network prefix.

Such as IP address: 222.22.65.10/20the network prefix is ​​20 digits.

Network prefixes eliminate the predefined separation of network and host numbers in an IP address, enabling more fine-grained IP address assignments.
Consecutive IP addresses with the same network prefix form a CIDR address block.

Knowing any address in the CIDR address block, you can find the starting address (ie the minimum address) and maximum address of this address block, as well as the number of addresses in the address block. There are 2 12 =4096 addresses in this address block
insert image description here
. The maximum number of assigned hosts is 2 12 - 2 = 4094. The address block is usually identified by the smallest address in the address block and the number of bits of the network prefix , and the above address block can be denoted as .
222.22.64.0/20

CIDR masking
In CIDR, an address mask is used to derive a network prefix from an IP address. The address mask consists of a string of 1s and a string of 0s, and the number of 1s is the length of the network prefix. Address masks are also called CIDR掩码, or simply called 掩码.

The network prefix can be obtained by bitwise ANDing the IP address and the CIDR mask . Although CIDR no longer uses the concept of subnet and subnet number, the unit that has applied for a CIDR address block can still divide some subnets within the unit as needed. These subnets have longer network prefixes than the entire organization.

Routing aggregation
CIDR can not only improve the allocation efficiency of IPv4 address space, but also can be used to reduce the number of entries in the routing table in the router and improve the performance of the router. This function can be realized through 路由聚合.
Route aggregation refers to merging the network prefixes of adjacent CIDR address blocks into a shorter network prefix , and the aggregated routing information can cover more address space.
Since some aggregated CIDR address blocks contain multiple Class C addresses, route aggregation is also called 构成超网.

The following conditions must be met for two CIDR address blocks to be aggregated:

①两个地址块相邻且大小一致;
②两个地址块的前n位相同:
③聚合前后,CIDR地址块包含的IP地址相同。

insert image description here
在CIDR的应用中,有一种特殊的情况,主机号全0和全1的P地址可以使用。
当路由器之间被一条点到点链路连接,则每个端点都需要分配一个IP地址,且两台路由器之间的网络仅包含两个IP地址,为了节省IP地址,RFC3021建议将/31地址块中包含的两个地址分配给两台路由器。
相应的,在IPv6中,对于这种特殊情况,RFC6164中也建议使用/127地址块

特殊用途的IP地址

常见的特殊用途地址包括:

专用网络地址
"链路本地”地址
运营商级NAT共享地址
用于文档的测试网络地址
环回测试地址
受限广播地址

专用网络地址
专用网是指企业或机构内部专用的网络,也称为私有网络。
专用网络内的IP地址不需要向ICANN申请,RFC1918和RFC6890规定了三块IP地址空间作为专用网络地址,它们也称为专有地址,仅用于专用网络内部的主机和路由器之间的通信。

①10.0.0.0-10.255.255.255(10.0.0.0/8);
②172.16.0.0~172.31.255.255(172.16.0.0/12);
③192.168.0.0~192.168.255.255(192.168.0.0/16)。

在专用网络内分配P地址,只需要保证在专用网络内唯一即可。
当专用网络内的主机需要和互联网上的主机通信时,需要利用网络地址转换NAT

Link-local address
Configure the IP address for the host, you can use 手动方式or 自动方式.
The manual configuration method is called static IP address configuration , and the automatic configuration method uses the dynamic host configuration protocol DHCP for configuration.
After choosing to automatically configure the P address, if the host fails to obtain the P address, the operating system will automatically assign a link-local address (Link-Local address) to the host.
The link-local address is specified by RFC3927 and includes a /16 address block: 169.254.0.0/16.
Link-local addresses are used only for communication between hosts on the same physical network that are configured with link-local addresses.

Carrier-grade NAT shared addresses
ISP are also known as carriers. Due to the very shortage of Internet communications 公网地址, operators can no longer obtain new public IP addresses.
In order to meet the needs of new users to access the Internet, RFC6598 specifies a /10地址块: 100.64.0.0/10, which is used as a carrier-grade NAT (Carrier-Grade NAT, CGN) shared address, and is recorded as a CGN address .
CGN address can only be used in ISP's internal network, every ISP can use CGN address.
Users need to go through two times to access the Internet through the ISP NAT.

insert image description here
Test network address for documentation

In various technical specifications or technical documents, it is often necessary to use certain network examples. In order to avoid the possibility of conflicts caused by using addresses that have been assigned to others, RFC5737 reserves 3 address blocks, which are dedicated to test network addresses in the document.

①TEST-NET-1:192.0.2.0~192.0.2.255(192.0.2.0/24);
②TEST-NET-2:198.51.100.0-198.51.100.255(198.51.100.0/24);
③TEST-NET-3:203.0.113.0~203.0.113.255(203.0.113.0/24)。

用于文档的测试网络地址也不会出现在公共互联网中

IP地址规划和分配

利用CIDR进行IP地址规划与分配主要考虑以下三个步骤:

(1)确定需要的CIDR地址块的数量和大小。

综合考虑建筑物的分布位置以及部门的数量,确定需要的CIDR地址块的数量
根据建筑物内或部门中的信息点数量,确定每个CIDR地址块包括的IP地址数目
每个CIDR地址块包括的IP地址数目是2的整数次幂,且其中的最小地址和最大地址不能分配给主机。

(2)确定掩码。

根据CIDR地址块的大小,计算并确定掩码。

(3)CIDR地址块分配。
进行CIDR地址块的分配,需要遵循以下规则:

①应先为较大的地址块分配网络前缀;
②在相同路径上的地址块应具有相同的前缀,便于进行路由聚合;
③应保留部分地址块,以备将来扩展使用。

举例
某校园网信息点分布情况
insert image description hereinsert image description here
insert image description hereinsert image description here

第三章 应用层

应用层协议原理

insert image description here

网络应用程序的核心是能够运行在不同的端系统并通过网络相互通信的程序
“主机A和主机B进行网络通信”,实际上是指:运行在主机A上的某个网络应用程序和运行在主机B上的某个网络应用程序进行通信
在计算机网络中,我们并不特别关注同一台主机上的进程间的通信,而关注运行在不同端系统(可能具有不同的操作系统)上的应用进程之间如何通过计算机网络进行通信。

在互联网的应用层,应用进程之间的通信方式分为两种:

  1. 客户-服务器(Client-Server,C/S)方式
  2. 对等连接(Peer-to-Peer,P2P)方式

客户-服务器方式

客户和服务器都是指通信中所涉及的应用进程
客户-服务器方式所描述的是进程之间服务和被服务的关系

客户是服务的请求方
服务器是服务的提供方

特点
客户进程:
①被用户调用后开始运行,需要通信时主动向服务器发起通信,请求对方提供服务。因此,客户进程必须知道服务器进程的地址。这里的地址由主机的IP地址和进程绑定的端口组成。
②不需要特殊的硬件和很复杂的操作系统的支持。
服务器进程:
①专门用来提供某种服务的进程,可以同时处理一个或多个远程或本地客户的请求。
②当服务器进程被启动后即自动调用并一直不断地运行着,被动地等待并接受来自多个客户进程的通信请求,因此,服务器进程不需要知道客户进程的地址。
③一般需要性能好的硬件和功能强大的操作系统支持。

对等连接方式
对等连接方式的思想是整个网络中共享的内容不再被保存在中心服务器上,各个终端结点不明确区分哪一个是服务请求方哪一个是服务提供方
对等连接中的每个结点既作为客户访问其它结点的资源,也作为服务器提供资源给其它结点访问。从本质上看,对等连接方式仍然是客户-服务器方式。

进程通信
在一对进程之间的通信会话场景中,发起通信的进程叫做客户进程,在会话开始时等待通信请求的进程叫做服务器进程。

应用进程通过称为套接字(socket)的软件接口向网络发送报文和从网络接收报文。
套接字就是应用程序进程和运输层协议之间的接口。
套接字也是应用进程为了获得网络通信服务而与操作系统进行交互时使用的一种机制。
insert image description here
套接字的另外一种含义就是"IP地址+端口”,它也可以理解为进程地址。

万维网【URL / HTML / HTTP】

insert image description here

万维网(Vorld Wide Web,WWW)简称为Web,是一个大规模的、联机式的信息储藏所。

The World Wide Web is a distributed hypermedia system, which is an extension of the hypertext system.
World Wide Web applications 统一资源定位符URLlocate information resources, 超文本标记语言HTMLdescribe information resources, and 超文本传输协议HTTPtransmit information resources.

The three specifications of URL, HTML and HTTP constitute the core construction technology of the World Wide Web and are the cornerstones supporting the operation of the World Wide Web.

HTTP request and response process

The browser encapsulates the URL into an HTTP request and sends it to the World Wide Web server;
after receiving the request, the World Wide Web server uses the URL to find the resource, encapsulates the resource into an HTTP response and sends it back to the browser; the
browser parses the HTML file, renders it and displays it to the user .

insert image description here
URL HTML HTTP
The three specifications of URL, HTML, and HTTP solve three key problems faced by World Wide Web applications:

  1. URLs solve the problem of how to identify resources distributed throughout the Internet.

  2. Using HTML solves the problem of standardization of WWW documents and hyperlinks, so that WWW documents of different styles created by different personnel can be displayed on various hosts in a unified form, and at the same time, resource access across sites is more convenient.

  3. The HTTP protocol is used to solve the transmission problem of information resources on the World Wide Web.

URL
统一资源定位符URL is used to indicate the location of resources on the Internet and the method of accessing these resources.
The general form of a URL consists of the following three parts:

<协议>://<主机><端口号/<路径>

“<协议>://”:指出访问该资源所使用的协议,或称为服务方式。
“<主机>:<端口>”:指出保存该资源的主机和处理该URL的服务器进程。
如果服务器上采用的端口是该协议注册过的熟知端口,则“:<端口>”可以省略。“<路径>”:指出资源在该主机中的具体位置。
如果在服务器上设置了某目录下的默认资源,则“<路径>”中可以省略文件名;如果服务器上设置了根目录下的默认资源,则“”/<路径>”部分都可以省略。

HTML文档
超文本标记语言HTML是制作万维网页面的标准语言。目前版本HTML5.0。
HTML使用标签来描述网页文档,HTML标签是由尖括号包围的关键词,通常是成对出现的,例如<body></body>。其中第一个标签是开始标签,第二个标签是结束标签。HTML标签的组成如下:

<tag-name[[attribute-name[=attribute-value]]……]>(文本内容)<tag-name>

从开始标签到结束标签的所有代码称为HTML元素
HTML文档由一组嵌套的元素组成。
HTML定义了几十种元素,用来定义不同的对象,如<img>元素用来定义图像,<p>元素用来定义段落,<a>元素用来定义超链接等。

HTML的目标是指定文档的结构,而不是文档的外观。
为了控制文档的呈现方式,通常会使用层叠样式表CSS

HTML文档分为静态文档动态文档活动文档三种:

  1. 静态HTML文档:静态HTML文档在创作完毕后就存放在万维网服务器中,它的内容不会根据浏览器发来的数据而改变。
  2. Dynamic HTML documents: Dynamic HTML documents are created when the browser accesses the server. When the browser's request arrives, the server maps the URL to an application, and the application creates an HTML document based on the data in the request.
  3. Active HTML Documentation: Active HTML Documentation moves the work of creating documents to the browser. The document sent back by the server to the browser contains a script program, and after the browser executes the script, a complete active HTML document is obtained.

Hypertext Transfer Protocol HTTP
Hypertext Transfer Protocol HTTP is at the heart of the World Wide Web.
Currently, HTTP/1.1and HTTP/2are Internet recommended standards.

The HTTP client program is usually a browser, and the HTTP server program is usually a World Wide Web server. A browser and a web server communicate by exchanging HTTP messages.
HTTP defines the structure of these messages and the way clients and servers exchange messages.
The purpose of the HTTP protocol is to enable browsers to obtain resources from World Wide Web servers. Resources include various multimedia files such as text, sound, and images.

Working process of HTTP protocol

  1. Client browser process analysis URL;
  2. The browser requests the DNS server to resolve the host domain name in the URL to an IP address:
  3. The DNS server resolves the IP address;
  4. The browser establishes a TCP connection with the World Wide Web server using the IP address and port;
  5. The browser sends an HTTP request message;
  6. The server process returns an HTTP response message,
  7. Interpreted and displayed by the browser;
  8. Release the TCP connection.
    insert image description here

Note: The HTTP protocol is a stateless protocol

Persistent connections and non-persistent connections
The TCP protocol at the transport layer is connection-oriented. If the client-server application chooses TCP as the transport layer protocol, there are two ways to design the application

  1. Each request/response pair needs to be sent through a separate TCP connection. This design method is called a non-persistent connection.
  2. A series of requests and their responses are sent over the same TCP connection. This design method is called a persistent connection.

HTTP/1.0 only supports non-persistent connections;
HTTP/1.1 supports both non-persistent connections and persistent connections.

In the non-persistent connection working mode, every time a file is obtained, there will be twice the RTT overhead 效率很低.

Two working modes of persistent connection
HTTP/1.1 By default, persistent connection is used, and its persistent connection supports two working modes:

  1. Non-pipeline mode: The client can send the next HTTP request message only after receiving the previous HTTP response message.
  2. Pipeline mode: The client can continue to send a new HTTP request message before receiving the HTTP response message sent back by the server. After the request messages arrive at the server one after another, the server can send back response messages continuously.

HTTP/1.1The pipeline method still requires the server to send back response messages in order . If a certain request takes a long time to process, even if the server has completed the processing of subsequent requests in a multi-threaded manner, it must wait for the request to be processed. Subsequent response messages can only be sent after the response message is sent.

HTTP/2Improvements have been made to solve this problem, allowing multiple HTTP request packets and HTTP response packets to be sent interleaved in the same connection.

HTTP message format
HTTP has two types of messages: 请求报文and 响应报文.

HTTP is text-oriented, and each field in the message is some ASCII code string.

HTTP request messages and response messages are composed of three parts.

  1. Start line: The start line in the request message is called the request line, and the start line in the response message is called the status line.
  2. Header line: Some information used to describe the browser, server or message body. The header can contain multiple lines or not be used. Each header line contains the header field name and its value.
  3. Entity body: It is called in the request message 请求主体, and this field is generally not used in HTTP requests. Called in the response message 响应主体.

HTTP request message
insert image description here

insert image description here

insert image description here

HTTP Response Message
insert image description here
insert image description here
insert image description hereCookie
The HTTP protocol is a stateless protocol, and each request is independent of each other.
When the business logic needs to understand the association between multiple requests, cookies can be used to make up for the lack of statelessness of the HTTP protocol.
A cookie consists of four components:

①在HTTP响应报文中的一个Set-Cookie首部行;
②在HTTP请求报文中的一个Cookie首部行;
③在用户系统中保存的Cookie,Cookie可以保存在内存中或磁盘上,由用户浏览器进行管理;
④位于万维网服务器的一个后端数据库。

代理服务器
代理服务器也称为万维网缓存
代理服务器把最近请求过的资源的副本保存在自己的存储空间,当收到新请求时,如果代理服务器发现新请求的资源与缓存的资源相同,就返回缓存的资源,而不需要按URL再次访问该资源。

insert image description here
insert image description here
部署并使用代理服务器具有以下两个优势:

  1. 代理服务器从整体上降低了万维网上的通信流量,从而改善网络性能。
  2. 代理服务器可以降低浏览器请求的响应时间。

内容分发网络
内容分发网络CDN是也是一种Web缓存。
提供内容分发服务的公司简称为CDN公司,他们在互联网上部署了许多代理服务器,因而使大量流量实现了本地化。
CDN公司部署的代理服务器也称为CDN集群
一旦CDN集群部署完成,CDN公司就可以动态地将客户的HTTP请求定向到CDN中的某个代理服务器上
CDN不依赖客户在浏览器中配置代理服务器,而是依赖DNS将不同的客户定向到不同的代理服务器上。

举例
insert image description here

  1. 客户浏览器向DNS系统发起查询,查找www.abc.cn的P地址。
  2. DNS系统依据发起查询的客户端不同,返回不同的结果。
    例如对于中国联通的用户A,DNS系统返回部署在中国联通的CDN集群的IP地址P1。
  3. The client browser initiates an HTTP request to the CDN cluster based on the DNS query result.
  4. According to the workflow of the proxy server, the CDN cluster directly returns resources to the client browser, or requests resources from the source server

Problems with HTTP/1.X
With the development of the Internet, HTTP/1.xit has been unable to meet the needs of the modern World Wide Web, which is mainly reflected in:

  1. Head-of-line blocking problem: the server processes requests and returns response messages in sequence, and needs to cache multiple requests, thus occupying more resources;
  2. Limitation on the number of concurrent TCP connections: using multiple TCP connections to access the World Wide Web server concurrently will consume a lot of server resources; therefore, usually a maximum of 5 to 10 concurrent TCP connections are established between the browser and the server;
  3. No message header compression scheme: HTTP message headers have a lot of content, but each request header does not change much;
  4. Plaintext transmission is insecure: HTTP relies on the transport layer security TLS protocol to achieve encrypted transmission.

New features of HTTP/2

二进制格式的帧: The message format of HTTP/1.1 is based on text, while HTTP/2 adopts a new binary format; :
基于流的多路复用The concept of stream is introduced in HTTP/2, and each pair of HTTP request message and response message is regarded as the same One stream implements stream-based multiplexing on the same TCP connection, and frames in different streams can be interleaved to be sent to each other; :
服务端推送HTTP2 allows the server to send multiple responses to a single client request, that is, the server Additional resources can be pushed to clients;
首部压缩: HTTP/2 uses the HPACK compression format to compress header information in requests and responses;
增强的安全性: mainstream browsers such as Chrome and Firefox have publicly announced that they only support encrypted HTTP/2, and HTTP/ 2 The security of TLS has been further strengthened.

Detailed introduction: New features of HTTP/2

HTTP/2 solves many problems of older versions of HTTP, but it still has the following two 待解决问题:

  1. TCP connection establishment delay problem: HTTP is implemented based on the TCP protocol. Before sending an HTTP request message, a TCP connection needs to be established first. The three-message handshake mechanism when the TCP connection is established has a large delay;
  2. The head-of-line blocking problem has not been completely resolved: HTTP2 has improved the head-of-line blocking problem caused by the "first in first out" of the application layer, but HTTP2 cannot solve the head-of-line blocking problem caused by the TCP "lost retransmission" mechanism.

The above two problems are caused by the transport layer protocol TCP. To solve the fundamental problem, the TCP protocol needs to be replaced.

Improvements of HTTP/3
Compared with HTTP/2, HTTP3 has the following improvements:

  1. HTTP/3 is no longer based on the TCP protocol for data transmission, but based on the QUIC protocol on UDP
    . Delay when no TCP connection is established.
  2. QUIC has added functions on the basis of UDP, realizing
    functions such as TCP-like flow control and reliable transmission.
  3. QUIC integrates the encryption function of TLS, which can also reduce the delay during key negotiation.
  4. QUIC implements the transmission of multiple independent logical data streams on one connection, which solves the problem of head-of-line blocking in TCP.
  5. TCP implements congestion control in kernel space, while HTTP3 implements congestion
    control in user space.
  6. HTTP/3 will use the HPACKI-compatible QPACK compression scheme.

insert image description here

Domain Name System DNS

insert image description here

The 32-bit IP address is difficult to remember. In order to facilitate memory and use, people have introduced it 主机名hostname.

In the ARPANET era, the scale of the computer network was relatively small, and the corresponding relationship between the host name and the IP address was stored in a file named hosts.

Since 1983, the Internet has adopted a hierarchical tree structure naming method. Any host or router connected to the Internet can have a unique hierarchical name, called域名(domain name)

The distributed Domain Name System (DNS) can resolve domain names to IP addresses.

The domain name system is the core service in the Internet, providing domain name resolution services for various network applications on the Internet.

Nameservers run over UDP 53号端口.

domain name space

The collection of names used in DNS makes up the domain name space. The early Internet used a non-hierarchical domain name space; the current Internet uses a hierarchical tree-structured domain name space.
域(domain)A manageable division of the domain name space. Domains can be divided into subdomains, and subdomains can be further divided into subdomains of subdomains, thus forming top-level domains, second-level domains, third-level domains, and so on.
Syntactically, each domain name consists of a sequence of labels separated by dots.
A domain name that can uniquely identify a host on the Internet is called a fully qualified domain name (FQDN) , and its format is as follows:
[hostname].[domain].[tld]

where hostnameis the hostname, which domaincan be any subdomain, and tldis the top-level domain

For example, the domain name www.zzu.edu.cn is the full domain name of the web server of Zhengzhou University

insert image description here
It is more intuitive to use a domain name tree to represent the structure of the domain name space.
Top-level domain names are managed by ICANN, and other levels of domain names are managed by their upper-level domain name management agencies.

Domain Name Server
The DNS system uses a large number of domain name servers, which are 层次方式organized in groups, and each domain name server only governs a part of the domain name system.
According to the role played by domain name servers, domain name servers can be divided into four types:

·Root name server
·Top-level name server
·Authoritative name server
·Local name server

Root domain name server
最高层次的域名服务器
Root domain name server knows all the domain names and corresponding P addresses of all top-level domain name servers.
Currently, the root domain name server uses 13 domain names with different P addresses , namely (a~m).rootservers.net. It is run by 12 separate agencies.
All root nameservers are deployed using anycast . After using anycast, the DNS resolver no longer needs to know the real IP address of the DNS server, but only needs to know the anycast address to communicate with the local optimal instance around the world.

The operator of each root server is independently responsible for managing its own anycast instances. The number of anycast instances varies greatly among different root servers. For example, root B name server has only 4 anycast instances, while root E has 308 instances.

As of June 2021, 1,380 root domain name server instances have been installed around the world, and there are 37 root domain name server instances in my country.

Top-level domain name server : responsible for managing all second-level domain names registered in this top-level domain

·When receiving a DNS query request, the top-level domain name server will give a corresponding answer.
·The answer may be the final IP address, or it may be the IP address of the domain name server that should be found in the next step.

Authoritative domain name server : the domain name server responsible for managing a certain area

·The scope of authoritative domain name server authority management is called zone (Zone), and the boundary division of the zone is the responsibility of each unit.
• A region may be equal to or smaller than a domain.
·The mapping from domain names to P addresses of all hosts in this area is saved in the authoritative domain name server.

Local domain name server : a domain name server that directly provides domain name resolution services for users

When a host sends a DNS query request, the query request message is sent to the local domain name server.
Each Internet service provider (ISP) can have a local domain name server. This domain name server is sometimes called the default domain name server.

Resource records
Each entry stored in a domain name server is called a resource record , and the file that holds the resource record is called a resource record 区域文件.
The resource records saved in all domain name servers together constitute a distributed DNS database.
DNS resource records are divided into many types, common resource record types are as follows:

  1. A record : Refers to 地址(Address)记录, also known as host record, which is used to map the domain name to the IPv4 address of the host.
  2. AAAA record : Similar to the A record, it maps the domain name to the IPv6 address of the host.
  3. SOA record : Refers to 起始授权(Start Of Authority,SOA)记录. The SOA record is a mandatory record in all zone files
    , and it must be the first record in a zone file.
    The SO record describes necessary parameters such as the zone to which the zone file belongs , the full domain name FQDN of the primary domain name server in the zone, and so on.
  4. NS record : refers 域名服务器记录to which DNS server is used to resolve the domain name. The role of the NS
    record is to map the domain name to the full domain name FQDN of the DNS server that manages the domain name.
  5. MX record : refers to 邮件交换记录the mail server used to specify the domain. The function of the MX record is to map the domain name
    to the fully domain name FQDN of the mail server of the domain.
  6. CNAME record : Refers to 别名记录, used to map multiple domain names to the same host
  7. PTR record : A PTR record refers to 指针a record, also known as 反向记录. The function of the PTR record is to reversely map the P address
    to a domain name.

When the resolver sends a domain name to the DNS server for resolution, the response it can get from the DNS server is the resource record associated with the domain name, and each DNS response message contains one or more resource records.

For example:
the authoritative domain name server of a domain edu.cnreceives a www.zzu.edu.cnrequest to resolve a domain name. Since the domain name server is not the authoritative www.zzu.edu.cndomain name server of the host, the response it returns contains an NS record and an A record ,
which NS记录indicates the management of www.zzu The authorized domain name server of .edu.cn is dns.zzu.edu.cn.
A记录Specify the IP address of dns.zzu.edu.cn.

Therefore, this DNS response directs the DNS resolver to initiate the next query to dns.zzu.edu.cn.

DNS domain name resolution can use 递归查询and 迭代查询two methods.

recursive query

If the domain name server inquired by the DNS client does not know the IP address of the queried domain name, then the domain name server will replace the DNS client and continue to send query request messages to other domain name servers until the result is obtained.

Usually the query initiated by the host to the local domain name server is a recursive query

Iterative query
When the domain name server receives the iterative query request message, it either gives the IP address of the host to be queried, or informs the DNS client which domain name server it should query next instead of replacing the DNS client for the next query.

The query initiated by the local domain name server to other domain name servers is an iterative query

DNS cache

In order to improve DNS query efficiency, reduce the load on the root domain name server and reduce the number of DNS query messages on the Internet, cache memory is widely used in domain name servers.

高速缓存用来存放最近查询过的域名以及从何处获得域名映射信息的记录。为保持高速缓存中的内容正确,域名服务器为每项内容设置计时器并删除超过合理时间的项。

本地DNS服务器不仅缓存最终的查询结果,也能够缓存顶级域名服务器的IP地址,因而本地DNS服务器可以绕过查询链中的根DNS服务器。

动态主机配置协议DHCP

insert image description here

互联网上广泛使用动态主机配置协议(Dynamic Host Configuration Protocol,DHCP)自动配置网络参数。

配置信息一般包括0IP地址子网掩码默认路由器的IP地址本地域名服务器的IP地址

DHCP服务器分配给DHCP客户的IP地址等网络参数是临时的,只能在一段有限的时间内使用,这段时间称为租用期

DHCP客户使用UDP的68号端口,DHCP服务器使用UDP的67号端口。

DHCP客户启动时,需要利用广播报文寻找DHCP服务器,该广播报文属于本地网络广播,不能被路由器转发。

因此,要求每个网络上都有一台DHCP服务器

insert image description here
为了避免DHCP服务器过多,DHCP利用DHCP中继代理解决该问题

  1. DHCP中继代理配置了DHCP服务器的P地址信息。
  2. DHCP中继代理收到DHCP客户以广播形式发送的发现报文后,就以单播方式向DHCP服务器转发此报文,并等待其回答。
  3. 收到DHCP服务器的回答报文后,DHCP中继代理再把此回答报文发回给DHCP客户。

insert image description here

例子
insert image description here

  1. DHCP客户从UDP端口68以广播方式发送DHCP Discover报文。目的IP地址:255.255.255.255 源IP地址:0.0.0.0

  2. DHCP服务器广播发送DHCP,Offer报文给予响应

insert image description here

3. The DHCP client selects a server from the received Offer message, and broadcasts the DHCP Request message to the selected DHCP server.
4. After the DHCP server receives the Request message, it broadcasts DHCP ACKthe message to the DHCP client.

When the DHCP ACK message is received, the DHCP client actually obtains and can use the assigned IP address
insert image description here
. When the DHCP ACK message is received, when the packet reaches 0.875T, the DHCP Request message will be unicasted again to request lease renewal.
insert image description here
After receiving 单播the DHCP Request message, the DHCP server 单播sends a DHCP ACK message or a DHCP
NAK message to the DHCP client.

If the DHCP client receives a DHCP ACK message, update the lease period and reset the timer
If the DHCP client receives a DHCP NAK message, stop using the original IP address and send a DHCP Discover message to request again

insert image description here
For the detailed format of the DHCP message, please refer to the blog: DHCP message format

Email [SMTP / POP3 / IMAP]

insert image description here

insert image description here
①The sender uses the user agent to compose an email, sends it to the sender's mail server through SMTP, and temporarily stores it in the cache of the sender's mail server.

②The sender's mail server regularly scans the mail cache, and after finding the mail, it sends the mail to the receiver's mail server through SMTP, and the receiver's mail server stores the mail in user B's mailbox.

If the delivery fails, the sender's mail server will save the mail in a queue and try to send it again later.
If the delivery is unsuccessful within the specified time limit, the email will be deleted and the sender will be notified of the delivery failure.

③When the recipient accesses his mail, the mail server verifies the identity of the recipient. The recipient accesses the mail using POP3协议either or IMAP协议.

Simple Mail Transfer Protocol SMTP

SMTP was originally specified by RFC821 and is currently specified by RFC5321.

SMTP uses 客户-服务器方式work, it works TCP的25端口on.
Every mail server can act as both an SMTP client and an SMTP server.
The SMTP process responsible for sending mail is the SMTP client, and the SMTP process responsible for receiving mail is the SMTP server.
SMTP specifies 14 commands and 21 responses.

Each command is composed of several letters, and each response message generally has only one line of information, starting with a 3-digit code, followed by a very simple text description (or not).

Earlier versions of SMTP had some disadvantages:

  1. Sending e-mail does not need to be authenticated, resulting in a lot of spam on the Internet.
    ·solve:

1. Use the POP protocol account for identity authentication
2. RFC5321 defines the extended SMTP (ESMTP), which increases the client identity
authentication function.

  1. SMTP only supports sending ASCII codes, not binary data.
    ·solve:

Defines the Multipurpose Internet Mail Extensions MME, which supports the transfer of binary data

The post office protocol POP3 is the
version 3 stipulated by RFC1939, which is recorded as POP3 . The working method
used by POP3 客户-服务器, its server works on TCP的110端口the Internet . and keep" and "download and delete"

互联网报文存取协议IMAP
由RFC3501规定的MAP协议的第4版
IMAP使用客户-服务器的工作方式,它的服务器工作在TCP的143端口
IMAP也是一个邮件读取协议,它是一个联机协议
在用户未发出删除邮件的命令之前,MAP服务器邮箱中的邮件一直保存着
IMAP允许用户代理仅读取邮件中某些部分,如只读取一个邮件的首部
IMAP为用户提供了创建文件夹,以及将邮件从一个文件夹移动到另一个文件夹的命令

电子邮件格式部分略。

第四章 运输层

运输层概述

insert image description here

运输层的主要任务是向应用进程提供端到端的逻辑通信服务
运输层协议是在网络边缘部分的主机中实现的,只有主机的协议栈才有运输层。
路由器在转发分组时只用到协议栈的下三层

insert image description here运输层协议的作用范围是发送方进程到接收方进程

  1. 在发送方,运输层协议实体从发送方应用进程接收数据,并依据运输层协议约定的方法,将数据封装到运输层数据单元内,交给下层实体处理;
  2. 在接收方,运输层实体从下层实体收到运输层数据单元,解封后将数据取出交给接收方应用进程。

运输层的两个重要协议
运输层中最重要的协议是传输控制协议TCP用户数据报协议UDP
insert image description here
insert image description here

运输层复用和端口
运输层复用是指将多种应用数据封装在同一种运输层协议数据单元中。
运输层分用是指将封装在同一种运输层协议数据单元中的数据分发给不同的应用进程。

To realize the multiplexing and demultiplexing of the transport layer, an identifier is required to identify different application processes .
In an operating system of a computer, a process identifier is generally used to identify a process. But different operating systems have different process identifier formats.
In order to enable processes on different operating systems to communicate with each other, it is necessary to select a uniform identifier that has nothing to do with the operating system to identify the processes in communication.
The transport layer protocol is used 端口号to identify the application process, and the port number is also referred to as端口

TCPUDPThe headers of and both contain 源端口字段and目的端口字段

源端口字段Used to identify the sender process The destination port field is used to identify the receiver process
The source port is usually used as part of the "return address" when processing on the receiver side.

The length of the port field is 16比特, and its value range is between 0 and 65535.
The port of the transport layer only has local meaning, that is, it identifies the application process in the computer.

insert image description here
Port Types
IANA divides ports into three types: 系统端口, 用户端口, 动态端口.

insert image description here
Common service programs and system ports
insert image description here

User Datagram Protocol UDP

insert image description here

UDP provides the minimum service at the transport layer, including multiplexing and demultiplexing functions and error detection functions.

insert image description here
No connection :
No need to establish a connection before sending

Packet-oriented :

  1. UDP preserves application layer packet boundaries .
    ·The message delivered by the application process is directly encapsulated into the UDP user datagram as the data part of UDP.
  2. UDP neither splits nor merges application layer messages, and sends one message at a time.

UDP packet length

UDP报文的长度是由应用进程决定的
过长和过短的UDP报文都会影响通信效率
报文过长,则下层的IP协议在传送时有可能需要分片,这会造成传输效率的下降
报文过短,则逐层封装所增加的各层首部所占比例会较大,也会造成效率的下降
典型的UDP应用进程,将报文长度控制在512字节以内。如DNSDHCP

insert image description here

尽最大努力交付
尽最大努力交付的实质是不可靠交付
但UDP不会随意的丢弃用户数据报;
UDP提供从发送方到接收方的、端到端的差错检测功能,提供无差错接受服务
对于可能出现的用户数据报的丢失、重复或者失序,UDP都不进行处理,因此UDP不提供可靠交付服务
使用UDP的应用进程,如果需要可靠交付,需要自己实现。

UDP不提供流量控制功能
UDP也不提供拥塞控制功能

也就是说:UDP发送用户数据报时,既不考虑接收方当前的状态和处理能力;也不考虑网络当前的拥塞情况和承载能力

一旦应用进程将数据传递给UDP,UDP立即将用户数据封装并发送出去。

因此,UDP发送用户数据报的时机是由应用进程控制的。

UDP user datagram format
In the Linux virtual network environment, build a UDP communication instance network topology.
Use the nc command of Linux to carry out UDP communication
insert image description here
interception of UDP packets
insert image description here .
insert image description here
Source port
The source port is the port number of the sender 16位.
The source port number is optional. If the UDP sender does not need the other party to reply, this field is allowed to be set to all 0.
In the user datagram intercepted by the UDP communication instance, the source port value is 0xc15e, ie 49502. The port is ephemeral

insert image description here
Destination port
Destination port is the port number of the receiver, accounted for 16位.
This field is required when the receiver UDP delivers packets to the application layer.
In the user datagram intercepted by the UDP communication instance, the destination port value is 0x1193, ie 4499. This port is the registration port

insert image description here
Length
Length refers to the total length of the UDP header and UDP data, accounting for 16位.
The minimum value of the length is 8 bytes.
In the user datagram intercepted by the UDP communication instance, the value of the length field 0x0013is, ie 19. In this example, the length of the UDP data part is 11 bytes, and the length of the UDP header is 8 bytes.

insert image description here
Checksum
A UDP checksum is an end-to-end checksum that accounts for 16位.
The UDP checksum is calculated by the initial sender and checked by the final receiver, which is used to check whether there is a bit error in the end-to-end transmission process.
For user datagrams that cannot pass the verification, UDP is only used as 丢弃处理
the value of the checksum field in the user datagrams intercepted by the UDP communication instance, and 0x92bethe verification is passed.

The calculation range of UDP checksum covers UDP首部, UDP数据部分and 一个伪首部.
The pseudo-header is derived from certain fields in the IP header and UDP header, and has a total 12字节length.
The pseudo-header is not the real header of the user datagram, but is temporarily added to the front of the UDP user datagram to participate in the calculation of the UDP checksum when the computer checksums.
The pseudo-header is neither transmitted to the lower layer nor submitted to the upper layer, nor will it be encapsulated for transmission.
In the TCP protocol of the transport layer, TCP校验和the calculation of s also uses a similar pseudo-header.

insert image description herePseudo-header
The source IP address and destination IP address are derived from the IP header.
The protocol number field is also derived from the IP header. For IP-UDP datagrams, the value of this field is 17. The
UDP length field is derived from the UDP header.

The calculation method of the UDP checksum is to find the inverse code of the 16-bit word and the inverse code.
This checksum calculation method is applied in the P protocol and the TCP protocol.
The calculation method of the UDP checksum requires 16-bit alignment, that is, it must be an even number of bytes, but the length of the UDP user datagram is allowed to be an odd number of bytes.
Therefore, for 奇数长度user datagrams, UDP appends a padding byte of all 0s at the end.
This padding byte is only for the calculation and verification of the checksum, and will not actually be transmitted

insert image description here

Principle of reliable transmission

insert image description here

The so-called 可靠传输服务means to provide a reliable logical channel for the upper-level entity , and the data transmitted through the channel will not happen 比特差错, 丢失and all data 发送顺序will be delivered according to it.

Protocols that provide reliable transport services are called reliable transport protocols . An ideal reliable channel satisfies the following two assumptions:

(1) The transmitted data will not cause bit errors, packet loss or delay;
(2) The receiving rate of the receiver can be as fast as the sending rate of the sender.

This section starts with an ideal reliable channel, removes assumptions step by step, and discusses how to achieve reliable transmission on unreliable channels.

The reliable transmission principle protocol discussed in this section only considers 单向数据通信the situation. For 全双工数据通信the situation, the implementation method of reliable transmission is consistent with the principle described in this section.

For the convenience of description, we refer to the sender as the receiver A(Alice)and the receiver as it B(Bob).
The principle of reliable transmission discussed in this section applies to general computer network protocols, so this section uses the term 协议数据单元PDUfor subsequent discussions

Stop Waiting for a Protocol
Ideally, transmitting data over a reliable channel clearly does not require any protocol for reliable transmission.

We remove the ideal channel 第(2)个假定and retain the (1) assumption that the channel is 无比特差错, 丢包or 延迟.

In order to ensure that receiver B can correctly receive and process the received data, we add a flow control mechanism:

When B receives a PDU, completes the processing, and is ready to receive the next PDU, B sends a confirmation PDU to A, which is recorded as ACK(Acknowledgment).
Each time A sends a PDU, it must stop sending and wait for the ACK from B. After receiving the ACK, A can send the next PDU.

This kind of protocol with flow control that stops and waits every time a PDU is sent is called 停止等待协议, for short 停等协议, SW协议.

A stop-and-wait protocol with flow control only is denoted as SW1.0协议.

We remove the ideal channel 第(1)个假定, then the data transmitted in the channel may have bit errors, packet loss or delay.
insert image description here

Consider three situations respectively:
data PDU error or loss
On the basis of SW1.0, add the following measures to obtain the SW2.0 protocol:

  1. When B receives the PDU, it can 校验和计算detect errors through other measures. For the erroneous PDU, B discards it directly without sending ACK.
  2. Add a timeout retransmission mechanism for the sender: A sets a timeout timer after sending a PDU. If the timeout timer expires and still does not receive the ACK sent by B, A will retransmit the previously sent PDU.
  3. If the ACK sent by B is received before the timeout timer expires, the timeout timer is canceled.
  4. Obviously, if the data PDU sent by A is lost during transmission, B will not receive the PDU, and will not send ACK. After the timeout timer expires, A will resend the lost PDU.

Yes 超时重传机制, it can automatically retransmit without the request of the receiver 出错或丢失的PDU. This protocol is called自动重传请求ARQ协议

ACK error or missing

If the acknowledgment ACK sent by B to A is wrong or lost during the transmission process, since A cannot receive the correct ACK, when the timeout timer expires, A will resend the previously sent PDU.

But B has received the PDU correctly, in order to prevent B from handing over the duplicate PDU to the upper layer negotiation entity, on the basis of SW2.0, add a serial number to the data PDU, and obtain the SW3.0 protocol .

  1. Each time A sends a PDU, it adds 1 to the sequence number and writes it into the sequence number field of the new PDU.
  2. The PDU retransmitted after timeout has the same sequence number as the erroneous or lost PDU.
  3. B can judge whether the received PDU is a duplicate according to the sequence number. If it is a duplicate PDU, it means that the ACK sent by B to A has not been delivered correctly, so B discards the duplicate PDU and retransmits the ACK .

ACK Delay
When the ACK sent by B to A is transmitted in the channel, it may reach A with a delay.

If A has retransmitted a certain PDU, when a late ACK arrives, A cannot determine which data PDU the ACK is for, and SW3.0 will be invalid.

On the basis of SW3.0, 为ACK增加序号get the SW4.0 agreement .

  1. Each time B receives a data PDU, it takes out the sequence number of the PDU, and writes the sequence number into the acknowledgment number field of the ACK when sending the ACK, so as to explain which data PDU the ACK confirms.
  2. A can judge whether the received ACK is a duplicate according to the confirmation number. If it is a duplicate ACK, A discards the duplicate ACK without any other processing.

Stop and wait protocol controls

The stop-and-wait protocol adds the following control measures to achieve reliable transmission on unreliable channels.

1. Flow control mechanism based on acknowledgment feedback;
2. Automatic retransmission mechanism based on timeout timer;
3. Duplicated PDU identification mechanism based on sequence number and acknowledgment number.

Channel utilization for the stop-and-wait protocol

example:

Consider two hosts, A and B, about 3000 km apart, communicating using a stop-and-wait protocol. Assume that hosts A and B pass through a channel with a transmission rate of 1Gbit/s (10bit/s), and the length of the connected data PDU is 1500 bytes, ignoring the processing delay and queuing delay of all nodes passing through from A to B , also ignores the delay in processing data PDUs and sending ACKs of host B.

insert image description here

Calculate the channel utilization of sender A

insert image description here
The channel utilization rate of the stop-and-wait protocol is low, resulting in a great waste of communication resources.

Sequential ARQ protocol
insert image description here

In order to improve transmission efficiency, pipeline transmission can be adopted.
The pipeline transmission method makes the data continuously transmitted on the channel, which can obtain a higher channel utilization rate.

The reliable transmission protocol adopted 流水线传输方式is called the continuous ARQ protocol, also known as 滑动窗口协议
the continuous ARQ protocol is divided into two types according to the different error recovery methods: 回退N步(GBN)的连续ARQ协议and 选择重传(SR)的连续ARQ协议.

Sliding window
The communication parties that implement the sliding window protocol maintain a window according to their own buffer space.
The sender maintains 发送窗口swnd, the receiver maintains 接收窗口rwnd.
insert image description here
insert image description here
For the sending window:
the pointer P1 points to the earliest unconfirmed PDU,
the pointer P2 points to the next PDU to be sent, and
the pointer P3 points to the first PDU outside the sending window.

[0,P-1]区间The inner corresponds to the PDU that has been sent and received the confirmation,
[P1,P2-1]区间the inner corresponds to the PDU that has been sent but has not received the confirmation,
[P2,P3-1]区间the inner corresponds to the PDU that is allowed to be sent but not yet sent, and
大于等于P3区间the inner corresponds to the PDU that is not allowed to be sent

[P1,P3-1]区间It is called the sending window.
The length of the sending window is N=P3-P1.
In this example, the length of the sending window is a fixed value of 10.

When the sender receives the confirmation ACK for PDU No. 3 and No. 4, the sending window will slide forward , as shown in figure (b). The P1 pointer
slides and points to PDU No. 5. Since the window length is a fixed value in this example, the P3 pointer also slides forward accordingly , keeping the sending window length value of 10 unchanged.

By convention, "forward" points in the direction of increasing time, and "backward" points in the direction of decreasing time.

insert image description here
For receive windows:

Pointer P4 points to the next PDU to be received
Pointer P5 points to the first PDU outside the receiving window

[0,P4-1]区间The inner corresponds to the PDU that has been received and confirmed,
[P4,P5-1]区间the inner corresponds to the PDU that is allowed to be received,
大于等于P5区间and the inner corresponds to the PDU that is not allowed to be received.

[P4,P5-1]区间is the receiving window, and the length of the receiving window is N'=P5-P4
. In this example, the length of the receiving window is a fixed value of 10.

When the receiver receives PDU No. 3, since the receiver has cached PDU No. 4 before, the receiver can continuously send ACKs for PDU No. 3 and No. 4.
After sending ACK No. 4, the receiving window will slide forward , as shown in figure (b). After the P pointer slides, it points to PDU No. 5.
Since the window length is a fixed value in this example, the P pointer also slides forward accordingly , keeping the receiving window length value of 10 unchanged.

Cumulative Acknowledgments
The receiver is allowed 累积确认to send acknowledgment ACKs in a way.
Cumulative acknowledgment means that the receiver does not need to send ACK to the received packets one by one, but sends ACK after receiving several packets, 按序到达的最后一个PDUwhich means that all packets up to this packet have been received correctly.

Going back to the N-step GBN protocol
Comparing the basic concepts of 停等协议and 滑动窗口协议, it is not difficult to find that the stop-and-wait protocol is essentially a sliding window protocol with a sending window length of 1 and a receiving window length of 1.

The GBN protocol is a sliding window protocol in which the length of the sending window is greater than 1 and the length of the receiving window is equal to 1.

We observe the behavior of the sender in the GBN protocol in an example of the GBN protocol running with a sending window length of 4 and a receiving window length of 1
insert image description here

  1. If the sending window is not full, assemble a PDU with the data in the sending buffer, send it out, and register the timeout timer; if the sending window is full, wait for the sending window to slide.
  2. If the confirmation ACK is received, the timeout timer of the PDU confirmed by the ACK and the previous PDU is canceled. Then calculate and slide the current sending window according to the confirmation number of the ACK and the length of the sending window.
  3. If a timeout event is detected, the timeout PDU is retransmitted.

Receiver Behavior in GBN Protocol

  1. If the received PDU falls within the receiving window, it receives the PDU, sends an acknowledgment ACK for the PDU, and slides the receiving window.
  2. If the received PDU does not fall within the receiving window, the PDU is discarded and an ACK is sent to confirm the last correct PDU.

Channel Utilization of GBN

Observing the operation process of the GBN protocol, it can be found that the transmission in the pipeline mode makes the data in the channel continuously transmitted, which can indeed improve the utilization rate of the channel.

However, since the receiving window is only 1, all PDUs that arrive after the lost or erroneous PDU are retransmitted by the sender, even if these out-of-order PDUs are all correct. This processing method causes a waste of channel resources.

From the perspective of the sender, once a timeout retransmission event occurs, it needs to go back N steps and resend all subsequent PDUs starting from the timeout PDU.

Select retransmission SR protocol

停等协议It is a sliding window protocol with a sending window length of 1 and a receiving window length of 1.
GBN协议It is a sliding window protocol in which the length of the sending window is greater than 1 and the length of the receiving window is equal to 1.
SR协议It is a sliding window protocol in which the length of the sending window is greater than 1 and the length of the receiving window is also greater than 1.

In the SR protocol, the receiver uses the sequence number of the last PDU that arrives in sequence to check all the PDUs that arrive in sequence 累积确认, and at the same time uses 选择确认(Selective Acknowledgement,SACK)a separate confirmation for the PDUs that arrive out of sequence.

The SACK here is the selection confirmation of the retransmission SR protocol, which is different from the selection confirmation option of TCP introduced in the later chapters of this book.

We observe an example of the operation of the SR protocol with a sending window length of 4 and a receiving window length of 4

insert image description here
Sender Behavior in SR Protocol

  1. If the sending window is not full, assemble a PDU with the data in the sending buffer, send it out, and register the timeout timer; if the sending window is full, wait for the sending window to slide.
  2. If the confirmation ACK is received, the timeout timer of the PDU confirmed by the ACK and the previous PDU is canceled. Then calculate and slide the current sending window according to the confirmation number of the ACK and the length of the sending window.
  3. If the selection confirmation SACK is received, the timeout timer of the PDU confirmed by the SACK is canceled.
  4. If a timeout event is detected, the timeout PDU is retransmitted.

Receiver Behavior in SR Protocol

  1. If the received PDU falls within the receiving window, and the PDU is a PDU that arrives in sequence, then receive the PDU, send a cumulative acknowledgment ACK to all correct PDUs that arrived in sequence, and slide the receiving window.
  2. If the received PDU falls within the receiving window, but the PDU arrives out of sequence, the PDU is buffered, the selection acknowledgment SACK for the PDU is sent, and the acknowledgment ACK for the last correct PDU is resent.

The concept of negative confirmation NAK

The selective retransmission $R protocol can be used in combination with a negative strategy, that is, when the receiver detects an erroneous PDU, it sends a否定确认(Negative Acknowledgement,NAK)

At the sender, receiving the NAK can trigger the retransmission of the PDU without waiting for the corresponding timeout timer to expire, so the protocol performance can be improved.

Transmission Control Protocol TCP

insert image description here

The TCP protocol is a connection-oriented reliable transport protocol that provides functions such as 连接管理, 可靠传输, 流量控制and 拥塞控制.

insert image description here

A TCP connection is a logical connection , and TCP regards it 连接as the most basic abstraction.
The endpoint of a TCP connection is called 套接字(socket)
a socket defined in RFC793, which consists of a port number concatenated with a P address:

套接字=(IP地址:端口号)

Each TCP connection has one and only two endpoints , and each TCP connection is uniquely identified by two sockets at both ends of the communication.

The hosts at both ends of the TCP connection need维护TCP连接状态

Once the connection is established, the TCP process in the host will set up and maintain the send buffer and receive buffer

Connection-oriented
Communication needs to establish a connection before

Byte stream-oriented
insert image description here
TCP does not preserve application layer message boundaries

The interaction between the application process and the TCP process is a data block at a time, but the TCP process regards these data blocks as a series of unstructured byte streams

TCP takes out the byte stream from the sending buffer at an appropriate time and encapsulates it into a segment and sends it out

TCP packet length
The TCP packet segment length is determined by the TCP process

The TCP process is 发送缓存中取出并放入报文段的字节流长度never 最大报文段长度(Maximum Segment Size,MSS)restricted and has nothing to do with the application layer packet boundaries.

MSSRefers to the maximum length of the data part in a TCP segment, excluding the TCP header

When TCP establishes a connection, it determines the MSS value through negotiation.

other features

TCP uses a sliding window protocol in bytes to achieve reliable delivery of services .
The data transmitted through the TCP connection can be guaranteed to be error-free, not lost, not repeated, and arrive in order.

TCP adopts a window-based flow control mechanism . The receiver sends the receiving window value to the sender, and the sender adjusts the length of the sending window according to the value to control the sending rate.
TCP can detect network congestion based on timeout retransmission events and fast retransmission events , slow down the sending speed, and perform congestion control.

TCP also supports congestion control using **Explicit Congestion Notification (ECN)**.

The sending timing of the TCP sending segment is controlled by the TCP process.

TCP message format
On the host ns57C, make a 3500-byte text file and name it 3500.0.
Initiate TCP communication from host ns56Ato host ns57C, and read 3500.0 files. Intercept the TCP segment on ns57C.
insert image description here
insert image description here

There are more pictures in the following parts, and the text can refer to the blog:
http://c.biancheng.net/view/6441.html
https://blog.csdn.net/a19881029/article/details/29557837

insert image description here

insert image description hereinsert image description here

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description hereinsert image description here
insert image description here
insert image description hereinsert image description hereinsert image description hereinsert image description hereinsert image description here
insert image description here
insert image description hereinsert image description herereserved bit

CWR: Congestion window reduction.
When CWR=1, it indicates that according to ECN echo, the sender has reduced the sending rate.
ECE: ECN echo.
A segment with ECE=1 is an explicit congestion notification from the receiver, indicating that the segment previously sent by the sender has encountered network congestion.
CWR and ECE are used for TCP's explicit congestion control, which will be introduced later.

URG: urgent data flag.
When URG=1, the urgent pointer field takes effect, indicating that the message segment contains urgent data, and the location of the urgent data is indicated by the urgent pointer field.
In 2011 RFC6093 suggested that emergency data should no longer be used.
ACK: Acknowledgment flag.
When ACK=1, the confirmation number field takes effect, indicating that the message segment contains confirmation information.
TCP stipulates that all segments after the connection is established must set ACK to 1 .

PSH: push sign.
When PSH=1, it indicates that the sender requires the receiver to deliver the data in the segment to the upper layer as soon as possible. In most TCP/IP implementations, including Berkeley Socket, the PSH flag is set to 1 to indicate that there is no data to be sent in the sender's cache. This flag is always set to 1 when dealing with interactive mode connections such as telnet.

RST: Reset connection.
When RST=1, it indicates that an error occurred in the TCP connection and the connection needs to be canceled.
The segment with RST=1 is usually called RST报文段.

SYN: Synchronous connection
When SYN=1, it indicates that the message segment is a TCP connection establishment request.
The segment with SYN=1 is usually called SYN报文段.
FIN: terminate the connection.
When FN=1, it indicates that the sender's data has been sent and requests to release the TCP connection.
A segment with FN=1 is usually called an FN segment.
insert image description hereinsert image description here

insert image description here
insert image description here
insert image description hereinsert image description here
options
insert image description hereinsert image description here
insert image description here
insert image description here

insert image description here

insert image description here

insert image description hereCalculation of the effective maximum segment length of the sender
TCP The sender needs to calculate the effective maximum segment length EMSS when encapsulating the segment.
EMSS is limited by the RMSS sent by the other party, and also by the sender's own MTU value, TCP option length, and P option length.
RFC1122 stipulates that the calculation formula of EMSS is as follows:

EMSS = min(RMSS + 20,MSS_S) - TCPhdrsize - IPoptionsize

Among them:
RMSSis the MSS value in the MSS option sent by the other party;
MSS_Sis the maximum value of the message segment containing the TCP header that the sender can send, and the calculation formula is:

MSS.S = 发送方的MTU-20;

TCPhdrsizeIt is the length of the TCP header containing the option;
IPoptionsizeit is the length of the lP option.

Do a simple transformation of the EMSS calculation formula stipulated in RFC1122, and you can get:

EMSS=min(RMTU,SMTU)-TCPhdrsize-IPhdrsize

That is, the EMSS value is: 发送方和接收方MTU中的较小值minus TCP首部长度andIP首部长度

In the specific implementation of TCP, 有效最大报文段长度EMSSthe calculation of y also needs to consider 路径MTU(PMTU)the limitations. PMTU refers to the smallest MTU among all links on the entire network path.

When the TCP sender sends data, in order to improve the transmission efficiency, it will try to encapsulate the TCP segment according to the EMSS value as much as possible. The
segment encapsulated according to the EMSS value is called 全长报文段(Full-sized Segment).

insert image description hereRMSS
RFC1122 stipulates that the default value of RMSS is 536字节.
If the MSS option is not included in the SYN segment, TCP sets the RMSS to 536 bytes.
In early computer networks, the X.25 protocol was widely used, and its MTU value was 576 bytes. The MTU value minus the 20-byte P fixed header length and the 20-byte TCP fixed header length just yields 536 bytes.

In the current network environment, the most typical RMSS value is 1460字节.

The acknowledgment number of WS
insert image description here
insert image description hereinsert image description here
SACK-P and SACK
TCP is 累积确认functional. Therefore, the TCP receiver cannot use the acknowledgment number field for acknowledgment of out-of-order segment arrivals.
Defined by RFC2018 选择确认SACK选项, it is used to confirm the out-of-sequence arrival of the segment.
If the TCP communication party wants to use the SACK option, it needs to add the allow option in the initial SYN message segment 选择确认(SACK Permitted,SACK-P).

insert image description here
insert image description here
TS
insert image description here

insert image description here
insert image description here

TCP connection management

insert image description here

You can refer to the video:

TCP's three-way handshake and four-way wave

The end that actively establishes the connection is called the client
, and the end that passively waits for the connection to be established is called the server .

In the process of establishing a TCP connection , three segment exchanges between the client and the server are required , which are called 三次握手or 三报文握手.

First, the server process B passively opens the connection and waits for the connection establishment request from the client CLOSED状态.LISTEN状态

insert image description here

In the first handshake
, the client process A sets the SYN flag to 1, selects 初始序号ISN(A), takes the IP address and port number of the server B as parameters, constructs a TCP segment, and sends it to B.

This segment is called SYN报文段.

Although the length of the data part of the SYN message segment is 0, it occupies a 1-byte number to facilitate the server to confirm the SYN request.

The client CLOSEDenters the state from the state SYN-SENT.

2nd handshake

After receiving client A's SYN segment, server process B sends its own SYN segment as a response.

In the message segment, set the SYN flag to 1 , select the initial sequence number ISN (B); and set the ACK flag to 1 , and use ISN (A)+1 as the confirmation number.

This segment is called SYN-ACK报文段.

The SYN-ACK message segment also occupies a 1-byte number. It is convenient for the client to confirm the SYN request.
The server goes from LISTENstate to SYN-RCVDstate.

3rd handshake

After receiving the SYN-ACK message segment from server B, client process A sends a confirmation message segment.

In the message segment, the ACK flag is set to 1, which will ISN(B)+1be used as the confirmation number. The sequence number field is ISN(A)+1.

This segment is called an ACK segment .

The ACK segment may or may not carry data. If no data is carried, the sequence number will not be occupied, and the sequence number field in the subsequent data segment sent by client A is still ISN(A)+1.


After the connection is establishedSYN-SENT and the ACK is sent, client A enters ESTABLISHEDthe state from the state.

At this point, for client process A, the TCP connection has been established, and data transmission can begin.
After server B receives the ACK segment from client A, SYN-RCVDit enters ESTABLISHEDthe state from the state.
At this point, server process B can also start data transmission.

After sending the ACK, client A SYN-SENTenters ESTABLISHEDthe state from the state.

At this point, for client process A, the TCP connection has been established, and data transmission can begin.

After server B receives the ACK segment from client A, it enters ESTABLISHEDthe state from the SYN-RCVD state.
At this point, server process B can also start data transmission .

insert image description here
RST resets the connection
If the client process sends a SYN segment request to a Socket to establish a TCP connection,

However, the port pointed to by the Socket is not bound to the server application process, that is, no service process on the port is in the LISTEN state.

Then the TCP process on the server will set the RST flag, send the RST segment to the client process, and refuse to establish a connection.

TCP connection release
insert image description here
After the data transmission is over, both parties to the communication can 主动释放TCP连接.

The end that releases the connection actively is called the client, and the end that releases the connection passively is called the server.

In the process of TCP releasing the connection , four message segment exchanges between the client and the server are required , which is called four-way handshake or four-message handshake .

Assume that process A actively releases the connection. In the process of data transmission, the last byte number sent by process A to process B is recorded as x-1,

The number of the last byte sent by process B to process A is denoted as y-1.

The first handshake
· A sends a release connection request segment to B.

In this message segment, it will FIN标志be set to 1, and the serial number is filled with × .

Since TCP recommends setting 1 in all segments after the connection is established ACK标志, A will also ACK标志be set to 1, and fill in the confirmation number as y to confirm the last byte of data received.

This segment is generally called a FIN segment.

The length of the data part of the FIN message segment is 0, but it occupies 1 byte number, so as to facilitate the confirmation of the FIN request by the other party in the communication.

Clients ESTABLISHEDenter from state FIN-WAIN-1状态.

The second handshake
After receiving A's release connection request, server process B should confirm immediately.

In the confirmation message segment, B will ACK标志be set to 1, fill in the confirmation number as x+1 , and fill in the serial number as y .

If the ACK contains no data, no byte number is occupied.
The server goes from ESTABLISHEDstate to CLOSE-WAINstate.
After the client receives the confirmation segment, FIN-WAIT-1it enters FIN-WAIT-2the state from the state

At this time, the TCP connection is in progress 半关闭状态,
and A can no longer send data. If B has data, it can continue to send, and A still needs to receive it.

The third handshake
In the example on the left, in the half-closed state, B does not send data.
When process B needs to release the connection, it also needs to send a FINrequest.

In this message segment, the FIN flag is set to 1, and the sequence number is still y. B also sets the ACK flag to 1, and the confirmation number is still x 1.

Although the FIN segment does not contain data, it occupies a 1-byte number.

The server goes from CLOSE-WAITstate to LAST-ACKstate

The fourth handshake
After receiving B's FIN request, client process A should confirm immediately.

In the confirmation segment, A will ACK标志be set to 1, fill in the confirmation number as y+1, and fill in the serial number as x+1.

After client A sends the confirmation message segment, FIN-WAIT-2it enters TIME-WAITthe state from the state.

Connection release
After server process B receives the last one ACK报文, LAST-ACKit enters COLOSEDstate from state. At this point, the TCP connection is closed for B.
Client process A needs to wait for 2MSL timeTIME-WAIT in the state before entering the state.COLOSED

MSLRepresents the longest segment lifetime. RFC793 recommends MSL to be 2 minutes.

When the 2MSL timer expires, A TIME-WAITenters CLOSEDthe state from the state. For A, the TCP connection is closed.

insert image description here
insert image description here

TCP's reliable transmission

insert image description here

TCP's reliable transport protocol is a sliding window protocol 字节in units of units.

Features of TCP reliable transmission:

1. The serial number in the TCP window is not numbered in units of PDUs, but numbered in units of bytes.
2. The send window and receive window of TCP are both greater than 1.
3. The length of the sending window and receiving window of TCP is not fixed, but changes dynamically.
4.TCP supports multiple retransmission mechanisms: timeout retransmission, fast retransmission and SACK retransmission.

Sliding window in bytes
TCP's sliding window operates on the 连续ARQ协议same principle as .

Case Analysis:

insert image description here
insert image description here
insert image description hereinsert image description here

insert image description here
insert image description here

Receive Buffer and Receive Window

Due to the influence of time delay, observing the above TCP communication example from the host ns56A, the order of the obtained message segments is different.

Assuming that ns56A sends confirmation information immediately after receiving the message segment, observe the TCP communication example in this section on ns56A, and the sequence of the obtained TCP message segment should be:

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
timeout retransmission

If it occurs 报文段丢失或差错, TCP will use the previously introduced 超时重传机制automatic retransmission of the message segment that has timed out and has not received an acknowledgment.

TCP's timeout retransmission is similar to the GBN protocol, and 累积确认it cannot confirm the out-of-sequence arrival of the segment separately.

The concept of TCP timeout retransmission is very simple, but in the choice of timeout retransmission time RTO is more complicated.

TCP measures 往返时延RTT, calculates 平滑往返时间, and calculates 超时重传时间RTO.

Estimation of round-trip time RTT
TCP records the sending time of a message segment and the time of receiving the corresponding confirmation information. The difference between the two is used as an RTT measurement value, also known as an RTT sample, which is recorded as RTT sam. TCP maintains an RTT
weight The average value, called the smoothed round-trip time, is denoted as SRTT.
Every time a measurement is performed, TCP calculates a new smoothed round-trip time SRTT according to the following formula:

SRTT = ( 1-α ) x SRTT + α x RTT sam

In RFC6298, it is recommended that α take a value of 0.125.

The initial value of SRTT shall be set to the first valid RTT sample.

A weighted average like SRTT is called 指数移动加权平均值, and the closer the time is to the current moment, the greater the weight of the data.

Estimation of overtime retransmission time RTO

RFC6298 defines RTT偏差, denoted as RTTV, to estimate the degree of RTT样本deviation SRTT.

RTT deviation is also an exponential moving weighted average. Every time RTT sam is obtained , TCP calculates RTTV according to the following formula:

RTTV = ( 1 - β ) x RTTV + β x | SRTT - RTT sam |

In RFC6298, it is recommended that β take a value of 0.25.

The initial value of RTTV is set to half of the value of the first RTT sample.

The timeout retransmission time RTO should be slightly greater than the smoothed round-trip time SRTT.

Every time RTT sam is obtained , TCP calculates the sum , SRTTand RTTVthen calculates according to the following formula RTO:

RTO = SRTT + max(G,4 x RTTV)

G in the above formula means 系统的时钟粒度(clock granularity)that even if the calculated RTTV approaches zero, the RTO should be 1 clock granularity larger than the SRTT.
In the Linux system, the TCP clock granularity is 1ms, so the RTO is at least 1ms larger than the SRTT.
RFC6298 suggests setting an upper bound and a lower bound for the RTO. The suggested value of the upper bound is 60 seconds, and the suggested value of the lower bound is 1 second .
Before obtaining valid RTT samples, RFC6298 recommends setting the initial value of RTO to 1秒.

RTT sample measurement
insert image description hereKarn algorithm

Karn's algorithm consists of two parts:

  1. After the segment is retransmitted, the segment is not used as the RTT sample.
  2. Every time a segment is retransmitted, the RTO is doubled until no more retransmissions occur.

The Karn algorithm enables TCP to distinguish between valid and invalid samples, ensuring that the RTO calculation results are more reasonable

RTT sample measurement based on timestamp option
In the previous article, we introduced TCP的时间戳the options that can be used 往返时间测量.
When the sender receives the confirmation message, it can be obtained by subtracting the time of the echoed timestamp from the current time往返时间 .

Using the timestamp option to calculate the round-trip time can obviously avoid the above-mentioned ambiguity, so it is not necessary to use the first part of the Karn algorithm.

insert image description here
Fast retransmission
The timeout retransmission mechanism can achieve reliable transmission, but it has the following disadvantages:

  1. The timeout retransmission mechanism of TCP is similar to the GBN protocol.
  2. TCP's timeout retransmission mechanism will bring greater network load.
  3. According to part ② of the Karn algorithm, the timeout retransmission event will also cause a rapid increase in RTO, which will cause a decrease in network utilization

More efficient fast retransmission mechanisms are defined in RFC5681 and RFC6582.

The fast retransmission mechanism does not rely on 重传计时器超时, but triggers retransmission based on the feedback information of the receiver. The fast retransmission mechanism 重复ACK(duplicate ACK)事件detects packet loss through detection and triggers retransmission.

Since the TCP confirmation number has a cumulative confirmation function, when the receiver TCP receives an out-of-sequence segment, the confirmation number in the sent ACK is the same as the confirmation number that confirms the last segment that arrived in sequence. This kind of ACK that reconfirms a certain segment is called ACK 重复ACK.

We first introduce the receiver's strategy for sending ACK

Policy Duplicate ACK Threshold for TCP Receiver to Send ACK
insert image description here
insert image description here

Since the network layer does not guarantee that datagrams are submitted in order, the TCP sender only receives a duplicate ACK, and cannot confirm whether it happened
丢包or not 失序传输.

RFC5681 stipulates that 重复ACK的阈值(DupThresh)the default value is 3. Note: The threshold of duplicate ACKs is allowed to be adjusted.

When 3 repeated ACKs are received, it is considered that the message segment after the message segment that has been confirmed 4 times (1 normal confirmation and 3 repeated confirmations) has been lost.

When the TCP sender receives 3 repeated ACKs, TCP starts to 快重传retransmit the lost segment immediately without waiting for the retransmission timer to expire.

insert image description here

Valid ACK

The fast retransmission mechanism belongs to 选择重传协议. 快重传After starting, TCP can only retransmit one segment before receiving a valid ACK.

A valid ACK is an ACK that acknowledges newly arrived data. Effective ACK includes two types: 完全ACKand 部分ACK, the distinction between these two effective ACKs depends on the definition of the recovery point.

When the sender starts the fast retransmission algorithm, multiple numbers have been sent 失序报文段, and RFC6582 refers to the maximum sequence number that the sender has sent at this time as the recovery point .

insert image description here
insert image description here
Key points of the fast retransmission algorithm
The main points of the fast retransmission algorithm of the TCP NEWReno version can be summarized as follows:

  1. 收到3个重复ACK: Record the recovery point, start the fast retransmission algorithm, and retransmit the lost segment;
  2. 收到部分ACK: Immediately retransmit the next lost segment:
  3. 收到完全ACK: Exit fast retransmission.

TCP's fast retransmission mechanism actually 重复ACKimplements the implicit否定确认(NAK)

优势: Compared with timeout retransmission, fast retransmission can more timely and effectively repair packet loss and improve retransmission efficiency.

不足: Although multiple lost segments within a window can be quickly retransmitted, the second retransmission is after receiving the confirmation information for the first retransmission, and the time interval between the two retransmissions is greater than one RTT , so it is not efficient , and it is still easy to trigger timeout retransmission .

SACK retransmission

The fast retransmission mechanism 部分ACKcan only get the information of the next lost segment after receiving it, so only one segment can be retransmitted within one RTT.

The SACK option of TCP is used to describe the segment information of the out-of-order arrival.

The SACK option information carried in 重复ACKthe message can reflect multiple data gaps in the receiver, so the sender can retransmit multiple message segments within one RTT according to the SACK information.

SACK-based retransmission mechanisms also belong 选择重传协议. However, TCP's SACK retransmission mechanism is quite different from the principle protocol for selective retransmission introduced earlier.
TCP is a protocol that operates in a complex network environment. Its SACK retransmission mechanism takes more factors into account, and the behavior of both parties in communication is more complicated.

For the convenience of description, the data receiver adopting the SACK option is referred to as the data receiver SACK接收方, and the data sender adopting the SACK option is referred to as the abbreviation here SACK发送方.

Behavior of the SACK receiver
After receiving the out-of-sequence segment, the SACK receiver temporarily stores the out-of-sequence data in the segment 接收缓存, and then generates and sends one back to the sender 包含SACK选项的报文段.
For the SACK sender, according to the acknowledgment number in the message segment, the message segment containing the SACK option belongs to one 重复ACK.

A SACK option can contain multiple byte blocks, and the byte block in the SACK option is called simply SACK块.
RFC2018 specifies the rules for the receiver to generate SACK options as follows:

① When generating SACK options, the receiver should fill in as many SACK blocks as possible.
②The first SACK block must indicate the sequence number of the out-of-sequence data that triggers the SACK option.
③Other SACK blocks indicate the sequence number of the out-of-sequence data received recently. These SACK blocks used to be filled
in the SACK option sent before.

Behavior of the SACK sender
In addition to recording the received cumulative confirmation information, the SACK sender also needs to record the received SACK information.
According to the accumulated acknowledgment information and SACK information, the SACK sender maintains a data structure for recording the sequence number range of out-of-sequence data blocks that have been correctly received and the sequence number range of data gaps .
In RFC6675, this data structure is called记分板(scoreboard)

insert image description here
After receiving the SACK option, the sender starts SACK retransmission if it determines that the segment is lost.

According to 重复ACK阈值(DupThresh)RFC6675, when one of the following two conditions is met, SACK retransmission is started:

  1. Received DupThreshnon-consecutive SACK blocks
  2. SACK块的最高数据序号 - 累积ACK号 > (DupThresh - 1) × MSS。

The above two conditions can trigger SACK retransmission in the following situations.

  1. 3 discontinuous repeated ACKs are received, and the sender's message segment is small, and the first condition is met, and SACK retransmission is started.
  2. When 3 consecutive repeated ACKs are received, the second condition is met, and SACK retransmission is started.
  3. The sender has lost multiple data segments, and the number of repeated ACKs sent back by the receiver is less than three, but the second condition is met, and SACK retransmission is started.
  4. In the design of the TCP protocol, there is no confirmation and retransmission mechanism for "pure ACK" that does not contain data. If the duplicate ACK sent by the receiver is lost, but the duplicate ACK that arrives later meets the second condition, that is, as long as the sequence number of the received SACK block is large enough, SACK retransmission can also be triggered.

insert image description here

insert image description hereinsert image description here
insert image description here

When SACK is retransmitted, TCP retransmits the vacant segments sequentially from low sequence numbers to high sequence numbers according to the information in the "scoreboard". TCP will not send new data until there are no vacant segments.

Advantages : The SACK-based retransmission algorithm is more flexible than the fast retransmission algorithm in judging the loss of a segment, and can retransmit multiple vacant segments within one RTT. In the case of severe packet loss, it is faster than the fast retransmission algorithm. The retransmission algorithm is more efficient and less likely to trigger timeout retransmissions.

Note 1 : After receiving a SACK, the SACK sender cannot clear the corresponding data in its retransmission buffer. Only after receiving the cumulative ACK can it clear the corresponding data in its retransmission buffer.

Note 2 : RFC2018 stipulates that when TCP initiates timeout retransmission, the information in SACK should be ignored. Even if the SACK confirmation has been received, all message segments after the timed-out message segment need to be retransmitted.

Comparison of three TCP retransmissions
insert image description here

TCP flow control

insert image description here

If the receiver application process reads data relatively slowly, and the sender sends too much data too fast, the sent data may cause the receiving buffer to overflow .

通知窗口长度The TCP flow control mechanism completes the adjustment of the sending speed, which is realized based on the data in the ACK segment . This approach provides explicit status information from the receiver and avoids buffer overflow at the receiver.

停等协议And 连续ARQ协议, both use a fixed-length sending window, which cannot be adjusted according to the receiver's situation.
The TCP protocol uses a variable-length sending window , and its sending window is set according to the notification window of the receiver.

The process of TCP flow control

Each time the receiver receives a segment, it recalculates its receiving window length.
In earlier TCP implementations, the TCP receiver was assigned a fixed-size receive buffer, and the length of the receive window was calculated using the following formula:

接收窗口长度 = 接收缓存字节数 - 已缓存但未被读取的按序到达字节数

In the newer version of TCP implementation, it has been added TCP接收窗口长度自动调优算法. This algorithm comprehensively considers factors such as the current available buffer capacity and the bandwidth-delay product of the connection, adjusts the receiving buffer allocated to the TCP connection, and then calculates the receiving window length.

When the receiver sends confirmation information to the sender, it fills the calculated receive window length into the window field in the TCP header and notifies the sender

The sender requires that its sending window must be less than or equal to the notification window . When the impact of congestion control is not considered, the sender sets the sending window equal to the notification window.
The sender sends the segment according to its sending window.

Zero window notification/window update message/window detection message

During flow control, if the receive buffer is exhausted , the receiver will set the notification window length to 0. Sent 零窗口通知to the sender, the sender is not allowed to continue sending new data.

After the receiver regains available buffer space , it is actively transmitted to the sender 窗口更新报文. Window update messages usually do not contain data. belong纯ACK

In the design of the TCP protocol, 纯ACKthere is no confirmation and retransmission mechanism for data that does not contain data. If the window update message is lost, the sender will always wait for the window update message, while the receiver will always wait for new data, and the protocol will fall into a deadlock state .

In order to avoid this deadlock state, the TCP sender will maintain one 持续计时器.

Once the zero-window notification is received, the sender sets a continuous timer, and when the continuous timer expires, one is sent to 窗口探测报文query the change of the receiver's notification window.

confused window syndrome

If the available buffer space obtained by the receiver is very small after the application process reads the data , sending a window update message at this time will cause a decrease in transmission efficiency.

In extreme cases, both the sender and the receiver will exchange message segments containing only 1 byte of data. This phenomenon is called RFC813 糊涂窗口综合征.

In order to avoid the confused window syndrome, RFC1122 suggests that TCP only send window update packets when one of the following two conditions is met.

  1. The available buffer can accommodate a full-length segment
  2. Available buffer reaches half of receive buffer space.

Nagle algorithm

When the TCP sender sends data, it will try to encapsulate the full-length segment according to the EMSS value.

For interactive applications , the timeliness of this sending mechanism of TCP is poor.

If each byte of data of an interactive application is encapsulated and sent separately, the transmission efficiency is very low. In interactive applications, TCP widely uses the Naglet algorithm , taking into account 传输效率and 时效性.

  1. If the sending application process sends the data to be sent byte by byte to the sending buffer of TCP, the sender sends the first data byte first, and caches all subsequent data bytes.
  2. When the sender receives the acknowledgment of the first data byte, it encapsulates all the data in the sending buffer into a message segment and sends it out, and at the same time continues to buffer the subsequent arriving data.
  3. Only proceed to send the next segment after receiving an acknowledgment for the previous segment.
  4. In addition, a segment is sent immediately when the buffered data has reached half the size of the send window or the maximum segment length has been reached.

TCP congestion control

insert image description here

The phenomenon that the router cannot handle the traffic arriving at a high rate and is forced to drop packets is called 拥塞.

A router in a congested state is called 拥塞结点.
Congestion occurs for many reasons:

Nodes have less buffer space, output link
capacity is lower,
node processors are less powerful
, etc.

Congestion is a complex comprehensive problem, which cannot be solved by simple measures such as increasing resources.

Network congestion will bring many negative effects:
insert image description here
insert image description hereCongestion control
The goal of congestion control is to prevent the network from entering a congested state , that is, to keep the network load Bbefore the point. The so-called congestion control is to control the behavior of the TCP sender
by related algorithms to prevent too many packets from entering the network and avoid overloading routers or links in the network. Congestion control needs to be implemented by limiting the sending rate of the sender .

Congestion control and flow control:

insert image description here
TCP congestion control method
insert image description here
In 1988, Van Jacobson proposed the initial TCP congestion control algorithm, which is based on the principle of data packet conservation :

A new packet isn't put into the network until an old packet leaves.

The congestion control algorithm proposed by Jacobsoni, including 慢开始and 拥塞避免, laid the cornerstone of the congestion control algorithm of the TCP protocol.

insert image description here
TCP 拥塞控制方法is based on window.
TCP adds a state variable called拥塞窗口(congestion windows,cwnd)

The length of the congestion window depends on 网络的拥塞程度, and can change dynamically according to the network congestion. The sender requires that its sending window must be less than or equal to the congestion window to control the sending rate. The principle of TCP congestion control is:

如果网络中没有出现拥塞,就增大拥塞窗口,以此提高发送速率,提高吞吐量;
如果网络中出现了拥塞,就减小拥塞窗口,以此降低发送速率,降低网络负载。。

Note :
The final sending window length = Min(notification window length, congestion window length)
In the discussion in this section, the influence of the notification window is temporarily ignored.

How does the TCP sender monitor the congestion level of the network?
Monitor the three previously introduced retransmission events:

超时重传: timeout timer timeout event
快重传: 3 repeated ACK events
SACK重传: one of the two conditions specified in RFC6675

When the above three retransmission events occur, TCP considers that there are different degrees of network congestion, and applies different congestion control algorithms for processing.

slow start

It needs to be executed at the beginning of the establishment of the TCP connection or after a timeout retransmission event occurs 慢开始算法.
After the TCP connection is established, it needs to be set 初始拥塞窗口, denoted as 初始窗口IW.
RFC5681 stipulates that the IW value is 2 to 4 SMSS, the maximum message segment length of the sender, and the specific regulations are as follows:
insert image description here
each time one is received 有效ACK, the congestion window is increased by a value not exceeding 1 SMSS.
The calculation formula stipulated in RFC5681 is as follows:
cwnd += min(N,SMSS)
where N represents the number of bytes that have not been confirmed before and are now confirmed by the ACK that has just arrived.

Obviously, when N<SMSS, every time an ACK is received, the increment of cwnd is smaller than SMSS.

In most cases, the message segment sent by TCP is a full-length message segment. At this time, each time an ACK is received, cwnd increases by 1 SMSS.

From TCP sending a round of message segments to TCP receiving confirmation of these message segments, the elapsed time is approximately equal to one RTT, which we call a transmission round . Using the term transfer round makes it easier for us to describe TCP的拥塞控制算法.

In the slow start phase, 拥塞窗口cwndit grows exponentially with rounds, and cwnd doubles every time a transmission round passes.

insert image description here
insert image description here
关于延迟确认:
快重传中已经介绍,RFC5681中规定:如果TCP接收方收到的两个报文段间隔时间小于500ms,则每两个报文段发送一个ACK,这称为延迟确认

如果采用延迟确认,那么慢开始阶段cwnd的增长速度将放缓。

在某些操作系统的 TCP/IP 实现中,在慢开始阶段采用了快速确认模式,即慢开始阶段不使用延迟确认。

慢开始–ssthresh
什么时候结束这种指数增长呢?慢开始算法提供了以下几种策略:

  1. 拥塞窗口增长超出慢开始阈值ssthresh
  2. 监测到重传事件时

TCP维持一个状态变量叫慢开始阈值ssthresh,也译作慢开始门限。

当cwnd<ssthresh时,TCP采用慢开始算法;
当cwnd>ssthresh时,TCP停用慢开始算法,改用拥塞避免算法:
当cwnd=ssthresh时,TCP选用慢开始算法或者拥塞避免算法。
ssthresh的初值应设置得尽可能高,然后ssthresh值随拥塞控制而调整。

慢开始–超时事件
当监测到超时事件时,TCP停止cwnd的增长,按照以下公式计算ssthresh:
ssthresh = max(FlightSize/2 , 2 x SMSS)
其中,FlightSize为在途数据量,代表已经发出但尚未被累积确认的字节数。
在不考虑通知窗口的限制时,可以近似认为FlightSize ≈ cwnd , 此时ssthresh的计算公式变换为:
ssthresh = max(cwnd/2 , 2 X SMSS)
然后,将cwnd设为1,重新执行慢开始算法。

insert image description here
慢开始算法要点小结:

  1. The initial value of IW is 2~4 SMSS, and the initial value of ssthresh is as high as possible
  2. Every time a valid ACK is received, cwnd+1 smss; that is, every time a round passes, cwnd doubles
  3. If cwnd > ssthresh, stop slow start and execute congestion avoidance algorithm
  4. When a timeout event occurs, set ssthresh = cwnd/2, then set cwnd=1, and re-execute the slow start
  5. When a fast retransmission or SACK retransmission event occurs, execute the fast recovery algorithm

congestion avoidance
insert image description hereinsert image description here

Fast recovery When an or
is detected , TCP starts an or , and at the same time starts a fast recovery . RFC stipulates that the fast recovery algorithm and the fast retransmission algorithm are implemented together.3个重复ACK收到的SACK满足RFC6675的两个条件之一
快重传SACK重传

RFC5681 specifies the fast retransmission and fast recovery algorithms of the TCP Reno version.
· RFC6582 specifies the fast retransmission and fast recovery algorithms of the TCP NewReno version.
· RFC6675 stipulates the fast retransmission and fast recovery algorithm after enabling SACK support.

After the fast recovery algorithm of the three versions is adjusted in the fast recovery phase, when the fast recovery algorithm exits, the cwnd value and ssthresh value are half of the cwnd value when the fast recovery algorithm is started.

After exiting the fast recovery algorithm, TCP starts 拥塞避免算法.

**Note:** During the execution of the TCP fast recovery algorithm, if a timeout retransmission event is detected, TCP will exit the fast recovery algorithm, set cwnd to 1 SMSS, and re-execute the slow start algorithm.

Congestion control state transition
According to different execution algorithms, TCP classic congestion control includes three stages: slow start, congestion avoidance, and fast recovery.

The three stages function as follows:

  1. The slow start phase is the phase in which TCP detects the current network transmission capacity.
  2. The congestion avoidance phase is a stable operation phase of TCP, during which TCP continues to detect possible network resources.
  3. The fast recovery phase is a phase in which TCP adjusts and restores stable operation after discovering network congestion.

insert image description here
The AIMD algorithm
TCP congestion control algorithm is called AIMD算法:

In the congestion avoidance phase, TCP linearly increases the congestion window cwd and slowly detects the network transmission capacity. This feature is called additive growth .

In the fast recovery phase, after fast adjustment and recovery, TCP sets the slow start threshold ssthresh and the congestion window cwnd to half of the cwnd when fast recovery is triggered. This feature is called multiplicative reduction .

insert image description hereAIMD Algorithm Fairness Discussion

What kind of congestion control algorithm is fair?

Under the control of the congestion control algorithm, multiple TCP connections passing through a certain node occupy bandwidth resources on average , so the algorithm is fair.

·For example: n TCP connections all go through a router with a maximum processing capacity of Xbit/s. If there is no congestion control mechanism, and the sender of each connection sends data at the maximum rate, the network is congested. Under the adjustment of the congestion control algorithm, if the sending rate of each connection converges to Xbit/s in the end, the congestion control algorithm is fair.

AIMD algorithm is fair

Two Factors Affecting the Send Window Value The
two factors that affect the send window swnd are:

  1. Notification window awnd: TCP flow control requires that the sending window of TCP is less than or equal to the notification window
  2. Congestion window cwnd: TCP congestion control requires the sender's sending window to be less than or equal to the congestion window
    swnd = min(awnd,cwnd)

Explanation of the above formula:

当awnd较小时,决定发送方发送速率的是TCP的流量控制
当cwnd较小时,决定发送方发送速率的是TCP的拥塞控制

Network layer assisted congestion control
Network layer assisted congestion control methods include: Active Queue Management AQM and Explicit Congestion Notification ECN
insert image description here

Active Queue Management:

Tail discard strategy : When the queue is full, all packets arriving later will be discarded.

The router 尾部丢弃will cause the loss of a batch of packets, and the TCP sender 拥塞控制算法will reduce the congestion window value according to.

When multiple TCP connections pass through the bottleneck router in the network, the router's tail drop strategy will cause multiple TCP connections to reduce the congestion window value at the same time. This phenomenon is called TCP全局同步.

In order to avoid the phenomenon of global synchronization, ETF proposed 主动队列管理AQM.

AQM does not wait until the router's queue is full before discarding packets, but when the queue length reaches a certain value or when the network has some signs of congestion, it actively discards some of the arriving packets.

AQM only causes 部分TCP连接降低拥塞窗口值, avoiding global synchronization .
A typical AQM approach is随机早期检测RED

Random Early Detection AQM:

To implement the RED algorithm, the router needs to pre-set three parameters: the minimum queue length threshold min th , the maximum threshold max th , and the maximum packet loss probability max p
insert image description here
Explicit congestion notification
insert image description here

Chapter 5 Network Layer

Network Layer Overview

The network layer is the most important layer in the Internet architecture , and its main task is to provide host-to-host communication services to the upper layer.

The switching method adopted by the Internet is 分组交换that the key equipment to realize packet switching is the core part of the network 路由器.

The network layer in the router is the focus of this chapter.

Control plane and data plane of traditional network

A router is a specialized computer with multiple interfaces, each connected to a different network. The networks that the router can connect 异构to.

The main functions of the router include 分组转发and 路由选择, wherein the packet forwarding function belongs to the data plane, and the routing function belongs to the control plane.

In a traditional network, each router is composed of 实现路由选择功能的控制平面and 实现分组转发功能的数据平面.

Structure of the Router
The core component of the control plane is 路由选择处理机.
The data plane consists of 一组输入接口, , 一组输出接口and 交换结构.
insert image description here
The control plane of traditional networks is implemented in a distributed manner . Every router contains a control plane.

Each router exchanges network topology information with other routers through routing protocols, and independently maintains routing tables (forwarding tables)

The data plane of the traditional network adopts the forwarding policy based on the destination address.
The router looks up the forwarding table according to the destination IP address of the received packet, and forwards the packet.
insert image description here
Control and data planes of software-defined networking

Software-defined network SDN builds a network architecture by separating the control plane from the data plane .可编程控制

The network switching equipment of SDN only needs to implement the functions of the data plane, and the functions of the control plane are realized on the remote controller.

To distinguish it from traditional routers, SDN will 受控网络交换设备be called SDN网元or SDN交换机.

The control plane of SDN is implemented in a centralized manner .

All of the SDN 控制逻辑is implemented in the SDN controller, and the SDN controller 控制数据平面接口CDPIcontrols and manages the SDN switch.

The SDN controller maintains 流表and OpenFlow协议sends the flow table to the SDN switch.

SDN controllers 北向接口open programmability to network control applications.

The data plane of SDN adopts a general forwarding strategy , that is, a forwarding strategy based on a flow table. The matching domain of the flow table is a collection of header fields. The SDN forwarding strategy can match multiple header fields in the protocol stack.

Advantages and disadvantages of data plane and control plane separation:

insert image description hereThe current network is still dominated by traditional networks

软件定义网络SDNAlthough SDN has been proposed and developed for many years, it is still impossible to completely replace the traditional network due to the following reasons

① There is still no unified international standard for SDN;
② A large number of traditional network devices have been deployed on the Internet;
③ The functions and functions of the routing protocol between autonomous systems in the Internet -the Border Gateway Protocol are still irreplaceable;

In the current Internet, traditional networks still occupy a large market. The discussion in this book is still dominated by traditional network

The main agreements in this chapter:

Internet protocol IP : the core protocol of the network layer, and the data of the transport layer TCP, UDP and other protocols are transmitted through P datagrams
.
Internet Control Message Protocol ICMP : Provides diagnostic and control information related to network configuration information and IP datagram disposition.
Routing protocol : A protocol used to exchange routing information, link state information or network topology information between routers, mainly including routing 信息协议RIP, 开放最短路径优先OSPF协议and 边界网关协议BGP
multi-protocol label switching MPLS : Provide connection-oriented quality of service for network layer protocols such as P, and support traffic engineering , load balancing, support for MPLS VPN, etc., are widely used in operators and ISPs.
Related Protocol Address Resolution Protocol ARP

Internet Protocol IP

The IP protocol is 网际互联a protocol designed for implementation.

The protocol data unit of IP is usually called IP分组or IP数据报.

The IP protocol shields the implementation details of the underlying network. After the IP protocol is adopted, the protocol entities above the network layer do not need to consider the implementation details of the specific network.

The network that uniformly adopts the IP protocol is also called IP网络or abbreviated IP网.

In an IP network, IP addresses with the same network prefix belong to the same network; IP addresses with different network prefixes belong to different networks.

There are currently two versions of the IP protocol in use, namely IPV4and IPv6.

insert image description here
Scope of the IP protocol
insert image description here
The scope of the IP protocol is 源主机的网络接口to 目的主机的网络接口.
The IP protocol only provides simple, flexible, connectionless, best-effort datagram services to the upper layers.
Each IP datagram 独立发送has nothing to do with the IP datagrams before and after it.

IP datagram format

An IP datagram consists of two parts : a header (called a header) and data . The first part of the header is of fixed length and 20 字节is mandatory for all IP datagrams. Following the fixed portion of the header are optional fields of variable length.

Every IP datagram begins with an IP header. The source computer constructs this IP header, and the destination computer processes the data using the information encapsulated in the IP header. The IP header contains a lot of information, such as source IP address, destination IP address, datagram length, IP version number, etc. Each piece of information is called a field.

insert image description here
insert image description here

The minimum length of the IP header is 20 bytes. The meaning of each field in the above figure is as follows:

1) The version (version)
occupies 4 digits, indicating the version of the IP protocol. The IP protocol versions used by both communication parties must be consistent. The currently widely used IP protocol version number is 4, that is, IPv4.
2) The header length (Internet Header Length IHL)
occupies 4 bits, and the maximum representable decimal value is 15. The unit of the number represented by this field is a 32-bit word length (a 32-bit word length is 4 bytes). Therefore, when the IP header length is 1111 (that is, 15 in decimal), the header length reaches 60 bytes. When the length of the header of an IP packet is not an integral multiple of 4 bytes, it must be filled with the last padding field.

数据部分永远在 4 字节的整数倍开始,这样在实现 IP 协议时较为方便。首部长度限制为 60 字节的缺点是,长度有时可能不够用,之所以限制长度为 60 字节,是希望用户尽量减少开销。最常用的首部长度就是 20 字节(即首部长度为 0101),这时不使用任何选项。
3) 区分服务(tos)
也被称为服务类型,占 8 位,用来获得更好的服务。这个字段在旧标准中叫做服务类型,但实际上一直没有被使用过。1998 年 IETF 把这个字段改名为区分服务(Differentiated Services,DS)。只有在使用区分服务时,这个字段才起作用。
4) 总长度(totlen)
首部和数据之和,单位为字节。总长度字段为 16 位,因此数据报的最大长度为 2^16-1=65535 字节。
5) 标识(identification)
用来标识数据报,占 16 位。IP 协议在存储器中维持一个计数器。每产生一个数据报,计数器就加 1,并将此值赋给标识字段。当数据报的长度超过网络的 MTU,而必须分片时,这个标识字段的值就被复制到所有的数据报的标识字段中。具有相同的标识字段值的分片报文会被重组成原来的数据报。
6) 标志(flag)
占 3 位。第一位未使用,其值为 0。第二位称为 DF(不分片),表示是否允许分片。取值为 0 时,表示允许分片;取值为 1 时,表示不允许分片。第三位称为 MF(更多分片),表示是否还有分片正在传输,设置为 0 时,表示没有更多分片需要发送,或数据报没有分片。
7) 片偏移(offsetfrag)
Takes 13 places. When the message is fragmented, this field marks the relative position of the fragment in the original message. The slice offset takes 8 bytes as the offset unit. Therefore, except for the last fragment, the offset values ​​of other fragments are integer multiples of 8 bytes (64 bits).
8) The time to live (TTL)
indicates the lifetime of the datagram in the network, occupying 8 bits. This field is set by the originating host of the datagram. Its purpose is to prevent undeliverable datagrams from being transmitted indefinitely across the network, thereby consuming network resources.

The router decrements the TTL value by 1 before forwarding the datagram. If the TTL value decreases to 0, the datagram is discarded and not forwarded. Therefore, TTL indicates the maximum number of routers that the datagram can pass through in the network. The maximum value for TTL is 255. If the initial value of TTL is set to 1, it means that this datagram can only be transmitted in the local area network.
9) Protocol
Indicates the protocol type used by the data carried by the data message, occupying 8 bits. This field can facilitate the IP layer of the destination host to know according to which protocol to process the data part. Different protocols have specific protocol numbers.

例如,TCP 的协议号为 6,UDP 的协议号为 17,ICMP 的协议号为 1。
10) 首部检验和(checksum)
用于校验数据报的首部,占 16 位。数据报每经过一个路由器,首部的字段都可能发生变化(如TTL),所以需要重新校验。而数据部分不发生变化,所以不用重新生成校验值。
11) 源地址
表示数据报的源 IP 地址,占 32 位。
12) 目的地址
表示数据报的目的 IP 地址,占 32 位。该字段用于校验发送是否正确。
13) 可选字段
该字段用于一些可选的报头设置,主要用于测试、调试和安全的目的。这些选项包括严格源路由(数据报必须经过指定的路由)、网际时间戳(经过每个路由器时的时间戳记录)和安全限制。
14) 填充
由于可选字段中的长度不是固定的,使用若干个 0 填充该字段,可以保证整个报头的长度是 32 位的整数倍。
15) 数据部分
表示传输层的数据,如保存 TCP、UDP、ICMP 或 IGMP 的数据。数据部分的长度不固定。

补充
区分服务:
支持区分服务DS功能的结点称为DS结点
跳过区分服务的细节个绍
IP协议中,对lP数据报采取的转发处理行为称为每跳行为(Per-Hop Behavior,PHB)。
不同的PHB种类代表了不同种类的服务质量。
6位DS字段可以定义64个区分服务码点DSCP。

insert image description here
insert image description here
Default PHB (Default PHB, DF PHB)
The default value of DSCP is all 0, which means that the conventional best-effort delivery IP datagram forwarding strategy is adopted.

Class Selector PHB group (Class Selector PHB, CS PHB)
RFC2474 stipulates: in order from high to low, the 0-2 bits of the DS field are compatible with the priority definition in the early service type field, and the 3rd in the DS field The PHB corresponding to the DSCP value whose 5 bits are all 0 is called CSPHB.

Assured Forwarding PHB (Assured Forwarding PHB, AF PHB)

RFC2597 defines the AF group. According to the order from high to low, the AF group uses bits 0-2 of DSCP to divide the traffic into four AF categories, which are 001, 010, 011 and 100 respectively. For each AF class, use bits 3-5 of DSCP to divide three "discarding priorities", from the lowest discarding priority to the highest discarding priority are 010, 100 and 110 respectively.

The AF class is i, and the IP datagram whose discard priority is j is marked as Af, such as the IP datagram whose DSCP value is 010110 is marked as AF23.
For different AF classes, RFC2597 requires DS nodes to allocate different forwarding resources, such as cache or bandwidth.

The discard priority is only used to cooperate with the active queue management AQM policy of the router. In the same AF class, higher packet loss probability applies to packets with higher "discard priority".

Expedited forwarding PHB (expedited forwarding PHB, EF PHB)
RFC3246 defines expedited forwarding PHB, DSCP value is 101110.
Expedited forwarding EF provides non-congested network services. For EF traffic, the output rate of DS nodes is required to be greater than the input rate. EF traffic is queued only after other EF traffic in one router's queue.
Capacity-Admitted Traffic
is defined by RFC5865, and the DSCP value is 101100. The DSCP is named VOICE-ADMIT and is mainly used for VolP services.
Less effort PHB (Lower-Effort PHB, LE PHB)
Defined by RFC8622, the DSCP value is 000001, mainly used for low-priority traffic, such as search engine crawlers.

IP fragmentation and reassembly
The total length of the IP datagram exceeds the MTU and requires fragmentation. The fragmentation
operation in IPV4 can be performed on 发送方主机and 任何中间路由器above.
Note: IPv6 fragmentation is only allowed on the sender host.
The reorganization operation of IP can only 最终目的主机be carried out on .
The total length field, identification field, flag field and slice offset field are used to complete fragmentation and reassembly of IP.

During the IP fragmentation operation, the header of the original IP datagram is copied to each IP datagram fragment, and the values ​​of the 总长度, 标志, 片偏移and other fields are modified and recalculated as required 首部校验和.

The length of the data portion of an IP datagram fragment needs to meet the following three conditions:

①数据部分长度 首部长度 ≤ MTU:
②数据部分长度是8字节整数倍,最后一个分片可以不满足该条件:
③数据部分长度取满足以上两个条件的数值中的最大值。

insert image description here
Common Protocol Field Values
insert image description here

IP packet forwarding

互联网中的主机和路由器都维护了至少一张路由表,用来实现分组转发功能。
主机或路由器查找路由表,将IP数据报从某个网络接口转发出去的过程称为IP分组转发
当互联网中的结点要把一个IP数据报发送给目的主机时,需要判断目的主机是否与自己直接相连。

  1. 转发结点与目的结点位于相同网络,则直接相连。
  2. 如果直接相连,则不需要经过任何路由器,IP数据报就直接发送到目的主机,这个过程称为直接交付。
  3. 如果不是直接相连,则必须把IP数据报发送给某个路由器,由该路由器将IP数据报交付到目的主机,这个过程称为间接交付。

路由表

P协议没有规定路由表或转发表的精确格式,为了支持CDR,路由表中每个项目至少应包含以下字段:目的地址掩码下一跳和转发接口

目的地址
目的地址是一个32位值,用于与掩码操作结果做匹配。可以代表以下三种含义:

①目的主机地址:当掩码是32位,即掩码为255.255.255.255时,目的地址仅能匹配某一个主机的IP地址,这样的路由表项目称为特定主机路由
②所有主机:当掩码长度是0位,即掩码为0.0.0.0,且目的地址字段值为0.0.0.0时,该目的地址可以匹配所有的P地址,这样的路由表项目称为默认路由
③目的网络前缀:当掩码长度是1~31位,目的地址能匹配某个CIDR网络前缀,这样的路由表项目称为目的网络路由

掩码:
掩码指CIDR掩码,长度32位,可以用来和IP数据报中的目的IP地址做掩码操作
下一跳:
下一跳是一个P地址,指向一个直接相连的路由器,IP数据报将被转发到该地址。
转发接口:
转发接口是一个网络层使用的标识符,用以指明将P数据报发送到下一跳的网络接口。

路由表的维护可以由系统管理员手动进行,也可以由一个或多个路由选择协议维护。

结点进行分组转发的过程如下:

  1. 获取目的IP地址ID:
    解析待发送P数据报的首部,读取目的P地址ID:
  2. 按照最长前缀匹配算法搜索路由表:
    在路由表中搜索所有与ID“匹配”的路由项目。
    所谓“匹配”是指:将ID与路由项目的掩码字段做按位与操作,得到的结果与该项目的目的地址字段值相同。在所有与ID匹配的路由项目中,选出掩码中1的位数最多的路由项目,即最长前缀匹配。
  3. 按照最长前缀匹配的路由项目进行转发
    读取最长前缀匹配的路由项目的接口字段和下一跳字段,将IP数据报从指定接口发送出去。

insert image description here

insert image description here

insert image description here
insert image description here
路由表的快速查找
经过精心设计的路由查找算法不需要遍历路由表,即可高效、快速地找到最长前缀匹配项目
最容易想到的方法是按照掩码长度的顺序,存储路由表,掩码越长的表项存储位置越靠前,这样按序查找后,最先找到的匹配就是最长前缀匹配。各种路由表的实现中,有更多更高效的算法,如Linux中实现的Hash查找算法Tire树查找算法

网际控制报文协议ICMP

The Internet Control Message Protocol (ICMP) is responsible for transmitting error messages and other control information. It must be implemented together with the IP protocol and is generally considered a part of the network layer.
ICMP messages can be divided into two categories: ICMP差错报告报文and ICMP查询/信息报文.
insert image description here

Common type values ​​of ICMP packet format
insert image description here
insert image description here

Common Code Values
insert image description here
​​ICMP Error Report Messages
The data fields in all ICMP Error Report messages have the same format.

Contains a complete copy of the header of the original IP datagram (the IP datagram that caused the error), and the first n bytes of the data portion of the original IP datagram.
insert image description here
The ICMP error report message should contain as much data as possible in the original IP datagram, but ensure that the length of the newly generated IP datagram does not exceed 576字节.

Extended ICMP message format
insert image description here
insert image description here

The format specified by RFC4884 ICMP扩展结构, but the objects in the extended structure are specified by other RFC documents.

For example: RFC4950 specifies the label stack object for MPLS. The label stack object can be used in destination unreachable packets and timeout packets. When the traceroute program is used, it can be used to record the labels in the forwarding path.
Another example: RFC8335 applies the extended structure to ICMP query/information messages, uses the extended structure to extend the echo request and echo response messages, and proposes a new type of network reachability detection tool Probe.

Destination Unreachable Error Report Message
The Destination Unreachable Error Report message is used to indicate that the IP datagram cannot be delivered to the destination, and there are many possible reasons.
ICMPv4 defines 16 different codes for this message, 5 of which are more commonly used, namely:

Network unreachable
Host unreachable
Port unreachable
Fragmentation required
Communication prohibited by management

Redirection error report message
If a router receives an IP datagram and finds that it is not the best route to deliver the IP datagram to the destination address after looking up the routing table, the router sends a redirection error report message to the source host, while still forwarding that IP datagram to the correct next-hop router.

输入接口The router compares whether the IP datagram is 下一跳转发接口the same as it, and if the input interface is equal to the output interface, it needs to be generated 重定向差错报告报文.

Timeout Error Report Message

Each router decrements the value in the IP datagram by 1 when it forwards the datagram TTL.
When the TTL value decreases to 0, the router discards the IP datagram and sends one 超时差错报告报文to the source host.
insert image description here
Situations where ICMP error report messages are not generated
According to the provisions of RFC1812, CMP error report messages should not be generated and sent in the following situations:

ICMP差错报告报文;
第一个IP分片以外的其它IP数据报片;
IP首部校验和验证失败的IP数据报;
目的地址是IPv4广播地址或IPv4多播地址的IP数据报;
作为链路层广播的IP数据报;
源IP地址不是单播地址的IP数据报,或者源地址无效(全零地址、环回地址等)的IP数据报。

ICMP Application Examples

ping / traceroute / tracert / TCP path MTU discovery
insert image description here
ping
insert image description here

traceroute / tracert
insert image description here
TCP's path MTU discovery
insert image description hereinsert image description here

routing protocol

The algorithm that realizes the routing function is called the routing algorithm .
The routing algorithm 传递必要信息的网络协议is called a routing protocol .

The routing algorithm is the core of the routing protocol , and its purpose is 找到从发送方到接收方的最佳路由to
Routing algorithms in the Internet define and measure the "best" route according to certain metrics. These metrics can be collectively referred to as cost , and usually the best route refers to the route with the lowest cost .
The goal of the routing algorithm is to find the path with the lowest cost from the source to the destination , that is, the best path.

insert image description here
A graph can be used to formally describe the routing algorithm.

A graph G=(N,E)is a collection of N vertices and E edges.
A vertex in the graph represents a router or other node, which is the point at which packet forwarding decisions are made.

An edge between two vertices represents a link between two adjacent routers. Each edge has a value representing its cost.
For any edge in E (x,y), we denote c(x,y)the cost of edge (x, y).
Once each edge in the graph has a given cost, the routing algorithm finds the lowest cost path from the source to the destination, the best path.

Routing protocols pass and provide edge "costs" to the routing algorithm.
Different routing protocols use different metrics and have different definitions of edge costs.
Even if the network topology is the same, different routing protocols may obtain different optimal routes.

Commonly used routing algorithms include:

  1. Distance Vector Algorithm (DV)
  2. Link State Algorithm (LS)
  3. Path Vector Algorithm (PV)

The scale of the Internet of Autonomous Systems (AS)
poses enormous difficulties for routing protocols. The solution to routing scalability is to introduce hierarchies.
To enable hierarchical routing, the Internet is divided into many自治系统(Autonomous System,AS)

Autonomous system AS refers to a group of routers under the management of a single technology, these routers use a routing protocol and common metrics within the autonomous system.

RFC4271 emphasizes that the key of autonomous system AS is to present a single and consistent routing strategy to other ASs .
Each autonomous system has a globally unique autonomous system number ASN. ASN is managed and assigned by IANA. It was initially specified as a 16-bit value, and RFC6793 extended it to 32 bits.

With the introduction of autonomous systems, Internet routing protocols are divided into two categories:

Interior Gateway Protocol IGP
Exterior Gateway Protocol EGP

insert image description here
The relationship between autonomous systems, interior gateway protocols, and exterior gateway protocols
insert image description here
Routing Information Protocol RIP

RIP is the official standard of the Internet, stipulated by RFC2453, the current version 2, recorded as RIPv2.
RIPv2 supports CIDR, and its biggest advantage is 简单.
RIPv2 supports simple authentication, and RFC4822 provides a supplementary encryption authentication mechanism for RIPv2.
RIP uses a distance vector algorithm (DV) .

Distance Vector Algorithm (DV)

DV algorithm does not need to know the global information of the network, it is an iterative, asynchronous and distributed algorithm.
The basis of the DV algorithm is Bellman-Fordthe equation

d(x,y)=min v {c(x,v), d(v,y)}, v∈{neighbor vertices of x}

The Bellman-Ford equation gives a way to find the lowest cost way from vertex x to vertex y . The DV algorithm uses the Bellman-Ford equation to solve the shortest path.

Idea: The minimum distance from me to the destination is equal to the minimum value of the sum of the distance from me to the neighbor and the minimum distance from the neighbor to the destination (there are one or more neighbors).

Reference Blog:
Distance Vector DV Algorithm

RIP protocol overview
RIP stipulates that the cost of all routers to the network directly connected to it is 1.

RIP refers to the lowest cost between two routers as "distance", also known as "hop count". Every time a router passes through, the hop count increases by 1.

RIP allows a maximum "distance" of a path of 15, so a "distance equal to 16 is equivalent to unreachable.

RIP is only suitable for small autonomous systems .

RIP Update Algorithm
The routing information data maintained in the router running RIP is composed of multiple routing items, and
the message of RIP protocol also includes multiple routing items. A routing entry in the RIP packet from router G is as follows:

insert image description here
insert image description here
Other provisions of the RIP protocol

Routers initially only know routing information for directly connected networks.

RIP stipulates that every interval 30秒, the router needs to send RIP路由更新报文to all neighbors. The routing items in the RIP routing update message include all routing information known by the router. Each router only exchanges and updates routing information with a limited number of neighboring routers.

RIP stipulates 超时计时器that the default is 3 minutes. If the timeout timer expires and no routing update message from neighbor router G is received, all routes with the next hop of G will be marked as invalid, that is, the distance will be changed to 16.

The RP stipulates 垃圾回收计时器that the default is 2 minutes. If the garbage collection timer expires, invalid routing items will be deleted.

example

insert image description here
"Count to Infinity" Problem
insert image description here
insert image description here
insert image description here
The Solution to the Count to Infinity Problem

split horizon with poison reverse:
简单水平分割 : send routing updates to a neighbor without items learned from that neighbor:
带毒性逆转的水平分割: send routing updates to a neighbor with items learned from that neighbor, but set the distance of these items to 16 .

Routing loops between two routers can be completely avoided. However, when more than three routers form a loop and learn from each other, the split horizon scheme with poisonous reversal cannot avoid the infinite calculation problem.

Trigger update:
Once the router finds that the distance of the routing item has changed, it will immediately send it 路由更新信息to the neighbor router.
The comprehensive use of the above mechanisms can reduce the probability of the infinite counting problem to a very low level, but it still cannot completely avoid the problem.

RIP packet format
insert image description here
Reference link:
RIP packet format

Open Shortest Path First OSPF
OSPF is the official standard of the Internet, and IPv4 uses the second version of OSPF, which is stipulated by RFC2328.

OSPF also supports CIDR, and its biggest feature is that it supports layering again in the autonomous system .
OSPF supports identity authentication .
OSPF adopts the link state algorithm .

Link State Algorithm (LS Algorithm)

The LS algorithm is one 全局信息algorithm used.

In the LS algorithm, the network topology and all link costs are known , which can be used as the input of the LS algorithm to calculate the minimum cost.
In practice, link costs are usually notified to other routers in the network through link state broadcasts. The result of the link state broadcast is that all routers in the network have consistent data containing the network topology and all link costs, which is called the link state database.

The LS algorithm utilizes the link state database for routing calculation.

基于链路状态的路由选择分为两个阶段:链路状态广播阶段路由计算阶段。OSPF采用的LS算法是Dijkstra算法(前向算法)

参考链接:
OSPF介绍

LS算法介绍

OSPF协议概述

OSPF协议非常复杂和繁琐,
本节仅从以下几个方面介绍OSPF协议的基本原理:

·OSPF的区域划分
·OSPF的链路状态广播
·OSPF的工作过程

OSPF的区域划分
insert image description here

OSPF在自治系统内引入了层次结构,将一个自治系统进分为多个区域
在上层的区域叫做主干区域,用于连通其它区域,

OSPF规定每个区域必须有一32位的标识符。主干区域的标识符为0.0.0.0,一般称为区域0。划分区域后,链路状态广播被限制在区域内部。

OSPF自治系统内包括四种类型的路由器

  1. 内部路由器
  2. 主干路由器
  3. 区域边界路由器
  4. AS边界路由器

内部路由器:广播发送自己的链路状态信息,参与区域内的路由计算,并从区域边界路由器那里学习其它区域的路由信息。

主干路由器:主干区域的内部路由器。

区域边界路由器:属于多个区域,执行LS算法的多个拷贝,每个拷贝参与一个区域的路由计算。
区域边界路由器负责将所属区域的路由信息汇总后发往主干区域,也将来自主干区域的路由信息汇总后发往自己所属的区域。

AS border router : run OSPF to obtain the routing information in the AS, and also run external gateway protocols such as BGP to learn the routing information outside the AS, and advertise the external routing information in the entire AS.

stub area
insert image description here

OSPF defines a class of special areas that do not need to receive routing information outside the area. These special areas are called stub areas.
When an area has only one egress, or the egress of an area does not need to When configuring the topology, the area can be configured as a stub area.

OSPF Link State Broadcasting
OSPF adopts 可靠洪泛the method to implement link state broadcasting.
The basic steps of a reliable flooding method include flooding and confirmation .
A router running OSPF creates one or more link-state advertisement LSAs , encapsulated in OSPF packets.

OSPF defines 5 types of LSA, and each LSA contains at least the following information:

  1. Create the router ID of the LSA;
  2. With routers 直接相邻的链路信息, including link costs:
  3. Link state sequence number (LS sequence number);
  4. Link state aging time (LS age)

Reliable flooding
LS洪泛 : When router X receives an LSA from router Y in the same area, X will check whether the LSA is new. Only when the new LSA is received , router X updates its LSDB and forwards the new LSA from all other interfaces except the receiving interface.

LS确认: The router that receives the LS update will send the confirmation information to the adjacent router through the current receiving interface. LS confirms that flooding is not required.

insert image description here
insert image description here
The working process of OSPF
insert image description here
Reference link:
Detailed explanation of OSPF

OSPF packet format
insert image description here
Reference link:
OSPF packet format

Characteristics of five types of LSA
insert image description here
OSPF
insert image description here
Border Gateway Protocol BGP

Border Gateway Protocol BGP is the de facto standard of the current inter-domain routing protocol, its current version number is 4, recorded
as BGP-4, and specified by RFC4271.
BGP has long been considered one of the most complex parts of the Internet.

BGP support CIDR, support 鉴别.
BGP adopts 路径向量算法to find an effective path.
BGP 策略selects the "best" route based on that.

The path vector algorithm of BGP only exchanges reachability information with its neighbor routers , and the routing information includes the complete path information to the destination network, so it is called the path vector algorithm .

However, BGP does not seek to find a path with the lowest cost. BGP only seeks to find an effective path leading to the destination network without loops .
When there are multiple valid paths, BGP selects the "best" path based on policies.

Reasons for using path vector algorithm
insert image description here
BGP - Autonomous System AS Classification

According to the different traffic allowed, AS can be classified as follows:

本地流量Refers to the traffic whose origin and destination are both in the AS, and
中转流量refers to the traffic transmitted through an AS.

insert image description here

insert image description here

Each AS has one or more AS border routers .
AS boundary routers are routers responsible for packet forwarding between ASs.
Each AS can have 多个网络前缀.

insert image description here
路径向量交换
每一个参与BGP的AS必须至少选择一个BGP发言人
通常选择AS边界路由器作为BGP发言人,但AS边界路由器并不一定必须成为BGP发言人。
BGP发言人之间通过TCP通信,交换路由信息。
建立TCP连接的两个BGP发言人,彼此成为对方的邻站或对等站。

insert image description here

BGP路由信息的构成:

BGP路由=[前缀,BGP属性]

前缀支持CIDR,由[目的P地址,掩码]构成

BGP属性有很多,最重要的是AS-PATH属性,AS-PATH属性包含到目的网络的完整路径信息

BGP还可以包含很多其它属性。

insert image description hereAS-PATH属性可以防止环路出现。

AS-PATH中,每个AS用唯一编号ASN表示

对于桩AS,可以不分配ASN

当收到的AS-PATH中包含自己的编号时,说明出现环路,丢弃。

基于策略的最佳路由选择

AS1到N2有两条路径,应该如何选择?
AS1可以根据预先配置的路由选择策略来进行选择。
根据不同的属性可以设定不同的规则。
规则举例:AS跳数较少的路由较好。
所谓策略就是一系列按照顺序执行的规则

insert image description here
insert image description here
insert image description here
insert image description here
BGP报文种类
当TCP连接建立后,BGP首先发送OPEN报文,如果邻站接受这种关系,就用KEEPALIVE报文响应。这样,两个BGP发言人就建立了对等站关系。

When BGP first ran, BGP peers exchanged complete BGP路由表. But later only update the changed part when the route changes. This saves network bandwidth and reduces router processing overhead.

Once a peer-to-peer relationship is established, both parties need to send periodically KEEPALIVE报文to verify that they are still online.
The BGP speaker can use the UPDATE message to announce the addition of a new route, or announce the withdrawal of an outdated route.

When a BGP error is detected, BGP sends a NOTIFICATION message, and then closes the BGP connection

Integration of inter-domain routing and intra-domain routing
After BGP发言人receiving the routing information from the peer station, how can the inter-domain routing information be notified to other routers in the local AS?
1. In the simplest case, there is only one stub AS of the AS border router:

· The AS border router only needs to import a default route into the interior gateway protocol, and then the interior gateway protocol diffuses the default route to all routers in the AS.

2. The situation of multi-connected AS and transit AS
Since there are multiple AS border routers, the AS internal router needs to decide which AS border router to send the IP datagram from.
The BGP routing table is very large. If the BGP routing table is imported into the interior gateway protocol and the inter-domain routing information is diffused to the internal routers of the AS by relying on the interior gateway protocol, the network load and the computing overhead of the router will be greatly increased.

Solution : BGP extends its scope of application to the interior of the autonomous system AS. RFC4271 divides BGP into two forms: 外部BGP(External BGP, EBGP) and 内部BGP(Internal BGP)

BGP running between different ASs is called EBGP
BGP running inside the same AS is calledIBGP

There is no difference between EBGP and IBGP in essence, both are for the purpose of transmitting inter-domain routing information

insert image description here
insert image description here
BGP packet format
insert image description here
insert image description here

Concepts related to private network

Network Address Translation (NAT)
专用网 refers to a dedicated network within an enterprise or institution.

Hosts on the private network need to communicate with hosts on the public Internet 网络地址转换NAT.

Addresses on a private network are called 专网地址(Private Address), and they only need to be unique within a certain range. These IP addresses can be reused in different parts of the Internet. Therefore, also known as 可重用地址.

Network address translation is also used to delay IP address space exhaustion. The globally unique IP address that
requires an application to be used is called or IP address .IANA全球地址(Global Address)公网地址

insert image description here
Using NAT needs to be installed on a router that connects the private network to the Internet NAT软件.

A router with NAT software installed is called a NAT router, and it should have at least one valid global address.
There are two forms of NAT:

  1. Basic NAT
  2. Network Address and Port Translation NAPT

Basic NAT
insert image description here
network address and port translation NAPT

insert image description here

NAT(Network Address Translation,网络地址转换): NAT is the process of translating an IP address in an IP datagram header to another IP address. The method of using a small number of public IP addresses to represent more private IP addresses only supports address translation and does not support port mapping: it supports port mapping and
NAPT(网络地址端口转换)allows multiple hosts to share a public IP address, so that multiple hosts can be supported at the same time. The machine behind the NAT interacts with the outside world. NAT that supports port translation can be divided into two categories: source address translation (SNAT) and destination address translation NAT (DNAT). The NAT mentioned below refers to NAPT.

Regardless of whether NAT or NAPT is used, the address mapping table of the NAT router has two implementation methods:
动态映射and 静态映射.
insert image description here
insert image description here
Links between private networks within a VPN
organization are usually private.

insert image description here
The private network that uses the Internet of the public network as the communication carrier between the private networks of the institution is called虚拟专用网(VPN)

tunnel technology

The technology of creating a virtual link to build a VPN on the public Internet is called tunnel technology .
A tunnel can be thought of as a virtual link between a pair of nodes that can in fact be separated by any number of networks.
The most basic tunnel is the IP in IP tunnel , that is, IP协议the tunnel is constructed using

By providing the IP addresses of the routers at both ends of the tunnel to the virtual link, a virtual link can be created between the routers at both ends of the tunnel.

In the routing tables of the routers at both ends of the tunnel, this virtual link is like an ordinary link. A tunnel link has one 虚接口号.
The essence of tunneling is re-encapsulation.
Datagrams forwarded through the virtual interface will be encapsulated again, and datagrams from the virtual interface will be decapsulated again.

insert image description hereThe tunneling protocol
IP in IP隧道 is specified by RFC1853.
In the IP in IP tunnel, the data in the tunnel 明文is transmitted in the Internet in the form, and no data security guarantee is provided.
In practice, the protocol used to establish a tunnel usually encrypts the original IP datagram before encapsulating it again.
Common protocols used to build VPNs include:

Generic Routing Encapsulation (GRE)
Point-to-Point Tunneling Protocol (PPTP) Layer
Two Tunneling Protocol (Layer Two Tunneling Protocol,. L2TP)
IP Layer Security (Internet Protocol Security, IPSec)

Multiprotocol Label Switching MPLS

多协议标记交换(Multi--Protocol Label Switching,IMPLS)It is a protocol developed by the MPLS working group of IETF, which tries to combine some characteristics of virtual circuits with the flexibility of datagrams.
It became an Internet recommended standard in 2001. It is mainly used in the following three aspects:

①使不知道如何转发P数据报的设备支持基于目的P地址的转发;
②利用显式路由,支持负载均衡和流量工程;
③利用BGP,支持对等模式VPN。

A router that supports MPLS technology is called a Label Switching Router (LSR).
A set composed of many adjacent LSRs is called MPLS域.
The important feature of MPLS is to mark each IP datagram with a fixed-length "mark" at the entrance of the MPLS domain, and then forward the IP datagram according to the mark.
The MPLS label is a field in the MPLS header, and the MPLS header is located between the data link layer header and the network layer IP header.

insert image description here
Punctuation in and out

LSR has 标记交换and 路由选择these two kinds of functions at the same time.
In order to support and record MPLS labels, LSR adds two fields in the routing table: 入标记and 出标记.
入标记: represents the MPLS label carried in the received packet,
出标记: represents the MPLS label carried in the packet sent by itself.

insert image description here
insert image description here
MPLS forwards IP packets

MPLS 入口结点adds labels to packets according to the destination IP address.

MPLS 域内结点forwards according to the label, and completes the label exchange.

MPLS 出口结点demarking
insert image description here

Advantages of MPLS
Faster forwarding of IP packets

Except for the MPLS entry node, other LSRs only need to search when forwarding packets 入标记.
The lookup algorithm for the destination IP address is but最长前缀匹配算法 one . Destination IP Address Lookup Algorithm标记查找算法精确匹配算法
效率更高,速度更快

Support forwarding based on destination IP address

虽然我们替换了路由表的查找算法,但路由选择算法可以不做改变。
基于目的地址分配标记后,分组经过的路径与不使用MPLS时经过的路径是相同的
以前不知道如何转发IP数据报的设备,在MPLS域内能够支持P数据报的转发
这种结果可以应用在ATM上,也可以应用在光交换设备上

MPLS中的核心概念就是转发等价类FEC
转发等价类FEC是指路由器按照同样方式对待的IP数据报的集合。
显然,每个MPLS标记与一个转发等价类FEC一一对应。
上例中,路由表中每个相同的网络前缀是一个FEC,即所有网络前缀相同的分组,都沿着相同的路径进行转发。这种转发等价类等价于基于目的IP地址的转发
FEC是非常强大和灵活的概念,划分FEC的方法不受什么限制,并不局限在基于目的IP地址划分FEC。

可以将所有源地址与目的地址都相同的IP数据报划分为一个FEC,
也可以将具有某种服务质量需求的IP数据报划分为一个FEC。


insert image description here
The format tag value of the MPLS header
occupies 20 bits, which is the most allowed in theory 220个标记. RFC3032 stipulates that tags 0~15 are reserved for special purposes. .
Traffic Class (Traffic Class, TC)
occupies 3 digits, initially these 3 digits are reserved for experiments. RFC5462 renames these three bits as traffic class, including 服务质量信息and 显式拥塞通知ECN.
The stack bottom mark S
occupies 1 bit. It is used when the MPLS header is inserted multiple times to form 标记栈. The MPLS header at the bottom of the stack, that is, the S position of the last MPLS header is 1.
The time-to-live TTL
occupies 8 bits. Every time an LSR passes through, the TTL is reduced by 1. When the TTL is 0, the packet is discarded.


MPLS supports traffic engineering相同标记的分组都沿着相同的路径转发 . In the MPLS domain, such a path is called a label-switched path .
Apparently, a forwarding equivalence class corresponds to a label switching path LSP.
Each LSP is determined when the ingress node assigns a label to the packet. This method of "**determining the forwarding path after entering the MPLS domain by the ingress node"** is called explicit routing.
Explicit routing is completely different from the ""hop-by-hop routing"" we discussed earlier. Explicit routing can be applied to load balancing and traffic engineering.

insert image description here
Traditional hop-by-hop routing only forwards according to the destination IP address.
Explicit routing can flexibly define FEC, such as dividing FEC according to different sources of packets, assigning labels, and completing load balancing of network traffic.
The practice of network administrators using self-defined FEC to balance network loads is called traffic engineering.
In MPLS traffic engineering, explicit routing does not require manual calculation by the network administrator, and 约束最短路径优先算法automatic calculation can be used.

支持MPLS VPN
insert image description here重叠模式VPN和对等模式VPN
insert image description here对等模式VPN

在MPLS网络上,构建对等模式VPN,是MPLS最广泛的应用,也是MPLS在互联网工程界被广泛部署的原因之一。
RFC4364规定了利用MPLS和BGP实现对等模式VPN的技术方案。这种VPN被称为MPLS L3VPN,也称为BGP/MPLS VPN.
MPLS的显式路由可以用来构建隧道,BGP被用来传递VPN的路由信息和分发MPLS标记。
RFC4760对BGP-4进行了多协议扩展,使BGP可以为三层VPN等多种协议传递路由信息。扩展后的BGP称为MP-BGP,属于HBGP.
MP-BGP还可以为MPLS分发专网标记,以区别VPN用户数据的归属。

BGP/MPLS VPN

CE设备将VPN的路由信息发送给PE设备。
PE设备之间利用MP-BGP交换路由信息,得到到达其它VPN子网的路由
利用MP-BGP分发MPLS标记,
这些MPLS标记用来在PE设备之间构建MPLS隧道。

insert image description here
insert image description here
insert image description here
insert image description here

第六章 数据链路层

数据链路层概述

在EEE802的系列标准中,将所有运行了数据链路层(二层)协议的网络设备称为站点站(station)

结点通常是指运行了网络层(三层)协议的网络设备,如主机路由器
链路是指从一个站点到相邻站点之间的一段物理线路(有线或无线),中间没有任何其它的站点,也称为物理链路
数据链路是指在一段物理链路之上增加了控制数据传输的协议软件或硬件,也称为逻辑链路。

insert image description here
The protocol data unit PDU of the data link layer is called .
The source address and destination address contained in each data frame are called 硬件地址or 物理地址.

The source address of the data frame is the hardware address of the sending interface of the sender node; the
destination address of the data frame is the hardware address of the receiving interface of the next-hop node.

Every time a router passes through, 源地址the sum in the data frame 目的地址will change

Scope of the data link layer protocol
insert image description here
The main task of the data link layer is to provide communication services between adjacent nodes. The scope of action of most data link layer protocols is between adjacent nodes .

When the wireless AP is connected to the Ethernet and the WLAN, the scope of the Ethernet protocol and the WLAN protocol is small, and cannot cover between two adjacent nodes, but only between two adjacent stations.

data link layer channel

There are two main types of channels used by the data link layer: 广播信道and点对点信道

insert image description here

Encapsulation into frames
封装成帧 is the most basic function of the data link layer.

Since frames are transmitted at the physical layer 比特流, the data link layer must be able to determine the boundaries of each frame when it is received. Determining where a frame begins and where a frame ends is called frame delimitation.

The frame length and MTU
insert image description here
帧长 are equal to the length of the data part of the frame plus the length of the frame header and frame trailer .

The upper limit of the length of the data part that the data link layer protocol can transmit is called 最大传送单元MTU.

Obviously, in order to improve the transmission efficiency of the frame, the length of the data part of the frame should be as close as possible to the MTU.

Frame Delimitation Methods
Typical frame delimitation methods include:

  1. sign character method
  2. sign bit method
  3. Special Physical Layer Coding Method

The network layer protocol does not need to understand the frame delimitation method adopted by the data link layer, that is, the frame delimitation method is transparent to the network layer, which is called transparent transmission .

Different frame delimitation methods implement transparent transmission in different ways.

Flag character method
定界方法 : For protocols that use characters as the basic transmission unit, special characters can be designated as frame start and frame end flag characters, called frame delimiters.

透明传输: If the flag character appears in the data, it will interfere with the realization of the frame delimitation function. The character filling method is used to realize transparent transmission. (Escapes)

insert image description here
Typical protocol: When the PPP protocol is used for asynchronous transmission, the flag character method is used.
insert image description here
Flag bit method
定界方法 : For a protocol that uses bits as the basic transmission unit, a special bit combination can be specified as a frame start or frame end mark.

透明传输: The bit stuffing method is used to avoid the flag bit combination in the transmitted data and realize transparent transmission.

insert image description here
Special physical layer coding method
定界方法 : Use special coding of the physical layer to mark the boundary of the frame. Requirement: Redundant coding is included in the physical layer coding scheme .

透明传输: Since redundant coding does not appear in regular data, transparent transmission can be achieved without additional processing.

insert image description here
insert image description here
addressing

The data link layer protocol on the broadcast channel must have an addressing function , and the network interface of the host or router on the broadcast channel must have a hardware address to be able to send and receive data frames.

There is only one sender and receiver on the point-to-point channel, so the addressing function of the data link layer protocol is not necessary.
insert image description here

Both the Ethernet protocol and the WLAN protocol use 48位a hardware address, which is also called a MAC address .
On the Ethernet, the network interfaces of the host computer and the router must have it MAC地址, but the interface of the layer 2 switch is allowed to have no MAC address.

MAC address
IEEE stipulates two types of extended unique identifier EUI, which are EUI-48 with a length of 6 bytes and EUI-64 with a length of 8 bytes.
EUI-48 is usually used as the hardware address of IEEE802 network equipment, such as 以太网MAC地址, WLAN的MAC地址.

The MAC address is divided into two parts, each of which occupies 3 bytes.

  1. Organizationally Unique Identifier OUI: the first three bytes (upper 24 bits), assigned by IEEE's registration authority load;
  2. Extended identifier EI: the last three bytes (lower 24 bits), which are assigned by the manufacturer who obtained the OUI.

The lowest bit of the highest byte is called I/G位
the second lowest bit of the highest byte, and U/L位
the address with all 1s in the 48 bits is the broadcast address.
insert image description here
Error control
Any communication link is not ideal, errors will occur in the transmission process, 0 may become 1, 1 may also become 0, this is called 比特差错.
In the data link layer, different protocols provide different degrees of error control functions, including 无差错接受or 可靠传输.

Data link layer protocols on links with very low bit error rates such as optical fibers, coaxial cables, and twisted pairs, such as Ethernet protocols and PPP protocols , only provide 无差错接受functions, not 可靠传输functions.

Data link layer protocols on links with high bit error rates such as wireless links, such as wireless local area network WLAN protocols , rely on the confirmation retransmission mechanism to provide functions MAC站(MAC station)between neighbors.可靠传输

No matter it is the realization of error-free reception function or reliable transmission function, an error detection algorithm is needed to discover bit errors.

Error Detection Algorithm CRC

At the data link layer, the currently widely used error detection algorithm is Cyclic Redundancy Check (CRC)

The sender of redundant code and frame check sequence FCS
divides the data into groups and sends a group of data each time.
Assuming that each group of data khas bits, denote the data to be sent as M.
The sender uses the CRC algorithm to add bit redundancy codes for error detection to the back of the data M, and send them together.
For a frame, the redundant code added for error detection is often referred to as 帧检验序列FCS
adding rbit , which increases the overhead of data transmission, but provides error detection capability

insert image description here
insert image description here

Also known as CRC-32多项式编码 CRC check , its basic idea is to regard the bit string as a polynomial with a coefficient of 0 or 1, and the calculation of the bit string is interpreted as a polynomial calculation.

The generator polynomial P(x)P agreed upon by the sender and the receiver must have the highest and lowest coefficients of 1 .

The carefully selected generator polynomial P(x) can ensure that the rate of missed judgment of the CRC algorithm is extremely low.

CRC-32 is used by EEE in various data link layer protocols including Ethernet:
insert image description here
insert image description here
CRC is an error detection code

Media access control
媒体接入控制(Medium Access Control,MAC)协议 is used to specify the access method and audience of the shared channel .

On a point-to-point channel, there is only one sender and one receiver, and as long as the channel is full-duplex, there is no need for a MAC protocol. For example, PPP does not require the MAC protocol .

在广播信道上的MAC协议也称为多点接入协议,或多路访问协议
实现媒体接入控制的方法可以分为三类:

  1. 静态信道分配方法、
  2. 随机接入方法
  3. 受控接入方法。
    其中,静态信道分配方法属于物理层的方法。

静态信道分配方法
常见的静态信道划分方法包括频分复用时分复用码分复用
码分复用也成为码分多址CDMA
insert image description hereinsert image description here

随机接入方法

随机接入方法是一种基于争用的信道分配方法。
随机接入的特点是所有站点可随机地发送数据
如果恰巧有两个或更多的站点在同一时刻发送数据,那么在共享信道上就会产生碰撞,也称为发生了冲突。随机接入方法必须有处理冲突的方法

随机接入的MAC协议主要包括:

  1. 纯ALOHA
  2. 隙ALOHA
  3. CSMA/CD:带有碰撞检测的载波监听多点接入
  4. CSMA/CA:带有碰撞避免的载波监听多点接入

ALLOHA
insert image description here
CSMA/CD

不再采用统一长度的帧,在ALOHA协议的基础之上增加了更多的控制措施。
可以用以下四句话描述:

  1. 发送前先监听
  2. 闲则发送,忙则等待
  3. 边发送边监听
  4. 碰撞则停发,随机退避重传

CSMA/CD采用二进制指数退避算法进行随机退避。随着重传次数的增加,增大随机退避的时间范围。

CSMA/CD的争用期、传输期和空闲期

insert image description here
Application of CSMA/CD
insert image description here
When using the distributed coordination function DCF in the wireless local area network WLAN, the MAC agreement adopted is CSMA/CA协议.

In the wireless local area network, since the collision detection cannot be performed , as long as the data is sent, the transmission will not be stopped halfway, but the entire frame will be sent. Therefore, CSMA/CA replaces the collision detection mechanism in Ethernet with acknowledgment retransmission and collision avoidance mechanism .

Although the collision avoidance mechanism can reduce the probability of collision, it cannot completely avoid the collision.

Controlled Access Method

The controlled access method is one 无争用的信道分配方法.
The feature of controlled access is that the data sent by the station must be subject to certain controls, so no collisions will occur . Controlled access protocols include and ,
respectively . flow control轮询令牌传递
insert image description here

Flow control is an optional feature of the data link layer protocol.

Flow control is used to solve the problem that the sending speed of the sender exceeds the processing speed of the receiving side.

In the TCPP protocol family, the TCP protocol at the transport layer relies on the sliding window mechanism to realize end-to-end flow control. Therefore, many data link layer protocols do not design flow control functions, but leave flow control to the transport layer for processing.

However, 以太网协议a flow control function is designed.
In Ethernet, the flow control function is also optional. It can be activated by the user or via auto-negotiation. In Ethernet, flow control is generally not used.

Ethernet has designed two flow control mechanisms: back pressure mechanism and pause frame. insert image description hereMain protocols in this chapter
Ethernet Protocol Ethernet
Address Resolution Protocol ARP
Wireless Local Area Network Protocol WLAN
Point-to-Point Protocol PPP
insert image description here

ethernet

insert image description here

Development trend of Ethernet
After decades of development, the development trend of Ethernet includes the following aspects:
from low speed to high speed,
from shared transmission medium to exclusive transmission medium,
from LAN to MAN and then to WAN

From low speed to high speed
The earliest Ethernet speed used in experiments is only 3Mbt/S.
The first standard Ethernet speed 10Mbit/sis called 传统以太网. The Ethernet
released in 1995 is called . Exceeding the rate of Ethernet, known as . In 2019 and 2020 , the 400Gbit/s Ethernet standard was released respectively.100Mbit/s快速以太网(Fast Ethernet)
100Mbit/s高速以太网
单模光纤多模光纤

insert image description here
From Shared Transmission Media to Exclusive Transmission Media

Traditional Ethernet uses coaxial cables as the shared transmission medium, also known as 共享以太网.
Multiple stations are connected to a shared bus, and CSMA/CD协议the data transmission of multiple stations is coordinated.
With the development of 10BASE-T, the stations are connected to the hub through twisted pair wires instead of coaxial cables. appeared 星形共享以太网.
· Still use CSMA/CDthe protocol.

With the development and popularity of 100BASE-T Ethernet, switches gradually replace hubs .
A network that completely replaces a hub and works in full-duplex mode is called a switch 全双工网络.
Each station in the full-duplex network exclusively uses the transmission medium, and the CSMA/CD protocol is no longer used.
insert image description here
From the local area network to the metropolitan area network and then to
insert image description here
the frame format of the wide area network Ethernet
The original Ethernet standard was jointly launched by DEC, Intel and Xerox, and it was called the DIX Etherneti standard. After the alliance launched the Ethernet V2 standard, it did not continue to update it.
The IEEE802 committee slightly modified the Ethernet V:2 frame format and introduced the 802.3 standard Ethernet frame format.
In order to support various types of LAN interconnection, IEEE802.3 divides the data link layer into two sublayers: Media Access
Control (.MAC) sublayer
and Logical Link Control (
LLC) sublayer
.

Therefore, the data link layer header is divided into MAC首部and LLC首部. Their relationship is as follows:
insert image description here

As other types of LANs were eliminated by Ethernet, EEE revoked the 802.2 standard in 2010, and the
LLC header is no longer used in the 802.3 Ethernet standard. This section discusses Ethernet and no longer considers the LLC sublayer.
Currently, the LLC header is still used in 802.11 wireless LANs.

802.3以太网帧格式
insert image description here
insert image description hereinsert image description here
在IEEE分组中,前导码、帧开始符SFD和载波延伸字段属于物理层,其它字段属于数据链路层,数据链路层的各字段共同构成了

前导码
占7个字节,取值为交替的1和0。其作用是使接收端的适配器在接收MAC帧时能够迅速调整其时钟频率,实现位同步
帧开始符SFD
占1个字节,取值“10101011”。SFD的前六位的作用和前导码一样,最后的两个连续的1代表帧即将开始传送。

目的地址:
占6字节,接收方MAC地址。
源地址:
占6字节,发送方MAC地址。

类型长度
占2字节。由于历史原因,该字段包含两种含义。当取值小于等于1500时,该字段理解为长度字段,代表基本帧中MAC客户数据的字节数
当取值大于等于1536时,该字段理解为类型字段,代表MAC客户协议类型

MAC客户数据(MAC Client Data)
MAC客户数据包含可选的标签和上层协议数据。
EEE802.3定义了三种类型的以太网帧:基本帧(basic frame)、Q标签帧(Q-tagged frame)和信封帧(envelope frame)
基本帧不包含标签,Q标签帧和信封帧包含标签。

insert image description here
三种类型的以太网帧中封装的上层协议的数据最大长度都是1500字节,也就是说以太网的MTU为1500字节。

填充
以太网最小帧长为64字节。当上层协议交下来的数据小于46字节,将导致封装的以太网帧不足64字节,以太网协议实体必须在填充字段用“0”补足。

帧检验序列FCS
Occupying 4 bytes, use CRC algorithm for frame inspection. The calculation range of frame inspection includes destination address, source address, type length, MAC client data and filling fields.
The FCS of the Ethernet adopts the CRC32 of the IEEE standard to carry on the calculation and uses the CRC algorithm to carry on the frame inspection.

载波延伸
In 1Gbit/s Ethernet, when working in half-duplex mode, in order to maintain the validity of the CSMA/CD protocol, it is necessary to supplement the carrier extension field after a short frame, and the value of this field is all 0. When working in
full In duplex mode, this field is not required.

interframe space

Ethernet stipulates that every time a frame is sent, it must stop and wait for a short period of time before sending the next frame. This time is called 帧间间隔. The inter-frame space has two functions:

  1. When the CSMA/CD protocol is running, stopping the transmission can make the channel idle, which is convenient for multiple stations to compete for the channel;
  2. Stop sending represents the frame end mark, and when sending a new frame again, the preamble and SFD can represent the frame start mark. ((10Mbit/s stipulation in traditional Ethernet)

The inter-frame interval of Ethernet stipulated by IEEE is 96bit时间.

For 10Mbit/s traditional Ethernet, the interframe interval is 9.6μs; for 100Mbit/s Fast Ethernet, the interframe interval is 0.96μs

Bridges and Switches

We can use 中继器or 集线器to extend the geographical coverage of ethernet.
Both the repeater and the hub belong to the physical layer device , and all the stations connected together through the interface of the repeater or the hub are in the same collision domain .

The so-called collision domain refers to the reachable range of the physical layer signal sent by the station .

For stations in the same collision domain, only one station can send data at any time, otherwise it will happen 碰撞.
With the popularity of 100BASE-T networks and switches, hubs were eventually replaced by switches. Switches are essentially multi-interface, high-performance bridges.
The switch works on 数据链路层, also known as 二层交换机.
The stations connected to each port of the switch are in the same collision domain.
insert image description here

The working principle of the switch
insert image description here
After the switch receives a data frame, it first fills in 自学习the switching table, and then forwards the data frame.
If the aging time of an item in the exchange table reaches the "validity period", the item 交换机will be deleted

Example of working principle of self-learning algorithm and forwarding algorithm switch
insert image description here

insert image description here

Spanning Tree Protocol
When multiple switches are connected to work together, many loop frames may be formed due to the possibility of redundant links. cause 广播风暴and 交换表震荡other problems.

insert image description here
virtual local area network

IEEE stipulates that an address whose 48 bits are all 1 is a broadcast address, and a frame whose destination address is a broadcast address is called a broadcast frame. Broadcast frames can be distributed to all hosts on the same Ethernet.

With the widespread application of switched Ethernet, the number of hosts on the same Ethernet is increasing. In order to use network bandwidth resources more effectively, it is necessary to limit the range of Ethernet broadcasting.

The range within which broadcast frames sent by a station can reach is called a broadcast domain.
The stations connected to all ports of the switch are in the same broadcast domain.
When a router receives an Ethernet broadcast, it does not forward it. The sites connected under each interface of the router are the same 广播域.

insert image description here
virtual local area network

虚拟局域网Isolation broadcast domains can be used within the same Ethernet .
A virtual local area network (VLAN) is a logical group composed of some LAN segments that has nothing to do with physical locations, as shown in the figure below.
A switch that supports the VLAN function is called a VLAN VLAN交换机.

insert image description here
Q tag frame format

Tag Protocol TPID:
insert image description here
Three Q Tags:
insert image description here
Priority Code Point PCP:
insert image description here
Discard Appropriate Indicator DEI:
insert image description hereVLAN ID:
insert image description here

Address Resolution Protocol ARP

The ARP protocol provides a way to 网络层的IP地址parse out 数据链路层的硬件地址.

What the ARP protocol needs to do is resolve the P address of the next-hop node to its corresponding MAC address in each network.
The Ethernet protocol can write the parsed MAC address into the frame header as the destination MAC address to complete the transmission.
For an indirect delivery network, the next-hop IP address can be obtained from the routing table; for a direct delivery network, the next-hop P address is the destination address.

insert image description here
ARP address resolution process
insert image description here
ARP message format:

insert image description hereReference link:
ARP packet format

ARP cache

The ARP cache is used to store the latest mapping between IP addresses and MAC addresses .
Each network interface of a host or router maintains an ARP cache table.
The ARP cache table includes three fields:

IP地址
MAC地址
生存时间

When the host needs to resolve the MAC address corresponding to the IP address, it first queries the ARP cache table of the host, and only initiates a broadcast form ARP request when it cannot be found.
Using ARP cache can greatly reduce the number of ARP requests and broadcast traffic in the network.

ARP caching policy

  1. After a host sends an ARP request, when it receives an ARP response, it will record the IP address and MAC address of the other party in the ARP cache table.
  2. If a host receives an ARP request, and the destination address of the ARP request is the host, the host will record the sender's IP address and MAC address in the ARP request in the ARP cache.
  3. If a host receives an ARP request, and the sender's IP address in the ARP request is already in the local ARP cache table, the host will update the sender's IP address and MAC address in the ARP cache table.

ARP Cache Deletion Policy

  1. When the survival time exceeds the specified time, the ARP cache record will be deleted.

  2. In RFC1122, the authentication method of ARP cache is stipulated.

    The host uses the unicast polling method to periodically send unicast ARP requests to the remote host .
    If no ARP response is received from the other party for two consecutive unicast ARP requests, delete the corresponding ARP cache record.

Wireless LAN

Wireless local area network WLAN is also called Wi-Fi, its standard is IEEE802.11, including the MAC layer standard and physical layer standard of wireless local area network, the latest version is 802.11-2020.
802.11 network defines a variety of physical layer standards, but all use the same The MAC protocol adopts the same MAC frame structure.

insert image description here
Composition of WLAN
There are two types of 802.11 WLAN, one is WLAN 基础设施(infrastructure)网络and the other is WLAN 自组织(ad hoc)网.

insert image description here
接入点AP: The site that provides the access function of the distributed system, and is also responsible for the traffic forwarding in a BSS.
门户portal: Refers to the logical point that provides interactive services between 802.11WLAN and non-802.11LAN.
AP and portal functions can be realized simultaneously in one device.
分布系统DSMultiple APs can be connected.
The media used to connect APs in the distribution system is called the distribution system media DSM . The most common DSM is 802.3 media.
A distribution system connected by a wireless medium is called a wireless distribution system WDS.

MeshIt is a mesh network, and the sites in it are called mesh sites.
Mesh网关Refers to the mesh site that provides access to the distributed system.

insert image description here
The Basic Service Set (BSS)
基本服务集BSS is the basic building block in the 802.11 architecture, and it consists of multiple wireless stations.
According to the types and functions of BSS stations, the 802.11-2020 standard divides BSS into:

  1. Independent Basic Service Set (Independent BSS, IBSS)
  2. Personal Basic Service Set (Personal BSS, PBSS)
  3. Infrastructure Basic Service Set (infrastructure BSS)
  4. Mesh Basic Service Set (Mesh BSS, MBSS)

The independent basic service set
insert image description here
独立基本服务集IBSS is 802.11WLANthe most basic and simplest BSS in China, which consists of multiple stations.

The stations in the BSS are in a peer-to-peer relationship , and the stations are directly connected and can communicate directly.
Sites in the BSS do not have the forwarding capability, and the BSS network does not have the multi-hop transmission function .
The BSS does not include infrastructure such as access points, portals, and distribution systems, and is an ad hoc network .

Personal basic service set PBSS
Personal basic service set PBSS is similar to IBSS, it is also composed of multiple sites, and the sites can communicate directly.
There is no peer-to-peer relationship among the stations of PBSS. In PBSS, there is only one PBSS控制点PCProle played by a station. The PCP is responsible for sending beacon frames to provide contention-based channel access services for stations in the PBSS.
PBSS can only be established by the DMG site , and the DMG works in the 60Ghz frequency band, which is stipulated by IEEE802.11ad.

基本设施基本服务集infrastructure BSS
拓展服务集EBS
insert image description here
网状网基本服务集MBSS
insert image description here
网状网由ad hoc网络发展而来,具有多跳传输功能。
MBSS由Mesh站点组成,Mesh站点可以作为源站、目的站或者转发站。
MBSS允许通过一个或多个Mesh网关接入分布系统DS。
MBSS中的所有站点都可以通过无线媒体互相通信,因此MBSS也可以用来作为分布系统媒体DSM。
Mesh网关与AP可以组合在一起,在一台设备内实现。

无线局域网的MAC帧

IEEE802.11-2020中定义了多种类型的MAC帧格式。
在所有802.11的MAC帧中,前2个比特的定义是一样的,为协议版本号(Protocol Version,PV)字段。

目前,802.11标准共定义了两个版本号,分别为PV0和PV1

PV0格式的MAC帧是绝大多数802.11WLAN中采用的MAC帧;
PV1格式的MAC帧是802.11ah中定义的MAC帧,该格式为物联网领域做了优化,更适用于低功耗的应用场景。

PV0格式的MAC帧的前8个比特的定义是一样的,分别是

  1. 协议版本(2bit)
  2. 类型(2bit)
  3. 子类型(4bit)
    其它部分的格式定义与类型和子类型相关。
    在802.11中共定义了3种类型的MAC帧,分别是

数据帧
管理帧
控制帧

子类型很多,可以参考802.11-2020。

数据帧格式

本部分表格例图较多,不再做文字解释,这里贴上学校的PPT作为参考。

(∪.∪ )…zzz

insert image description here
The WLAN data frame consists of three parts MAC首部, , 帧主体and .帧检验序列FCS

insert image description hereinsert image description here

insert image description here

insert image description here
insert image description here
insert image description here
insert image description here
insert image description hereinsert image description here

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
Conversion of 802.11 frames to 802.3 frames
When the 802.11 WLAN and 802.3 Ethernet are interconnected, the AP needs to complete the format conversion between 802.11 frames and 802.3 frames.
In the network topology shown in the figure below, consider the process of host H3 sending P datagrams to host H1. In the 802.11 data frame sent by host H3:

Address 1 is MAC_AP2 (RA)
Address 2 is MAC_3 (TA/SA)
Address 3 is MAC_1 (DA)

insert image description here
When AP2 receives the data frame, it needs to convert it into an 802.3 Ethernet frame format and forward it to the switch.
In the converted 802.3 frame: the destination address is MAC_1, and the source address is MAC_3.
insert image description hereFrame Check Sequence FCS
insert image description here
Management Frame
The management frame is mainly used for the association between 创建, 维持, 终止站点and .AP

Common management frames include 信标帧, 关联请求帧, 关联响应帧, 探测情求帧and 探测响应帧etc.

Association means creating a virtual wire between the wireless station and the AP. The AP only sends data frames to the wireless stations associated with it, and the wireless stations only send data frames to other stations or the Internet through the associated AP.

There are passive scanning and active scanning methods for establishing an association with an AP .

insert image description here
Control frame
The control frame is mainly used for channel reservation and data frame confirmation .

Common control frames mainly include 确认帧ACK, , 请求发送帧RTSand 允许发送帧CTS.
The 802.11 WLAN adopts the stop-and-wait protocol to realize reliable transmission at the data link layer .

When a station in the WLAN receives a frame of data correctly, it waits for one 短帧间间隔SIFS, and then sends one back 确认帧ACK.
The stop-and-wait protocol of 802.11 is implemented between the direct sending station and the direct receiving station .
insert image description here

Hidden station problem
Wireless stations H 1 and H 2 are both associated with the AP, but they are not within the wireless signal range of each other.
insert image description here

If two stations send data to the AP at the same time, a collision will obviously occur. However, due to the problem of wireless signal coverage, neither side can find that the other is sending data through carrier sensing. This problem is called the hidden station problem .

802.11 allows the adoption RTS帧and CTS帧reservation of channels to avoid the possibility of collisions caused by hidden stations.

RTS frame and CTS frame
insert image description here
Before sending a data frame, the sending station first sends an RTS frame to the receiving station AP, and uses the duration field to indicate the total time required to transmit the data frame and the confirmation frame.
When the receiving station AP receives the RTS frame, it sends a CTS frame as a response.
The CTS frame serves two purposes:

Give the sending station explicit permission to send,
and also inform other stations not to send data for a sustained period of time.

RTS和CTS;数据帧和确认帧之间均采用短帧间间隔SIFS。
一般情况下,802.11协议的实现会提供一个称为RTS阈值的配置选项,超过阈值长度的帧才会触发一个RTS帧,预约信道。

RTS帧、CTS帧和ACK帧
insert image description here
无线局域网的MAC协议
802.11-2020标准采用三种方法解决无线媒体接入控制问题:

  1. 分布协调功能(Distributed Coordination Function,DCF)

DCF是WLAN中媒体接入控制的基本方法,用于竞净服务,是HCF和MCF的基础,
802.11协议规定所有站都应该实现DCF。

  1. 混合协调功能(hybrid coordination function,HCF)

HCF用于支持802.11e和802.11n中的QoS,所有支持QoS的设备都应该实现HCF。
HCF包括受控接入和竞净接入两种方法。
HCF受控接入即HCF受控信道接入(HCF Controlled Channel Access,HCCA)
·HCF竞争接入即增强型分布式信道接入(Enhanced Distributed ChannelAccess,EDCA)。

  1. Mesh协调功能(Mesh coordination function,McF)

MCF is only used in MBSS. MCF is outside the scope of this book.
In the earlier 802.11 standard, a point coordination function (Point Coordination Function, PCF) was also defined. The application of PCF is very small, and PCF has been abandoned in 802.11-2020.

insert image description here
Distributed Coordination Function DCF
The distributed coordination function DCF uses the CSMA/CA protocol, that is, the carrier sense multi-point access protocol with collision avoidance .

  1. Carrier sense is used to detect whether the transmission medium is busy,
  2. Multi-point access is used to ensure that each wireless terminal can access the transmission medium fairly,
  3. Collision avoidance is used to reduce the probability of collision, and it is expected that only one wireless terminal can access the transmission medium within a specified time.

Access control measures adopted by DCF
802.11 WLAN implements the stop-wait protocol
, and uses the confirmation frame ACK to replace the collision detection mechanism in 802.3 .

802.11 wireless stations cannot send and receive data at the same time, and cannot detect while sending.
In 802.11, if the sending station does not receive the ACK frame, it considers the sending failed and retransmits the data frame.
It is stipulated in 802.11 that broadcast frames and multicast frames do not need to be acknowledged.

  1. 802.11 uses a collision avoidance mechanism to reduce the probability of collisions.

When the channel is detected to be idle, the WLAN station first uses the binary exponential backoff algorithm to randomly back off for a period of time before allowing data to be sent.

  1. 802.11 carrier sense includes physical carrier sense and virtual carrier sense.

802.11 also uses RTS frames/CTS frames to reserve channels to solve the problem of hidden stations and further reduce the probability of collisions.

802.11 Collision Avoidance Mechanism
802.11 Collision Avoidance Mechanism includes the following aspects:
inter-frame interval
can provide priority
carrier sense
for accessing wireless media , including physical carrier sense and virtual carrier sense
random backoff
802.11 binary exponential backoff algorithm

Interframe Space
After a frame has been sent, a short amount of time must be waited before the next frame can be sent. This period of time is called the interframe space IFS.
insert image description here
carrier sense

802.11CSMA/CA equipment must perform carrier sense before starting transmission to check whether the transmission medium is occupied.
802.11 carrier sense includes two methods: physical carrier sense and virtual carrier sense .

Physical carrier sense and virtual carrier sense are judged at the same time, and the channel is judged to be idle only when both methods consider the channel to be idle.

Physical Carrier Sense (Physical Carrier Sense)
Each 802.11 physical layer specification needs to provide a function to evaluate whether the channel is idle, which is usually based on 能量和波形identification.
· This function is called Clear Channel Assessment (CCA) , which is used to understand whether the media is currently busy.

虚拟载波监听(Virtual Carrier Sense)
执行虚拟载波监听的每个站维持一个称为网络分配向量(Network Allocation Vector,NAV)的本地计数器,用来估计信道将处于繁忙状态的时间。

只要NAV变量不为0,信道就会被认为是繁忙的。
发送站发送单播帧时,会设置持续时间字段。
·虚拟载波监听机制将检查每个MAC帧中的持续时间字段,并据此更新NAV变量
当一个站监听到一个大于本地NAV的持续时间时,它用持续时间值更新本地NAV变量。
·NAV变量基于本地时钟递减,当监听到一个ACK帧时,本地NAV被重置为O。

数据帧、RTS帧和CTS帧的持续时间
insert image description here
随机退避
在802.11WLAN中,需要发送数据的站需要执行随机退避算法。

例外:发送站需要发送的数据是本站的第1帧数据,且载波监听时信道空闲,而非信道由忙转闲

WLAN的随机退避算法是二进制指数退避算法
退避时间等于一个随机数与时隙的乘积。

The time slot depends on the physical layer standard and is usually tens of microseconds.
The random number is an integer randomly selected in the interval [O, CW, where CW represents the contention window , is an integer, and satisfies aCWmin< CW < aCWmax.
aCWmin and aCWmax are defined by physical layer standards.
The value of CW starts from the constant aCWmin specified by the physical layer, and increases by the power of 2 minus 1 as the number of retransmissions increases until aCWmax.

The process of stochastic backoff for the change of contention window CW
insert image description here

The Random Backoff Timer is the last timer before a wireless station transmits a frame.

When the channel changes from busy to idle and remains idle for DIFS, the random backoff timer starts counting .
During the backoff process, the station will listen to the channel every time a time slot passes

If the channel is idle, the value of the random backoff timer is decremented by 1;
if the channel is busy, the random backoff timer is suspended until the channel turns from busy to idle again, and the timer resumes after DIFS time.

When the random backoff timer of the station counts down to 0 , it means that the station competes for the channel and can send data.

insert image description hereinsert image description here
Hybrid coordination function HCF
In 2004, the 802.11e standard added QoS functions to WLAN.
The QoS functions in the subsequent 802.11 standards are basically derived from the main design of 802.11e. The hybrid coordination function HCF is used to support the QoS function. include:

HCF Controlled Channel Access HCCA : The polling method is used to control media access, and HCCA is rarely used.
Enhanced Distributed Channel Access EDCA : An improved media access control mechanism based on DCF.

我们仅讨论EDCA。
EDCA对DCF主要做了两点改进:

·传输机会(Transmission Opportunity,TXOP)
·接入类别(Access Category,AC)。

传输机会TXOP

传统的DCF中,当站点竞争到信道后,可以获得发送一帧的机会,即**“竞争一次,传输一个帧”**。这会带来`速率异常问题:
insert image description here

改进:引入了传输机会TXOP,将竞争方式改进为**“竞争一次,传输一段时间”**
当站点竞争到信道后,可以获得一段传输时间,在这段时间内,站点可以传输多个数据帧,称为帧突发(frame burst)

在帧突发期间,每个帧之间使用SFS。
TXOP的传输时间可以通过虚拟载波监听保证。

接入类别AC
insert image description here4个介入类别:
insert image description here
按用户优先级对流量分类

insert image description here
insert image description hereEDCA模型

在EDCA中,帧间间隔不再采用DCF中的DIFS,而是采用AIFS[AC]

insert image description here

点对点协议PPP

**点对点协议(Point to Point Protocol,PPP)**是一种在传统拨号上网ADSL接入网光纤接入网以及SDH网络中广泛使用的协议。

PPP来源于另一种广泛应用的协议:高级数据链路控制HDLC协议。

PPP协议用来在全双工点对点链路上传输网络层分组,由RFC1661和RFC1662规定,是互联网正式标准。

PPP实际上是一个协议集合,它包括三个主要组成部分:

  1. 一种将网络层分组封装到串行链路的方法。
  2. 一个用来处理连接建立、选项协商、测试线路和释放连接的链路控制协议LCP。
  3. A set of Network Control Protocols NCP. Each of these network control protocols supports a different network layer protocol, such as the network control protocol IPCP for HP.

PPP frame format
insert image description here
PPP frame format

The following is an introduction to each part of the PPP frame format:

Flag:
The first field at the frame header and the last field at the frame tail are both flag fields, each occupying 1 byte.
The flag field indicates the beginning or end of a frame. .
The specified value of the flag field 0x7Eis expressed in binary as 01111110.

When PPP is used for asynchronous transmission , it is used 字符填充法to realize transparent transmission :
PPP stipulates that the escape character is 0x7D, and it is filled according to the following method:

  1. Convert each 0x7E character that appears in the data field to a 2-byte sequence (0x7D, 0x5E).
  2. If a 0x7D character appears in the data field, it is converted to a 2-byte sequence (0x7D, 0x5D).
  3. If an ASCII code control character (that is, a character whose value is less than 0x20) appears in the data field, insert a 0x7D character in front of the character, and add 0x20 to the code of the character. For example, character 0x03 is converted to (0x7D, 0x23), character 0x11 is converted to (0x7D, 0x31), etc.

The receiver needs to do the reverse transformation after receiving the data.

When PPP is used for synchronous transmission , it is adopted 比特填充法to realize transparent transmission :
When the sender finds five consecutive 1s in the data, it immediately fills one 0 behind it.
·If the receiver receives 5 consecutive 1s, delete the subsequent 0s.

Address and control:
PPP 地址字段is derived from HDLC and occupies 1 byte. HDLC can be used for multipoint links , and the address field is used to indicate it 接收方.
But the bold style PPP is only used for point-to-point links, there is only one receiver, and the address field is always set to 0xFF.

PPP is 控制字段also derived from HDLC and occupies 1 byte. In HDLC, the control field is used to specify 帧类型,
but PPP only uses one kind of frame, and the control field is always set to 0xO3.

These two fields in PPP are actually meaningless.
When the PPP protocol uses LCP to negotiate link layer parameters, these two fields are usually omitted.

Protocol:
The protocol field is used to indicate which protocol data is encapsulated in the data part, occupying 1 or 2 bytes.
By default, the protocol field occupies 2 bytes, but when the PPP protocol uses LCP for link layer option negotiation, it is allowed to configure the protocol field as a compressed form of 1 byte.

When the protocol field is 0x0021, it means that the data field encapsulates an IP datagram.
When the protocol field is 0xC021, it means that the data field encapsulates an LCP packet.
When the protocol field is 0x8021, it means that the data field encapsulates an IPCP packet

The maximum transmission unit MTU for data and padding
PPP defaults to 1500字节.
When the PPP protocol uses LCP to negotiate link layer options, the MTU can be configured to other values.

FCS
The FCS field occupies 2 bytes or 4 bytes.
The FCS field of PPP is verified by the CRC algorithm除FCS字段本身和标志字段之外的整个帧 .

在默认情况下,PPP采用16位的FCS,其生成多项式所对应的二进制数为10001000000100001
PPP协议使用LCP进行链路层选项协商时,允许将CRC从16位扩展到32位,此时,CRC的生成多项式与以太网相同的CRC-32。

PPP链路的状态

insert image description here
链路状态变迁的细节不再阐释。

PPPoE协议
随着以太网占据市场主导地位,利用以太网接入互联网成为一种宽带接入方案。

ISP需要一种能在以太网上运行的、基于单用户计费的接入控制方案。

RFC2516规定了PPPoE(PPP over Ethernet)协议,它将PPP帧封装在以太网帧内,利用PPP协议,使以太网上的每个用户站点都可以与一个远程的PPPoE服务站点建立一个PPP会话,
该服务站点位于ISP内,称为接入集中器(Access Concentrator,AC)
当用户通过PPP会话获得一个ISP地址,ISP就可以通过IP地址和特定的用户相关联。以此完成单用户计费。

发现阶段和PPP会话阶段
insert image description herePPPoE帧格式
insert image description here

版本和类型字段:
各占4位,目前均取值为0x01。
代码字段:
占8位,用以区分发现阶段和PPP会话阶段中各种类型的PPPoE帧

会话标识符字段:
占16位,每一个PPPoE会话都会具有一个唯一标识符。
长度字段:
占16位,代表PPPOE载荷部分的长度,以字节为单位。

Payload field:
Depending on the code, the data encapsulated in the payload part is different.
During the discovery phase, the payload field encapsulates the various frames that PPPoE uses to assign session identifiers.
During the PPP session phase, the payload part encapsulates the PPP frame.

Discovery phase
insert image description hereSession phase
In the PPPOE session phase, PPPOE data is encapsulated in Ethernet frames, and the type field of Ethernet frames is filled 0x8864.

The Ethernet frame interacts with the MAC address of the remote access concentrator AC and the user site as the destination address and the source address.

The session identifier field of the PPPOE frame is filled with the session identifier obtained during the discovery phase.

The upper layer PPP entities operate just like conducting ordinary PPP sessions.
When the LCP sends a termination request frame to terminate a PPP session, the PPPOE session will also be terminated.

PPPOE also defines a PPPOE session termination frame. AC and user station can also actively send PPPOE session termination frame to end the session.

Mind Map Summary

Chapter One

insert image description here
insert image description here
insert image description hereinsert image description here

insert image description here

Chapter two

insert image description here
insert image description here
insert image description here

third chapter

insert image description here
insert image description here
insert image description here
insert image description here

Chapter Four

insert image description here
insert image description here
insert image description here
insert image description here

chapter Five

insert image description here
insert image description here
insert image description here

Chapter Six

insert image description here
insert image description here
insert image description here
insert image description here

Glossary of terms

pending upgrade

Guess you like

Origin blog.csdn.net/qq_51594676/article/details/128949826