How Virtual Routers Work

1. Overview of Virtual Router

1. Introduction to Virtual Router

A virtual router, or Virtual Router, refers to the functional simulation of a physical router at the software and hardware layers, and is a logical device. Each VR should have a logically independent routing table and forwarding table, so that the address space between different VPNs can be reused, and the isolation of routing and forwarding within the VPN is guaranteed.

Multiple virtual routers with different logical architectures and routing functions can be formed in one physical router. Each virtual router runs its own routing protocol instance independently and has its own dedicated I/O port, cache, and address. Space, routing table and network management software can provide virtualized node and link resources for the network. Virtual backbone routers can provide customers with low-cost dedicated backbone network control and security management functions, in which the software for controlling and managing virtual routing devices adopts a modular design.

If these softwares run on a real multi-process operating system (such as Unix), they also support multiple instances, that is, they can support multiple virtual routers at the same time. The process of each virtual router is separated from the processes of other routers, and the memory it uses is also protected by the operating system, thus ensuring a high degree of data security; meanwhile, it also eliminates the possibility of conflicts with other virtual routers caused by imperfect software modules. Possibility of data collisions between routers.

Many carrier-class routers rely on hardware to achieve wire-speed performance for packet forwarding to and from high-speed Synchronous Optical Network (SONET) and Synchronous Digital Hierarchy (SDH) network connections. For systems using virtual routing functions, these hardware functions can be subdivided logically.

In addition, it can be flexibly configured to a dedicated virtual router, and the software module with virtual routing function can completely control the physical ports and switching paths for sending and receiving data packets. The size of the packet cache and switching table of the virtual router is limited by the size of the resources it occupies. The reason for this is to ensure that the virtual routers will not affect each other.

Virtual routing technology enables each virtual router to independently execute routing protocol software instances (such as OSPF and BGP) and network management software instances, such as Simple Network Management Protocol (Simple Network Management Protocol, SNMP) or command line, so users can Virtual routers can all be monitored and managed independently. Running network protocol instances independently makes each virtual router have a completely independent IP address domain without any conflicts among them, and each virtual router can be managed as a separate running entity. The user-based security module it provides can ensure that all network management functions and information belonging to a certain virtual router are only open to authorized users; in addition, the packet forwarding paths of each virtual router are also independent of each other, so that administrators can separate Configure capabilities individually for each virtual router.

The large communication burst data flow through the virtual router will only affect the router itself, and will not affect other routers, thus providing a guarantee for end users to obtain stable network performance; in addition, the virtual router also provides independent policies and Internet engineering work The ability of the group to differentiate services (Internet Engineering Task Force Differentiated Service, IETFDS) enables the virtual router to provide complete customized services for end users. By configuring the I/O port of the virtual router, the received packets can be counted, which can ensure that the amount of data will not exceed the predetermined agreement; at the same time, the virtual router can also distribute its data packets to different queues to achieve different quality of service.

At present, virtual routers have been applied in practice. Nortel's Accelar1000 routing switch uses virtual routing ports to route between VLANs. Virtual router ports can configure each VLAN to be implemented between IP subnets or VLANs. Accelar1000 routing switches support virtual router ports without reducing their performance.

2. Basic functions of virtual router

The basic functions of a virtual router can be defined as routing processing, packet forwarding, and virtual router-specific services.

1) Routing processing function

Use a routing protocol (unicast routing protocol or multicast routing protocol) to obtain a network topology view, and construct and maintain a routing table. Static routing can also be configured manually.

2) Packet forwarding function

  • IP packet inspection

Check the version number and packet header fields and calculate the header checksum.

  • Analyze the destination IP address and look up the routing table

Determine the output interface of the packet and the next hop node to the destination IP address. The possible results of the table lookup are: one is local delivery, that is, the destination IP address is the address of an interface of the router; the other is unicast to an output port Submission, that is, to send the packet to the next hop router or the final destination; the third is multicast delivery to a group of output ports, which depends on the router's understanding of the group membership.

  • Packet TTL Control

The router adjusts the TTL value field to prevent the packet from endlessly looping in the network: the TTL value of the packet delivered locally must be greater than 0; for the packet forwarded outward, the TTL value is first decremented by 1, and the TTL value must be rechecked before the actual forwarding . Packets with expired TTL values ​​are discarded, and error messages may be notified to the sender of the packets.

  • Checksum calculation

After the TTL field is changed, the checksum is required to be recalculated.

3) IP maximum transmission unit (Maximum Transmission Unit, MTU) discovery mechanism

In order to adapt to the MTU value of the output network interface, fragmentation processing is sometimes required, which has a relatively large impact on performance. Now due to the application of the IP MTU discovery mechanism, fragmentation operations are rare.

4) Dedicated service for virtual router

Additional functions beyond the core functions are: packet translation, recapsulation, traffic prioritization and authentication, access control, and addition and removal of virtual routers.

In addition, the router also has network management functions, including SNMP agent and management information base (Management Information Base, MIB) and so on.

In order to provide independent functions for different customer networks, it is necessary to logically divide and differentiate management of the hardware and software of the router. Specifically, the following requirements should also be met.

  1. Isolation: Multiple routing instances and their associated routing tables and forwarding tables need to have an efficient virtualization method to achieve complete logical independence and separation; but because they share the same routing table and forwarding packets are classified, isolated, and processed independently.
  2. Heterogeneous network support capability: Various networks may be based on completely different architectures that are not similar to each other. That is, it has obvious heterogeneous features, such as the naming, addressing and routing methods of new data networks and mobility-first. Because they have different data packet processing procedures, virtual routers must have the ability to connect various heterogeneous networks and process various heterogeneous information.
  3. Support VR scalability: virtual routers need to meet the scalability requirements in terms of forwarding performance, and also have the ability to dynamically expand in terms of router virtualization, and the number of virtualized units can be increased by adding logical resources or simply connecting physical devices .
  4. High-performance forwarding capability: In order to meet the traffic requirements of applications, the performance of the instances running in the virtual router must be able to match the forwarding speed of the hardware.
  5. Different management capabilities are provided depending on whether the administrator is a customer or a system administrator.
  6. Complete security and protection between routing instances, one instance cannot observe the state of other instances.
  7. Channel-based abstraction of point-to-point links.
  8. Components that need to be instantiated.
  9. Routing protocols: Routing Information Protocol (Route Information Protocol, RIP), OSPF, Intermediate System to Intermediate System (Intermediate System to Intermediate System, IS-IS), BGP, etc.
  10. Virtual instances related to firewall, NAT, IP forwarding and security services.
  11. QoS functions and algorithms.
  12. SNMP or other management protocol stack.
  13. TCP and UDP and its application.
  14. L2TP and PPP (if remote VPN is provided).
  15. Remote Authentication Dial In User Service (RADIUS): Interconnects multiple servers based on the context of authentication and accounting.

3. Virtual router structure

The basic structure of a virtual router is shown in the following figure:

The whole structure is composed of 3 parts, namely the virtual router template, the global management module and several running instances of the virtual router (the range indicated by the dotted rectangle).

1. Virtual router template

All routing protocols, management platforms and TCP/IP protocol stacks in this architecture are realized by the same object code, which we call "virtual router template", which is a collection of all software codes of the router system. This template is the template of all virtual router running instances, and the functions supported by each virtual router instance are only a subset of the virtual router template. That is, if the virtual router template does not contain the code of a certain protocol, these protocol codes will not be available in each instance.

For example, if the BGP protocol is not implemented in the template, any virtual router will not provide this protocol.

The virtual router template consists of four parts according to functions, namely virtual TCP/IP protocol stack, virtual routing protocol, virtual management platform and global management interface. The first three parts use encapsulation technology so that the same target code can run multiple instances, and each virtual router corresponds to a running instance; the global management interface provides an interface connected to the global management module to realize the management of the entire system. Run an instance.

2. Virtual router running instance

A running instance of a virtual router is actually a virtual router, which includes all key elements of the router, such as interfaces, routing protocols, and management interfaces. Each virtual router instance has its own TCP/IP protocol stack, forwarding information library, routing protocol and virtual router management platform. They all depend on the same object code, but belong to different running instances.

Virtual router management is a part of the overall management function of the system, which includes two meanings. One is that an authorized virtual router administrator (Virtual Router Administrator, VRA) logs in through Telnet, and then uses the command line interface (Command Line Interface, CLI) to configure the virtual router. The second is to use a standard network management protocol, such as SNMP, to manage the virtual router, and this function is completed by the SNMP subagent.

3. Global Management Module

The global management module is divided into two parts, namely the global configuration module and the SNMP master agent. The latter implements global management through the SNMP sub-agent in each virtual router, which will not be described in detail here.

The global configuration module is used by the global administrator through the console, and includes functions such as adding and deleting virtual routers, assigning interfaces, adding and deleting protocols supported by virtual routers, and all public operations involving multiple virtual routers.

In order to realize a high-performance programmable router that supports virtualization, the overall architecture of the router can be divided into three planes: data, control, and management, as shown in the figure below.

The data plane is mainly responsible for the reception, identification, classification and forwarding of network data packets; the control plane is mainly responsible for the virtualization of routers, that is, multiple relatively independent logical routers are implemented in one physical router and are responsible for routing calculations and abnormal data packets. It can also perform simple configuration management on the data plane.

The management plane is mainly responsible for the deployment and maintenance of various protocols of multiple virtual routers and the management and configuration of various parameters of the routers, which is convenient for users to manage a single virtualized router and the entire virtualized router system.

4. Data plane structure

The virtual router structure adopts the idea of ​​separating the data plane and the control plane, and its structure mainly includes a hardware-based data plane and a general-purpose processor-based control plane.

The data plane mainly implements virtualization and forwards and processes data packets at high speed according to the rules configured on the control plane, and the data packets are input to the data plane through physical ports. The high-speed packet classification module separates various data packets into different processing pipelines for forwarding processing according to the classification rules, and finally determines the output port and sends it out. The data plane uses an efficient packet classification mechanism to realize virtualization and isolation features, and realizes heterogeneous network support and scalability through multi-pipeline design. In each pipeline, users can use high-speed search processing algorithms to achieve High-speed packet forwarding capability.

The overall structure of the virtual router data plane is shown in the figure below.

1. Input queue

Data packets input from physical ports, such as Ethernet or optical fiber ports, are processed by the physical layer device (Physical laYer device, PHY) chip and then parsed by the MAC protocol into data packets that can be processed by hardware, such as New Data Network (New Data Network, NDN) ) and IP and other packet formats, the data packets received by each port are queued in a separate queue for processing.

2. Parallel package classification

The data packets queued in the same queue may belong to different virtual networks and be based on different architectures. Therefore, a high-speed packet classification mechanism is required to determine the destination pipeline of each data packet, so as to facilitate separate processing and search.

Assuming that the rate of each port is Ri and the data packets of all queues are processed by a unified packet classification module, then the packet classifier needs to process data packets with a rate of ΣRi at the same time. Calculated at a port rate of 10 Gb/s, if there are 4 ports, then the rate of packet classification needs to reach a throughput of 40 Gb/s. This is a great challenge to the traditional packet classification algorithm, and the traditional algorithm based on Telecommunication Access Method (TCMA), Trie or space segmentation cannot be used, and requires high power consumption and cost.

In order to reduce the complexity of packet classification and make full use of the parallelism provided by programmable hardware, this system does not use a separate packet classification module. Instead, packet classifications are distributed across queues, as shown in the figure below. 

There are two paths for data packets before input scheduling. One is to use a parallel packet classification mechanism for packet classification, that is, each packet classification module is responsible for the packet classification of a single queue, which can reduce the rate of packet classification to the speed of a single queue. This allows the use of traditional packet classification algorithms. The second is to buffer the packet header data before the packet classification is completed, so each packet classification module can process the packet header data of the first-in-first-out (First In First Out, FIFO) depth at most. Taking IP packets as an example, if 5-tuples are used as the packet classification basis and the FIFO adopts a 64-bit bit width, then a FIFO with a depth of 6 can complete packet header buffering.

3. Enter schedule

Data packets need to be efficiently dispatched to the processing pipeline after determining the processing pipeline according to the packet classification rules. The traditional schemes include round robin and weighted round robin, which can ensure the fairness of the input. There are multiple pipelines as target scheduling in the system, so the traditional polling scheme needs to be improved. That is, when the Virtual Output Queue (VOQ) queuing strategy is not adopted, the number of polled pointers is consistent with the number of pipelines. The specific algorithm is described as follows.

  • Let the number of input ports be N, and the number of processing pipelines be M (N>M). The input port maintains state information, Sj (1≤j≤N) indicates the state of the destination pipeline, and each pipeline maintains an input pointer Pi (1≤i≤M) to indicate the input queue corresponding to the current pipeline. In the initial state, Pi=i, Sj=0.
  • If the buffer corresponding to the input queue is empty and the output pipeline is idle, then each input pointer Pi polls from the current position to select the first non-empty queue k. If Sk=i, then Pi=k.

4. Processing pipeline

The processing pipeline is a key part of packet forwarding and processing in the router, and the way and flexibility of data packet processing also lies in this. Different pipelines carry different architectures, and the packet processing procedures corresponding to these applications are different. In a specific processing pipeline, each data area of ​​the data packet is parsed out and forwarded and table looked up according to the processing method of each architecture. For example, search schemes based on TCAM, Trie or Bloom-filter etc. can be used in it.

The scheme of multi-table query in a single pipeline structure needs to select the target table item in the homogeneous lookup table, and the lookup modules of these tables must have the same structure or be connected in series. If the function of the virtual router needs to be changed, the key modules of these structures must be modified, and these modifications will inevitably affect the functions and performance of other virtual routers, as well as the isolation performance of the virtual router. In this system structure, different application schemes and architectures can be deployed in different pipelines, such as IPv4 routers can be combined with OpenFlow (OpenFlow classifies network data flows according to flow tables, and processes various data flows according to established rules, which can A new technology for software-defined networking that realizes the separation of data forwarding and routing control) coexists.

5. Output Arbitration

The data packets in different pipelines need to be buffered and exchanged uniformly after being forwarded and searched. This system adopts a stateless information scheduling algorithm. That is, the buffer sequence of each pipeline data packet is determined by polling, and the output buffer queue is also output to each port according to the polling scheduling algorithm.

6. Output buffering

The switching mechanism using output queuing provides the best performance. In order to achieve 100% throughput, the link rate of the peripheral memory needs to reach the sum of the link rates of all ports. 

5. Control plane structure

To support reconfigurability in the control system software of a virtualized router, first of all, the current mainstream multi-process operating system should be used when selecting the kernel of the router operating system, so as to avoid the security and Deployability issues; secondly, the resources and permissions used by each functional component running in the router operating system must be restricted to avoid malicious components or implementation defects affecting system availability; thirdly, because the hardware platform implementation methods of various router manufacturers vary widely, it is necessary to Functional abstraction of the hardware platform is carried out by a special hardware abstraction layer. To establish a unified hardware platform abstract description model and access interface for access by upper-layer operating systems, forwarding modules, and routing protocol modules.

Based on the above analysis, the control plane structure of the reconfigurable virtualized router is supported, as shown in the following figure.

1. Composition of the model

The whole model consists of the following four layers.

1) Reconfigurable router hardware platform

Including various underlying hardware devices of reconfigurable routers, such as line cards, high-speed packet switching networks, and main control cards.

2) Kernel operating system layer

It is composed of hardware abstraction layer, reconfigurable router OS kernel and kernel virtualization service platform, and provides resource management, scheduling and system function services for various component function modules of the upper layer.

  • Hardware abstraction layer: perform functional abstraction on the hardware platform to establish a unified hardware platform abstract description model and access interface for access by upper-layer operating systems, forwarding modules, and routing protocol modules. This layer shields the implementation details of the underlying hardware, and is the key to supporting independent development and dynamic deployment of third-party components.
  • Reconfigurable router OS kernel: Based on the current mainstream multi-process operating system design, the key problem to be solved is to establish a process and thread scheduling mechanism suitable for routers, provide a high-performance inter-process communication mechanism and support efficient large memory copy and sharing mechanism.
  • Kernel virtualization service platform: The independent user-level functional components are isolated from each other through operating system-level virtualization technology to support the control of resource access control policies and system service permissions of each virtual machine internal component. Operating system-level virtualization only needs to modify the operating system kernel; at the same time, it can maintain the processing performance of the original host, providing very good security, openness, deployability and resource isolation for various functional components of the reconfigurable virtual router. good support.

3) Routing and forwarding layer

It mainly includes a message forwarding processing layer and a routing protocol control layer, which respectively provide various open control functions for the upper layer through standard forwarding control APIs and network perception and routing control APIs. These two layers are located in the user space. The operating system notifies the message received by the network interface from the kernel space to the message forwarding processing layer, and the corresponding forwarding processing component performs message table lookup and forwarding operations.

Since packet forwarding in high-performance routers is mainly completed by hardware, the performance requirements for software forwarding are not high, so this kind of forwarding components in user space will not become the performance bottleneck of the system; at the same time, the kernel virtualization service platform ensures that each forwarding The isolation between components and routing components, and the problem of a single component will not affect the availability and stability of the entire system.

4) Application plugin layer

When network applications need to deploy basic functional services to the network to accelerate network application execution and improve transmission efficiency, it is necessary to deploy application plug-ins on the reconfigurable virtual router control plane.

The application plug-in layer provides the basic operating environment for the application plug-in, completes the resource isolation and security isolation between each plug-in through the kernel virtualization service platform, and perceives the network through the standard forwarding control API, network perception and routing control API to control routing and forwarding behavior. 

2. Key issues to be addressed

The key issues to be solved in the reconfigurable virtual router control plane model are as follows.

  1. High-performance kernel virtualization technology: There are a large number of functional components that need to be deployed in the control plane of the reconfigurable virtual router, and each functional component requires the support of an independent virtualized operating environment. Therefore, the high-performance kernel virtualization technology for reconfigurable routers must be able to support heavyweight and independent virtualized operating environments, and support flexible access control policies and resource quota policies. The performance of the control plane is critical.
  2. Reconfigurable router hardware abstract model and access interface: The hardware platforms implemented by router manufacturers are quite different, and with the development of technology, new hardware devices and new interconnection and forwarding structures are constantly introduced in the implementation of hardware platforms. How to establish a unified hardware platform abstract description model to define the hardware access interface for upper-layer software for access by upper-layer operating systems, forwarding modules, and routing protocol modules is the key to the control plane design of reconfigurable virtual routers.
  3. High-performance inter-process communication and memory sharing technology: Reconfigurable virtual router components have independent operating spaces, and the control interaction between components must be completed through inter-process communication or memory sharing technology. Due to the large number of routing tables supported by existing high-end routers, sharing massive routing information between components poses a huge challenge to inter-process communication and memory sharing technologies.
  4. Message forwarding control model and open access interface technology: As the fastest growing wide-area infrastructure, the Internet attracts traditional telecommunication networks and TV networks, and gradually becomes the basic platform for the next generation triple play. In the reconfigurable virtual router, how to define the packet forwarding control model and the general access interface of the forwarding processing components that can cover the characteristics of the three network data forwarding requirements are the basic requirements that determine that the reconfigurable virtual router technology can become the future network technology bearing platform.
  5. Support network-aware routing control model and open access interface technology: With the integration of three networks becoming the main development trend of future network technology, the reconfigurable virtual router control plane needs to support the telecommunications network and TV in addition to traditional routing protocol control functions. Various signaling protocols of the network, so it is necessary to define a routing control model that can cover the functions of the three network control protocols; at the same time, in order to support the upper-layer application plug-in's perception of network topology information and routing information, it is necessary to define a routing control model based on reconfigurable Network awareness interface and routing control interface.

2. The core of virtual router technology

Programmable virtual router refers to a new type of router that supports both programmable and virtualized features. The former allows users to customize the data packet processing method; the latter supports multiple router instances running in parallel on the same physical router platform.

In order to meet the requirements of the virtual network, the router platform needs to support virtualization, so that multiple clients can run in isolation in parallel. Programmable virtual routers need to provide users with flexible programmability, allowing users to customize the packet format and processing process, so as to support the experiment and deployment of new protocols.

Diverse application requirements in data center networks (such as high reliability, low latency, and high throughput, etc.) require network devices to support differentiated packet forwarding methods (multi-path routing and shortest path routing, etc.). The programmable virtual router supports flexible programming, and can flexibly modify data packet forwarding methods (such as search keywords and methods, etc.) according to the needs of different applications.

In addition, a large data center is the interconnection of massive servers, providing various services in the form of virtual machines and physical machines. Due to the need for privacy control and traffic isolation between different services, it is necessary for the data center network to provide a good virtualization mechanism. In order to ensure that the business subnets of different services are isolated and independent from each other, and can dynamically share the server resources in the data center network.

1. Technical system and challenges

The technical system of a programmable virtual router includes three aspects, namely router virtualization, programmability, and packet forwarding performance, as shown in the figure below.

1. Virtualization

In a programmable virtual router, virtualization introduces two technical issues, resource allocation and isolation, and scalability. In a virtual router platform, multiple virtual router instances share physical resources such as network bandwidth, CPU, and memory. The basic requirement for resource allocation is to meet the application requirements of each virtual router instance. Under this premise, it is necessary to allocate physical resources as fairly as possible to avoid problems such as resource waste caused by excessive consumption of resources by several virtual router instances.

In addition, different virtual router instances need to be isolated from each other under the premise of sharing resources, so as to prevent the failure of a single router instance from affecting the normal work of other router instances. However, resource allocation and isolation will lead to serious performance problems. Under the premise of rationally allocating resources and ensuring the isolation between virtual router instances, how to minimize the impact of virtualization on performance is also an important challenge for the resource allocation and isolation mechanism. In a programmable virtual router, multiple virtual router instances run simultaneously, and each router instance has a forwarding engine. With the continuous increase of application requirements and the expansion of network scale, the number of virtual router instances continues to increase, resulting in a continuous increase in the number of forwarding engines.

On the one hand, the computing components such as packet header protocol analysis and data packet modification in the forwarding engine have a rapid increase in demand for computing resources, such as CPU, registers and logic resources in FPGA, as the number of virtual router instances increases; on the other hand, , the forwarding information base (Forwarding Information Base, FIB) and other storage components in the forwarding engine have requirements for storage resources, such as TCAM, static RAM (Static RAM, SRAM) and dynamic RAM (Dynamic RAM, DRAM). The number of router instances increases linearly. How to support as many forwarding engines and thus as many virtual router instances as possible with limited computing and storage resources is a scalability challenge in programmable virtual routers.

2. Programmability

It is mainly reflected in four aspects, that is, programmable interface, programmable system architecture, programmable flexibility and programmable difficulty. The programmable interface should define a well-defined development interface, so that users can focus on the development of business functions without caring about the specific implementation of the underlying layer.

The programmable system architecture needs to abstract the basic unit of data packet processing, define the unified interface between functional units, and realize the user-defined data packet processing flow through the selection and combination of existing functional units. How to design a programmable architecture and ensure that the performance of the system is not affected is a challenge for the programmable system architecture.

Enhancing the programmability of a single functional unit (such as the packet header parsing unit) in a programmable system architecture can further improve the programmability of the system. And meet user-defined data packet processing requirements, so as to avoid users from developing new functional units as much as possible. However, there is a contradiction between the programmable flexibility of functional units and the complexity of implementation, which is the main challenge for the programmable flexibility of functional units; when users need to redevelop new functional units, how to provide efficient programming methods to reduce the user's programming time Difficulty and reducing user development time poses the challenge of ease of programming.

3. Packet forwarding performance

In the virtual router, the packet forwarding performance is mainly affected by the I/O virtualization performance, the efficiency of the driver and kernel-level packet sending and receiving mechanism, and the performance of the packet search algorithm. The virtual network interfaces of multiple virtual router instances share the physical network interface, which requires the support of I/O virtualization technology. However, I/O virtualization technology increases system I/O overhead and reduces packet forwarding performance because it needs to be responsible for packet distribution between virtual interfaces and physical interfaces.

In addition, in a router example, the efficiency of the driver and kernel-level data packet sending and receiving mechanisms also directly affects the performance of data packet collection and transmission. In traditional routers, IP search and packet classification and other search and matching operations are the key to fast forwarding of data packets. bottleneck. The above bottlenecks still exist in the virtual router environment, and because non-IP protocols may be carried, the data packet search and matching problem is more complicated than the traditional IP search and quintuple packet classification problems, and the challenges faced are more serious.

In order to solve the above-mentioned key technical challenges in programmable virtual routers, researchers have conducted research on resource allocation and isolation issues, scalability issues, programmability issues, and forwarding performance issues brought about by virtualization.

2. Router virtualization technology

In order to realize router virtualization, it is necessary to focus on solving problems such as resource allocation and isolation, and scalability of computing and storage resources.

1. Resource allocation and isolation

Currently, resource allocation and isolation in programmable virtual routers are usually implemented using traditional server virtualization technologies.

For example, vRouter uses OpenVZ and Xen to implement resource allocation and isolation; SwitchBlade uses OpenVZ virtualization technology. Programmable Virtual Router Platform (ProgrammablE virtuAl Router pLatform, PEARL) is a project carried out by the Institute of Computing Technology of the Chinese Academy of Sciences under the support of the National 973 and 863 projects. The Linux container (LinuX Containers, LXC) used in it is a new version of The characteristics of the Linux kernel, that is, Cgroups and other lightweight virtualization technologies that do not require Hypervisor. It can virtualize a complete system environment (rootfs), or it can only provide a virtualization technology for a single or multiple applications to virtualize the operating environment.

Using full virtualization technology, a software middle layer Hypervisor (virtual machine management program) is used between the operating system and the underlying hardware to manage the underlying hardware resources, so that the hardware resources can be shared and accessed by multiple upper-layer guest operating systems. Full virtualization technology can support different operating systems, but cannot simulate different hardware platforms.

The use of paravirtualization technology also uses a software middle layer to isolate the underlying hardware and operating system. The difference is that it tries to modify the operating system code so that the guest operating system itself knows that it is running on a virtualization platform. This technology can significantly improve virtualization technology. I/O performance.

Operating system-level virtualization uses some mechanisms provided by the operating system (such as namespaces in Linux) to put a group of processes into "containers" and isolate them from the processes of the main operating system. And use these mechanisms to limit the CPU usage and memory usage of this group of processes, so as to achieve the purpose of resource isolation and allocation in virtualization technology. This type of virtualization technology can only run in a specific operating system (almost all Linux), and all virtual machines (called "Server" in this technology) share a kernel.

2. Resource scalability

The scalability research in programmable virtual routers mainly considers how to support the different requirements of multiple new network protocols for data packet processing under the condition of resource constraints, and how to support as many virtual router instances as possible to run simultaneously (that is, support as much as possible Parallel operation of multiple network protocols and business trials and deployments).

In order to improve scalability, the researchers proposed a unified data plane abstraction, that is, a unified data plane abstraction supports the special requirements of different new network protocols to process data packets. On this basis, the researchers further studied the scalability issues brought about by the coexistence of multiple data planes, and proposed methods such as data plane module sharing, forwarding table merging, and software and hardware virtual data plane migration.

1) Unified data plane abstraction

OpenFlow provides a unified data plane abstraction for TCP/IP protocol packet forwarding, which abstracts the data plane into three parts: rule extraction, search matching and action execution.

First extract several important fields (such as 10-tuples) in the header of the data packet as keywords for finding matches, then perform exact matching or wildcard matching on the fields in the keywords, and finally forward, discard or modify the data packets according to the matching results, etc. operate. The unified data plane of OpenFlow can meet the requirements of various protocols in the TCP/IP system. However, this data plane abstraction does not consider the support for new network protocols, and it is difficult to meet the special packet processing requirements of new network protocols.

In order to solve this problem, LabelCast proposes a universal data plane abstraction, including two parts: Label table and Cast table. LabelCast maps network addresses to fixed-length labels, and realizes stateless fast label lookup and forwarding through the Label table; describes server computing and storage resources based on the Cast table, and supports complex network protocol semantics and stateful in-depth packet processing services. Therefore, LabelCast not only supports the common rule-based forwarding methods in existing routers, but also supports new network forwarding services, such as name-based forwarding in Named Data Networking (NDNO). That is, LabelCast uses a unified data plane abstraction to support the data packet processing requirements of multiple different network protocols, improving the scalability of the data plane.

2) Data plane module sharing

By analyzing the characteristics of the basic functional modules of the virtual router data plane, the packet processing module can be divided into shareable modules and non-shareable modules. The former refers to a stateless module that does not need to maintain any state information related to the virtual router instance. Therefore, it can be shared between different virtual router data planes, such as packet header analysis, checksum calculation, and TTL minus 1 and other units; the latter refers to a module with state, which needs to maintain information related to a specific virtual router instance, such as transfer Publish the module.

Based on the above analysis, the researchers designed a virtual router data plane based on programmable hardware FPGA, which effectively improves the utilization of hardware resources by sharing modules between various data planes. The test results show that 8 virtual router data planes can be implemented on one NetFPGA data packet processing card, which saves about 75% to 79% of logic resources compared with the independent implementation of 8 virtual router data planes.

The above method improves the scalability of the virtual router data plane to a certain extent by sharing computing resources between different virtual router data planes. However, since the forwarding table resources are not shared among virtual router instances, the total forwarding table size It grows linearly as the number of virtual router instances increases. When the number of virtual router instances is 8, 87% of the storage resources in the FPGA chip are occupied, so the limited physical storage resources further limit the continued increase in the number of virtual router instances it supports.

3. Forwarding table merge

Although the forwarding tables of different virtual router instances are different, they cannot be directly shared and reused. However, by mining the similarity between the forwarding tables of different virtual router instances, multiple forwarding tables can be merged, thereby effectively reducing the storage resource requirements of multiple forwarding tables, and further improving the scalability of the programmable virtual router.

The memory types of forwarding tables are different, and their data structures and merging algorithms are very different. According to the type of physical memory used for forwarding table storage, forwarding table merging methods can be divided into SRAM-based and TCAM-based merging methods.

A commonly used data structure in the SRAM (or DRAM)-based forwarding table lookup method is the Trie tree, which is a method of merging and compressing the forwarding tables of multiple virtual router instances into a Trie tree, that is, the Trie Overlap method. In this method, by mining the similarity between prefixes in different forwarding tables (the corresponding Trie trees are also similar), multiple forwarding tables can be merged into one Trie tree for storage. The following figure shows that the Trie Overlap method combines two Two forwarding tables FIB 1 and FIB 2 are merged into a Trie tree.

Merge two forwarding tables FIB 1 and FIB 2 into a Trie tree. Compared with separate storage of multiple Trie trees, this method effectively reduces the number of nodes of the merged Trie trees, thereby saving storage space. However, the number of next-hop pointers stored in each node in the merged Trie tree is the same as the number of virtual router instances (two next-hop pointers are stored in each Trie tree node in the figure), so when When the number of virtual router instances continues to increase, the storage space occupied by nodes will be too large.

This will cause two problems. One is that accessing each node requires multiple memory access operations, which increases the total number of memory accesses required for each IP lookup, thereby reducing the search speed; The Trie tree requires a large storage space, and it is difficult to store it in a high-speed memory with a small space. In order to solve the problem of too large nodes, the Trie Overlap method uses the leaf push technology to push all the next-hop pointers stored in the intermediate nodes to the leaf nodes, so that the intermediate nodes do not need to store the next-hop information. The size of Trie tree nodes in the Trie Overlap method is significantly reduced after using the leaf push technology, but this technology will make incremental updates difficult.

In order to avoid this problem, a prefix bitmap is introduced into the merged Trie tree node to separate the node and next hop information. Every time a virtual router instance is added, the node size of the Trie tree only increases by 1 bit, which effectively improves the scalability of the node size. Avoid using leaf push technology in the process of Trie tree merging, so as to support fast incremental update while increasing scalability.

When the prefix similarity of different FIBs is high, the compression effect of the Trie Overlap method is good. However, the prefix similarity of multiple forwarding tables in the virtual router is not necessarily very high. At this time, only using the Trie Overlap method may not be able to obtain a significant compression effect. In order to convert the Trie trees corresponding to multiple FIBs with dissimilar prefixes into similar Trie trees, some experts proposed a Trie Braiding mechanism. That is, by adding a branch bit in the Trie tree node, the left and right children of each node of the Trie tree are allowed to exchange positions, thereby adjusting the dissimilar Trie trees into similar Trie trees, so as to facilitate the use of the Trie Overlap method for merging and compression.

The study found that due to the existence of different common prefixes in the FIB prefix of the VPN router, it is difficult to convert the corresponding Trie tree into a similar Trie tree even if the Trie Braiding mechanism is used. Therefore, a Multiroot method is proposed, which allows merging from a certain intermediate node of the Trie tree, so as to maximize the use of the similarity of the subtrees.

The basic idea of ​​implementing FIB storage based on TCAM is to store the prefixes one by one in the TCAM, and store the next-hop information corresponding to the prefixes in the attached SRAM. If the forwarding table of each virtual router instance is stored separately, the storage space requirement of TCAM increases linearly with the increase of the number of virtual router instances (FIBs), and it is difficult to support a large number of FIBs of virtual router instances in the limited TCAM storage space.

To solve this problem, researchers store shared prefixes in TCAM by mining the similarity between different FIB prefixes, thereby reducing TCAM storage space requirements.

Multiple forwarding tables share the basic data structure of prefixes in TCAM, as shown in the figure below. 

If a certain prefix belonging to different FIBs is the same (the next hop does not have to be the same), they can be combined into one prefix, and all the next hop information corresponding to the prefix is ​​sequentially stored in the SRAM. For example, entries <P, NH1> and <P, NH2> belonging to two FIBs can be merged into one entry <P, [NH1, NH2]> by sharing a prefix.

Based on the above data structure, forwarding tables belonging to different virtual router instances can be merged together. Thus significantly reducing the TCAM storage space requirements, however the above combined method may lead to incorrect IP lookup results. In order to avoid this problem, two TCAM-based FIB merging methods have been proposed, namely FIB Completion and FIB Splitting.

The former merges all forwarding table prefixes into one TCAM and fills all invalid next hops with correct next hop results. This method has the best scalability, but the update overhead is large in the worst case. The latter merges the disjoint leaf prefixes in the forwarding table into one TCAM, and stores the remaining prefixes in another TCAM in an isolated manner.

This method also maintains good scalability and guarantees a small upper bound on update overhead. Compared with the traditional isolated storage method, the above two methods can effectively reduce the storage space requirements of multiple forwarding tables. When storing the forwarding tables of 14 core routers, the storage space requirements of TCAM are reduced by 92% and 82% respectively. 

4. Software and hardware virtual data plane migration

Efficient processing module sharing and forwarding table merging can improve the utilization of computing resources and storage resources respectively, thereby improving the scalability of the hardware virtual data plane. However, it is an objective fact that high-speed hardware resources are limited at present, and it is difficult to fundamentally solve the scalability problem only by improving the utilization rate of hardware resources. To support more virtual router instances, a software virtual data plane with good scalability must be used.

The researchers propose a combination of software virtual routers and hardware virtual routers to further improve scalability. A small number of high-throughput virtual router data planes are implemented in NetFPGA, and other large and low-throughput virtual router data planes are implemented in server OpenVZ. The hardware data plane and software data plane support the same user-defined interface to ensure that the location of the data plane (in FPGA hardware or server software) is transparent to the user. The virtual router data plane can be dynamically migrated in software and hardware to meet dynamically changing application requirements.

The migration of the virtual router data plane from hardware to software is straightforward. First, create a new OpenVZ virtual machine instance in the server and run the Click software router in it, and configure Click so that the packet forwarding process is exactly the same as that of the hardware data plane to be migrated; secondly, the hardware data plane to be migrated The forwarding table is completely copied to the newly running software router; finally, the configuration of the data packet distribution module is changed to redirect the data packets sent to the hardware data plane to the software router.

There are some restrictions on data plane migration from software to hardware. If the forwarding mechanism of the software virtual router instance and the hardware virtual router instance are exactly the same, copying the forwarding table can realize the migration of the software virtual router instance to the hardware; if it is not supported in the hardware data plane For the forwarding mechanism in the software virtual router instance, simple forwarding table replication cannot meet the needs of migration.

That is, it is necessary to modify the logical functions of the hardware data plane to meet this migration requirement, so people propose to use an online dynamic reconfigurable method to solve this problem. When the hardware data plane needs to be significantly modified to meet the migration requirements, first migrate all hardware virtual data planes to software and redirect the traffic destined for these hardware data planes to the corresponding software routers; secondly reprogram the entire FPGA , implement a new data packet forwarding mechanism in the hardware data plane; finally copy the forwarding table in the software router instance to be migrated to the corresponding hardware data plane and configure the data packet distribution module to redirect the traffic to the hardware virtual router .

The combination of software and hardware virtual routers can well solve the scalability problem of virtual router instances. However, in the above method, there may be a large overhead when migrating software routers to hardware. When a software router is migrated to hardware, if all virtual data planes in the hardware cannot implement the forwarding mechanism required by the software router, all hardware virtual data planes must be migrated to software. The entire FPGA is then reprogrammed, and finally the corresponding virtual router instance is migrated back to hardware. Experimental results show that the above migration process takes more than 10 seconds when there is only one virtual data plane in the hardware, during which time the network data packets cannot be forwarded in the FPGA hardware.

In order to reduce the migration time, it is proposed to divide the forwarding logic and forwarding table of each hardware virtual data plane into a partially reconfigurable area. When the forwarding mechanism of the hardware data plane needs to be modified, the entire FPGA does not need to be reprogrammed, and only one of the local reconfigurable regions can be dynamically programmed by using the local dynamic reconfigurable feature provided by the FPGA, thereby significantly reducing the migration time; At the same time, there is no need to stop the data packet forwarding operation of the entire FPGA hardware data plane during the migration process. Experimental results show that this method can increase the migration speed by nearly 20 times compared with the migration method of completely reprogramming the entire FPGA.

The following table shows the research progress of router virtualization problem:

Router virtualization brings problems such as resource allocation and isolation, and resource scalability. Among them, the resource allocation and isolation mechanism is usually implemented by mature server virtualization technology, and there is a contradiction between isolation and forwarding performance; the problem of resource scalability mainly comes from the scarcity of computing and storage resources and the constant number of virtual router instances. The contradiction between the growing application requirements. The researchers proposed a unified data plane abstraction to support packet processing of multiple different network protocols and expand the universality of the data plane.

When multiple data planes coexist, the utilization rate of hardware resources is effectively improved by sharing and multiplexing the data packet processing modules of multiple virtual routers, and merging the forwarding tables of multiple virtual routers, and the performance of hardware virtual data planes is improved. scalability. However, even if the utilization rate of hardware resources is high, the number of hardware virtual data planes supported by it is still relatively small.

In order to further improve the scalability, the researchers proposed a combination of software virtual routers and hardware virtual routers to implement a small number of high-performance router instances in hardware and a large number of low-throughput and high-flexibility router instances in software; At the same time, it supports dynamic migration of software and hardware router instances and uses partial dynamic reconfigurable technology to reduce migration overhead, so as to meet dynamically changing application requirements. 

3. Programmability of virtual routers

Software systems and programmable hardware systems such as FPGAs are inherently programmable to meet custom packet processing needs. However, the purpose of researching programmability in virtual routers is to propose a simpler programming mechanism to reduce the difficulty of user programming; to propose a faster programming mechanism to shorten the time of prototype system design, test and deployment.

To achieve this goal, the researchers designed an open programmable interface, a modular pipeline architecture, and flexible programmable modules to implement an easy-to-use compiler for network packet processing.

1. Open programmable interface

The OpenFlow protocol defines an open programmable interface, allowing users to customize packet forwarding operations through software, regardless of the specific implementation of the underlying switch. Its programmable interface consists of a set of predefined messages, such as controller-switch messages, asynchronous messages, symmetric messages, and the data structures of these messages, through which instructions and status information between OpenFlow switches and controllers can be realized interact.

The programmable interface supports the lookup and forwarding operation of the existing TCP/IP network protocol well, but does not provide support for complex stateful deep data packet processing. Basic service primitives (buffer, thread and registration primitives) are defined in LabelCast as programming interfaces for user-defined service development.

The extended LabelCast resource container provides user programming environment and various resources required for service operation. Users use buffer and thread primitives to manage storage and computing resources respectively, and use registration primitives (that is, open service registration interfaces) to dynamically load custom services. Because the services defined by LabelCast support protocol semantics and state-dependent deep packet processing, the programming interface can be easily extended to support new network protocols.

2. Modular pipeline architecture

The modular pipeline architecture is mainly to solve the architectural problems of programmable systems. The idea of ​​modular programming is to modularize the basic units of data packet processing in routers, and realize a custom data packet processing pipeline by selecting and combining these basic modules. Based on the above idea of ​​modularization, a modular router data plane Click is proposed, which supports user-defined data packet search and forwarding processing.

In addition, a modular router control plane is also proposed. The extensible Open Router Platform (eXtensible Open Router Platform, XORP) supports user-defined routing calculation and routing management functions. Therefore, a modular router can be realized using Click and XORP. The current mainstream Linux operating system can also achieve the same router data plane function as Click through certain adjustments and configurations.

However, the Click data plane does not consider support for virtualization; and the pure software implementation also leads to poor performance in packet lookup and forwarding. In order to solve these problems, a modular virtual router SwitchBlade is implemented based on NetFPGA hardware. In the data plane, SwitchBlade adopts a modular pipeline architecture and implements multiple isolated virtual data planes based on FPGA; meanwhile, it takes into account good programmability and high forwarding performance. Based on the virtualization environment provided by OpenVZ in the control plane, Quagga is run in each virtual machine as routing software.

The strength of programmability in the above-mentioned modular router structure depends on whether the functions of the pre-implemented basic modules (basic processing units) are complete, and whether they can flexibly support free selection and combination among modules. However, the pre-implemented basic modules in a system are limited, and there are two basic ideas to further improve programmability. One way is to realize flexible programmable modules and enhance the flexibility of a single basic module; the other way is to reduce the number of basic modules. development difficulty, so that users can quickly implement custom modules.

3. Flexible programmable modules

Enhancing the flexibility of each basic module on the basis of a modular pipeline architecture can further improve programmability. If only basic IPv4 and IPv6 protocol parsing and destination IP address extraction functions are implemented in the packet header protocol parsing module, its flexibility is very limited and it can only be used for IP lookup operations; if the module can support custom offset and custom The multi-field keyword extraction of length can adapt to the packet header extraction of various search operations such as MAC address search, IP search, and quintuple matching.

For the traditional TCP/IP protocol, OpenFlow proposes a very flexible programmable module. It abstracts the processing of data packets into three modules: packet header parsing, keyword matching, and action execution. In the packet header parsing module, the 10-tuple of the data packet is extracted as the search keyword; in the keyword matching module, it supports 10-tuple-based Wildcard matching and exact matching; common data packet execution actions in routers are defined and implemented in the action execution module, and the above-mentioned flexible programmable modules can well meet various data packet processing requirements common in current TCP/IP routers.

However, OpenFlow's matching method based on the predefined 10-tuple still has its programming limitations. First, the extraction of keywords is limited by the predefined 10-tuple. If the user's new protocol uses other fields of the packet header (outside the range defined by the 10-tuple), OpenFlow will not support this protocol well. In order to make up for this defect, the keywords defined by OpenFlow are getting longer and longer, extending from the initial 10-tuple to 36-tuple; secondly, the definition of tuple in OpenFlow is based on the standard TCP/IP protocol, and its programmable range is limited to Within the scope of the TCP/IP system. For non-IP lookups, such as NDN lookups, OpenFlow's predefined tuples cannot meet the requirements.

The configurable packet forwarding engine (Configurable pAcket Forwarding Engine, CAFE) and SwitchBlade designed an arbitrary bit extractor in its packet header parsing module. The ultimate goal is to support the free combination of arbitrary bit fields in the packet header, thereby supporting user-defined Find keyword extraction methods and new agreement types.

The stronger the flexibility and the better the programmability of this packet header parsing module, the better the support for new protocols, but the higher the complexity. The complex extraction process brings performance challenges to packet header parsing. In order to realize a high-performance and flexible packet header parsing module, a 400 Gb/s packet header parsing pipeline is designed based on FPGA. That is, the analysis of each header is mapped to a pipeline stage in the pipeline, and in the pipeline stage, flexible custom multi-domain extraction is realized through user-configured microcode.

4. Packet Processing Compiler

If the user's new protocol's packet processing requirements cannot be met through the free combination of basic modules and the flexible configuration of programmable modules, the user needs to write the corresponding packet processing unit, and then insert it into the modular pipeline in the form of a plug-in architecture or replace existing modules.

For the software packet processing module, the unified and friendly programming interface can facilitate the user's programming. However, even if the hardware programmable module has a well-defined module interface, user programming will encounter greater difficulties, because it involves the learning and use of Visual Hardware Description Language (Visual Hardware Description Language, VHDL)/Verilog, and the underlying hardware understanding and understanding of details.

In order to simplify the programming difficulty of users, especially the programming difficulty of hardware modules, researchers have designed a network packet processing compiler, which allows users to use familiar high-level languages, such as C/C++ or scripting language to write packet processing functions, and then compile The converter converts it into the corresponding hardware module implementation.

Aiming at the FPGA-based modular virtual router data plane, a Click-style data packet processing compiler ReClick is designed to reduce the difficulty of hardware module programming. If the user is familiar with the programming method of Click, it will be very easy to use ReClick for hardware packet processing.

Packet Parsing Language (PPL) is also designed in the designed packet header parsing pipeline, which defines the format and processing rules of each header in a C++-like style, which is convenient for users to automatically analyze the new protocol packet header. Define processing.

The following table shows the progress of research on programmability issues in virtual routers:

The researchers designed an open programmable interface to simplify the difficulty of user programming, so that users do not need to care about the specific implementation of the underlying layer. And a modular data packet processing architecture is proposed, and most of the modules in the common data packet processing flow are designed and implemented; at the same time, a unified and friendly interface is provided. Users can reuse and combine existing modules to meet most common packet processing needs.

Users with special needs only need to focus on the specific data packet processing function of the new protocol, and then plug-in the customized module into the packet processing pipeline architecture. Improve the flexibility of each important module in the modular architecture, especially the packet header parsing module, the stronger the adaptability of the overall architecture to new protocols; at the same time, it can reduce the workload of users to develop new systems and avoid a large number of redevelopment Work.

In the case that the user's development and implementation cannot be avoided, the network data packet processing compiler is designed to provide the user with a familiar high-level programming language interface. To reduce the difficulty and threshold of user programming, so as to realize the rapid programming of user-defined protocols. 

4. Forwarding performance

In the programmable virtual router, the overhead brought by I/O virtualization, the efficiency of the data packet sending and receiving mechanism, and the performance of the data packet search and matching operation become important factors that limit the forwarding performance of the virtual router. In order to improve forwarding performance, a large number of virtual I/O acceleration technologies have been proposed by the industry and academia to optimize the driver and kernel-level packet sending and receiving mechanism, and to speed up the performance of search and matching operations.

1. I/O virtualization acceleration technology

In the programmable virtual router, each virtual router instance has its own virtual network interface, and the virtual interfaces between different router instances are logically isolated from each other. And physically share the network interface of the instance, as shown in the following figure.

In order to realize the abstraction of the virtual network interface, the virtual machine management program needs to be responsible for completing the distribution of data packets between the physical network interface and each virtual network interface, and participate in every I/O operation of each virtual router instance, so the virtual machine management program The distribution of data packets becomes the bottleneck brought by I/O virtualization.

Researchers have designed a set of standardized paravirtual I/O interface Virtio for various virtualization software in the Linux operating system. Basically, certain optimization measures are adopted to improve the I/O performance of the guest operating system. Currently, KVM virtualization technology supports Virtio.

Although the paravirtualized I/O technology can improve the I/O performance of the guest operating system, there are still problems such as excessive context overhead. The key to solving the problem is to unproxy the hypervisor and allow the guest operating system to directly access the physical device. Direct access to physical devices can be broken down into two issues, device buffer DMA for the guest operating system and interrupt remapping.

Intel VT-d technology solves the above two problems and provides a direct path for data from the underlying hardware to the guest operating system. SR-IOV provides a method to virtualize a single PCI device into multiple devices, and provides independent memory space, interrupt and DMA flow for each virtual machine.

Its architectural design allows one I/O device to support multiple virtual functions, thus providing a way to share the physical functions of I/O devices and I/O ports without software emulation; in addition, Xen developers developed PCI Passthrough technology supports direct access to PCI devices. This enables the guest operating system to bypass the virtual machine management program to access the device, thereby obtaining I/O performance close to the native Linux environment.

2. Optimization of packet sending and receiving mechanism

The I/O virtualization acceleration technology reduces the additional I/O overhead introduced by virtualization. On this basis, the optimization of the data packet sending and receiving mechanism can further improve the data packet forwarding performance.

The traditional I/O interrupt method has the advantages of fast response and low delay. However, when the data packet rate is too high, frequent interrupt processing will significantly increase the software burden. RouteBricks uses the polling method to replace the interrupt method of traditional hardware, which greatly improves the forwarding performance of 64-byte small packets of general-purpose hardware. Each bus transaction (bus transaction) transfers the addresses of multiple data packets in batch processing, making full use of the bus bandwidth of the fast peripheral component interconnect-express (PCI-e); utilizing the multi-queue feature of the network card Binding different queues to different cores greatly improves the parallelism of data processing. Using the above three methods can make the 64-byte packet forwarding performance of a single server reach 9.7 Gb/s.

PacketShader optimizes the Linux kernel protocol stack, reduces the sk-buff structure in the kernel from 208 bytes to 8 bytes, and uses batch processing to greatly improve the packet processing performance.

Experiments show that the data packet throughput can be increased to 10.5 Gb/s only by using this optimization method, and the performance is improved by 13.5 times.

PacketShader also takes advantage of the characteristics of multi-core and multi-queue and considers its architecture as non-uniform memory access (Non Uniform Memory Access, NUMA). Performance reaches 40 Gb/s.

3. Search algorithm optimization

Packet lookup operations have always been a major performance bottleneck in software virtual router designs. The traditional IP lookup algorithm can also be applied to the programmable virtual router to realize fast IP lookup. However, the programmable virtual router not only needs to support fast IP lookup, but also needs to support non-IP protocols, such as support for content name-based lookup in the NDN protocol.

The content name in NDN has a hierarchical structure similar to HTTP uniform resource locator (Uniform Resource Locator, URL). Compared with traditional IP addresses, the length of NDN data names is variable and has no upper bound, which makes the longest name prefix match an important challenge in NDN lookup. In order to solve this problem, an efficient name substring encoding method is proposed and an improved state transition array is applied to speed up the longest name prefix matching. research

Researchers evaluated NDN's prototype system to implement Content-Centric Networking (CCN), analyzed the performance bottleneck of its forwarding plane and found three main problems affecting NDN's forwarding performance. That is, accurate string matching that supports fast updates, longest prefix matching based on variable-length and unbounded names, and large-scale flow table maintenance issues. To address these issues, five design principles are proposed to guide NDN forwarding with scalable performance engine design.

The following table shows the research progress of forwarding performance issues in programmable virtual routers:

Among them, the performance overhead of I/O virtualization is a special problem brought about by the introduction of virtualization technology, which has received extensive attention from academia and industry. Some virtual I/O performance acceleration technologies, such as Intel VT-d technology and SR-IOV technology, have been widely used in servers and network card products.

Traditional packet sending and receiving mechanisms and IP lookup operations still become performance bottlenecks in a virtualized environment, and have been receiving continuous attention from the academic community. Non-IP environment, especially the routing lookup problem under the NDN architecture has gradually attracted people's attention with the deepening of the research on the new architecture, and has become a new important research direction. 

3. Virtual Router Redundancy Protocol

Virtual Router Redundancy Protocol (Virtual Router Redundancy Protocol, VRRP) is a protocol for implementing router redundancy, first given by RFC2338, which is based on Cisco's proprietary protocol Hot Standby Routing Protocol (Hot Standby Routing Protocol, HSRP) formulate.

VRRP simplifies the mechanism proposed by HSRP, and minimizes the extra load brought to the network by providing redundancy functions. The latest VRRP protocol is defined in RFC3768. The original RFC2338 was abolished after the release of the new Request For Comment (RFC, Internet Standard Draft) document. The new protocol further simplifies some functions.

1. Introduction to VRRP protocol

In a network based on the TCP/IP protocol, in order to ensure communication between devices that are not directly connected physically, routes must be specified. At present, there are two commonly used methods for specifying routes. One is through dynamic learning of routing protocols, such as Routing Information Protocol (Route Information Protocol, RIP) and Open Shortest Path First (Open Shortest Path First, OSPF); the other is Static configuration. It is unrealistic to run a dynamic routing protocol on each terminal, and most client operating system platforms do not support dynamic routing protocols. Even if supported, it is limited by many issues such as administrative overhead, convergence, and security.

Therefore, the static routing configuration of the terminal IP device is generally adopted, and generally one or more default gateways (Default Gateway) are designated for the terminal device. The static routing method simplifies the complexity of network management and reduces the communication overhead of terminal devices, but it still has a disadvantage. That is, if the router serving as the default gateway is damaged, all communications using this gateway as the next-hop host must be interrupted. Even if multiple default gateways are configured, switching to a new gateway is not possible without rebooting the end device. Using VRRP can well avoid the defect of statically designated gateway.

VRRP is a fault-tolerant protocol. Usually, all hosts in a network set a default route, as shown in the figure below.

In this way, the packets sent by the host whose destination address is not in the local network segment will be sent to the router RA through the default route, thus realizing the communication between the host and the external network. When the router RA is damaged, all hosts in this network segment that use RA as the next hop of the default route will interrupt the communication with the outside. VRRP was proposed to solve the above problems, and it provides a design for a local area network (such as Ethernet) with multicast or broadcast capabilities.

VRRP is a selection protocol that can dynamically assign the tasks of a virtual router to one of the VRRP routers in the LAN. The VRRP router that controls the IP addresses of the virtual routers is called the "master router" and it is responsible for forwarding packets to these virtual IP addresses. This election process provides a dynamic failover mechanism, allowing the virtual router's IP address to serve as the default 1-hop router for end hosts should the primary router become unavailable. The advantage of using VRRP is that there is a higher availability of the default path without configuring dynamic routing or routing discovery protocols on each terminal host. VRRP packets are encapsulated in IP packets and sent.

VRRP is sometimes called "virtual routing backup protocol", and its network structure is shown in the figure below.

There are two routers, RA and RB, and VRRP is designed to provide a virtual router for the LAN. The two routers are associated with a virtual router, and this virtual router acts as a PC gateway to communicate with the outside world.

These two routers form a VRRP group, one of which is the master router. The rest of the routers are backup routers. Multiple routers in the VRRP group are mapped to a virtual router. VRRP ensures that there is only one router sending packets on behalf of the virtual router at the same time, and the host sends data packets to the virtual router.

Primary router and standby router have the following meanings.

  1. The master router is the router that actually forwards packets in the VRRP group.
  2. A backup router is a router in the listening state in the VRRP group.

Using VRRP, you can set a virtual IP address as the default router manually or through Dynamic Host Configuration Protocol (DHCP). During configuration, the virtual IP address and virtual ID in router RA must be set the same as those in router RB, and the virtual IP address is shared between routers.

If the main router is unavailable, this virtual IP address will be mapped to the IP address of a backup router (the backup router becomes the main router), and the router in the standby state will be selected to replace the original main router. VRRP makes hosts in the LAN appear to use only one router, and can maintain routing connectivity even if the first-hop router currently used fails. VRRP ensures high network redundancy from the OSI network layer and can also implement load balancing of network data packet traffic. It is a part of IPv4 and IPv6. 

2. Terms related to VRRP protocol

The VRRP protocol involves many concepts and terms. In addition, when configuring the VRRP protocol, you must specify the parameters related to the protocol operation. These are the basis for understanding the VRRP protocol.

1. The concept of virtualization in VRRP

1) Virtual router

A VRRP router refers to a router running VRRP, that is, a physical entity; a virtual router is created by the VRRP protocol and is a logical concept.

To clients on the LAN, the router appears to be physically present. A group of VRRP routers work together to form a virtual router, which appears as a logical router with a unique fixed IP address and MAC address. A virtual router can have multiple groups, and a real router can be a member of one or more groups.

2) Virtual IP address

The IP address configured for the virtual router can be the same as or different from the primary and backup routers. The virtual router IP address is actually the user's default gateway. When the main router fails, the backup router takes over the routing task, and the virtual IP address remains unchanged. All hosts in the LAN must point their gateways to virtual IP addresses, and VRRP broadcast packets must carry virtual IP addresses.

3) Virtual Router ID (Virtual Router ID, VRID)

The VRID is elected in the router group and is replaced by an identifier. VRRP stipulates that VRID is a number ranging from 1 to 255. If it is not configured at deployment time, VRRP cannot be started because there is no default value for VRID.

4) Virtual instance (Instance)

A program running in an actual router, the information in which includes whether the router running the program is a master router or a standby router, and the VRID value to which this router belongs.

2. Parameter definition

1) Priority

This is the number to determine which router is in charge of the main router. The larger the number, the more likely it will be the main router. In addition, as the main router, it must carry its own priority when broadcasting, and the priority can be manually modified.

2) Advertisement Interval (Advertisement_Interval)

In a router group, the master router periodically sends messages to other routers to inform them of their status, so that they can continue to maintain the standby status. Such a message is sent every Advertisement_Interval time interval, and the value of this time interval must also be included in the broadcast message, and its measurement value is seconds.

3) Master router timeout (Master_Down_Interval)

If the standby router does not receive the broadcast message from the master router within the time interval of Master_Down_Interval, other routers will consider the master router to be invalid, and will conduct a new round of master router election.

4)Master_Down_Interval=3×Advertisement_Interval+Skew_Time

No configuration is required.

5) Skew time (Skew_Time)

In order to prevent the situation that all other backup routers compete for the main router at the same time when the main router fails, Skew_Time is set, Skew_Time=(256-priority)/256. It can be seen that the higher the priority, the more likely it will become the master router first.

3. Other concepts

1) The owner of the virtual IP address (IP address owner)

The interface of the router can be configured with multiple addresses in the form of sub-interfaces, but there can only be one primary IP address. When the primary IP address of a gateway in the router is the same as the IP address of the virtual router, the actual router is called " Virtual IP address owner".

2) Preemption mode (Preempt_Mode)

If the value is true and the priority of the message broadcast by the added router is higher than that of the current master router, it will replace the position of the current master router, and the original master router receives a message with a higher priority than its own. When the router broadcasts, it will automatically switch to the standby router; if the value is false, even if the primary router receives a packet with a higher priority than its own, it will not process it, but will be discarded directly. By default, the router turns on preemptive mode.

3) Authentication method

Specifies the authentication method adopted by the router when sending packets, which can be enabled or disabled; in addition, the parameters VRID, virtual IP address, Advertisement_Interval and authentication method in the same router group must be consistent; otherwise, the entire router backup group will generate Confusion, and even multiple master routers.

3. VRRP function points

In order to achieve the goal of eliminating the network failure caused by the single point of failure of the default router in the static routing environment, and ensure that the internal and external data communication is not affected when the device function is switched over in the event of a failover, and the network parameters of the internal network do not need to be modified, VRRP has the following functional requirements .

  1. Redundant backup, that is, in a backup group, when the main router fails for some reason, the backup router in the group can quickly conduct a new round of elections to generate a new main router, and restore the communication between the LAN and the outside world in time.
  2. Allows hosts in the LAN to specify a virtual IP address as a gateway to communicate with the outside world, and the virtual IP address is shared by multiple routers. When the main router is switched, the host can continue to communicate with the outside world without configuring a new gateway, that is, the switch is transparent to the host.
  3. Like the physical router, the virtual router should present a unified MAC address and virtual IP address to the outside world, and can normally respond to ARP requests for the virtual IP address and forward data packets.
  4. The network interruption time caused by the switching of the main router should be configurable, and it can be recovered within 3 seconds to 4 seconds at the shortest.
  5. After the primary router is selected using Minimization of Unnecessary Service Disruptions, there is no unnecessary communication between the primary router and the standby router except for the VRRP broadcast packets sent by the primary router periodically. Any standby router with low or equal priority cannot initiate a state transition, so that the main router can continue to work stably, and the operation of the protocol will not occupy limited transmission bandwidth due to large traffic.
  6. Provide users with a convenient and feasible command line interface and management interface, and provide flexible, easy-to-operate and simple configuration implementations. The software should be able to analyze and process user commands, and output error messages to users when errors are found.
  7. The configurable parameters of protocol operation include standby group ID, virtual IP address, priority, preemption mode, and time interval of VRRP broadcast packets, etc.
  8. In order to maintain the stability and order of the network, the operation of the protocol must ensure that only one main router exists in a standby group in a stable state.
  9. Multiple virtual routers are allowed to be configured in one real router, that is, redundancy and load balancing are realized at the same time. But it must be ensured that a standby group can only have a unique ID and IP address, and a standby group must contain at least one virtual router. The virtual IP addresses of different backup groups cannot overlap, and a backup group can only have one IP address owner.
  10. During protocol message exchange, the receiver of the message can verify the legitimacy of the message, update its status in time for the correct message, and report error information and discard the illegal message.
  11. The operation of the virtual router should try not to affect the normal function of the physical router.
  12. Provides the interface monitoring function, which can update its own priority in time according to the status change of the monitored interface.

The working mechanism of the VRRP protocol has many similarities with Cisco's HSRP. The main difference between the two is that in HSRP, an IP address needs to be configured separately as the external address of the virtual router. This address cannot be the interface of any member in the group. address.

Using the VRRP protocol does not need to modify the current network structure, which protects the current investment to the greatest extent. And it only requires the least management cost, but it greatly improves the network performance and has great application value.

Compared with HSRP, VRRP has the following features.

  1. Functionally, VRRP is very similar to HSRP, but in terms of security, one of the main advantages of VRRP is that it allows authentication mechanisms to be established between devices participating in a VRRP group. And unlike HSRP which requires a virtual router not to have the IP address of one of the routers, VRRP allows this to happen. But in order to ensure that the end host does not have to relearn the MAC address in case of failure, it specifies the MAC address to use as 00-00-5e-00-01-VRID, where VRID is the ID of the virtual router (equivalent to an HSRP group identifier).
  2. The state machine of VRRP is simpler than that of HSRP. HSRP has six states, namely Initial, Learn, Listen, Speak, Standby and Active. And 8 events; VRRP has only 3 states, namely initial (Initialize), main (Master) and backup (Backup) states, and 5 events.
  3. HSRP has 3 kinds of messages, and there are 3 states that can send messages, that is, call (Hello), farewell (Resign) and mutation (Coup); VRRP has a kind of VRRP broadcast message, which is regularly sent by the main router to notify its existence. The message can be used to detect various parameters of the virtual router and be used for the election of the main router.
  4. HSRP carries packets on UDP packets, while VRRP carries packets on TCP packets (HSRP uses UDP port 1985 to send hello messages to the multicast address 224.0.0.2).
  5. The VRRP protocol includes three main authentication methods, namely, no authentication, simple plaintext passwords, Hash-based Message Authentication Code (HMAC) using MD5 hash operation, and strong authentication for IP authentication. The strong authentication method uses the IP authentication header (Authentication Header, AH) protocol. MD5 HMAC uses a shared secret key to generate the hash value. The router sends a VRRP packet to generate the MD5 hash value and places it in the advertisement to be sent. The receiver uses the same key and MD5 value to recalculate the hash value of the packet content and header. If the results are the same, the message really comes from a trusted host; if not, it must be discarded, which prevents attackers from accessing the LAN to send notification messages that can affect the selection process or use other methods to disrupt the network.
  6. VRRP includes a mechanism to protect VRRP packets from being appended by another remote network (set TTL value = 255, and check when receiving), which limits most of the defects that can be used for local attacks; on the other hand, HSRP is in The TTL value used in its messages is 1.
  7. The crash interval of VRRP is 3 × notification interval + skew-time.

4. Working principle of VRRP

1. Working process

The VRRP protocol virtualizes two or more routers in the LAN into one device, providing a virtual router IP address externally, and actually owns the router with the external IP address inside the router group. If it works normally, it is the main router or is elected by algorithm. The main router implements various network functions for the IP address of the virtual router, such as ARP requests, Internet Control Messages Protocol (Internet Control Messages Protocol, ICMP), and data forwarding, etc.; in addition, the main router also broadcasts on the LAN every Advertisemem_Interval interval A message about itself, the status of other routers is standby, and does not perform external network functions except for receiving VRRP status notification information from the main router. When the main router fails (down), the standby router will take over the network functions of the original main router, so that no matter what the switch is, the only consistent IP address and MAC address for the terminal device can be guaranteed, thereby reducing the impact of the switch on the terminal device.

When configuring the VRRP protocol, you need to configure the virtual router ID (VRID) and priority value of each router. The priority of VRRP is an important value for a router to determine whether it can become the main router at the initial stage of establishment; VRID is an important value for a router. Identity is an identification. A VRRP router has a unique VRID, which ranges from 0 to 255. The router appears as a unique virtual MAC address to the outside world, and the format of the address is 00-00-5E-00-01-[VRID].

VRID is the VRRP virtual group number, which consists of two hexadecimal numbers and ranges from 00 to FF. Routers with the same VRID value belong to the same group. In a VRRP router, the {interface, VRRP ID} 2-tuple is used to uniquely represent a VRRP virtual router. In the same backup group, the virtual MAC address will not change when the master router changes, and the virtual router uses the virtual MAC address when responding to the ARP request to ensure that the switch between the master and backup routers is transparent and eliminates a single point of failure.

The VRRP protocol uses multicast data to transmit VRRP packets, and there is only one type of control packet, that is, VRRP Advertisement. It uses IP multicast data packet encapsulation, the group address is 224.0.0.18, and the release range is limited to the same LAN. This ensures that the VRRP protocol can be applied to different types of LANs, and the VRID can be used repeatedly in different networks. VRRP packets use a special virtual MAC address to send data instead of the MAC address of the router's own network card. To reduce network bandwidth consumption when VRRP is running, only the master router can regularly send VRRP broadcast packets.

The VRRP message has two functions, that is, it is used for the election of the master router and announces its status to other members in the standby group. When the router has become the main router, it needs to broadcast its own priority and IP address and other information; the standby router only receives VRRP data and does not send data. When the master router broadcasts its priority as 0 or reaches the Master_Down_Interval time interval and does not send out broadcast packets again, it will be considered invalid, and each backup router will declare itself as the master router and send broadcast information to re-elect the master router.

When there are multiple backup routers in the group, there may be multiple master routers. At this time, each master router will compare the priority in the VRRP message with its own local priority. If the local priority is lower than the priority in VRRP, it will change its status to standby router; otherwise, it will keep its status unchanged. Through such a process, the router with the highest priority is elected as the new master router to complete the backup function of VRRP.

To ensure the security of the VRRP protocol, two security authentication measures are provided, namely plaintext authentication and IP header authentication. The plaintext authentication method requires that the same VRID and plaintext password must be provided at the same time when joining a VRRP router group. It can avoid configuration errors in the LAN, but it cannot prevent the password from being obtained through network monitoring; the IP header authentication method provides higher It can prevent attacks such as packet replay and modification.

2. Working principle of VRRP

The following figure shows the implementation principle of VRRP:

VRRP organizes a group of routers in a LAN (including a master router and several backup routers) into a virtual router, called "a backup group".

This virtual router has its own IP address 10.100.10.1 (this IP address can be the same as the interface address of a router in the standby group), and the routers in the standby group also have their own IP addresses (for example, the address of the primary router is 10.100 .10.2, the address of the backup router is 10.100.10.3). The hosts in the LAN only know the IP address of the virtual router, but do not know the IP address of the specific master router or the IP address of the standby router.

They set their default routing next-hop address to the virtual router's IP address 10.100.10.1, so hosts in the network communicate with other networks through this virtual router. If the main router in the standby group fails, the standby router will select a new main router through the election strategy to continue to provide routing services for the hosts in the network, so that the hosts in the network can communicate with the external network without interruption. 

3. Election mechanism

In a LAN, a VRRP standby group consists of routers with the same VRID, and the administrator can manually set the VRID, which ranges from 1 to 255. A router running the VRRP protocol can join different backup groups and participate in the backup of different master routers.

In a VRRP standby group, only one router performs the function of a virtual router, and this router is the master router. The main router is elected based on the priority. The optional range of priority is 0~255, but the range of priority that can be configured by the administrator is 1~254.

The highest priority is the main router, and the owner of the virtual IP address is generally regarded as the main router. The priority configuration principle is based on the speed and cost of the link, router performance and reliability, and other management policy settings. If the priorities are the same, compare the IP addresses of the interfaces. The router with the largest IP address becomes the main router; other routers are used as backup routers to monitor the status of the main router at any time and prepare to take over the work of the main router.

VRRP provides a priority preemption mechanism. If this policy is configured, the high-priority standby router will replace the low-priority primary router and become the new primary router.

When the master router is working normally, it sends a VRRP multicast message every certain time interval, and the routers in the backup group are working normally. The time interval of the VRRP message can be set by the administrator.

If the master router does not send VRRP packets within a certain time interval and there is only one router in the backup group, it will change its status to the master router; if there are multiple routers in the backup group, multiple The routers that want to change their state must go through an election mechanism to maintain the stable operation of the VRRP network.

Each router participating in the election not only receives packets, but also sends packets. If the priority in the received message is lower than that of the local router, the router will continue to broadcast the message; if it is higher than the local router, it will keep the original state, that is, the standby state. Through such a process, there will be only one router left in the VRRP standby group, and this router will change its status to become the master router.

5. VRRP state transition

In the actual running process of VRRP, there are three basic states, namely initial state, standby state and main state. The migration of the latter two states is started with the start of VRRP.

Any actual router in any virtual group can only be in one of the three states at any moment, and it is the initial state from when the router just configures each parameter to when it is enabled or when the router interface is down. Once VRRP is started, state transition begins.

Both the active state and the standby state are running states, and the main state is responsible for forwarding data packets; the standby state is responsible for monitoring the operation of the main router. In a backup group, only one VRRP router with the highest priority is in the master state, and the rest are in the standby state.

The transition of the 3 states is shown in the figure below:

The trigger timing of each conversion process is as follows.

  1. If the priority of this router is defined as a maximum of 255, it is when a router in the standby group starts the VRRP protocol.
  2. When a router in master state receives a shutdown message.
  3. If the priority of this router is defined as less than 255, it is when a router in the standby group starts the VRRP protocol.
  4. When a router in standby state receives a shutdown message.
  5. When a router in standby state does not receive the message from the master router within the response time of the master router.
  6. When a router in the master state receives a packet with a higher priority than itself.

The behavior of routers in different states is described below.

1. Initial state

The router enters this state after the system is started. When VRRP is enabled on an interface connected to the network, it will transfer to the standby state (when the priority is not 255) or the main state (when the priority is 255). In this state, the router will not process VRRP message.

Its logic flow is as follows:

IF优先级等于255(即该VRRP路由器是IP地址的拥有者,即虚实相同的路由器会自动获得)
{
该路由器发送一个ARP给所有局域网的主机,内容为虚拟IP——虚拟MAC的对应关系,使得局域网主机可以用这个MAC和IP地址封装数据包.开启广播间隔计时器 Advertisement Interval,并定期广播报文发送一个VRRP通告报文:转换自己的状态为 Master
} ELSE
{
创建间隔为Master Down Interval的广播超时定时器;将状态Initialize迁移到Backup。
}

This process shows that the router with the same virtual and real IP with a priority of 255 is the best in the entire VRRP topology, and it does not need to compare with other routers and directly enters the master state. If the priority of a router is not 255, then directly enter the standby state, and start the unique timeout timer of the standby state to start monitoring the existence and state of the main router.

2. Standby state

This state is used to monitor the reachability and state of the main state router. When the router is in this state, it will do the following work.

  • Receive the VRRP multicast message sent by the master router, and learn its status from it.
  • Do not respond to ARP requests for virtual IP addresses.
  • Discard IP packets whose destination MAC address is the virtual MAC address.
  • Discard IP packets whose destination IP address is the virtual IP address.

The logic flow in this state is as follows:

IF 接口发生类似链路中断的接口shutdown
{关闭Master超时定时器;状态由Backup退回到Initialize}
IF Master_Down_Interval超时器超时
{
IF允许抢占
{
IF延时时间为0或抢占延时标示置位
{
发送一个VRRP广播报文:针对虚拟IP地址广播的ARP报文,向外宣告自己为Master;启动广播间隔定时器Advertisement_Interval;将状态由Backup转换到Master。
}
ELSE
{将通告定时器设为延迟时间,设置抢占延时标志。}
}
ELSE 如果不允许抢占
{重新设置Master Down Interval为0,继续监听 Master}
}
IF收到通告报文
{ 
IF报文优先级为0
{设置通告定时器间隔为Skew_Time}
ELSE
{IF抢占方式为假(FALSE)或者报文的优先级大于本地优先级
{重设通告定时器间隔Master_Down_Interval为0}
ELSE{直接丢弃该报文}
}
}

The transition mechanism of this state can be divided into three types.

  • If the link of the router in this state is disconnected or the administrator manually closes the VRRP protocol function, the router will first close the Advertisement_Interval timer, and then the standby state 0 of the router will directly change to the initial state.
  • If no interruption occurs unexpectedly, the main router timeout timer is always counted. If no message from VRRP is received after timeout and VRRP is set to allow preemption and the delay time is 0, the standby router will "stand on its own as the king" and declare itself as the main router; and broadcast free ARP messages for the virtual IP address , set the advertisement timer Advertisement_Interval; if preemption is allowed and the delay time is not 0, set the advertisement timer as the delay time; if preemption is not allowed, use Master_Down_Interval to re-time the master router.
  • The above are the cases where the message from the master router has not been received. If an abnormal message is received, the state of the router may also migrate. If the priority field of the received packet is lower than the priority of the local router, it is considered that the external router has automatically lowered its priority. It may be that the administrator has set the router to no longer be the master router, and the router that lowers its priority can still forward data normally; if it still receives such low-priority packets 3 times within the Master_Down_Interva time, a timeout occurs, and the router’s standby The state transitions to the master state and participates in the election of the master router.

If a VRRP message with priority 0 (that is, the main router may be shut down) is received, the standby router will not migrate to the main state immediately; because if all the standby routers receive the message with priority 0 at the same time and migrate to the main state Sending VRRP packets in the wrong state will cause confusion on the network. The design of the state machine is to set the timeout counter of the main state to Skew_Time. Since Skew_Time=(256-Priority)/256, the higher the priority, the faster it will enter the main state.

Although all routers can receive packets with priority 0 at the same time, they cannot enter the master state through this. In fact, the router that first enters the master state broadcasts a VRRP advertisement, so that other routers have received VRRP packets with a higher priority than itself before entering the master state; at the same time, it resets the timeout timer of its own master router and maintains a standby state. This can reduce the switching time of the network when the gateway fails, speed up the convergence of the topology as much as possible, and reduce the burden on the network.

3. Main state

When in this state, the VRRP router must assume the role of a virtual router and be responsible for the following work.

  1. Periodically send VRRP broadcast packets.
  2. Send an ARP message so that each host in the network knows the virtual MAC address corresponding to the virtual IP address.
  3. Respond to the ARP request of the virtual IP address, and the response is the virtual MAC address instead of the real MAC address of the interface.
  4. Forward IP packets whose destination MAC address is the virtual MAC address.
  5. If it is the owner of the virtual IP address (IP Address Owner), it will receive the IP packet whose destination IP address is the virtual IP address; otherwise, it will discard the IP packet.

The logic flow in this state is as follows:

IF Master路由器的某个连接局域网的接口发生shutdown事件
{
删除广播间隔Advertisemem Interval定时器;发送优先级为0的VRRP广播报文,表明本路由器放弃Master地位;设置本路由器的状态由Master转移到Backup状态。
}
IF 广播间隔Advertisemem_Interval定时器到时
{
发送正常的本路由器的VRRP广播报文;将广播间隔Advertisement Interval计时器清零,重新开始计时。
}
IF收到其他路由器的一个报文{
IF收到的其他路由器的VRRP报文中显示该路由器的优先级为0
{
立刻发送本路由器的一个 VRRP广播报文;将本路由器的广播间隔Advertisement_Interval定时器清零0
}
ELSE
{
IF 报文中的优先级大于本地路由器的优先级或虽然优先级相等,但发送者接口的 IP 地址大于本地路由器接口的实际的IP地址
{
将通告定时器间隔由Advertisement_Interval修改为Master_Down_Interval;将本地路由器的状态由Master转换到Backup状态。
}
ELSE
{丢弃报文}
}
}

This state process defines the following three situations in total:

  1. When the port of the downlink LAN of the main router is unexpectedly interrupted, it will set its own priority to 0 and delete the broadcast timer, and migrate the state from the main state to the standby state.
  2. When the Advertisement_Interval timer in the master router expires, it will send a VRRP packet containing priority and timer time information to prove that it is the master router.
  3. When the master router receives a competitive challenge from other routers, if the priority of the competitor is 0 or lower or equal to itself and the IP address of the interface is smaller than itself, it will immediately send a VRRP message from the master router to prove that it is eligible to become At the same time, clear the timer of the local router; if the priority of the competitor is higher than itself or the priority is equal and the IP address of the interface is larger than itself, then change the advertising timer of this router to the failure timer of the main router switch, and then change the status from Primary to Standby.

6. Message format

The source address of the VRRP broadcast message is the primary IP address of the main router interface; the destination IP address must be a multicast address, that is, 224.0.0.18, which is a multicast address within the scope of the local link. For this type of IP message, the router ignores its TTL and does not forward it; in addition, the TTL field value of the IP datagram must be 255, and the VRRP router will discard the VRRP message whose TTL value is not equal to 255. The protocol field of the IP datagram header is 112 in decimal.

The VRRP protocol is a protocol running on the network layer. In the static mode, it has only one protocol message, that is, the Advertisement broadcast message; in the dynamic mode, there are two types of messages: advertisement and payload. VRRP packet encapsulation is similar to ICMP packets, which are encapsulated in IP packets for transmission on the network, and then MAC addresses are encapsulated in front of IP packets to form a frame and then transmitted on the network.

The following table shows the packet format of VRRP:

The MAC frame header is divided into two parts, the source MAC address and the destination MAC address, as shown in the following table. 

1. Link layer protocol header-MAC frame header

There are two types of destination MAC addresses. If it is a broadcast message, it is the multicast MAC address corresponding to the multicast address 224.0.0.18; if it is a payload message, it is the virtual MAC address of the entire standby group.

The source MAC address is also divided into two types. If it is a broadcast message, it is the virtual MAC address of the standby group (00-00-5E-00-01-{VRID). VRID is the virtual router ID value in hexadecimal format, so there are at most 255 VRRP routers in the same network segment; if it is a payload message, the destination MAC address is the actual MAC address of the interface.

2. IP packet format

The IP packet format of VRRP is shown in the following table:

The source IP address is the main IP address of the main router interface; the destination IP address is divided into two types, the first one is an advertisement message, that is, the multicast address is 224.0.0.18. This is a local multicast IP address, and the router will not forward the IP address encapsulated with this target address regardless of the TTL value; the second type of payload message is the virtual IP address of the standby group.

The lifetime in the VRRP IP packet is 255. If a router in the VRRP standby group receives a VRRP packet with a TTL value not equal to 255, it will discard it.

The value of the protocol field in the VRRP IP packet is 112 in decimal.

3. Packet encapsulation format

The format of VRRP packets encapsulated in IP packets is shown in the following table. 

illustrate:

  1. Version number: VRRP version number, 4 digits. Defined as 2 in RFC3768.
  2. Type: VRRP packet type, 4 bits. Currently, only one type is defined, that is, the broadcast message, and the value is 1.
  3. VRID: virtual route ID, 8 bits. A VRID is used to uniquely identify a VRRP standby group, and the VRID in the message indicates the standby group to which the message belongs.
  4. Priority: Indicates the priority of the VRRP router sending the message, represented by an 8-bit unsigned integer, and the value range is 0-255. If the value is 0, it has a special meaning. A router sending a priority packet of 0 indicates that the master router automatically relinquishes its master status, which will cause the remaining backup routers to conduct another round of master router election. If the IP address of a certain router is a virtual IP, it will be IP_address_owner, and its priority will be automatically set to 255; otherwise, the value range is 1-254. If the administrator does not configure it, the default value is 100.
  5. Number of virtual IP addresses: 8 bits, indicating the number of virtual IP addresses in the VRRP broadcast message, normally 1.
  6. Mode: 4 bits, indicating the standby group load sharing mode. If it is static load balancing, it is 0; if it is dynamic load balancing, it is 1.
  7. Authentication type: 4 digits, the authentication types in the same backup group must be the same. If a VRRP router in the backup group receives a data broadcast packet with a different authentication type from its own, it discards it. There are three authentication types as follows.
    0:表示没有认证,报文中的验证数字字段将全设为 0 或者接收报文方将忽略认证数据。
    1:表示通过没有加密处理的简单字符进行认证,验证数据段中填写的是认证字符串。报文接收者将与自己所配置的认证字符串比较,不一致则将报文丢弃。这种认证方法其实没有任何安全性可言,黑客可能通过某些途径截获报文,然后直接看到认证字符串。
    2:表示MD5认证,验证数据字段填写的是认证字符串和其他需要认证的内容,通过MD5算法得到报文摘要。
  8. VRRP broadcast period: The period for VRRP to send broadcast packets, 8 bits. The unit is second, and the default is 1 second. The broadcast period of VRRP must be the same in a standby group.
  9. Checksum: 16 bits, used to check whether the message is damaged during network transmission. The sender of the message writes 0 in this field, and then calculates the checksum of the entire message and fills it in this field. After receiving the message, the receiver uses the same method to calculate the checksum, and the result must be 0.
  10. IP address: The virtual IP address of the standby group.
  11. Authentication Data: Authentication data, fields and types are specified by the value of the Authentication Type field. 

4. Verification when receiving data

The following verifications are performed when receiving VRRP packets, and unsatisfactory packets will be discarded.

  1. TTL must be 255.
  2. The VRRP version number must be 2.
  3. The data fields in a packet must be complete.
  4. Checksum must be correct.
  5. It must be verified that the VRID value is configured in the receiving NIC and that the local router is not the owner of the IP address.
  6. It must be verified that the VRRP authentication type is consistent with the configuration.

5. Payload message format

The format of the VRRP payload packet is shown in the following table:

The value of the payload field is 0 to 100, and the MAC field is the actual MAC address of the corresponding interface of the standby router. The IP address field is the actual IP address of the interface, and its type is 2. The meanings of other fields are similar to those in VRRP broadcast packets. 

Guess you like

Origin blog.csdn.net/qq_35029061/article/details/128792698