DPDK's virtual switch framework OvS

Introduction to DPDK

DPDK is a collection of libraries and drivers for fast packet processing on the X86 platform. It is not a network protocol stack, does not provide Layer 2 and Layer 3 forwarding functions, and does not have firewall ACL functions, but the above functions can be easily developed through DPDK.

The advantage of DPDK is that it can directly forward the user mode data to the network card without going through the kernel to achieve the purpose of acceleration. The main structure is shown in the figure:.

Compared with the traditional socket method and DPDK:

DPDK key technical points:

  1. Use large page cache support to improve memory access efficiency.

  2. UIO support is used to provide driver support in the application space, that is to say, the network card driver runs in the user space  , which reduces the multiple copies of messages in the user space and application space.

  3. Using LINUX affinity support, the control plane thread and each data plane thread are bound to different CPU cores , which saves the thread from being scheduled back and forth between each CPU core.

  4. LOCKLESS , providing lock-free ring cache management to speed up memory access efficiency.

  5. Batch processing of sending and receiving packets. Multiple sending and receiving packets are concentrated into one cache line, which is implemented in the memory pool without repeated application and release.

  6. PMD driver , polling driver in user mode , can reduce the overhead of context switching, and facilitate the realization of zero copy of virtual machine and host.

OVS+DPDK

With its rich functions, OpenvSwitch has been widely used in cloud environments as a multi-layer virtual switch. The main function of Open vSwitch is to provide Layer 2 network access for VMs on physical machines, and to work in Layer 2 in parallel with other physical switches in the cloud environment.

The traditional host ovs works in the kernel state, and the data transmission with the guest virtio requires multiple data switching between the kernel state and the user state, which brings performance bottlenecks.

The difference between Ovs+Dpdk and Ovs itself can be simply represented by the following figure:

In the early version of OVS, in order to alleviate the problem of slow multi-level flow table lookup, OVS uses the Microflow Cache method in the kernel mode. Microflow Cache is a Hash-based exact match lookup table (O(1)). The table item caches the results of multi-level lookup tables and maintains the state of each link granularity. Microflow Cache reduces the number of times messages enter the multi-level table in the user state. After the first packet of a flow enters the table lookup in user mode, subsequent packets will hit the Microflow Cache in the kernel, which speeds up the table lookup. However, for a network environment with a large number of short flows, the hit rate of Microflow Cache is very low, so most packets still need to go to the user mode for multi-level flow table lookup, so the forwarding performance is limited.

Then, in order to solve the problems of Mircoflow Cache, OVS replaced Mircoflow Cache with Megaflow Cache. Unlike Mircoflow Cache's precise Hash lookup table, Megaflow Cache supports table lookup with wildcards, so it can reduce the number of times packets are sent to user space to look up tables. Yu Zhihui's blog analyzed the data structure and table lookup process of megaflow at that time, and the relevant content will not be repeated here, please see the link in the above article. However, since OVS uses tuple space search (described below) to implement Megaflow Cache lookups, the average number of table lookups is half of the number of tuple tables. Assuming that the tuple table chain for tuple space search is m, then the average table lookup cost is O(m/2). The table lookup cost comparison between Mircoflow Cache and Megaflow Cache is O(1)<O(m/2). Therefore, compared with Mircoflow Cache, Megaflow Cache reduces the number of times packets are looked up in user space, but increases the number of times packets are looked up in kernel mode.

For this reason, the current version of OVS adopts the flow cache organization form of Megaflow Cache+Microflow Cache, and still retains Microflow Cache as the first-level Cache, that is, the first-level Cache is checked first after the message enters. It's just that the meaning of this Microflow Cache is different from the original Microflow Cache. The original Microflow Cache is an actual accurate Hash table, but the Microflow Cache in the latest version is not a table, but an index value, pointing to the last Megaflow Cache entry. Then the first table lookup of the message does not need to perform a linear chain search, and can directly correspond to one of the Megaflow tuple tables. The table lookup costs of these three stages are shown in the table.

DPDK high-performance (user space) network card driver, large page memory, and lock-free structural design can easily achieve the performance of 10G network card line speed. ovs-dpdk enables the entire data transmission from vm to vm and from nic to vm to work in user mode, which greatly improves the performance of ovs.

In addition, ovs-dpdk combines the advantages of DPDK and vhost-user technologies. vhost-user is a user-mode vhost-backend program that realizes zero copy of data from the virtual machine to the host.

Comparison between native ovs and ovs-dpdk

(1) The native ovs data stream processing process is as follows:

  1. After the data packet arrives at the network card, it is uploaded to Datapath;

  2. Datapath will check whether the accurate flow table in the cache can directly forward this packet. If no record is found in the cache, the kernel sends an upcall to vswitchd in user space through netlink;

  3. vswitchd checks the database to see where the packet's destination port is. To operate the openflow flow table here, you need to interact with ovsdb and ovs-ofctl;

  4. Refresh the content of the kernel mode flow table;

  5. Reinject to the datapath and resend the data packet;

  6. Query the flow table again, obtain the precise forwarding rules of the data packets, and forward them according to the rules

(2) ovs-dpdk method:

The user mode process directly takes over the network card to send and receive data, adopts the "IO exclusive core" technology, that is, each port allocates a core for data sending and receiving, and the polling processing method replaces the interrupt processing, which significantly improves the IO performance.

Summarize:

Use ovs-dpdk

hardware requirements

The network card must support DPDK, see: http://dpdk.org/doc/nics

CPU must support DPDK, test command: cat /proc/cpuinfo |grep pdpe1gb

It is not necessary to support the network card of DPDK hardware, because DPDK also supports virtio dpdk driver

Turn on hugepage support

hua@node1:~$ cat  /etc/default/grub |grep GRUB_CMDLINE_LINUXGRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on pci=assign-busses"GRUB_CMDLINE_LINUX="transparent_hugepage=never hugepagesz=2M hugepages=64 default_hugepagesz=2M"
hua@node1:~$ mkdir /mnt/hugehua@node1:~$ mount -t hugetlbfs nodev /mnt/hugehua@node1:~$ echo 8192 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepageshua@node1:~$ cat /proc/meminfo |grep HugePages_HugePages_Total:       8192HugePages_Free:        8192HugePages_Rsvd:        0HugePages_Surp:        0hua@node1:~$ grep Hugepagesize /proc/meminfoHugepagesize:       2048 kB

Configure the network card to use the uio_pci_generic driver, and the virtual machine can use PF as the DPDK port, or use VF in direct mode:

As for PF driver, you can use ixgbe or DPDK PF driver; for VF driver, you can use ixgbefv or DPDK VF driver. If you want to use DPDK PF driver, add iommu=pt in grub.

Communication is also possible in the following ways:

Test scene construction

​​​​​​​

sudo ovs-vsctl ovsbr0 br-intsudo ovs-vsctl set bridge ovsbr0 datapath_type=netdevsudo ovs-vsctl add-port ovsbr0 dpdk0 -- set Interface dpdk0 type=dpdk  #Port name shoud begin with dpdk
# 给虚机创建普通ovs port:sudo ovs-vsctl add-port ovsbr0 intP1 -- set Interface intP1 type=internalsudo ip addr add 192.168.10.129/24 dev intP1sudo ip link set dev intP1 upsudo tcpdump -i intP1
# 或给虚机创建vhost-user port:sudo ovs-vsctl add-port ovsbr0 vhost-user2 -- set Interface vhost-user2 type=dpdkvhostuser
# 虚机使用vhost-user portsudo qemu-system-x86_64 -enable-kvm -m 128 -smp 2 \    -chardev socket,id=char0,path=/var/run/openvswitch/vhost-user1 \    -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \    -device virtio-net-pci,netdev=mynet1,mac=52:54:00:02:d9:01 \    -object memory-backend-file,id=mem,size=128M,mem-path=/mnt/huge,share=on \    -numa node,memdev=mem -mem-prealloc \    -net user,hostfwd=tcp::10021-:22 -net nic \    /bak/images/openstack_demo.img
# Inside VM: sudo ifconfig eth0 192.168.9.108/24

The schematic diagram of using vhost-user is as follows:

dpdk-virtio-net

Original address: https://blog.csdn.net/qq_15437629/article/details/77887910

DPDK's virtual switch framework OvS

  • 0vS three major components ovs vwitond, ovd-server, oswichn.ko

  • OvS packet processing mechanism

  • OvS 4 data paths

  • VXLAN data protocol

dpdk-network protocol stack-vpp-OvS-DDos-virtualization systematic learning course

  • DDos protection

  • OvS virtual switch

  • VPP packet processing framework

  • nff-go network development framework

  • network virtualization virtio

  • spdk efficient disk io read and write

  • User mode protocol stack

  • c1000k,c10M

32 project cases

2w+ lines of handwritten code

The 4 technical directions of dpdk are security, network, storage, and cloud generation

80+ hours of recording and broadcasting time

[Article Benefits]: The editor has sorted out some learning books and video materials that I personally think are better and shared them in Junyang: Documents, you can add them yourself if you need them! ~Click 909332607 to join (requires self-collection)
(information includes: Dpdk/network protocol stack/vpp/OvS/DDos/NFV/virtualization, tcp/ip, plugin, feature, flexible array, golang, mysql, linux, Redis, CDN etc.), free sharing~
Dpdk/network protocol stack/vpp/OvS/DDos/NFV/virtualization/high performance expert learning address : https://ke.qq.com/course/506620

Guess you like

Origin blog.csdn.net/weixin_60043341/article/details/127905898