KVM virtualization network optimization technology summary

KVM virtualization network optimization technology summary

Source http://blog.51cto.com/xiaoli110/1558984

 

The path of a complete packet from the virtual machine to the physical machine is:

Virtual machine--QEMU virtual network card--virtualization layer--kernel bridge--physical network card

clip_image002

KVM's network optimization solution, in general, is to allow the virtual machine to access fewer layers of physical network cards, until the physical network card is occupied alone, and the physical network card is used the same as the physical machine to achieve the same network performance as the physical machine.

 

Option 1 Fully virtualized network card and virtio

clip_image004

The difference between Virtio and a fully virtualized network card    
A fully virtualized network card is a network card that is completely simulated by the virtualization layer. The para-virtualized network card has modified the operating system through drivers;    
viritio simply means to tell the virtual machine, hi, you are in a virtual machine. Run on a virtualized platform, let's make some changes together to allow you to get better performance on a virtualized platform;    
clip_image006  

  
Regarding the usage scenario of virtio,    
because the windows virtual machine uses viritio, there will be network flashes. If the network pressure of the windows virtual machine is not high, it is recommended to use a fully virtualized network card such as e1000. If the network pressure is relatively high, it is recommended to use SRIVO or PCI Device Technology such as Assignment; viritio is also constantly evolving, and I hope that the problem of windows flashing will become less and less.

KVM is designed for linux system, please feel free to use viritio driver for linux system;

 

Scheme 2 vhost_net macvtap technology

clip_image008

vhost_net enables the network communication of the virtual machine to bypass the virtualization layer of the user space directly, and can communicate with the kernel directly, thereby providing the network performance of the virtual machine;

macvtap is a bridge that skips the kernel;

 

To use vhost_net, you must use a virtio paravirtualized network card;

vhost_net virtual machine xml file configuration,

1
2
3
4
5
6
7
< interface  type = 'bridge' >
      < mac  address = '' />
      < source  bridge = 'br0' />
      < model  type = 'virtio' />
< driver  name = "vhost" />
      < address  type = 'pci'  domain = '0x0000'  bus = '0x00'  slot = '0x03'  function = '0x0' />
    </ interface >

If not using vhost_net, or

1
< driver  name = "qemu" />

macvtap virtual machine xml configuration

1
2
3
4
5
6
< interface  type = 'direct' >
       < mac  address = '00:16:3e:d5:d6:77' />
       < source  dev = 'lo'  mode = 'bridge' />
       < model  type = 'e1000' />
       < address  type = 'pci'  domain = '0x0000'  bus = '0x00'  slot = '0x03'  function = '0x0' />
     </ interface >

Note: macvtap has poor performance on Windows virtual machines and is not recommended

vhost_net macvtap comparison

The function of macvlan is to configure multiple MAC addresses for the same physical network card, so that multiple Ethernet ports can be configured in the software vendor, which belongs to the function of the physical layer.   
macvtap is used to replace the TUN/TAP and Bridge kernel modules. macvtap is based on the macvlan module, which provides the interface used by the tap device in TUN/TAP.    
The virtual machine using the macvtap Ethernet port can directly pass data to the tap device interface. The corresponding macvtap Ethernet port in the kernel.    
vhost-net is an optimization for virtio. virtio was originally designed to communicate between the front-end of the client system and the back-end of VMM, reducing the switching between root mode and non-root mode in hardware virtualization mode.    
Instead, after using vhost-net, you can further enter the root mode of the CPU, and you need to enter the user mode to send data to the tap device and then switch to the kernel mode again. After entering the kernel mode, you do not need to perform the kernel mode user mode. Switching, further reducing this privilege level switching, it is not accurate to say which layer vhost-net belongs to, but it belongs to the optimization of layer 2 network data transmission.

 

Scenario 3 Virtual machine network card exclusive

clip_image010

Configuration method of network card passthrough in virtual machine

Use the lcpci device to view the pci device information

1
2
04:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
04:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)

You can also use virsh nodedev-list --tree to get information

1
2
3
4
5
6
7
8
9
+- pci_0000_00_07_0
|   |
|   +- pci_0000_04_00_0
|   |   |
|   |   +- net_p1p1_00_1b_21_88_69_dc
|   |  
|   +- pci_0000_04_00_1
|       |
|       +- net_p1p2_00_1b_21_88_69_dd

Use virsh nodedev-dumxml pci_0000_04_00_0 to get the xml configuration information

1
[root@] # virsh nodedev-dumpxml pci_0000_04_00_0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<device>
<name>pci_0000_04_00_0< /name >
<parent>pci_0000_00_07_0< /parent >
<driver>
    <name>e1000e< /name >
< /driver >
<capability  type = 'pci' >
    <domain>0< /domain >
    <bus>4< /bus >
    <slot>0< /slot >
    < function >0< /function >
    <product  id = '0x105e' >82571EB Gigabit Ethernet Controller< /product >
    <vendor  id = '0x8086' >Intel Corporation< /vendor >
< /capability >
< /device >

3 Edit the virtual machine xml file and add the pci device information

1
2
3
4
5
< hostdev  mode = 'subsystem'  type = 'pci'  managed = 'yes' >
      < source >
        < address  domain = '0x0000'  bus = '0x04'  slot = '0x00'  function = '0x0' />
      </ source >
</ hostdev >

Domain bus slot function information is obtained from the xml file from dumpxml, define the virtual machine, and then start the virtual machine. Note that the attachment is a physical device, and the corresponding driver needs to be installed in the system.

 

Option 4 SR-IVO technology

 

The principle of     SRIOV
SR-IVO is the abbreviation of the single root I/O virtualization. It is a standard for sharing PCIe to virtual machines. It is currently used in many network devices. In theory, it can also support other PCI devices. SRIOV requires hardware support.

clip_image012

The following content is from the oracle website, the link is   
http://docs.oracle.com/cd/E38902_01/html/E38873/glbzi.html

clip_image014

物理功能 (Physical Function, PF)   
用 于支持 SR-IOV 功能的 PCI 功能,如 SR-IOV 规范中定义。PF 包含 SR-IOV 功能结构,用于管理 SR-IOV 功能。PF 是全功能的 PCIe 功能,可以像其他任何 PCIe 设备一样进行发现、管理和处理。PF 拥有完全配置资源,可以用于配置或控制 PCIe 设备。    
虚拟功能 (Virtual Function, VF)    
与物理功能关联的一种功能。VF 是一种轻量级 PCIe 功能,可以与物理功能以及与同一物理功能关联的其他 VF 共享一个或多个物理资源。VF 仅允许拥有用于其自身行为的配置资源。    
每 个 SR-IOV 设备都可有一个物理功能 (Physical Function, PF),并且每个 PF 最多可有 64,000 个与其关联的虚拟功能 (Virtual Function, VF)。PF 可以通过寄存器创建 VF,这些寄存器设计有专用于此目的的属性。    
一 旦在 PF 中启用了 SR-IOV,就可以通过 PF 的总线、设备和功能编号(路由 ID)访问各个 VF 的 PCI 配置空间。每个 VF 都具有一个 PCI 内存空间,用于映射其寄存器集。VF 设备驱动程序对寄存器集进行操作以启用其功能,并且显示为实际存在的 PCI 设备。创建 VF 后,可以直接将其指定给 IO 来宾域或各个应用程序(如裸机平台上的 Oracle Solaris Zones)。此功能使得虚拟功能可以共享物理设备,并在没有 CPU 和虚拟机管理程序软件开销的情况下执行 I/O。    
SR-IOV 的优点    
SR-IOV 标准允许在 IO 来宾域之间高效共享 PCIe 设备。SR-IOV 设备可以具有数百个与某个物理功能 (Physical Function, PF) 关联的虚拟功能 (Virtual Function, VF)。VF 的创建可由 PF 通过设计用来开启 SR-IOV 功能的寄存器以动态方式进行控制。缺省情况下,SR-IOV 功能处于禁用状态,PF 充当传统 PCIe 设备。    
具有 SR-IOV 功能的设备可以利用以下优点:    
性能-从虚拟机环境直接访问硬件。    
成本降低-节省的资本和运营开销包括:    
节能    
减少了适配器数量    
简化了布线    
减少了交换机端口    
SRIOV的使用    
启动SRIVO内核模块    
modprobe igb    
激活虚拟功能VF    
modprobe igb max_vfs=7    
千兆网卡最多支持8个vf0-7,千兆网卡目前支持比较好的是INTEL I350, 82576S虽然也支持SRIOV但是只支持虚拟机是linux的情况,windows系统不支持;    
万兆网卡最多支持64个vg0-63,intel的新新一代万兆网卡都支持SRIOV x520 x540等;    
如果需要重新设置vf 可以删除模块在重新加载    
modprobe -r igb    
将配置永久写入配置文件    
echo "options igb max_vfs=7" >>/etc/modprobe.d/igb.conf    
You can see multiple main network cards and subnet cards through the lspci command    

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# lspci | grep 82576    
0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)    
0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)    
0b:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)    
0b:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)    
0b:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)    
0b:10.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)    
0b:10.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)    
0b:10.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)    
0b:10.6 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)    
0b:10.7 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)    
0b:11.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)    
0b:11.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)    
0b:11.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)    
0b:11.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)    
0b:11.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)    
0b:11.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)

 
The virtual machine can listen to the exclusive use of the subnet card by the pci network card;    

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# virsh nodedev-list | grep 0b    
pci_0000_0b_00_0    
pci_0000_0b_00_1    
pci_0000_0b_10_0    
pci_0000_0b_10_1    
pci_0000_0b_10_2    
pci_0000_0b_10_3    
pci_0000_0b_10_4    
pci_0000_0b_10_5    
pci_0000_0b_10_6    
pci_0000_0b_11_7    
pci_0000_0b_11_1    
pci_0000_0b_11_2    
pci_0000_0b_11_3    
pci_0000_0b_11_4    
pci_0000_0b_11_5

Virtual machine network card xml file    

1
2
3
4
5
< interface  type = 'hostdev'  managed = 'yes' >    
     < source >    
       < address  type = 'pci'  domain = '0'  bus = '11'  slot = '16'  function = '0' />    
     </ source >    
   </ interface >

 

Scenario 5 NIC Multi-Queue

Centos 7 begins to support virtio network card multi-queue , which can greatly improve the network performance of virtual machines. The configuration method is as follows:

xml network card configuration of the virtual machine

1
2
< interface  type = 'network' >
      < source  network = 'default' />        < model  type = 'virtio' />         < driver  name = 'vhost'  queues = 'N' />    </ interface >

N 1 - 8 supports up to 8 queues

Execute the following command on the virtual machine to enable the multi-queue NIC

1
#ethtool -L eth0 combined M


M 1 - NM is less than or equal to N

 

I personally think that the KVM network optimization solution is mainly based on hardware, and the solution of 10 Gigabit + SRIOV on the hardware will become more and more popular, but the problem of online migration needs to be solved.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324976364&siteId=291194637