KVM virtualization network optimization technology summary
Source http://blog.51cto.com/xiaoli110/1558984
The path of a complete packet from the virtual machine to the physical machine is:
Virtual machine--QEMU virtual network card--virtualization layer--kernel bridge--physical network card
KVM's network optimization solution, in general, is to allow the virtual machine to access fewer layers of physical network cards, until the physical network card is occupied alone, and the physical network card is used the same as the physical machine to achieve the same network performance as the physical machine.
Option 1 Fully virtualized network card and virtio
The difference between Virtio and a fully virtualized network card
A fully virtualized network card is a network card that is completely simulated by the virtualization layer. The para-virtualized network card has modified the operating system through drivers;
viritio simply means to tell the virtual machine, hi, you are in a virtual machine. Run on a virtualized platform, let's make some changes together to allow you to get better performance on a virtualized platform;
Regarding the usage scenario of virtio,
because the windows virtual machine uses viritio, there will be network flashes. If the network pressure of the windows virtual machine is not high, it is recommended to use a fully virtualized network card such as e1000. If the network pressure is relatively high, it is recommended to use SRIVO or PCI Device Technology such as Assignment; viritio is also constantly evolving, and I hope that the problem of windows flashing will become less and less.
KVM is designed for linux system, please feel free to use viritio driver for linux system;
Scheme 2 vhost_net macvtap technology
vhost_net enables the network communication of the virtual machine to bypass the virtualization layer of the user space directly, and can communicate with the kernel directly, thereby providing the network performance of the virtual machine;
macvtap is a bridge that skips the kernel;
To use vhost_net, you must use a virtio paravirtualized network card;
vhost_net virtual machine xml file configuration,
1
2
3
4
5
6
7
|
<
interface
type
=
'bridge'
>
<
mac
address
=
''
/>
<
source
bridge
=
'br0'
/>
<
model
type
=
'virtio'
/>
<
driver
name
=
"vhost"
/>
<
address
type
=
'pci'
domain
=
'0x0000'
bus
=
'0x00'
slot
=
'0x03'
function
=
'0x0'
/>
</
interface
>
|
If not using vhost_net, or
1
|
<
driver
name
=
"qemu"
/>
|
macvtap virtual machine xml configuration
1
2
3
4
5
6
|
<
interface
type
=
'direct'
>
<
mac
address
=
'00:16:3e:d5:d6:77'
/>
<
source
dev
=
'lo'
mode
=
'bridge'
/>
<
model
type
=
'e1000'
/>
<
address
type
=
'pci'
domain
=
'0x0000'
bus
=
'0x00'
slot
=
'0x03'
function
=
'0x0'
/>
</
interface
>
|
Note: macvtap has poor performance on Windows virtual machines and is not recommended
vhost_net macvtap comparison
The function of macvlan is to configure multiple MAC addresses for the same physical network card, so that multiple Ethernet ports can be configured in the software vendor, which belongs to the function of the physical layer.
macvtap is used to replace the TUN/TAP and Bridge kernel modules. macvtap is based on the macvlan module, which provides the interface used by the tap device in TUN/TAP.
The virtual machine using the macvtap Ethernet port can directly pass data to the tap device interface. The corresponding macvtap Ethernet port in the kernel.
vhost-net is an optimization for virtio. virtio was originally designed to communicate between the front-end of the client system and the back-end of VMM, reducing the switching between root mode and non-root mode in hardware virtualization mode.
Instead, after using vhost-net, you can further enter the root mode of the CPU, and you need to enter the user mode to send data to the tap device and then switch to the kernel mode again. After entering the kernel mode, you do not need to perform the kernel mode user mode. Switching, further reducing this privilege level switching, it is not accurate to say which layer vhost-net belongs to, but it belongs to the optimization of layer 2 network data transmission.
Scenario 3 Virtual machine network card exclusive
Configuration method of network card passthrough in virtual machine
1 Use the lcpci device to view the pci device information
1
2
|
04:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
04:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
|
You can also use virsh nodedev-list --tree to get information
1
2
3
4
5
6
7
8
9
|
+- pci_0000_00_07_0
| |
| +- pci_0000_04_00_0
| | |
| | +- net_p1p1_00_1b_21_88_69_dc
| |
| +- pci_0000_04_00_1
| |
| +- net_p1p2_00_1b_21_88_69_dd
|
2 Use virsh nodedev-dumxml pci_0000_04_00_0 to get the xml configuration information
1
|
[root@]
# virsh nodedev-dumpxml pci_0000_04_00_0
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
<device>
<name>pci_0000_04_00_0<
/name
>
<parent>pci_0000_00_07_0<
/parent
>
<driver>
<name>e1000e<
/name
>
<
/driver
>
<capability
type
=
'pci'
>
<domain>0<
/domain
>
<bus>4<
/bus
>
<slot>0<
/slot
>
<
function
>0<
/function
>
<product
id
=
'0x105e'
>82571EB Gigabit Ethernet Controller<
/product
>
<vendor
id
=
'0x8086'
>Intel Corporation<
/vendor
>
<
/capability
>
<
/device
>
|
3 Edit the virtual machine xml file and add the pci device information
1
2
3
4
5
|
<
hostdev
mode
=
'subsystem'
type
=
'pci'
managed
=
'yes'
>
<
source
>
<
address
domain
=
'0x0000'
bus
=
'0x04'
slot
=
'0x00'
function
=
'0x0'
/>
</
source
>
</
hostdev
>
|
Domain bus slot function information is obtained from the xml file from dumpxml, define the virtual machine, and then start the virtual machine. Note that the attachment is a physical device, and the corresponding driver needs to be installed in the system.
Option 4 SR-IVO technology
The principle of SRIOV
SR-IVO is the abbreviation of the single root I/O virtualization. It is a standard for sharing PCIe to virtual machines. It is currently used in many network devices. In theory, it can also support other PCI devices. SRIOV requires hardware support.
The following content is from the oracle website, the link is
http://docs.oracle.com/cd/E38902_01/html/E38873/glbzi.html
物理功能 (Physical Function, PF)
用 于支持 SR-IOV 功能的 PCI 功能,如 SR-IOV 规范中定义。PF 包含 SR-IOV 功能结构,用于管理 SR-IOV 功能。PF 是全功能的 PCIe 功能,可以像其他任何 PCIe 设备一样进行发现、管理和处理。PF 拥有完全配置资源,可以用于配置或控制 PCIe 设备。
虚拟功能 (Virtual Function, VF)
与物理功能关联的一种功能。VF 是一种轻量级 PCIe 功能,可以与物理功能以及与同一物理功能关联的其他 VF 共享一个或多个物理资源。VF 仅允许拥有用于其自身行为的配置资源。
每 个 SR-IOV 设备都可有一个物理功能 (Physical Function, PF),并且每个 PF 最多可有 64,000 个与其关联的虚拟功能 (Virtual Function, VF)。PF 可以通过寄存器创建 VF,这些寄存器设计有专用于此目的的属性。
一 旦在 PF 中启用了 SR-IOV,就可以通过 PF 的总线、设备和功能编号(路由 ID)访问各个 VF 的 PCI 配置空间。每个 VF 都具有一个 PCI 内存空间,用于映射其寄存器集。VF 设备驱动程序对寄存器集进行操作以启用其功能,并且显示为实际存在的 PCI 设备。创建 VF 后,可以直接将其指定给 IO 来宾域或各个应用程序(如裸机平台上的 Oracle Solaris Zones)。此功能使得虚拟功能可以共享物理设备,并在没有 CPU 和虚拟机管理程序软件开销的情况下执行 I/O。
SR-IOV 的优点
SR-IOV 标准允许在 IO 来宾域之间高效共享 PCIe 设备。SR-IOV 设备可以具有数百个与某个物理功能 (Physical Function, PF) 关联的虚拟功能 (Virtual Function, VF)。VF 的创建可由 PF 通过设计用来开启 SR-IOV 功能的寄存器以动态方式进行控制。缺省情况下,SR-IOV 功能处于禁用状态,PF 充当传统 PCIe 设备。
具有 SR-IOV 功能的设备可以利用以下优点:
性能-从虚拟机环境直接访问硬件。
成本降低-节省的资本和运营开销包括:
节能
减少了适配器数量
简化了布线
减少了交换机端口
SRIOV的使用
启动SRIVO内核模块
modprobe igb
激活虚拟功能VF
modprobe igb max_vfs=7
千兆网卡最多支持8个vf0-7,千兆网卡目前支持比较好的是INTEL I350, 82576S虽然也支持SRIOV但是只支持虚拟机是linux的情况,windows系统不支持;
万兆网卡最多支持64个vg0-63,intel的新新一代万兆网卡都支持SRIOV x520 x540等;
如果需要重新设置vf 可以删除模块在重新加载
modprobe -r igb
将配置永久写入配置文件
echo "options igb max_vfs=7" >>/etc/modprobe.d/igb.conf
You can see multiple main network cards and subnet cards through the lspci command
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
# lspci | grep 82576
0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
0b:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:10.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:10.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:10.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:10.6 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:10.7 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:11.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:11.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:11.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:11.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:11.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0b:11.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
|
The virtual machine can listen to the exclusive use of the subnet card by the pci network card;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
# virsh nodedev-list | grep 0b
pci_0000_0b_00_0
pci_0000_0b_00_1
pci_0000_0b_10_0
pci_0000_0b_10_1
pci_0000_0b_10_2
pci_0000_0b_10_3
pci_0000_0b_10_4
pci_0000_0b_10_5
pci_0000_0b_10_6
pci_0000_0b_11_7
pci_0000_0b_11_1
pci_0000_0b_11_2
pci_0000_0b_11_3
pci_0000_0b_11_4
pci_0000_0b_11_5
|
Virtual machine network card xml file
1
2
3
4
5
|
<
interface
type
=
'hostdev'
managed
=
'yes'
>
<
source
>
<
address
type
=
'pci'
domain
=
'0'
bus
=
'11'
slot
=
'16'
function
=
'0'
/>
</
source
>
</
interface
>
|
Scenario 5 NIC Multi-Queue
Centos 7 begins to support virtio network card multi-queue , which can greatly improve the network performance of virtual machines. The configuration method is as follows:
xml network card configuration of the virtual machine
1
2
|
<
interface
type
=
'network'
>
<
source
network
=
'default'
/> <
model
type
=
'virtio'
/> <
driver
name
=
'vhost'
queues
=
'N'
/> </
interface
>
|
N 1 - 8 supports up to 8 queues
Execute the following command on the virtual machine to enable the multi-queue NIC
1
|
#ethtool -L eth0 combined M
|
M 1 - NM is less than or equal to N
I personally think that the KVM network optimization solution is mainly based on hardware, and the solution of 10 Gigabit + SRIOV on the hardware will become more and more popular, but the problem of online migration needs to be solved.