KVM virtualization of CPU

1.1 Why virtual CPU

Virtualization refers to the x86 system, one or more passenger Caozuojitong (Guest Operating System, referred to as: Guest OS) in a main Caozuojitong (Host Operating System, referred to as: Host OS) running under a technology. This technique requires only few modifications to the passenger Caozuojitong or even no modifications. At first x86 processor architecture does not meet the Popek and Goldberg virtualization requirements (Popek and Goldberg virtualization requirements), which makes the x86 processor operating under ordinary virtual machines become very complicated. In 2005 and 2006, Intel and AMD, respectively, on their x86 architecture solves this problem as well as other virtualization difficulties.

1.2 Ring0 on the CPU, Ring1 ···

image.png | center | 400x288.125

ring0 involves running the CPU level, it is the highest level ring0, ring1 followed, ring2 followed by more ...... get Linux + x86, the operating system (kernel) code running at the highest level ring0 running, you can use privileged instructions, interrupt control, modify the page table, access to equipment and so on. Code that runs the application running at the lowest level on ring3, can not do controlled operation. If you do, for example, to access the disk, write files, you should call (function) by executing the system, when the system call, the CPU is switched from the run level ring3 to ring0 will occur, and jump to the corresponding system call kernel code execution position, so that the kernel you complete access device, and then return ring3 from ring0 after completion. This process is also referred to as user mode and kernel mode switching of.
So, here virtualization encountered a problem because the host operating system is working in ring0, the guest operating system can not also ring0, but it does not know this, before any instruction execution, execution or what instructions now, it is certainly not ah, no authority, ah, could not handle ah. So this time hypervisor (VMM) is necessary to avoid this happening. (VMM on ring0, usually embodied in the form of drivers, drivers are working on ring0, or can not drive the device) generally do so, the client operating system to perform privileged instructions, will trigger an exception (CPU mechanism, no authority instruction, trigger an exception), then VMM catch this exception, a translator in which abnormal, simulation, and finally returned to the guest operating system, the guest operating system that their privileged instructions to work, continue to run. But this performance loss is very large, you think of the original, a simple instruction executed, trouble, and now have to pass complex exception handling process.
This time paravirtualization came, paravirtualized idea is to allow customers operating system know they are running on a virtual machine, some of the privileged instructions work in the ring0 state, it was originally executed on the physical machine, it will modified in other ways, this approach is a good agreement and VMM, this is equivalent to, I modify the code for the operating system to a new architecture up is customized. So like this semi XEN virtualization technology, client operating systems are a dedicated custom kernel version and x86, mips, arm these kernel versions are equivalent. This way, there will catch the exception, translate, process simulation, the performance loss is very low. This is the advantage of this semi-XEN virtualization architecture. This is why the XEN virtualization support only Linux, virtualization windows can not reason, Microsoft does not change the code ah.
Can later, CPU manufacturers started to support virtualization, the situation has changed, take the X86 CPU, the introduction of the Intel-VT technology to support Intel-VT's CPU, there VMX root operation and VMX non-root operation modes both modes support Ring 0 ~ Ring 3 four run level. This is just great, VMM can run in VMX root operation mode, the guest OS running in VMX non-root operation mode. Also he said that the hardware layer made some distinction, this fully virtualized, some rely on "catch exceptions - translation - analog" implementation is not required. And the CPU manufacturers to support increasing the intensity of virtualization, full virtualization technology on the performance of hardware-assisted virtualization is gradually approaching a half, plus a fully virtualized guest operating systems do not need to modify this advantage, full virtualization technology it should be the future trend of development.
XEN is the most typical paravirtualization, but now XEN also supports hardware-assisted full virtualization, trend, fail to beat ah. . . KVM, VMARE these have always been full virtualization.

1.3 virtualization technology classification

The current virtualization technology is mainly divided into three types:
1. platform virtualization
platform virtualization and virtualization is directed to a computer operating system, which is all most common form of virtualization technology, Hyper-V, Xen, VMware, and other products such applications are virtualization technology.
2. resource virtualization
resource virtualization means that virtualization for a particular computer system resources such as memory, network resources, and so on.
3. Application virtualization
one of the most typical application is the application virtualization JAVA, generated programs run at the specified VM inside.

1.3.1 platform virtualization

image.png | center | 400x265.51724137931035

It refers to a full virtualization completely simulated virtual machine underlying hardware of the computer, including a processor, physical memory, clock, various types of peripherals like. This happen, you do not need the original hardware and operating system changes.
Access to the computer's physical hardware virtual machine software can be seen as a specific software to access the interface. This interface is (provided by the Hypervisor technology) provided by the VMM to VMM both the underlying hardware to provide complete simulation of a computer environment, and at this time the operating system (running on a non-virtual machine operating system) running on the computer (host) will be downgraded run (Ring0 change to Ring1).
Simply put, a full virtualization VMM must be running at the highest privilege level to full control of the host system, while the Guest OS (guest OS) downgraded to run, not privilege level operation, Guest OS operating handed over to the original privilege level on behalf of VMM carry out.
Full virtualization can not be achieved in the early x86 platform. Until the year 2006, AMD and Intel were added to the AMD-V and Intel VT-x extensions. Intel VT-x using the means for the protection ring to properly control the privileged virtual machine kernel mode. However, before this many x86 platform VMM on the already very close to achieving full virtualization, and even claims to support full virtualization. For example Adeos, Mac-on-Linux, Parallels Desktop for Mac, Parallels Workstation, VMware Workstation, VMware Server, VirtualBox, Win4BSD and Win4Lin Pro.

1.3.2 Paravirtualization

image.png | center | 400x253.48837209302326

Paravirtualization known as paravirtualization, which is by modifying the Guest OS, privileged status code to interact with the VMM. Such virtualization software performance virtualization technology are very good. Paravirtualization by modifying the operating system kernel, can not replace the virtualization program to communicate with underlying virtualization layer directly through the super calls. The core operation performed by the virtualization layer.
Paravirtualization is typical of VMware Tools, the VMware Tools service program provides a back door service for the virtualization layer can be a lot of privilege level operated by the service. Use paravirtualization technology software: Denali, Xen and so on. (Xen can use full virtualization and paravirtualization two states)

1.3.3  hardware-assisted virtualization

Hardware assisted virtualization (hardware-assisted virtualization) refers to the special instructions provided by the processor to implement efficient full virtualization, for example, Intel-VT technology and AMD-V technology.
With Intel-VT technology and AMD-V technology, and VMM Guest OS is completely isolated, at the same time, CPU to CPU virtualization technology has added a new model Root, thus achieving isolation Guest OS and the VMM.
In the hardware-assisted virtualization, the hardware provides structural support to help create a virtual machine monitor and allows guest operating systems run independently. Hardware-assisted virtualization started running in 1972, it runs on the IBM System / 370, using the first virtual machine operating system VM / 370. In 2005 and 2006, Intel and AMD virtualization provides additional hardware support. Support for hardware-assisted virtualization have Linux KVM, VMware Workstation, VMware Fusion , Microsoft Virtual PC, Xen, Parallels Desktop for Mac, VirtualBox and Parallels Workstation.

1.3.4 operating system virtualization

image.png | center | 645x240

Operating system virtualization (Operating system-level virtualization) more applications on VPS, the traditional operating system, all user processes are essentially running on the same operating system instance, and therefore, the operating system kernel defects, it is bound to affect other running processes.
Operating system virtualization is a lightweight virtualization technology for use in the server operating system, it is very simple, and very powerful.
Such technology is to isolate the core by creating multiple virtual operating system instances (N multi-core and libraries) process. Different instances of a program running in other instances can not know what the process is running, it can not communicate.
In Unix-like operating systems, this technology originated in the standard chroot mechanism, further evolved. In addition to software independent mechanism outside of the kernel generally also provides resource management features that make the software a single vessel in operation, for interaction with other software containers due to be minimized.
The most common application of this technique is that OpenVZ, but OpenVZ of overbooking problems has been criticized by many grassroots owners.

1.3.5 Comparison of various types of virtualization technology

 
The use of binary translation of full virtualization
Hardware-assisted virtualization
OS assist / paravirtualization
Implementation technology
BT and direct execution
Go to root encounter privileged instruction execution mode
Hypercall
Modify the guest operating system / compatibility
No need to modify the guest operating system, the best compatibility
No need to modify the guest operating system, the best compatibility
Guest operating systems need to be modified to support hypercall, so it can not run on physical hardware itself or other hypervisor, poor compatibility, does not support Windows
performance
difference
Full virtualization, the CPU needs to switch between the two modes, a performance overhead; however, their performance gradually approaching paravirtualized.
it is good. Paravirtualization the CPU performance overhead is almost zero, the performance of virtual machines close to the physical machine.
Application vendors
VMware Workstation/QEMU/Virtual PC
VMware ESXi/Microsoft Hyper-V/Xen 3.0/KVM
Xen

1.3.6 cgroups

cgroups, whose name is derived from the control group (control groups) is short, is a function of the Linux kernel for limiting, controlling a separation process resource group (e.g., CPU, memory, disk input and output, etc.).
The project was first initiated by Google engineer (mainly Paul Menage and Rohit Seth) in 2006, the earliest name for the process container (process containers). In 2007, as in the Linux kernel, container (container) this term has many different meanings, in order to avoid confusion, is renamed cgroup, and is incorporated into the version of the kernel to 2.6.24. Since then, they add a lot of features.
cgroups a design goal is to provide a unified interface to different applications, from a single control process (as nice) to the operating system level virtualization (like OpenVZ, Linux-VServer, LXC) . cgroups offer:

  • Resource constraints: the group can be set not to exceed the set limit of memory; this also includes virtual memory.
  • Priority: Some groups may get a lot of CPU [5] or disk IO throughput.
  • Settlement: the system is indeed used to measure how much resources are used for the purpose.
  • Control: Freeze group or a checkpoint and restart.

image.png | center | 800x500

1.4 KVM CPU virtualization

KVM is a full virtualization solution based on CPU-assisted, it needs to support virtualization feature of CPU.

1.4.1 CPU physical characteristics

Use numactl command to view the physical situation on the host CPU:

[[email protected] /root]
# numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17
node 0 size: 12276 MB
node 0 free: 7060 MB
node distances:
node   0 
  0:  10  21 

To support KVM, Intel CPU AMD svm or vmx extension of the CPU must be in force:

[[email protected] /root]
# egrep "(vmx|svm)" /proc/cpuinfo
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm arat epb dts tpr_shadow vnmi flexpriority ept vpid

1.4.2 Multi-CPU server architectures: SMP, NMP, NUMA

From the system architecture point of view, the current commercial server can be roughly divided into three categories:

  • Multi-processor architecture (SMP: Symmetric Multi-Processor): all the CPU resources all shared, such as bus, memory and I / O system, a copy of the operating system or a database management only, such a system has a maximum characteristic It is the sharing of all resources. There is no difference between multiple CPU and equal access memory, peripherals, an operating system. The main problem SMP server, that is its scalability is very limited. Experiments show that the best case SMP server CPU utilization is the CPU 2-4.
  • Massively parallel processing architecture (MPP: Massive Parallel Processing): Basic characteristics NUMA server having a plurality of CPU modules, each CPU module consists of a plurality of CPU (for example, 4), and having an independent local memory, I / O slots mouth and so on. In one physical server can support hundreds of CPU. NUMA technology but also has some defects, due to the remote memory access latency is far more than local memory, so when the number of CPU, system performance can not increase linearly. MPP mode is a distributed memory mode, it can be a more processors into a memory system. A pattern memory having a plurality of distributed nodes, each node has its own memory, may be configured as a SMP mode, you may be arranged a non-SMP mode. Individual nodes connected to each other to form a total system. MPP understood to be approximate scale of an SMP cluster, MPP generally rely on software.
  • Non-uniform memory access architecture (NUMA: Non-Uniform Memory Access): It consists of a plurality of SMP servers connected via a certain network nodes, work together to accomplish the same task, a server system from the user point of view. Which is essentially characterized by a plurality of SMP servers (known per SMP node server) connected to each other via node network, each node accesses only its own local resources (memory, storage, etc.), is a completely non-shared (Share Nothing ) structure.
[[email protected] /root]
#uname -a
Linux clsn.io 2.6.32-431.23.3.el6.x86_64 #1 SMP Thu Jul 31 17:20:51 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Note: This machine is SMP architecture

The creation of 1.5 KVM virtual machine

1.5.1 KVM start Environment Overview

CPU support virtualization are adding new features. With Intel VT technology, for example, it adds two modes of operation: VMX root mode and the VMX nonroot mode. Generally, the host operating system and VMM runs in VMX root mode, the guest operating system and applications running in VMX nonroot mode. Because the two models support all the ring, therefore, the client can run it needs in the ring (OS run in ring 0, the application running in ring 3 in), VMM also run in ring in their needs (for it KVM, QEMU running in ring 3, KVM run in ring 0).
CPU switching between the two modes is called VMX handover. Enters from the root mode nonroot mode, referred to as a VM entry; from entering the root mode nonroot mode, referred to as a VM exit. Be seen, the CPU controlled by switching between the two modes, in turn execute VMM code and Guest OS code.
, The VMM runs in the VMX Root Mode for performing KVM VM Guest OS needs to be performed when the instruction to the CPU instructions to convert VMLAUNCH VMX non-root mode, the client starts executing the code, i.e. VM entry process; Guest OS requires the exit mode, CPU automatically switches to the VMX Root mode, i.e. VM exit process. Visible, KVM client code is directly controlled by the VMM runs on the physical CPU. QEMU only is the code executed by the CPU controlling the virtual machine KVM, but not themselves execute its code. In other words, CPU does not really been into virtual level of virtual CPU to clients.

image.png | center | 700x741.125

As seen above:

  1. By qemu-kvm / dev / kvm series ICOTL control command VM
  2. A KVM virtual machine that is a Linux qemu-kvm process, like any other Linux process is scheduled Linux process scheduler.
  3. KVM virtual machine including a virtual memory, the virtual machine virtual CPU and I / O devices, wherein the virtual memory and CPU responsible KVM kernel module implemented, virtual I / O device responsible for implementing the QEMU.
  4. Memory KVM client system is part of the address space of qemu-kvm process.
  5. vCPU KVM virtual machines as a thread running in the context of qemu-kvm process.



How 1.6 VM code is run

A normal Linux kernel has two execution modes: kernel mode (Kenerl) and user mode (User).
To support the CPU with virtualization function, i.e., the KVM added a third mode client mode (the Guest) to the Linux kernel, the mode corresponds to VMX non-root mode CPU's.

1.6.1 KVM kernel mode

KVM kernel module as a bridge between User mode and Guest mode:

  • QEMU-KVM User mode in the command will be run through a virtual machine ICOTL
  • After receiving the request KVM kernel module, which perform some preliminary work, such as the context is loaded into the VCPU VMCS (virtual machine control structure) and the like, and then the CPU proceeds to drive VMX non-root mode, the client code starts executing
    three modes division of labor:
  • Guest mode: performing a non-client system I / O code, and when needed the drive CPU exits this mode
  • Back Kenerl mode is responsible for switching the CPU to execute Guest OS Guest mode code and exit Guest mode in CPU: Kernel Mode
  • User Mode: behalf of the client system to perform I / O operations

image.png | center | 680x512.5073746312685

1.6.2 Processing KVM

QEMU-KVM changes compared to the native QEMU:

  • QEMU native instruction translation by fully virtualizes the CPU, but the modified QEMU-KVM calls ICOTL command to invoke the KVM module.
  • Native QEMU is single-threaded achieve, QEMU-KVM is a multi-threaded implementation.
    Linux will host a QEMU virtual regarded as a process that includes several thread below:
  • I / O thread for managing the simulation device for running threads Guest codes vCPU
  • Other threads, such as processing thread event loop, offloaded tasks, etc.

image.png | center | 680x507.3475177304964

1.7 host CPU structure and model

KVM supports multi-CPU SMP and NUMA architecture of host and client. SMP type of client, use the "-smp" parameter:

kvm -smp <n>[,cores=<ncores>][,threads=<nthreads>][,sockets=<nsocks>][,maxcpus=<maxcpus>]

NUMA type of client, use the "-numa" parameter:

kvm -numa <nodes>[,mem=<size>][,cpus=<cpu[-cpu>]][,nodeid=<node>]

CPU model (models) which defines the functions of the host CPU's (features) will be exposed to the client operating system. In order to make safe migration between hosts with different CPU functions, qemu-kvm tend not to all the features of the host CPU will be exposed to the client. The principle is as follows:
You can run the qemu-kvm  -cpu ?command to obtain a list of models supported by the host CPU.

[[email protected] /root]
#kvm -cpu ?
x86       Opteron_G5  AMD Opteron 63xx class CPU                      
x86       Opteron_G4  AMD Opteron 62xx class CPU                      
x86       Opteron_G3  AMD Opteron 23xx (Gen 3 Class Opteron)          
x86       Opteron_G2  AMD Opteron 22xx (Gen 2 Class Opteron)          
x86       Opteron_G1  AMD Opteron 240 (Gen 1 Class Opteron)           
x86          Haswell  Intel Core Processor (Haswell)                  
x86      SandyBridge  Intel Xeon E312xx (Sandy Bridge)                
x86         Westmere  Westmere E56xx/L56xx/X56xx (Nehalem-C)          
x86          Nehalem  Intel Core i7 9xx (Nehalem Class Core i7)       
x86           Penryn  Intel Core 2 Duo P9xxx (Penryn Class Core 2)    
x86           Conroe  Intel Celeron_4x0 (Conroe/Merom Class Core 2)   
x86      cpu64-rhel5  QEMU Virtual CPU version (cpu64-rhel5)          
x86      cpu64-rhel6  QEMU Virtual CPU version (cpu64-rhel6)          
x86             n270  Intel(R) Atom(TM) CPU N270   @ 1.60GHz          
x86           athlon  QEMU Virtual CPU version 0.12.1                 
x86         pentium3                                                  
x86         pentium2                                                  
x86          pentium                                                  
x86              486                                                  
x86          coreduo  Genuine Intel(R) CPU           T2600  @ 2.16GHz 
x86           qemu32  QEMU Virtual CPU version 0.12.1                 
x86            kvm64  Common KVM processor                            
x86         core2duo  Intel(R) Core(TM)2 Duo CPU     T7700  @ 2.40GHz 
x86           phenom  AMD Phenom(tm) 9550 Quad-Core Processor         
x86           qemu64  QEMU Virtual CPU version 0.12.1                 

Recognized CPUID flags:
  f_edx: pbe ia64 tm ht ss sse2 sse fxsr mmx acpi ds clflush pn pse36 pat cmov mca pge mtrr sep apic cx8 mce pae msr tsc pse de vme fpu
  f_ecx: hypervisor rdrand f16c avx osxsave xsave aes tsc-deadline popcnt movbe x2apic sse4.2|sse4_2 sse4.1|sse4_1 dca pcid pdcm xtpr cx16 fma cid ssse3 tm2 est smx vmx ds_cpl monitor dtes64 pclmulqdq|pclmuldq pni|sse3
  extf_edx: 3dnow 3dnowext lm|i64 rdtscp pdpe1gb fxsr_opt|ffxsr fxsr mmx mmxext nx|xd pse36 pat cmov mca pge mtrr syscall apic cx8 mce pae msr tsc pse de vme fpu
  extf_ecx: perfctr_nb perfctr_core topoext tbm nodeid_msr tce fma4 lwp wdt skinit xop ibs osvw 3dnowprefetch misalignsse sse4a abm cr8legacy extapic svm cmp_legacy lahf_lm

Each Hypervisor has its own strategy to define the default on which CPU functions will be exposed to the client. As to what features will be exposed to the client system, depending on the configuration of the client. qemu32 and qemu64 are basic client CPU model, but there are other models can be used. You can use the  qemu-kvm command  -cpu parameter to specify the CPU model client, you can also specify additional CPU features. "-cpu" will be fully exposed to all the functions specified CPU model to the client, even if some features are not supported on the physical host CPU, this time QEMU / KVM will emulate these characteristics, so this might be a certain time performance degradation.
Use the default cpu64-rhe16 on 6 RedHat Linux as a client CPU model, you can specify a specific CPU model and feature:

[[email protected] /root]
#qemu-kvm -cpu Nehalem,+aes

For more details, please refer to: https://clsn.io/clsn/lx194.html

1.8 vCPU number allocation method

The more vCPU than the client, the better the performance, because the thread switch will spend a lot of time; it should need to allocate a minimum of vCPU depending on the load.
The total number of clients vCPU on a host should not exceed the total number of physical CPU cores. Not more than words, there is no CPU competition, each vCPU thread is executed on a physical CPU core; more than words, part of the thread will wait to switch between the CPU and thread on one CPU core, which will be overhead.
The computational load and the load is divided into I / O load, the calculation load, need to be allocated more vCPUs, even considering the affinity CPU, the physical CPU core specified to give the clients.

1.8.1 determining a number of steps vCPU

If we want to create a VM, the following steps can help determine the appropriate number of vCPU

(1) understand the application and set the initial value

Whether the application is a critical application, whether there is Service Level Agreement. Be sure to support applications running on the virtual machine-depth understanding of multi-threaded. Provider application supports multi-threading and SMP (Symmetricmulti-processing). The reference number of applications running on the physical CPU server required. If there is no reference information, 1vCPU may be provided as an initial value, and then closely observing the resource usage.
(2) observing the resource usage
to determine a period of time, observing the resource usage of the virtual machine. Period of time depending on the application characteristics and requirements, may be days, even weeks. Not only observing the VM CPU utilization, application and observe the occupancy rate of the CPU in the operating system.
In particular to distinguish between CPU usage and average peak usage CPU.
If there are four vCPUs assigned, if the CPU VM application on the peak value is equal to 25%, i.e. only up to 25% can be used all the CPU resources, indicating that the single-threaded application is only possible to use a vCPU (4 * 25% = 1)
the average value is less than 38%, less than 45% of the peak value in consideration of reducing the number of vCPU
average is greater than 75%, and the peak is greater than 90%, consider increasing the number of vCPU
(. 3) and change the number of observations vCPU
each of change as little as possible, if possible need 4vCPU, whether it is acceptable to set 2vCPU observation performance.

1.9 References

https://clsn.io/clsn/lx194.html
http://www.cnblogs.com/popsuper1982/p/3815398.html
http://frankdenneman.nl/2013/09/18/vcpu-configuration-performance-impact-between-virtual-sockets-and-virtual-cores/
https://www.dadclab.com/archives/2509.jiecao
https://zh.wikipedia.org/zh-hans/%E5%88%86%E7%BA%A7%E4%BF%9D%E6%8A%A4%E5%9F%9F
https://www.cnblogs.com/cyttina/archive/2013/09/24/3337594.html
https://www.cnblogs.com/sammyliu/p/4543597.html
http://www.cnblogs.com/xusongwei/archive/2012/07/30/2615592.html
https://blog.csdn.net/hshl1214/article/details/62046736

Guess you like

Origin www.cnblogs.com/liujunjun/p/12444772.html