Common APIs for KVM virtualization

         After the kvm module is loaded, the /dev/kvm character device will be generated. /dev/kvm is a standard character device. You can use the commonly used open, close, and ioctl interfaces. Use ioctl instead of read, and the write interface to interact with kvm. The KVM API can be divided into three major categories in terms of functionality:

1. The virtualization system command is for global parameter setting and control of the virtualization system.

2. VM instructions, for controlling VM virtual machines, such as: memory settings, creating VCPU, etc.

3. VCPU instruction, set parameters for specific VCPU. Such as: reading and writing of related registers, interrupt control, etc.

        Usually KVM operations start from the open /dev/kvm device file. After opening, the corresponding file descriptor (fd) will be obtained, and then the fd will be further operated through the ioctl system command, such as through the KVM_CREATE_VM command. You can create a virtual machine and return the file descriptor corresponding to the virtual machine, and then further control the behavior of the virtual machine based on the descriptor, such as creating a VCPU for the virtual machine through the KVM_CREATE_VCPU instruction.

1. System command

        System ioctl instructions are used to control the parameters of the KVM operating environment, including global parameter settings and virtual machine creation. The main instruction words include:

        KVM_CREATE_VM creates a KVM virtual machine

        KVM_GET_API_VERSION Query the current KVM API version

        KVM_GET_MSR_INDEX_LIST Get MSR index list

        KVM_CHECK_EXTENSION Check extension support

        KVM_GET_VCPU_MMAP_SIZE The size of a memory area shared between the running virtual machine and the user space

        Among them, KVM_CREATE_VM is more important, used to create a virtual machine and return a descriptor (fd) representing the virtual machine. The newly created virtual machine does not have VCPU or memory and other resources. It is necessary to further configure the descriptor returned when the virtual machine is created through the ioctl instruction.

2. VM instructions

        VM ioctl instructions implement control of virtual machines. Most of them require the fd returned from KVM_CREATE_VM to operate. Specific operations include: configuring memory, configuring VCPU, running virtual machines, etc. The main instructions are as follows:

        KVM_CREATE_VCPU creates a VCPU for the virtual machine

        KVM_RUN runs the VM virtual machine based on the kvm_run structure information.

        KVM_CREATE_IRQCHIP creates a virtual APIC, and subsequently created VCPUs are associated with this APIC

        KVM_IRQ_LINE issues an interrupt signal to a virtual APIC

        KVM_GET_IRQCHIP reads APIC interrupt flag information

        KVM_SET_IRQCHIP writes APIC interrupt flag information

        KVM_GET_DIRTY_LOG returns a bitmap of dirty memory pages

        KVM_CREATE_VCPU and KVM_RUN are two important instruction words in the VM ioctl instruction. KVM_CREATE_VCPU is used to create a VCPU for the virtual machine and after obtaining the corresponding fd descriptor, KVM_RUN can be called on it to start the virtual machine (or called scheduling VCPU ) . 

        The Kvm structure represents a specific virtual machine. When a virtual machine is created through the KVM_CREATE_VM instruction word, a new kvm structure object is created. The Kvm structure includes VCPU, memory, APIC, IRQ, MMU, Event and other related information. This structure is mainly used inside the KVM virtual machine to track the status of the virtual machine.

struct kvm {
        spinlock_t mmu_lock;
        struct mutex slots_lock;
        struct mm_struct *mm; /* userspace tied to this vm */
        struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];/*KVM虚拟机分配的内存slot,用于GPAàHVA的转换,内存虚拟化使用*/
        struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];/*kvm支持的最大vcpu个数*/

        /*
         * created_vcpus is protected by kvm->lock, and is incremented
         * at the beginning of KVM_CREATE_VCPU.  online_vcpus is only
         * incremented after storing the kvm_vcpu pointer in vcpus,
         * and is accessed atomically.
         */
        atomic_t online_vcpus;
        int created_vcpus;
        int last_boosted_vcpu;
        struct list_head vm_list;
        struct mutex lock;
        struct kvm_io_bus __rcu *buses[KVM_NR_BUSES];
#ifdef CONFIG_HAVE_KVM_EVENTFD
        struct {
                spinlock_t        lock;
                struct list_head  items;
                struct list_head  resampler_list;
                struct mutex      resampler_lock;
        } irqfds;
        struct list_head ioeventfds;
#endif
        struct kvm_vm_stat stat;/*KVM虚拟机中的运行时状态信息,比如页表、MMU等状态。*/
        struct kvm_arch arch;
        refcount_t users_count;
#ifdef CONFIG_KVM_MMIO
        struct kvm_coalesced_mmio_ring *coalesced_mmio_ring;
        spinlock_t ring_lock;
        struct list_head coalesced_zones;
#endif

        struct mutex irq_lock;
#ifdef CONFIG_HAVE_KVM_IRQCHIP
        /*
         * Update side is protected by irq_lock.
         */
        struct kvm_irq_routing_table __rcu *irq_routing;
#endif
#ifdef CONFIG_HAVE_KVM_IRQFD
        struct hlist_head irq_ack_notifier_list;
#endif

#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
        struct mmu_notifier mmu_notifier;
        unsigned long mmu_notifier_seq;
        long mmu_notifier_count;
#endif
        long tlbs_dirty;
        struct list_head devices;
        bool manual_dirty_log_protect;
        struct dentry *debugfs_dentry;
        struct kvm_stat_data **debugfs_stat_data;
        struct srcu_struct srcu;
        struct srcu_struct irq_srcu;
        pid_t userspace_pid;
        struct kvm_mig_opt mig_opt;
};
                                                                      

        The kvm_run structure is defined in include/uapi/linux/kvm.h. You can use this structure to understand the internal operating status of KVM.

/* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
struct kvm_run {
        /* in */
        __u8 request_interrupt_window;/*向VCPU注入一个中断,让VCPU做好相关准备工作*/
        __u8 immediate_exit;
        __u8 padding1[6];
        /* out */
        __u32 exit_reason;/*记录退出原因*/
        __u8 ready_for_interrupt_injection; /*响应request_interrupt_window的中断请求,当设置时,说明VCPU可以接收中断*/
        __u8 if_flag; /*中断使能标识,如果使用了APIC,则无效*/
        __u16 flags;
        /* in (pre_kvm_run), out (post_kvm_run) */
        __u64 cr8;
        __u64 apic_base;
#ifdef __KVM_S390
        /* the processor status word for s390 */
        __u64 psw_mask; /* psw upper half */
        __u64 psw_addr; /* psw lower half */
#endif
        union {
                /* KVM_EXIT_UNKNOWN */
                struct {
                        __u64 hardware_exit_reason;
                } hw;
                /* KVM_EXIT_FAIL_ENTRY */
                struct {
                        __u64 hardware_entry_failure_reason;
                } fail_entry;
                /* KVM_EXIT_EXCEPTION */
                struct {
                        __u32 exception;
                        __u32 error_code;
                } ex;
                /* KVM_EXIT_IO */
                struct {
#define KVM_EXIT_IO_IN  0
#define KVM_EXIT_IO_OUT 1
                        __u8 direction;
                        __u8 size; /* bytes */
                        __u16 port;
                        __u32 count;
                        __u64 data_offset; /* relative to kvm_run start */
                } io; /*当由于IO操作导致发生VM-Exit时,该结构体保存IO相关信息。*/
                /* KVM_EXIT_DEBUG */
                struct {
                        struct kvm_debug_exit_arch arch;
                } debug;
                /* KVM_EXIT_MMIO */
                struct {
                        __u64 phys_addr;
                        __u8  data[8];
                        __u32 len;
                        __u8  is_write;
                } mmio;
                /* KVM_EXIT_HYPERCALL */
                struct {
                        __u64 nr;
                        __u64 args[6];
                        __u64 ret;
                        __u32 longmode;
                        __u32 pad;
                } hypercall; /*hypercall exit*/
                /* KVM_EXIT_TPR_ACCESS */
                struct {
                        __u64 rip;
                        __u32 is_write;
                        __u32 pad;
                } tpr_access;
                /* KVM_EXIT_S390_SIEIC */
                struct {
                        __u8 icptcode;
                        __u16 ipa;
                        __u32 ipb;
                } s390_sieic;
                /* KVM_EXIT_S390_RESET */
#define KVM_S390_RESET_POR       1
#define KVM_S390_RESET_CLEAR     2
#define KVM_S390_RESET_SUBSYSTEM 4
#define KVM_S390_RESET_CPU_INIT  8
#define KVM_S390_RESET_IPL       16
                __u64 s390_reset_flags;
                /* KVM_EXIT_S390_UCONTROL */
                struct {
                        __u64 trans_exc_code;
                        __u32 pgm_code;
                } s390_ucontrol;
                /* KVM_EXIT_DCR (deprecated) */
                struct {
                        __u32 dcrn;
                        __u32 data;
                        __u8  is_write;
                } dcr;
                /* KVM_EXIT_INTERNAL_ERROR */
                struct {
                        __u32 suberror;
                        /* Available with KVM_CAP_INTERNAL_ERROR_DATA: */
                        __u32 ndata;
                        __u64 data[16];
                } internal;
                /* KVM_EXIT_OSI */
                struct {
                        __u64 gprs[32];
                } osi;
                /* KVM_EXIT_PAPR_HCALL */
                struct {
                        __u64 nr;
                       __u64 ret;
                        __u64 args[9];
                } papr_hcall;
                /* KVM_EXIT_S390_TSCH */
                struct {
                        __u16 subchannel_id;
                        __u16 subchannel_nr;
                        __u32 io_int_parm;
                        __u32 io_int_word;
                        __u32 ipb;
                        __u8 dequeued;
                } s390_tsch;
                /* KVM_EXIT_EPR */
                struct {
                        __u32 epr;
                } epr;
                /* KVM_EXIT_SYSTEM_EVENT */
                struct {
#define KVM_SYSTEM_EVENT_SHUTDOWN       1
#define KVM_SYSTEM_EVENT_RESET          2
#define KVM_SYSTEM_EVENT_CRASH          3
                        __u32 type;
                        __u64 flags;
                } system_event;
                /* KVM_EXIT_S390_STSI */
                struct {
                        __u64 addr;
                        __u8 ar;
                        __u8 reserved;
                        __u8 fc;
                        __u8 sel1;
                        __u16 sel2;
                } s390_stsi;
                /* KVM_EXIT_IOAPIC_EOI */
                struct {
                        __u8 vector;
                } eoi;
                /* KVM_EXIT_HYPERV */
                struct kvm_hyperv_exit hyperv;
                /* Fix the size of the union. */
                char padding[256];
        };
        /* 2048 is the size of the char array used to bound/pad the size
         * of the union that holds sync regs.
         */
        #define SYNC_REGS_SIZE_BYTES 2048
        /*
         * shared registers between kvm and userspace.
         * kvm_valid_regs specifies the register classes set by the host
         * kvm_dirty_regs specified the register classes dirtied by userspace
         * struct kvm_sync_regs is architecture specific, as well as the
         * bits for kvm_valid_regs and kvm_dirty_regs
         */
        __u64 kvm_valid_regs;
        __u64 kvm_dirty_regs;
        union {
                struct kvm_sync_regs regs;
                char padding[SYNC_REGS_SIZE_BYTES];
        } s;
}

3. VCPU instructions

        VCPU ioctl instructions are mainly configured for specific VCPUs, including register reading and writing, interrupt settings, memory settings, clock management, debugging switches, etc., and can configure the KVM virtual machine at runtime. The main command words include:

1. Register control

    KVM_GET_REGS Get general register information

    KVM_SET_REGS sets general register information

    KVM_GET_SREGS Get special register information

    KVM_SET_SREGS sets special register information

    KVM_GET_MSRS gets MSR register information

    KVM_SET_MSRS sets MSR register information

    KVM_GET_FPU gets floating point register information

    KVM_SET_FPU sets floating point register information

    KVM_GET_XSAVE gets the xsave register information of VCPU

    KVM_SET_XSAVE sets the xsave register information of VCPU

    KVM_GET_XCRS obtains the xcr register information of the VCPU

    KVM_SET_XCRS sets the xcr register information of VCPU

2 Interruption and event management aspects

    KVM_INTERRUPT generates an interrupt on the VCPU (when APIC is invalid)

    KVM_SET_SIGNAL_MASK sets the interrupt signal mask of a certain VCPU

    KVM_GET_CPU_EVENTS Gets the events that are pending in VCPU and need to be delayed, such as interrupts, NMI or exceptions

    KVM_SET_CPU_EVENTS sets VCPU events, such as interrupts, NMI or exceptions

3 Memory management

    KVM_TRANSLATE translates the physical address of VCPU into HPA

    KVM_SET_USER_MEMORY_REGION Modifies the memory region of VCPU

    KVM_SET_TSS_ADDR initializes the TSS memory area (dedicated to Intel architecture)

    KVM_SET_IDENTITY_MAP_ADDR creates EPT page table (Intel architecture specific)

4 Other aspects (such as: CPUID settings, debugging interface, etc.)

The kvm_vcpu (include/linux/kvm_host.h) structure in kvm implements vcpu-related information.

struct kvm_vcpu {
        struct kvm *kvm;/*记录虚拟机相关信息*/
#ifdef CONFIG_PREEMPT_NOTIFIERS
        struct preempt_notifier preempt_notifier; /*vcpu抢占通知*/
#endif
        int cpu;
        int vcpu_id;/*vcpu id*/
        int srcu_idx;
        int mode;
        u64 requests;
        unsigned long guest_debug;

        int pre_pcpu;
        struct list_head blocked_vcpu_list;

        struct mutex mutex;
        struct kvm_run *run; /*记录虚拟机运行状态*/

        int guest_xcr0_loaded;
        struct swait_queue_head wq;
        struct pid __rcu *pid;
        int sigset_active;
        sigset_t sigset;
        struct kvm_vcpu_stat stat;
        unsigned int halt_poll_ns;
        bool valid_wakeup;

#ifdef CONFIG_HAS_IOMEM
        int mmio_needed;
        int mmio_read_completed;
        int mmio_is_write;
        int mmio_cur_fragment;
        int mmio_nr_fragments;
        struct kvm_mmio_fragment mmio_fragments[KVM_MAX_MMIO_FRAGMENTS];
#endif

#ifdef CONFIG_KVM_ASYNC_PF
        struct {
                u32 queued;
                struct list_head queue;
                struct list_head done;
                spinlock_t lock;
        } async_pf;
#endif

#ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
        /*
         * Cpu relax intercept or pause loop exit optimization
         * in_spin_loop: set when a vcpu does a pause loop exit
         *  or cpu relax intercepted.
         * dy_eligible: indicates whether vcpu is eligible for directed yield.
         */
        struct {
                bool in_spin_loop;
                bool dy_eligible;
        } spin_loop;
#endif
        bool preempted;
        bool ready;
        struct kvm_vcpu_arch arch;
        struct dentry *debugfs_dentry;
                                                       

Guess you like

Origin blog.csdn.net/qq_28693567/article/details/130124669