Interrupt Mechanism in Virtualization: Exploration of X86 and PIC 8259A (Part 1)

This series explores the interrupt virtualization technology in depth, from the foundation of X86 architecture and PIC 8259A, to the programming of IOAPIC and MSI, to the actual application of MSIX technology and Broiler equipment, and comprehensively analyzes the cutting-edge progress of interrupt virtualization.

X86 Interrupt Mechanism

In the computer architecture, the speed of the CPU is far greater than the speed of the peripherals. In the early programming, if the computer wants to obtain the IO completion status of the external devices, the computer has to query the completion status of the peripherals through polling, so often Do a lot of useless peripherals, resulting in low computer performance. In order to solve this problem, the interrupt mechanism is introduced. The interrupt  is a mechanism to notify the CPU after the external device completes some work. The interrupt greatly liberates the CPU, and the IO operation efficiency is greatly increased. In the X86 architecture, the interrupt is an electrical signal , generated by the hardware peripherals, and sent directly to the input pin of the interrupt controller, and then the corresponding signal is sent by the processor under the interrupt controller. Once the processor detects this signal, it will interrupt the current processing work and handle the interrupt instead.

The CPU of the X86 architecture provides two external pins for interrupts:  NMI  and  INTR . Among them, NMI is a non-maskable interrupt, usually used for power failure and physical memory parity; INTR is a maskable interrupt, which can be set by setting the interrupt mask bit For interrupt masking, it is mainly used to accept interrupt signals from external hardware, and these signals are passed to the CPU by the interrupt controller. There are two common interrupt controllers: 8259A PIC and APIC.

PIC 8259A

In the X86 architecture, the traditional PIC (Programmable Interrupt Controller) programmable interrupt controller is connected by two 8259A external chips in a cascaded manner, each chip can handle up to 8 different IRQs, and the INT output of the Slave PIC The line is connected to the IRQ2 pin of the Master PIC, so the number of available IRQ lines reaches 15. The interrupt pins have priority, among which IR0 has the highest priority and IR7 has the lowest priority. There are three important registers inside the PIC:

IRR (Interrupt Request Register) interrupt request register, a total of 8 bits corresponding to IR0 ~ IR7 8 interrupt pins. A certain bit is set to indicate that the corresponding pin has received an interrupt but has not yet submitted it to the CPU. ISR (In Service Register) is  in service Register, a total of 8 bits, a certain position means that the interrupt of the corresponding pin has been submitted to the CPU for processing, but the CPU has not yet processed it.  IMR (Interrupt Mask Register) interrupt mask register, an 8 bit, a certain position represents the corresponding pin Masked. In addition, PIC also has an EOI bit. When the CPU finishes processing an interrupt, write this bit to inform the PIC that the interrupt processing is complete. The process of PIC submitting an interrupt to the CPU is as follows:

1. One or more IR pins generate a level signal, if the IRR bit corresponding to the interrupt is not set.

2. PIC pulls the INT pin high to notify the CPU that an interrupt occurs

3. The CPU responds to the PIC through the INTA pin, indicating that the interrupt request is received

4. After the PIC receives the INTA response, it clears the high priority bit in the IRR and sets the corresponding bit in the ISR

5. The CPU sends a second pulse through the INTA pin, and the PIC sends the highest priority Vector to the data line.

6. Wait for CPU to write EOI.

7. After receiving EIO, the highest priority bit in ISR is cleared.


APIC: Local APIC and I/O APIC

PIC can work on UP (uniprocessor) platform, but it cannot be used on MP (multiprocessing) platform, so  APIC (Advanced Programmable Interrupt Controller) came into being. The APIC is composed of the local advanced programmable interrupt controller LAPIC (Local Advanced Programmable Interrupt Controller) located in the CPU  and the I/O Advanced Programmable Interrupt Controller I/O APIC (I/O Advanced Programmable Interrupt Controller) located in the mainboard south bridge.  The parts are composed, and their relationship is shown in the figure above.

Each Logical Processor logical processor has its own Local APIC, and each local APIC contains a set of Local APIC registers, which are used to control the generation, transmission and reception of kicak and external interrupts, and are also used to generate and transmit IPI. The Local APIC register group is mapped to the storage domain space of the system in the form of MMIO, so it can be accessed like operating physical memory. The initial physical address of the Local APIC register in the storage domain is 0xFEE0000; the Local APIC register in x2APIC mode is mapped to the MSR register bank instead, so the Local APIC register can be accessed using the RDMSR and WRMSR instructions.

Local APIC is used by a set of LVT (Local vector table) registers to generate and interface Local interrupt source. The LINT0 and LINT1 registers of LVT correspond to the processor LINT0 and LINT1 Pins, which can directly accept external I/O devices or connect to 8259A compatible A class of external interrupt controller. Typically, LINT0 is used as the INTR Pin of the processor and connected to the INTR output of the external 8259 interrupt controller, and LINT1 is used as the NMI Pin of the processor to connect to the NMI request of the external device.

IO APIC usually has 24 pins without priority for connecting external devices. After receiving an interrupt signal from a certain pin, IO APIC searches for the corresponding pin according to the PRT (Programmable Redirection Table) table set by the software. RTE (Redirection Table Entry). Through each field of the PTE, an interrupt message containing all the information of the interrupt is formatted, and then sent by the system bus to the Local APIC of a specific CPU. After receiving the information, the Local APIC chooses an opportunity to submit the interrupt For CPU processing. IO APIC also has its own registers, which are also mapped to the storage domain space through MMIO. In the APIC system, the general process of interrupt initiation is as follows:

1. IO APIC receives an interrupt signal generated by a certain pin

2. Look up the PRT table to get the RTE corresponding to the pin

3. Format an interrupt message according to the RTE fields, and determine which CPU's LAPIC to send to

4. Send interrupt information through the system bus

5. Local APIC receives the interrupt message and judges whether it is accepted by itself

6. Local APIC confirms acceptance, sets the corresponding bit in IRR, and confirms whether it is processed by CPU

7. Confirm that the interrupt is handled by the CPU, obtain the highest priority interrupt from the IRR, set the corresponding bit in the ISR, and submit the interrupt. For edge-triggered interrupts, the corresponding bit in IRR is cleared at this time.

8. After the CPU finishes processing the interrupt, the software writes the EOI register to notify the completion of the interrupt processing. For level-triggered interrupts, the corresponding bit in the IRR is cleared. Local APIC submits the next interrupt.

On the MP (multi-processor) platform, multiple CPUs need to work together, and the inter-processor interrupt (Inter-processor Interrupt,  IPI ) provides a means of communication between the CPUs. The CPU can send an interrupt to one or more specified CPUs through the ICR (Interrupt Command Register) of the Local APIC. The OS usually uses IPI to complete tasks such as process transfer, interrupt balancing, and TLB flushing.

Interrupt key concepts

Interrupts can be divided into synchronous interrupts (Synchronous Interrupt) and asynchronous interrupts (Asynchronous Interrupt). The synchronous interrupt is generated by the CPU control unit when the instruction is executed. It is called synchronous because the CPU will issue an interrupt only after an instruction is executed. After the interrupt occurs, the CPU will process it immediately, rather than during the execution of the code instruction. For example, system calls; asynchronous interrupts are randomly generated by other hardware devices according to the CPU clock signal, which means that interrupts can occur between instructions, and cannot be executed by the CPU immediately after the interrupt is generated.

In the Intel X86 architecture, synchronous interrupts are called exceptions , and asynchronous interrupts are called interrupts . Interrupts can be divided into Maskable Interrupts and Nonmaskable Interrupts . Exceptions can be divided into:  faults (Fault), trap (Trap) and termination (abort). In the X86 architecture, each interrupt is assigned a unique number or vector Vector (8-bit unsigned integer).

In the protection mode of X86 architecture, the system uses the Interrupt Descriptor Table (Interrupt Descriptor Table, IDT) to represent the interrupt vector table. There are 256 descriptors in total. The index of IDT is called the interrupt vector Vector. The IDT table is actually a large array, and the IDTR register indicates The location and length of the IDT in the physical memory are used to store various gates (interrupt gates, trap gates, and task gates), which are the entrances for interrupts and exceptions to their respective processing functions.


PIN/IRQ/GSI/VECTOR

The concepts of PIN, IRQ, GSI and Vector are easy to confuse. IRQ  is a product of the PIC era. Since ISA devices are usually connected to fixed PIC pins, the IRQ of a device actually refers to the PIC pin number it is connected to. . IRQ implies interrupt priority, eg IRQ0 has higher priority than IRQ1. When entering the era of APIC, in order to be forward compatible, it is customary to use IRQ to represent the interrupt number of a device, but for IRQ below 16, it may no longer correspond to the pin of IOAPIC. Pin is the pin number, which means the pin of  IOAPIC  , The PIC era is similar to IRQ. The maximum value of Pin is limited by the number of IOAPIC pins, and the current value range is [0, 23].  GSI (Global System Interrupt) is a concept introduced in the APIC era, which specifies a unique interrupt number for each interrupt source in the system .

There are 3 I/O APICs in the figure above, and IO-APIC0 has 24 pins, of which GSI Base is 0, and the GSI of each Pin  =GSI_Base + Pin  , so the GSI range of IO-APIC0 is [0: 23]. IO-APIC1 has 16 pins, GSI base is 24, GSI range is [24, 39], and so on. APIC requires that the 16 IRQs of the ISA should be Identify mapped to GSI [0, 15].  Vector  is the index in the IDT table in the interrupt, and it is a CPU concept. Each IRQ (or GSI) corresponds to a Vector. In PIC mode, the vector corresponding to IRQ  = Start_Vector + IRQ ; in APIC mode, the vector of IRQ/GSI is allocated by the operating system.

The operating system's processing flow for interrupts/exceptions

Although various operating systems have different implementations of interrupt/exception handling, the basic flow follows the following sequence: An interrupt or exception occurs, interrupting the currently executing task

1. The CPU obtains the corresponding gate through the vector index IDT table, and obtains the entry address of its processing function

2. Save the context of the interrupted task and jump to the interrupt handler for execution

3. If it is an interrupt, you need to write the EOI register to answer after the processing is completed, and the exception does not need to

4. Restore the context of the interrupted task and prepare to return

5. Return from the interrupt/exception processing function, resume the interrupted task, and make it continue to execute

  Information through train: Linux kernel source code technology learning route + video tutorial kernel source code

Learning through train: Linux kernel source code memory tuning file system process management device driver/network protocol stack

X86 Interrupt Virtualization

In a virtualization scenario, the VMM also needs to present a virtual interrupt architecture similar to the physical interrupt architecture for the Guest OS. The above figure shows the interrupt architecture of the virtual machine. Like the physical platform, each VCPU corresponds to a virtual Local APIC for receiving interrupts. The virtual platform also includes a virtual I/O APIC or a virtual PIC for sending interrupts. Like the VCPU, the virtual Local APIC, virtual I/O APIC and virtual PIC are all maintained by the VMM. When the virtual device needs to send an interrupt, the virtual device will call the interface of the virtual I/O APIC to send the interrupt, and the virtual I/O APIC is based on Interrupt request, select the corresponding virtual Local APIC, call its interface to send an interrupt request, and the virtual Local APIC further uses the event injection mechanism of VT-x to inject the interrupt into the corresponding VCPU. It can be seen that the main task of interrupt virtualization is to realize the virtual PIC , virtual I/O APIC and virtual Local APIC, and realize the process of generating, collecting and injecting virtual interrupts.

PIC 8259A Virtualization
IOAPIC Virtualization

Not only Line-Based PCI Interrupt Routing is supported on PCI/PCIe devices  , but also the more modern  PCI Message-Signalled Interrupt is supported , allowing devices to support more interrupts than IOAPIC/PIC, and MSI/MSIX to better serve PCI Function, enabling Interrupts are routed directly to the specified LAPIC.

MSI Interrupt Virtualization
MSIX Interrupt Virtualization

When Broiler triggers the PIC/IOAPIC interrupt, you need to let the virtual machine VM-EXIT and then VM-ENTRY, write the interrupt that needs to be injected into the VM_ENTRY_INTR_INFO_FIELD field of VMCS, and check whether there is an interrupt in the field that needs to be injected when VM-ENTRY , if there is a VM-ENTRY to trigger the corresponding interrupt, when the Guest OS finishes processing the interrupt, it needs to write EOI, which will also cause VM-EXIT. If the virtual machine is in sleep after the Broiler requests to inject the interrupt, then KVM will simulate sending IPI interrupts, and let the virtual machine send VM-EXIT. With the continuous improvement of random hardware functions, developers are considering whether they can use hardware to perform interrupt injection without VM-EXIT, and APICv's mapping is very good. Solve this problem, and use the Posted Interrupt method for interrupt injection, so that the virtual machine can complete the interrupt injection and EOI without the request of VM-EXIT.

VM_ENTRY_INTR_INFO_FIELD Interrupt Injection(TODO)
IPI Virtualization(TODO)
APIv Virtualization(TODO)
Posted Interrupt(TODO)

PIC 8259A Virtualization

Virtual PIC is essentially a virtual device, so it can be emulated on the user space side (QEMU/Broiler), or it can be emulated on the kernel side KVM. According to the PIC hardware specification, the same interface between the virtual PIC and the physical PIC is simulated on the software. Broiler puts the device simulation of PIC into KVM, so vPIC is an In-Kernel device, and vPIC virtualizes the basic processing simulation of 8259A for interrupts, and also includes Broiler submitting an interrupt to vPIC, and vPIC selects the appropriate one according to the priority of the interrupt. Interrupt injection, the injection process will prompt the VM to send VM-EXIT, and write the interrupt that needs to be injected into the domain specified by the VMCS of the virtual machine. When the virtual machine detects that there is an interrupt injection when VM-ENTRY again, the virtual machine will run triggers an interrupt. This section is used to comprehensively introduce the vPIC interrupt virtual process, and is divided into the following chapters to explain:

  • 8259A Interrupt Controller Programming
  • vPIC creation
  • Broiler vPIC Interrupt Configuration
  • Broiler devices use vPIC interrupts
  • vPIC Interrupt Injection

8259A Interrupt Controller Programming

In the x86 architecture, vPIC implements interrupt simulation by simulating the logic of the 8259A interrupt controller, and focuses on simulating the internal register processing logic of the 8259A, while the programming of the 8259A interrupt controller focuses on describing the use of the 8259A from the operating system, so this section is used to introduce 8259A programming logic, which is helpful for vPIC simulation. According to the PIC hardware specification, PIC mainly provides the following interfaces for software to operate PIC:

  • 4 initialization command words (Initialization Command Words): ICW1/ICW2/ICW3/ICW4
  • 3 Operation Command Words: OCW1/OCW2/OCW3

The ICW1 register is used to initialize the connection mode and interrupt trigger mode of 8259A, among which BIT0 is used to indicate whether ICW4 is enabled, and this BIT must be set in the X86 architecture. Bit means single chip, clearing means cascading 2 blocks of 8259A. BIT3 is used to indicate the interrupt trigger mode, setting means level, clearing means edge triggering. BIT4 must be set in the x86 architecture. ICW1 needs to write to the 0x20 port of the master 8259A and the 0xA0 port of the slave 8259A.

The ICW2 register is used to set the initial interrupt vector, where BIT0-BIT0 is used to indicate the interrupt number. ICW2 needs to write to the 0x21 port of the master 8259A and the 0xA1 port of the slave 8259A.

The ICW3 register is used to specify the cascading pin of the master and slave 8259A. In the x86 architecture, the slave 8259A is cascaded to the IRQ2 pin of the master 8259A, so the value of the ICW3 register of the master 8259A is 0x04, and the value of the ICW3 register of the slave 8259A is 0x02 . ICW3 needs to write to the 0x21 port of the master 8259A and the 0xa1 port of the slave 8259A.

The ICW4 register is used to initialize the 8259A data connection mode and interrupt trigger mode. In the x86 architecture, BIT0 uPM must be set; BIT1 is AEOA, if it is set, the interrupt will automatically end AUTO EOI, if it is cleared, the 8259A needs to receive EOI to complete the interrupt processing; BIT2-BIT3 is used to indicate the cache mode, and BIT3 is cleared , then non-cache mode, BIT3 is set to cache mode; BIT2 is used to indicate the cache mode of the master-slave 8259A, set to indicate the master 8259A, cleared to indicate the slave 8259A. BIT4 is used to set the nested mode, if set to indicate Special fully nested mode, cleared to fully nested mode. The ICW4 register needs to be written to the 0x21 port of the master 8259A and the 0xA1 port of the slave 8259A.

The picture above is a simple code to demonstrate the initialization of master and slave 8259A.

The OCW1 register is used to shield the interrupt source connected to the 8259A. When a certain bit is set, the corresponding interrupt source is masked. When a certain bit is cleared, the corresponding interrupt source is not masked. The OCW1 register needs to be written to the 0x20 port of the master 8259A and 0xA0 of the slave 8259A port.

The OCW2 register is used to set the interrupt termination method and priority mode. BIT0-BIT2 are used to indicate interrupt priority. BIT3-BIT4 must be 0 in the X86 architecture. BIT5 EOI is meaningful when the interrupt is ended manually. When this bit is set, it means that the corresponding bit of the ISR is cleared when the interrupt is over. It is meaningless to change the bit to zero. BIT6 SL bit, this bit is set to use BIT0-BIT2 Specify the interrupt that needs to be ended. Clearing this bit will end the interrupt being processed and clear the highest priority bit of the ISR. BIT7 R bit, this bit is set to use the circular priority method, and the priority of each IR interface is 0 -7 cycle, when this bit is cleared, the fixed priority will be adopted, and the lower the IR interface number, the higher the priority. The OCW2 register needs to be written to the 0x20 port of the master 8259A and the 0xA0 port of the slave 8259A.

OCW3 register is used to set special masking mode and query mode. BIT0 RIS flag bit, the bit is set to read the value of the ISR register, the bit is cleared to read the IRR register. BIT1 RR flag bit, the bit is set to read the register, and the bit is cleared to zero. BIT2 P flag bit, when this bit is set, it means that the interrupt query command queries the current highest interrupt priority, and it is meaningless if this bit is cleared; BIT3-BIT4 is fixed at 0x01 in the X86 architecture; Mask mode, if this bit is cleared, it will not work in special mask mode; BIT6 ESMM flag bit, if this bit is set, it means that special mask simulation is enabled, and if this bit is cleared, it means that special mask mode is turned off. OCW3 needs to be written to the main 8259A 0x20 port and From port 0xA0 of 8259A.

vPIC creation

The creation of vPIC is mainly divided into two parts. The first part is to virtualize a vPIC device in the kernel space, and the second part is to simulate the default pin. Broiler calls the kvm_init() function in the initialization phase and uses iotcl to pass the pin to KVM. The input command KVM_CREATE_IRQCHIP is used to create a vPIC. The kvm_arch_vm_ioctl() function of KVM receives the command passed by Broiler, and then finds the corresponding processing branch KVM_CREATE_IRQCHIP. There are two main function branches in the branch, among which the kvm_pic_init() function is used to A virtual vPIC device is used to simulate the IO port of the vPIC; another function is the kvm_setup_default_irq_routing() function, which is used to simulate the default pin of the vPIC.

KVM uses the struct kvm_pic data structure to describe a vPIC device. The pics[] member indicates that there are two 8259A chips in the x86 architecture, where pics[0] is the master PIC, and pics[1] is the slave PIC. pics[] uses struct kvm_kpic_state data The structure is described. The members of this data structure are used to simulate the registers inside the PIC device. For example, irr corresponds to the Interrupt Request Register of PIC, imr corresponds to the Interrupt Mask Register of PIC, and isr corresponds to the In Service Register of PIC, etc. The output member represents the vPIC chip Whether there is a new interrupt that needs to be injected, dev_master/dev_slave/dev_eclr ​​are the three devices simulated by KVM, and the first two simulate the master PIC device and slave PIC device respectively. The irq_states[] array is the simulation of each pin state of vPIC.

Going back to the calling logic of KVM for vPIC device simulation, KVM calls the kvm_pic_init() function to simulate the vPIC device, which simulates the master vPIC device, slave vPIC device and eclr device by calling the kvm_iodevce_init() function, and provides these devices with The IO port read and write is simulated, the IO port read operation is finally realized by the pcidev_read() function, and the IO port write operation is finally realized by the pcidev_write() function. The function registers 0x20/0x21/0xa0/0xa1 to KVM by calling the kvm_io_bus_register_dev() function, As long as the Guest OS accesses these IO ports, VM-EXIT will occur because of EXIT_REASON_IO_INSTRUCTION.

According to the logic of the 8259A chip, the picdev_read() function realizes the ICW/OCW register read simulation. It can be seen that the read of the port 0x20/0x21/0xa0/0xa1 is based on the address to calculate the master-slave vPIC, and then through the pic_ioport_read() function Read the specified member from its corresponding struct kvm_kpic_state data interface to complete the IO port read simulation.

Similarly, according to the logic of the 8259A chip, the picdev_write() function implements the ICW/OCW register write simulation. It can be seen that the write to the port 0x20/0x21/0xa0/0xa1 is based on the address to calculate the master-slave vPIC, and then through pic_ioport_write() The function writes the specified member from its corresponding struct kvm_kpic_state data interface to complete the IO port write simulation.

The last step of vPIC creation is the simulation of the default pins. Through the previous analysis, we can know that there are two cascaded 8259A chips in the x86 architecture, one of which is the master chip and the other is the slave chip. Each 8259A contains 8 pins, among which Pin2 of the master is connected to the INT pin of the slave. The simulation of KVM pins is realized by calling the kvm_set_default_irq_routing() function, which contains the public routing code. The simulation of vPIC pins is mainly realized in the setup_routing_entry() function, which sets the vPIC corresponding entry The pin interrupt simulation function is set to the kvm_set_pic_irq() function, and the simulated pins are added to the hash list.

KVM uses the struct kvm_irq_routing_table data structure to maintain the pins of vPIC and vIOAPIC. The chip[] array stores all pin information, and the nr_rt_entries member indicates the total number of pins. In addition, the map[] hash list maintains the pin and Chip Mapping relationship, that is, whether the pin belongs to vPIC or vIOAPIC. KVM uses the struct kvm_kernel_irq_routing_entry data structure to describe a GSI. GSI can correspond to a pin of vPIC, and can also describe a pin of vIOAPIC. Its member gsi indicates the GSI number of the pin. The set callback function represents the interrupt simulation of the pin, and the irqchip member represents the CHIP information and pin information corresponding to the pin. Finally, the link member is used to uniformly maintain the simulated pins in the KVM struct kvm_irq_routing_table array structure map[] hash list superior.

The struct kvm_irq_routing_table data structure maintains the mapping relationship between vPIC and vIOAPIC pins and GSI, through which the GSI number can be derived from the pins of the device, and the map[] hash list maintains the relationship between GSI and pins. In KVM The pins of the simulated vPIC and vIOAPIC can overlap, so a GSI number may correspond to the pins of vPIC or vIOAPIC, or correspond to the pins of both vPIC and vIOAPIC, then use a map[] hash list and use GSI as the key , struct kvm_kernel_irq_routing_entry is used as a member of the hash list, and finally forms the logical structure of the above figure.

The pins of the interrupt device supported by KVM by default are shown in the figure above. From the definition of the ROUTING_ENTRY2 macro, it can be seen that the pin range of vPIC is [0, 15], and the pin range of vIOAPIC is [0, 23]. Then the shared between vPIC and vIOAPIC [0, 15] pins.

Broiler vPIC Interrupt Configuration

When KVM created vPIC, it has created a set of vPIC and vIOAPIC pin mapping logic by default. Although it can be used, as HypV software, Broiler can still customize the mapping logic of mapping vPIC pins and GSI. The figure above shows the virtual interrupt pin connection logic of Broiler. The interrupt pin logic planned by Broilr is: the 8 pins of the master vPIC exclusively occupy GSI0-GSI7, the first 4 pins of the slave vPIC exclusively occupy GSI8-GIS11, and the last 4 pins The pins share GSI12-GSI15 with the first 12-15 pins of vIOAPC, and the 16-24 pins of vIOAPC exclusively occupy GSI16-GSI24.

Broiler remaps GSI in the broiler_irq_init() function, lines 16-20 are used to map GSI0-GSI7 to IRQCHIP_MASTER, lines 33-25 are used to map GSI8-GSI15 to IRQCHIP_SLAVE, lines 27-29 are used to map GSI12-GSI24 One IRQCHIP_IOAPIC, it can be seen that vPIC and vIOAPIC share GSI12-GSI15, and finally call ioctl to pass KVM_SET_GSI_ROUTING to KVM to make the mapping take effect.

The emulation of vPIC is an In-Kernel device in the kernel. When Broiler changes the interrupt mapping, it needs to pass the ioctl() function to pass the KVM_SET_GSI_ROUTING command to update the mapping between the interrupt pins of the vPIC and the GSI. After KVM's kvm_vm_ioctl() function receives the KVM_SET_GSI_ROUTING command, it calls the kvm_set_irq_routing() function to update, and updates the core through the kvm_set_routing_entry() function, which will set the interrupt simulation function of the master-slave vPIC pins to the kvm_set_pic_irq() function.

The struct kvm_irq_routing_table data structure maintains the mapping relationship between vPIC pins and GSI, through which the GSI number can be deduced from the pins of the device, and the map[] hash list maintains the relationship between GSI and pins, simulated in KVM The pins of vPIC can be overlapped, so a GSI number may correspond to the pins of vPIC or vIOAPIC, or correspond to the pins of vPIC and vIOAPIC at the same time, then use a map[] hash list, and use GSI as the key, struct kvm_kernel_irq_routing_entry as The members of the hash linked list, after the mapping is completed, the vPIC pins of the Broiler are connected as above.

Broiler devices use vPIC interrupts

After Broiler configures the vPIC interrupt, the next step is to use the interrupt provided by vPIC in the device. The devices currently supported by Broiler include PCI devices and IO port devices. For IO port devices, Broiler provides the irq_alloc_from_irqchip() function to allocate interrupts from vPIC, and if PIC devices use INTX interrupts, Broiler provides the pci_assign_irq() function to allocate interrupts from vPIC. For the assigned vPIC interrupt, the interrupt trigger is divided into level trigger and edge trigger. The device simulated by Broiler can use the broiler_irq_line() function for level trigger, and use the broiler_irq_trigger() function for edge trigger. Next, let’s take a case where an IO port device in Broiler uses a level-triggered method to send an interrupt to the Guest OS front-end driver:

From inside the virtual machine, in the system IO space, the main PIC is mapped to port 0x20-0x21, and the slave PIC is mapped to port 0xa0-0xa1. Next, virtualize a device in Broiler, which contains an asynchronous IO, and the read operation of this IO will Trigger an interrupt (source location: foodstuff/Broiler-interrup-vPIC.c)

Broiler simulates a device that includes a section of IO ports, the range of which is [0x6020, 0x6030], which contains two registers, the first register IRQ_NUM_REG is used to obtain the IRQ used for setting, and the second register is an asynchronous IO, writing to this register will trigger an interrupt. Broiler calls the irq_alloc_from_irqchip() function on line 70 to allocate an IRQ number from the vPIC, and then calls the broiler_irq_line() function to set the IRQ level to low. Broiler implements an asynchronous IO on lines 74-103. If the Guest OS writes to the IO port, it will wake up the irq_threads thread. The function of the thread is to inject an IRQ interrupt to the Guest OS. You can see that line 45 calls the broiler_irq_line() function Pull the high level to simulate the high level, and then generate an interrupt. At this time, KVM will receive the ioctl command KVM_IRQ_LINE, and then KVM will inject an interrupt into the Guest OS. Then there is an analog device on the Broiler side, and then the internal driver of the Guest OS will handle it Interruption, BiscuitOS already supports the deployment of this driver, and its deployment logic is as follows:

# 切换到 BiscuitOS 目录
cd BiscuitOS
make linux-5.10-x86_64_defconfig
make menuconfig

  [*] Package --->
      [*] KVM --->
          [*] vInterrupt: Broiler vPIC interrupt

# 保存配置并使配置生效
make

# 进入 Broiler 目录
cd output/linux-5.10-x86_64/package/Broiler-vPIC-interrupt-default/
# 下载源码
make download
# 编译并运行源码
make
make install
make pack
# Broiler Rootfs 打包
cd output/linux-5.10-x86_64/package/BiscuitOS-Broiler-default/
make build

Broiler-vPIC-interrupt-default Gitee @link

The source code of the Guest OS driver is relatively simple. First, the IO port of the device is registered in the IO space through request_resource(), and the IO port field is [0x6020, 0x6030]. The driver then calls the ioport_map() function to map the IO port to the virtual memory, and then in Line 52 obtains the interrupt number used by the device from the 0x6024 port of the device, and then calls request_irq() on line 53 to register the interrupt handler function. At this time, the trigger mode of the interrupt is set to high-level trigger, and the interrupt handler function is Broiler_irq_handler(). Used internally to print the interrupt number after an interrupt has been received. Finally, the driver writes to port 0x6020, which will trigger the device to send an interrupt to the CPU. Next, practice this case in BiscuitOS:

After Broiler starts the BiscuitOS system, load the driver Broiler-vPIC-interrupt-default.ko, you can see that the driver is loaded successfully, wait for 5s and then the interrupt handler receives the interrupt from the device. At this time, you can see that the IRQ is 6, and then check IO space, you can see that ports 0x6020-0x6030 are allocated to "Broiler PIO vPIC". Finally, you can view the /proc/interrupts node to obtain the interrupt mapping relationship, that is, you can see the interrupt used by Broiler-PIO-vPIC in the picture at the beginning.

vPIC Interrupt Injection

The vPIC interrupt injection process flow is shown in the above figure, which is divided into two major parts. The first part is that Broiler passes the KVM_IRQ_LINE command to KVM through the ioctl() function to tell KVM to perform vPIC interrupt injection. At this time, Broiler and KVM run asynchronously. , at this time, the Guest OS is running and there is no VM_EXIT. When the vPIC of KVM decides to inject an interrupt into Guesst after passing the interrupt evaluation, it will send a KVM_REQ_EVENT request to KVM and send an IPI interrupt to make the virtual machine VM-EXIT; interrupt The second stage of injection is to make the virtual machine VM_ENTRY again after VM-EXIT. At this time, KVM will check whether the VM_ENTRY_INTR_INFO_FIELD domain has injected an interrupt. If injected, the virtual machine will trigger an interrupt after VM-ENTRY, so the second part of the task is Write the interrupt to be injected to VM_ENTRY_INTR_INFO_FIELD before VM-ENTRY, KVM writes the interrupt to be injected into the VM_ENTRY_INTR_INFO_FIELD field by calling the vmcs_write32() function in the vmx_inject_irq() function, and finally the virtual machine receives the interrupt after VM-ENTRY , KVM is running synchronously at this stage. Then analyze the process in detail next.

The pic_set_irq1() function is used to simulate the interrupt generation, and the Interrupt Request Register of vPIC will set the corresponding bit after receiving the interrupt. From the implementation of the function, it can be seen that the IRR register receives level-triggered and edge-triggered interrupts, 93-100 Line processing level-triggered interrupts, if the analog interrupt is level-triggered, then if the level is true, that is, the interrupt pin generates a high level, then the IRR register will set the bit corresponding to the pin and set the last_irr register The corresponding bit is also set; on the contrary, if it is a low level, then the interrupt is eliminated, so the corresponding bit of the IRR register is cleared. If it is edge-triggered, then the function uses lines 102-110 for interrupt edge-triggered simulation, 103-108 OK, if the pin corresponding to last_irr is low level at this time, then the interrupt pin triggers a rising edge signal at this time, then the IRR register will set the bit corresponding to the pin at this time; otherwise, only use the last_irr register to record the level signal; Lines 108-109 generate a low level, and the falling edge does not trigger an interrupt, then only use the last_irr register to record the level information. The function checks whether the Interrupt mask register has masked the interrupt at line 112, and returns -1 if masked; otherwise returns 1.

The pic_get_irq() function is used for interrupt evaluation to determine whether an interrupt is sent to the CPU for processing. The function of line 137 of the function is to simulate the interrupt that can be processed without masking through the IRR register and the IMR register, and the line of function 138 is based on the priority Policy gets the highest priority. The function then simulates obtaining the current interrupt being processed from the ISR register in line 146, and then obtains the corresponding priority, and finally judges whether the PIC has an interrupt and sends it to the CPU for processing based on the priority in line 150-156.

The pic_update_irq() function is used to judge whether there is an interrupt in the master-slave vPIC, and if so, mark it so that the virtual machine can inject an interrupt when it VM-ENTRY again. The function judges in line 167 whether there is an interrupt in the slave vPIC that needs to be processed by the CPU. If there is, it proceeds to branch 168 for processing. In this branch, the function sets the IRQ2 of the master vPIC to simulate an interrupt from the slave vPIC and cuts off No. 2 of the master vPIC. pin to accept the interrupt. The function then calls the pic_get_irq() function again on line 175 to determine whether the main vPIC has an interrupt that needs to be processed by the CPU, and if so, calls the pic_irq_request() function to tell KVM that the main vPIC has an interrupt that needs to be processed by the CPU.

The function of the pic_unlock() function is to propose KVM_REQ_EVENT requirements to KVM and let the virtual machine VM-EXIT. The function calls the kvm_make_request() function on line 62 to propose KVM_REQ_EVENT requirements to KVM. If KVM detects KVM_REQ_EVENT requirements between VM-ENTRY, Then it will check whether there is an interrupt in the VM_ENTRY_INTR_INFO_FIELD field that needs to be injected. The interrupt injection can only be injected when VM-ENTRY, so for a virtual machine without VM-EXIT, KVM lets the virtual machine VM-EXIT by letting the VCPU send an IPI interrupt. So far, the active injection of vPIC interrupts has been completed, and the next step is to write the interrupts that need to be injected into VM_ENTRY_INTR_INFO_FIELD before the virtual machine VM-ENTRY again.

The virtual machine will call the vcpu_enter_guest() function to check before VM-ENTRY again. At this time, the function finds a KVM_REQ_EVENT request at line 8865, and the function calls the inject_pending_event() function at line 8873 for interrupt injection.

Inside inject_pending_event(), the function knows that the vPIC interrupt needs to be injected through the kvm_cpu_has_injectable_intr() function at line 8310, then the function enters the 8311 branch for processing, and the vPIC interrupt function enters the 8315 branch for processing, the function first calls the kvm_queue_interrupt() function to obtain the interrupt that needs to be injected vector, and then call the vmx_inject_irq() function to inject interrupts into the VM_ENTRY_INTR_INFO_FIELD field of VMCS.

The vmx_inject_irq() function is used for the final interrupt injection operation. From the function implementation, we can see that the interrupt number to be injected is obtained at line 4500, and then the vmcs_write32() function is called at line 4519 to set the corresponding bit of irq in the VM_ENTRY_INTR_INFO_FIELD field, so far vPIC After the interrupt injection is completed, the virtual machine VM-ENTRY checks that this field is set, and then injects an interrupt into the virtual machine after the virtual machine RESUME.

The VM_ENTRY_INTR_INFO_FIELD field is a 32-bit field. The Vector field is used to describe the injected interrupt vector number. The Deliver err Code field indicates whether an error code needs to be written to the Guest stack. The Valid field is used to indicate that an interrupt needs to be injected into the Guest OS. VM- This field is automatically cleared when EXIT is executed. Interrupt type indicates the type of interrupt injected>. The specific support is as follows:

  • 0: External Interrupt
  • 1: Reserved
  • 2: NMI
  • 3: Hardware exception (e.g. #PF)
  • 4: Software interrupt (INT n)
  • 5: Privileged software exception (INT 1)
  • 6: Software exception (INT 3 or INTO)
  • 7: Other event

Original author: Journey to the Linux Kernel

 

Guess you like

Origin blog.csdn.net/youzhangjing_/article/details/132120102