Analysis of heterogeneous communication process between A core and M core

Now more and more products have the heterogeneous architecture of M core and A core, which can not only meet the real-time requirements of M core, but also meet the ecology and computing power of A core. Such as NXP's i.MX8 series, Renesas' RZ/G2L series and TI's AM62x series, etc. Although the brands and performance of these processors are different, the principles of multi-core communication are basically the same. They are all based on registers and interrupts to transmit messages, and based on shared memory to transmit data.

Description of the overall architecture of the communication process

1. Implementation principle of hardware layer communication

Through physical memory DDR allocation, the hardware layer is divided into two parts: TXVring Buffer (send virtual ring buffer) and RXVring Buffer (receive virtual ring buffer); where the M core sends data from the TXVring area and reads from the RXVring area Receive data, A core vice versa.

The processor supports the Messaging Unit (MU for short) function module, which communicates and coordinates by passing messages through the MU, and transmits commands between the M core and the A core through register interrupts, and supports up to 4 groups of MUs to transmit messages bidirectionally. Notify the other party of the status of data transmission through interrupts, and can also send up to 4 bytes of data, and can wake up the other party in low-power mode, which is an important means to ensure real-time dual-core communication.

Let's take a look at the specific process of passing a message from CoreA to CoreB once:

(1) CoreA writes data;
(2) MU clears the Tx empty bit to 0, and the Rx full bit is set to 1; (
3) Generates a receiving interrupt request and notifies CoreB that the receiver in the receiving status register is full and can read data;
(4) CoreB responds to the interrupt and reads the data;
(5) After CoreB reads the data, MU clears the Rx full bit to 0, and the Tx empty bit is set to 1; (
6) The status register generates a send interrupt request to CoreA, telling CoreB to read the data and send the register null.

2. Realization of RPMsg communication under the driver layer Virtio

Virtio is a general I/O virtualization framework, an abstraction layer above the device, responsible for the notification mechanism and control process between the front and back ends, and provides a layer implementation for data communication between heterogeneous multi-cores. The hypervisor simulates a series of virtualization devices, such as virtio-net, virtio-blk, etc., and makes these devices available through api calls inside the virtual machine. It consists of 4 parts: front-end driver, back-end driver, vring and unified interface between communication. Compared with other simulated I/O methods, virtio reduces the exit and data copying of virtual machines, and can greatly improve I/O performance. There are different bus standards in computers, and virtio uses the pci bus (of course, it can also be implemented with other buses). Every virtio device is a pci device.

Virtio front-end driver

The virtio front-end driver is located in the Linux kernel and runs on the virtual machine VM. There are different types of drivers for different types of devices, including virtio-net, virtio-blk, virtio-pci, etc. These drivers interact with the back-end driver. All unified.

virtio layer

The virtio layer implements the virtual queue interface. As a bridge for front-end and back-end communication, the number of virtual queues used by different types of devices is different. For example, virtio-net uses two virtual queues, one for receiving and one for sending; the virtio-blk driver only Use a virtual queue. The virtual queue is actually implemented as a connection point across the guest operating system and the hypervisor, and can be implemented in any way, provided that both the guest operating system and the virtio backend program follow certain standards and implement it in a matching manner.

virtio-ring layer

Virtio-ring is a concrete implementation of a virtual queue, which implements a ring buffer (ring buffer) to save information about the execution of the front-end driver and the back-end handler, and it can save multiple I/O requests from the front-end driver at one time , and hand it over to the back-end driver for batch processing, and finally actually call the device driver in the host to implement physical I/O operations. In this way, batch processing can be realized according to the agreement instead of every I/O request in the client. Processed once, thereby improving the efficiency of information exchange between the client and the hypervisor.

Virtio backend driver

The virtio backend driver is located in qemu, and the main functions of the backend device are divided into two parts:

Emulation of virtio backend devices;
According to the virtio protocol, the request sent from the virtual machine is processed.

In the implementation of QEMU, the virtio device is a PCI device simulated by QEMU for the virtual machine. It follows the PCI specification defined by PCI-SIG and has functions such as configuration space and interrupt configuration. The virtio back-end driver runs in the host machine and is used to implement the virtio The backend operates hardware devices, such as sending a data packet to the kernel protocol stack to complete the operation of the virtual machine on network data.

The RPMsg message framework is a framework for message communication between the main processing core and the co-processing core implemented by the Linux system based on the Virtio cache queue. When the client driver needs to send a message, RPMsg will encapsulate the message into a Virtio cache and add it to the cache queue. After completing the sending of the message, when the message bus receives the message sent by the coprocessor, it will be reasonably dispatched to the client driver for processing.

In the driver layer, for the A core, Linux adopts the RPMsg framework + Virtio driver model, and encapsulates the RPMsg as a tty file for the application layer to call; in the M core, the Virtio is transplanted, and a simplified version of the RPMsg is used, because it involves mutexes and Semaphore, and finally use FreeRTOS to complete the encapsulation of the process, the flow chart is shown below.

Flow chart of data transfer between main processing core and co-processing core

(1) Core0 sends data to Core1, and packs the data into the Virtioavail linked list area through the rpmsg_send function; (
2) Finds the free cache in the shared memory in the avail linked list, and puts the data in the shared memory;
(3) Notifies Core1 of the arrival of data through an interrupt , the shared memory is changed from the avail linked list area to the used area;
(4) Core1 receives an interrupt, triggers the receive callback function of rpmsg, obtains the physical address of the shared memory where the data is located from the used area, and completes the data reception; (5) Notifies through the
interrupt Core0 data reception is completed, and the shared memory cache is changed from the used area to the available area for the next transmission.

Information through train: Linux kernel source code technology learning route + video tutorial kernel source code

Learning through train: Linux kernel source code memory tuning file system process management device driver/network protocol stack

3. Application layer dual-core communication implementation

In the application layer, the A core can use the open, write and read functions to call the device file under /dev; for the M core, you can use the rpmsg_lite_remote_init, rpmsg_lite_send and rpmsg_queue_recv functions to call, and the focus will not be elaborated. From the perspective of the overall structure, the relationship is as follows: