Chuanglong take you to decrypt TI, Xilinx between heterogeneous multi-core SoC processor core communications

First, what is heterogeneous multi-processor SoC

As the name suggests, a single-chip integrated SoC plurality of different processor architectures core units, which we call SoC heterogeneous multi-core processor, such as:

  1. TI's OMAP-L138 (DSP C674x + ARM9), AM5708 (DSP C66x + ARM Cortex-A15) SoC processors;
  2. Xilinx's ZYNQ (ARM Cortex-A9 + Artix-7 / Kintex-7 programmable logic architecture) the SoC processors.

Second, the heterogeneous multi-core SoC processor What are the advantages

Compared with single-core processors, heterogeneous multi-core SoC processors combined strengths will bring more performance, cost, power consumption, size, etc., between different architectures perform their duties and play their original architecture unique advantages. such as:

  1. ARM inexpensive, low power consumption, good control and operation of a multimedia display;
  2. DSP digital signal processing natural green, good a dedicated arithmetic operation;
  3. FPGA is good at high-speed, multi-channel data acquisition and signal transmission.

Meanwhile, core SoC heterogeneous processor cores, quickly transfer and share data through a variety of communication, to achieve the perfect effects 1 + 1> 2.

Third, the common inter-core communication

To maximize the performance of heterogeneous multi-core processor SoC, in addition to the opening of the semiconductor chip manufacturers of hardware package, hardware key mechanism lies in the design of inter-core communication, in the following description of several TI, Xilinx heterogeneous multi-processor SoC common inter-core communication.

  1. OpenCL

OpenCL (full name of Open Computing Language, open computing language) is the first for a general-purpose parallel programming of heterogeneous systems open, free standards, is a unified programming environment to facilitate software developers write portable and efficient code, but also widely applicable multi-core processor (CPU), graphics processing unit (GPU), Cell type architecture, and digital signal processor (DSP) and other parallel processors, energy, electricity, rail transportation, industrial automation, medical, communications, military and other applications fields have broad prospects for development.

On SoC heterogeneous multi-core processor, a programmable core in which the OpenCL considered the host, the devices regarded as the other cores. Applications running on the host (i.e., host program) the code (kernel) executed on the management device, and is also responsible for making the data available apparatus. Computing device by one or more units. For example, the TI AM5728 SoC heterogeneous multi-core processor, each of the C66x DSP is a computing unit.

OpenCL runtime, generally comprises the following two components:

  1. Host program for creating and submitting kernel API execution.
  2. For cross-platform language kernel.

2.DCE

DCE (Distributed Codec Engine) distributed codec engine, widget framework is complete Gstreamer TI video process frame AM57x heterogeneous multi-core processor based SoC, provided.

DCE consists of three hardware modules, respectively, MPU core, IPU2 IVA-HD core and a hardware accelerator, and its main functions are as follows:

MPU: ARM-based Gstreamer user space application, a control module libdce. libdce modules communicate with the IPC IPU2 on ARM RPMSG frame.

IPU2: Construction DCE server, for communication based on the frame and RPMSG ARM, using a codec controlling the engine and the frame assembly IVA-HD accelerator.

IVA-HD: for video / image codec hardware accelerator.

3.IPC

IPC (Inter-Processor Communication) is a group designed to promote inter-process communication modules. Communications include message passing, flow and linked list. Services and functionality of these modules may be provided for communication between heterogeneous multi-core processor to the ARM and DSP SoC core.

Comparison of the following advantages and disadvantages of the conventional processor of the inter-core communication TI heterogeneous multi SoC:

 

the way

advantage

Shortcoming

OpenCL

  1. Easy migration between the devices
  2. You need to understand memory architecture
  3. Without worrying about MPAX and MMU
  4. Without worrying about consistency
  5. Without having to build between the ARM and DSP / configuration / use IPC
  6. Without having to become experts in the code, schema, or optimization of DSP
  1. You can not control the system memory to the processing layout optimized DSP code

DEC

  1. Accelerated multimedia codec process
  2. Simplify the development of multimedia applications when connected to a plug Gstreamer and TI Gstreamer
  1. Not suitable for non-coding algorithm
  2. Efforts need to add a new codec algorithm
  3. DSP programming knowledge required

IPC

  1. Full control of the DSP Configurator
  2. DSP code can be optimized
  3. On multiple TI platforms supports the same API
  1. You need to know about memory architecture
  2. DSP need to understand the configuration and programming
  3. Message is limited to small (less than 512 bytes)
  4. TI's proprietary API

 

4AXI

AXI (Advanced eXtensible Interface) is a bus protocol proposed by ARM Inc., Xilinx began offering support for AXI bus from the 6 series FPGA, currently using AXI4 version.

ZYNQ three AXI bus:

(A) AXI4: (. For high-performance memory-mapped requirements) mainly for the needs of high-performance communication address mapping, address mapping is an interface for allowing data burst transmission 256 is the largest.

(B) AXI4-Lite: (. For simple, low-throughput memory-mapped communication) is a lightweight single address mapping transmission interface, the logic unit takes up very little.

(C) AXI4-Stream: (For high-speed streaming data.) For high-speed data transmission stream, the address item is removed, allowing unlimited burst transfer size of data.

AXI protocol is developed to be built on top of the bus constituted. Therefore, AXI4, AXI4-Lite, AXI4-Stream are AXI4 protocol. Ends AXI bus protocol may be divided into sub-main (Master), from (slave) at both ends, connected by a generally need AXI Interconnect between them, it is to provide one or more connecting devices to a master or AXI a plurality of switching from one mechanism AXI device.

The main role of AXI Interconnect is: When there are multiple hosts and from the machine, AXIInterconnect responsible for contacts and manage them. Because AXI support out of order, out of order ID signal support needs of the host, and the host sends a different ID may be the same, while AXI Interconnect solve this problem, he will have a different host ID signal is processed so that ID becomes only.

AXI protocol address read channel, the read data channel, channel write address, write data channel and write response channel separately, each channel has its own handshake agreement. Do not interfere with each other but dependent on each channel. This is one of the reasons AXI efficient.

Four, IPC inter-core communication development

Below, a Long AM57x (AM5728 / AM5708) Source Evaluation Board, for example, to explain the inter-core communication development IPC.

  1. Development Environment Description
  • RTOS Processor-SDK 04.03.00.05。
  • Linux-4.9.65 / Linux-RT-4.9.65 kernel.
  • IPC Development Kit version: 3.47.01.00.

IPC (Inter-Processor Communication) provides a processor-independent API, which inter-core communication process for a multi-core environment, inter-communicate with other threads on the same processing core (inter-process) and a peripheral device (device )Communication. Defines the communication IPC following components in the following table, the communication component interfaces have the following in common:

Notify

MessageQ

ListMp

GateMp

HeapBufMp

FrameQ (commonly used raw video data)

HeapMemMp

RingIO (usually used for audio data)

  • All system interfaces by standardized communication component named IPC.
  • In HLOS end, all the required interfaces IPC The _setup () to initialize using _destroy () to destroy the corresponding IPC Module1; partially initialized also desirable to provide an interface _config ().
  • All instances of the need to use _create () to create, use _delete () to delete.
  • When using IPC deeper need _open () to get the handle, at the end of the IPC needed _close () to recover the handle.
  • IPC configuration Most configuration is done in the SYS / BIOS, to support XDC configuration you can use the static configuration method.
  • Each module supports IPC trace information for debugging, and support different trace levels.
  • IPCs section provides specific APIs to be used to extract information analysis.

 

This section demonstrates the use of MessageQ major communications components.

2.MessageQ mechanism

  1. MessageQ Module Features
  • Supporting structures of transmitting and receiving a variable length message.
  • A MessageQ will have a reader, multiple writers.
  • For both homogeneous and heterogeneous multiprocessor messaging, it may also be used for single processor message passing between threads.
  • Powerful, easy to use.

​​​​​​​

2.MessageQ mechanism to explain the code

MessageQ transmission, mainly divided into the sender with the receiver, the following description for the common API functions:

  • MessageQ_Handle MessageQ_create (String name, MessageQ_Params * params): create a message queue, the queue name will be created later MessageQ_open basis.
  • Int MessageQ_open (String name, MessageQ_QueueId * queueId): Open the message queue is created, access to queue ID value (ID value should be a unique value, create a message queue name to be unique).
  • MessageQ_Msg MessageQ_alloc (UInt16 heapId, UInt32 size): application message space application from the heap, so the need to open heap acquisition heapID, the predetermined message length MessageQ_Msg structure.
  • MessageQ_registerHeap (HeapBufMP_Handle_upCast (heapHandle), HEAPID): Registration heap, heapID assigned to the heap, as a unique identifier.
  • Int MessageQ_put (MessageQ_QueueId queueId, MessageQ_Msg msg): sending the corresponding message to the message queue QueueID.
  • Int MessageQ_get (MessageQ_Handle handle, MessageQ_Msg * msg, UInt timeout): receiving a message from the message queue.
  • MessageQ_free (MessageQ_Msg * msg): msg release space, do not pay attention to the news release required space, otherwise it will lead to memory problems. To ex02_messageq routines to illustrate the process of using MessageQ mechanisms:

​​​​​​​

Routine operation flow chart is as follows:

Actual code analysis process described above:

ARM:

a) creating a host message queue, the message queue open slave.

b) sending a message to the slave message queue, the message queue listener host, waiting to return information.

c) sending a message to the slave queue shutdown.

DSP:

a) creating a slave message queue.

b) monitor slave message queue, and returns a message to the host terminal.

c) receiving a shutdown message, stop the task.

3. The memory access and address mapping problem.

  • Address Mapping

First, the memory management unit (MMU) between the DSP / IPU and L3 interconnect subsystem, for all virtual addresses (i.e. address DSP / IPU subsystem viewed) into a physical address (i.e., from the interconnection L3 see address).

DSP: MMU0 for DSP core, MMU1 for local EDMA.

IPU: IPUx_UNICACHE_MMU for a map, IPUx_MMU for two mapping.

rsc_table_dspx.h, rsc_table_ipux.h resource table, the mapping relationship configured DSP / IPU subsystem, before the startup of the firmware, which will be written into the register mapping relationship, the mapping process is completed.

Physical address with the mapping between the virtual address View:

The default configuration mmu1 configuration and mmu2 configuration is the same as DSP1 :()

cat /sys/kernel/debug/omap_iommu/40d01000.mmu/pagetable

cat /sys/kernel/debug/omap_iommu/40d02000.mmu/pagetable

 

The default configuration mmu1 configuration and mmu2 configuration is the same as DSP2 :()

cat /sys/kernel/debug/omap_iommu/41501000.mmu/pagetable

cat /sys/kernel/debug/omap_iommu/41502000.mmu/pagetable

 

IPU1:

cat /sys/kernel/debug/omap_iommu/58882000.mmu/pagetable

IPU2;

cat /sys/kernel/debug/omap_iommu/55082000.mmu/pagetable

Resource_physToVirt(UInt32pa,UInt32*da);

Resource_virtToPhys(UInt32da,UInt32*pa);

  • Memory Access
  1. CMA memory

CMA memory for storing the stack IPC program code and data segments.

dts files, set aside a few paragraphs from space as a space segment core (DDR space):

IPC-demo / shared / config.bld: the starting segment address of the configuration space, and segment size.

To DSP1 to illustrate the relationship between the DMA memory mapping of:

By viewing the system virtual address table, left da (device address) corresponding to the virtual address, physical address corresponding to the right, then the address of the virtual address 0x95000000 should be mapped to the physical address of 0x99100002.

cat /sys/kernel/debug/omap_iommu/40d01000.mmu/pagetable

2. Shared Memory

Shared memory: in fact, is a memory "everybody" can access.

CMEM is a kernel-mode driver (the ARM), is assigned to one or more Block (contiguous memory allocation), to better manage memory application (or a plurality of contiguous memory allocation Block), and recovering the released memory fragmentation .

CMEM Memory: reserved by linux, CMEM is a space driven management.

arch / arm / boot / dts / am57xx-evm-cmem.dtsi defined CMEM, and out of the space reserved for shared memory (DDR & OCMC space).

cmem {} block number distribution maximum is four, the number of cmem-buf-pools is not limited.

The actual use, DSP and IPU access are virtual addresses, but also to complete the mapping of virtual addresses to physical addresses of the relationship.

dsp1 / rsc_table_dsp1.h defines the virtual address to physical address mapping table, the virtual address (0x85000000) 0xA0000000 to physical address mapping, and then when the access address 0x85000000 in the DSP side when in fact the physical address should be mapped access 0xA0000000.

cat /sys/kernel/debug/omap_iommu/40d01000.mmu/pagetable

Practical application:

a) Initial cmem.

b) application memory space, and converted to a physical address (msg transmission when the transmission is the physical address or virtual address transmission uncertainty).

DSP side processing: receiving a physical address to virtual address conversion operation, the transmission operation result completed. Here we need to address to the DSP ARM, then it should convert virtual addresses into physical addresses, and then to the ARM side.

Published 356 original articles · won praise 11 · views 30000 +

Guess you like

Origin blog.csdn.net/Tronlong_/article/details/105089616