Using DMA to transfer data between HDL and Embedded C in FPGA

Using DMA to transfer data between HDL and Embedded C in FPGA

This project presents the basic structure of how to transfer data between HDL in the PL and embedded C running on the processor in the FPGA.

5540615d780a6be98c2a0d4d010d144c.png

introduce

Given the rise of hardware acceleration in FPGA designs for applications such as machine learning and artificial intelligence, it's time to peel back a few layers and discuss passing data back and forth between HDL (mainly the code running in the FPGA's programmable logic (PL) and A good time to learn the basics of transferring data between corresponding software running on a hard or soft processor in an FPGA).

Hardware acceleration can be summarized as the basic idea of ​​implementing certain functions in hardware (also called programmable logic of an FPGA) that were previously implemented in software running on a host PC or on a processor (soft or hard) on the FPGA . Therefore, to be an effective designer, you must master the skills of passing data back and forth between hardware and software.

In this case, a Zynq SoC (system on chip) FPGA is used, which has a hard-core ARM processor. The ARM core and peripherals are called the processing system or PS.

While there are a few different ways to accomplish data transfer between PL and PS, including writing your own custom interface, I think the most common mechanism is via direct memory access (DMA) transfers. This is because DMA allows the ARM core's CPU to simply initiate a data transfer between itself and the DDR, without the CPU having to wait for the transfer to complete before performing any other tasks. DMA also allows the CPU to initiate transfers between external devices and the DDR.

In this project, the capabilities of DMA will be demonstrated by using the Xilinx DMA IP, which converts a memory mapped interface to a stream interface over the AXIS bus. Write 32 bytes to memory in embedded C, then transfer it to PL via memory map to stream (MM2S) AXIS, process each value via register, then transfer the data back to memory via stream DMA IP's memory map ( S2MM) port.

While this example is too simple for a heavy hardware acceleration application, this level of high-speed data transfer can become very complex/difficult to learn when new to FPGAs. This project focuses on the use of DMA and its behavior. Although it was intended that this project focus more on the data processing aspects, many small "gotchas" were discovered in the DMA transaction implementation, so the data processing focus had to be left to another project.

ab42ca76acef9933f41cd2b7127ed97c.png

There are two main layers to using AXI DMA to control data transfer between HDL in PL and C code in PS:

  1. AXI stream handshake signals in the PL's HDL code on the Memory Map to Stream (MM2S) and Stream to Memory Map (S2MM) channels (the control channels for the DMA are written using normal AXI, but this is all handled automatically by Vivado, so Here we only focus on the AXI stream interface).

  1. Sequence of register read/write DMA in PS's C code.

AXI-Stream handshake in Verilog

The AXI stream interface uses a simple set of handshake signaling mechanisms for data exchange in embedded designs. There are many optional signals in the AXI stream interface, but the required signals related to DMA MM2S and S2MM data exchange are tdata, tvalid, tready, tlast and tkeep. In AXI stream, the main interface is used to send data and the slave interface is used to receive data.

  • tdata: data bus

  • tvalid: Set by the main interface when the data placed on the tdata bus is valid

  • tredy: Set by the slave when the slave is ready to receive data on the tdata bus

  • tlast: Asserted by the master during the duration of the last packet in the tdata bus stream to tell the slave that there will be no data after this packet

  • tkeep: Secondary verification of packets on the tdata bus set by the master, indicating whether the packet is part of the stream

Exactly how the AXI DMA IP implements this handshake interface to transfer data out of memory (MM2S) and into memory (S2MM) is very variable, especially on the S2MM side...

However, the first thing we need to know about S2MM transactions for AXI DMA, most of which can be summed up in one sentence: S2MM transactions must be set up and the appropriate data in the DMA must be written in the appropriate order before trying to send any data to the DMA. Control register to start the transaction and once the S2MM channel sees the tlast signal, it will stop the transaction.

Data transfers occur on the tdata bus in S2MM and MM2S transactions every clock cycle, with both tready and tvalid set (true). Therefore, when responsible for asserting tvalid, the master on the AXI interface must be careful not to let tvalid assert for more than one clock cycle when the incoming trety signal from the slave is also asserted for tvalid. Otherwise, the slave will clock the same packet twice, as two separate packets. And because how many bytes are in the transfer must be specified in the control register, the DMA channel (S2MM in this case) will think the exchange is over before it sees the tlast signal being supplied, because the count is off.

e926ad6384b27b6047a8dbb3ca212257.png

I wrote a simple state machine in Verilog, which implements a slave AXI stream interface to receive data from the MM2S channel of DMA, passes each packet in the stream through a register, and then implements a main AXI strean interface to stream the data Sent back to the S2MM channel. The registers through which data from the tdata bus passes are intended to serve as placeholders for any custom data processing for hardware acceleration.

A screenshot taken from ILA in Vivado showing a timing diagram implemented using a state machine. The top is the MM2S side and the bottom is the S2MM side.

This is a flowchart of a Verilog state machine and the actual file is attached at the end of this article. It is worth noting that the master/slave interface in the flow diagram is viewed from the perspective of a Verilog state machine.

cebdd373e39521ca0d450dfa57e399e5.png

For the specific setup of the DMA IP, the scatter-gather option is not selected because DMA is used in direct register mode. I then left everything else at default and checked the option to allow unaligned transfers, which I found gave a lot more free space when writing the custom AXI stream interface to DMA.

e6cd904fd8ecf88edd4db0e653f66e96.png

To add the Verilog state machine to the block design, I right-clicked on an empty area of ​​the block design and selected the "Add Block..." option, which will display all valid Verilog blocks that Vivado can find in the design source in BD files used in .

47237336e4bf0ee851130d79b7e11044.png

It is worth noting that the signal naming convention follows the "s_axis" and "m_axis" standards for slave and master interfaces respectively.

46d8423991437933c2f6a698a8db4daf.png

DMA register read/write control sequence

Here is a simpler sequence when using DMA on bare metal:

  • 1. Reset the DMA by writing 1 to bit 2 of the MM2S (offset 0x00) and S2MM (offset 0x30) control registers.

  • 2. Write the target address of the location in the DDR where the S2MM channel wants to write data to the S2MM DMA target address register (offset 0x48).

  • 3. Start the DMA S2MM channel by writing 1 to bit 0 of the S2MM control register (offset 0x30).

  • 4. Write the length of the S2MM channel buffer by writing the total number of bytes to be read into memory on the S2MM channel into the S2MM buffer length register (offset 0x58). This starts an S2MM transfer so that the DMA is ready to receive the data stream from the device in the FPGA logic (this process doesn't actually start until the data is actually fed and tvalid on the AXI stream bus is asserted by the device in the FPGA logic).

  • 5. Write the source address in the DDR of the data to be read by the MM2S channel into the MM2S DMA source address register (offset 0x18).

  • 6. Start the DMA MM2S channel by writing a 1 to bit 0 (offset 0x00) of the MM2S control register.

  • 7. Write the transfer length of the MM2S channel by writing the total byte value to be sent into the MM2S transfer length register (offset 0x28). This initiates an MM2S transfer from the DMA to the receiving device in the FPGA logic.

Remember earlier that we mentioned that the S2MM channel must be up and running before a device in the PL can attempt to send data to the S2MM channel? Well, that’s why the above steps are to be performed in order. Steps 2 - 4 configure and start the S2MM channel, and steps 5 - 7 configure and start the MM2S channel.

It's okay for some other processes to happen between steps 4 and 5, but steps 2 - 4 must happen before steps 5 - 7. After step 4, the S2MM AXI stream channel will assert its Tready signal and HDL code can start sending data to it.

This also explains something I noticed in the example DMA project in SDK/Vitis when I first started using DMA. Always thought the sample code seemed to be trying to pull data into the DDR (by doing an S2MM - XAXIDMA_DEVICE_TO_DMA transfer first) before writing anything from the DDR using the MM2S - XAXIDMA_DMA_TO_DEVICE transfer. However, the S2MM channel must be ready and waiting to receive data in order to function properly and not lock.

DMA can seem like a tricky way to get started in FPGA design, but it can be very helpful once you figure it out.

code

https://github.com/suisuisi/FPGATechnologyGroup/tree/main/AXIS-DMA

Guess you like

Origin blog.csdn.net/Pieces_thinking/article/details/134658056