PYNQ framework HLS development process memo

0. Design ideas

The purpose of this article is to write for the bit file generated by calling HLS in PYNQ. It aims to clarify the entire process and details. It is a personal study note and will be gradually updated and improved as the work progresses. This article may not be useful to your work. Help, but also hope you can point out the mistakes in the text, thank you.

At present, the entire process will be refined from the example of matrix multiplication for future use of your own IP.

The following is an example of matrix multiplication. The HLS source file is detailed in https://download.csdn.net/download/u014798590/64515794,
which contains the source file and test file of matrix multiplication.

If there are any details in this article, please refer to the article https://blog.csdn.net/qq_42334072/article/details/106769534

Tips:

01. Communication between PS and PL

According to "The Zynq Book", PL (programmable logic) and PS (processing system) use AXI (Advanced eXtensible Interface) for communication.
For the introduction and use of AXI, see Chapter 19 of "The Zynq Book" and chapter lab4 of "UG871 HLS Tutoial" ch.4

Regarding the design idea of ​​the picture, before sending it in, the Python on the PS side has been converted into an rgb bitmap, and then the AXI interface is called to write in a frame of data, and then read out after processing. In this example, the image is implemented using AXI_Stream

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

02. Use your own ip idea

As can be seen in the figure below, the IP (image_process) designed by the routine has two interfaces for input and output. We connect the image IP with axi_dam in Diagram. The image processing tutorial is a very good learning material . In deep learning In , the essence of image IP is a convolutional layer, which inputs the original image and obtains the convolved image, which will be learned in this example later.
insert image description here

03. Add third-party boards in Vivado

Visit the website https://github.com/Avnet/bdf;
copy the folder corresponding to the board to the E:\Xilinx\Vivado\2018.3\data\boards\board_files folder; it
should be noted that if the board With multiple versions, you only need to copy a folder;
switch Parts/Boards in the project, then search and select.
insert image description here

1. HLS design

Please note that the chip model selected by HLS must be the same as the chip or board model in the Vivado design, otherwise in the BD, you may not be able to find the IP you packaged

Overview:
Compile the completed HLS file, and generate the IP core after the test and synthesis are completed.

1.1. Process Memo

1.1.5. Add header file

Before synthesis, you need to set the following process first
project——"project setting——"synthesis—"Top Function
select the corresponding function
C synthesis

1.2. Pragma design and optimization

For details, please refer to the HLS-related learning blog, "Xilinx HLS Study Notes 3 (for Loop Optimization)"
The following example is to optimize the for loop and increase the pipeline.
In the process of optimizing the program, you can use the Pragma code or the HLS Directive method to configure
the following display separately

1.2.1, Directive method

First name the for loop that needs to be expanded or streamlined
insert image description here
, and then you can see it in the Directive on the right
insert image description here

Right-click and name to set the insert directive. In the pop-up dialog box, select the command according to the actual situation.
insert image description here
Finally, you can see the relevant command appears, indicating that the configuration is effective.
insert image description here

1.2.2, Pragma method

Configure directly using pragma HLS PIPELINE
insert image description here
Final result
insert image description here

1.2.3. Interface configuration (leave pits)

In the same way, two methods can also be used to achieve the same effect, (about the function and definition of the interface, to be updated)
insert image description here
insert image description here

1.3. Integrated and packaged IP

insert image description here
Finally, find the IP folder under this path (will be used in the next step)
D:\ZYNQ_PACK\Zynq7020\HLS_Project\AXI_Test1\IP_Matrix\Matrix\solution1\impl

2. Vivado design

Overview:
Import the IP core generated above, add ZYNQ, AXI and other IP, after a certain configuration, draw Diagram for connection, and export bit, Hwh, tcl files after connection. Also record the address used.

2.1. Process Memo

1. After creating the project and entering the file path and file name, all Next
2. Select the corresponding chip/board such as 7020, ultra96 (see above for third-party boards), etc.
3. Create Block Design (BD)
insert image description here
insert image description here
4. Add ZYNQ’s IP (Different boards may have different cores. In this example, Ultra96 is used, so the core is UltraScale+)
insert image description here
5. Run Block Automation (block automation)
insert image description here
6. Add your own IP
and select the path of the IP core folder generated above
insert image description here
to select all of the above Generated IP folder
insert image description here

After the addition is complete, you can see the path of the IP
insert image description here

Then in the DB, search for the corresponding IP name, add the IP
insert image description here
we wrote by ourselves, and the IP will be displayed.
insert image description here
7. Configure the PS core
insert image description here
8. In the automatic wiring, configure HP0-HP2
insert image description here
to match the IP port we wrote
insert image description here
insert image description here
insert image description here
10. Finally, connect the interrupt signal line ( please note that the direct connection here will cause the PYNQ 2.6 version to load the bit file and report an error interrupts. It is recommended to add a concat block in the middle of the connection, as shown in Figure 2 below
insert image description here
insert image description here

The final generated BD is shown in the figure below (the functions of each part are left in the pit), you can click the check (box tick) on the BD picture to check whether there is any error in the connection.
insert image description here
Please note that some methods indicate that in step 11, it is not necessary to use the generate output products step, but directly generate the top-level file (creat HDL wrapper), and then perform step 12 to generate bitstream

11. To export the result, right click on the Design Sources project on the right, after completing Generate Output Products,
select Create HDL Wrapper to generate the top-level file,
insert image description here
and the following dialog box will pop up. After the top-level file
insert image description here
is generated, the following figure will appear. After the bit stream of the bit file HLS_Project\AXI_Test1\project_1\project_1.runs\impl_1 is generated, key information such as resource overhead will appear, and then we switch back to the BD page and export Block Design——>tcl file
insert image description here

insert image description here


insert image description here

insert image description here
insert image description here

insert image description here

2.1.1, file path

For tcl file export, you can set the export directory by yourself (the default is in the root directory of the project). Finally, find the hwh file
insert image description here
in this path. The 2018 version of vivado path is ** AXI_Test1\project_1\project_1.srcs\sources_1\bd\Ultra96_Design\hw_handoff*** 2020 The version vivado path is AXI_Test1\project_1\project_1.gen\sources_1\bd\Ultra96_Design\hw_handoff*** bit file is in the directory project folder\project_1.runs\impl_1 , and the wrapper.bit file is finally obtained. These three files are completed After this part of the content, we can package and export bit, Hwh, tcl and other files to the jupyter project folder.


insert image description here


insert image description here

2.2. View address

When the PS side calls our above file for calculation, it needs to operate through the following address, open the following path in BD, and then you can view it
insert image description here

3. PYNQ call

Overview:
Load the bit, hwh, and tcl files mentioned above; load the bias, weight and other dat files required for model reasoning; in the third step, some registers and offsets need to be set for operation and control of IP
. Yes, for the above three files, the file name prefix needs to be changed to be the same (such as 1.tcl; 1.bit; 1.hwh), so that they can be called correctly in PYNQ.

3.1. Upload files and call

Upload files can be uploaded via jupyter web page, or uploaded directly to the root directory of the project via FTP.

from pynq import Overlay
from pynq import Xlnk
from pynq.lib.video import *
from PIL import ImageDraw as PIL_ImageDraw
from PIL import ImageFont
from pynq import MMIO
from PIL import Image as PIL_Image

    
import numpy as np
import math
import os
import inspect
import matplotlib.pyplot as plt
import time
import ctypes
overlay=Overlay("design_1.bit")

xlnk=Xlnk()
xlnk.xlnk_reset()
print('ok')

insert image description here

3.2. Set address

According to the screenshot in chapter 2.2, set the address

img_in_b=xlnk.cma_array(shape=(4,), dtype=np.int32)
in_buffer_address_b=img_in_b.physical_address

img_in_a=xlnk.cma_array(shape=(4,), dtype=np.int32)
in_buffer_address_a=img_in_a.physical_address

img_in_c=xlnk.cma_array(shape=(4,), dtype=np.int32)
in_buffer_address_c=img_in_c.physical_address

print("in buffer_address_b:",in_buffer_address_b)
print("in buffer_address_a:",in_buffer_address_a)
a=[2,1,4,3]
b=[1,2,1,0]

np.copyto(img_in_b,a)
np.copyto(img_in_a,b)
IP_BASE_ADDRESS =0X0080000000
ADDRESS_RANGE =0x30
FPGA_img_addr_AP_CTRL=0x00
FPGA_img_addr_GIE =0x04
FPGA_img_addr_IER =0x08
FPGA_img_addr_ISR =0x0c
FPGA_img_addr_b=0x10
FPGA_img_addr_a=0x18
FPGA_img_addr_c=0x20

3.3, run the program

During the running of the program, it needs to correspond to the program developed by HLS. An example of the calling method is shown in the figure below

def Matrix():
    mmio=MMIO(IP_BASE_ADDRESS, ADDRESS_RANGE)
    while True:
        ap_idle=(mmio.read(FPGA_img_addr_AP_CTRL)>>2)&0x01
        if(ap_idle):
            break
            
    mmio.write(FPGA_img_addr_b,in_buffer_address_b)
    mmio.write(FPGA_img_addr_a,in_buffer_address_a)
    mmio.write(FPGA_img_addr_c,in_buffer_address_c)
    
    mmio.write(FPGA_img_addr_GIE,0)
    mmio.write(FPGA_img_addr_AP_CTRL,1)
    while True:
        ap_done=(mmio.read(FPGA_img_addr_AP_CTRL)>>1)&0x01
        if(ap_done):
            break;
    print("b_address:",mmio.read(FPGA_img_addr_b))
    print("a_address:",mmio.read(FPGA_img_addr_a))
        
time.sleep(3)

start=time.time()
Matrix()
stop=time.time()
time_cifar_fpga=stop-start

print("Matrix time:",time_cifar_fpga)

time.sleep(3)
print("The out is",img_in_c)

insert image description here

4. Reference materials

"HLS Tutorial UG817"
"AXI Reference Guide UG1037"
"HLS User Guide UG902"
This article also refers to the related videos of Station B Up - Jiangyao

Guess you like

Origin blog.csdn.net/u014798590/article/details/121648687