0. Design ideas
The purpose of this article is to write for the bit file generated by calling HLS in PYNQ. It aims to clarify the entire process and details. It is a personal study note and will be gradually updated and improved as the work progresses. This article may not be useful to your work. Help, but also hope you can point out the mistakes in the text, thank you.
At present, the entire process will be refined from the example of matrix multiplication for future use of your own IP.
The following is an example of matrix multiplication. The HLS source file is detailed in https://download.csdn.net/download/u014798590/64515794,
which contains the source file and test file of matrix multiplication.
If there are any details in this article, please refer to the article https://blog.csdn.net/qq_42334072/article/details/106769534
Tips:
01. Communication between PS and PL
According to "The Zynq Book", PL (programmable logic) and PS (processing system) use AXI (Advanced eXtensible Interface) for communication.
For the introduction and use of AXI, see Chapter 19 of "The Zynq Book" and chapter lab4 of "UG871 HLS Tutoial" ch.4
Regarding the design idea of the picture, before sending it in, the Python on the PS side has been converted into an rgb bitmap, and then the AXI interface is called to write in a frame of data, and then read out after processing. In this example, the image is implemented using AXI_Stream
02. Use your own ip idea
As can be seen in the figure below, the IP (image_process) designed by the routine has two interfaces for input and output. We connect the image IP with axi_dam in Diagram. The image processing tutorial is a very good learning material . In deep learning In , the essence of image IP is a convolutional layer, which inputs the original image and obtains the convolved image, which will be learned in this example later.
03. Add third-party boards in Vivado
Visit the website https://github.com/Avnet/bdf;
copy the folder corresponding to the board to the E:\Xilinx\Vivado\2018.3\data\boards\board_files folder; it
should be noted that if the board With multiple versions, you only need to copy a folder;
switch Parts/Boards in the project, then search and select.
1. HLS design
Please note that the chip model selected by HLS must be the same as the chip or board model in the Vivado design, otherwise in the BD, you may not be able to find the IP you packaged
Overview:
Compile the completed HLS file, and generate the IP core after the test and synthesis are completed.
1.1. Process Memo
1.1.5. Add header file
Before synthesis, you need to set the following process first
project——"project setting——"synthesis—"Top Function
select the corresponding function
C synthesis
1.2. Pragma design and optimization
For details, please refer to the HLS-related learning blog, "Xilinx HLS Study Notes 3 (for Loop Optimization)"
The following example is to optimize the for loop and increase the pipeline.
In the process of optimizing the program, you can use the Pragma code or the HLS Directive method to configure
the following display separately
1.2.1, Directive method
First name the for loop that needs to be expanded or streamlined
, and then you can see it in the Directive on the right
Right-click and name to set the insert directive. In the pop-up dialog box, select the command according to the actual situation.
Finally, you can see the relevant command appears, indicating that the configuration is effective.
1.2.2, Pragma method
Configure directly using pragma HLS PIPELINE
Final result
1.2.3. Interface configuration (leave pits)
In the same way, two methods can also be used to achieve the same effect, (about the function and definition of the interface, to be updated)
1.3. Integrated and packaged IP
Finally, find the IP folder under this path (will be used in the next step)
D:\ZYNQ_PACK\Zynq7020\HLS_Project\AXI_Test1\IP_Matrix\Matrix\solution1\impl
2. Vivado design
Overview:
Import the IP core generated above, add ZYNQ, AXI and other IP, after a certain configuration, draw Diagram for connection, and export bit, Hwh, tcl files after connection. Also record the address used.
2.1. Process Memo
1. After creating the project and entering the file path and file name, all Next
2. Select the corresponding chip/board such as 7020, ultra96 (see above for third-party boards), etc.
3. Create Block Design (BD)
4. Add ZYNQ’s IP (Different boards may have different cores. In this example, Ultra96 is used, so the core is UltraScale+)
5. Run Block Automation (block automation)
6. Add your own IP
and select the path of the IP core folder generated above
to select all of the above Generated IP folder
After the addition is complete, you can see the path of the IP
Then in the DB, search for the corresponding IP name, add the IP
we wrote by ourselves, and the IP will be displayed.
7. Configure the PS core
8. In the automatic wiring, configure HP0-HP2
to match the IP port we wrote
10. Finally, connect the interrupt signal line ( please note that the direct connection here will cause the PYNQ 2.6 version to load the bit file and report an error interrupts. It is recommended to add a concat block in the middle of the connection, as shown in Figure 2 below
The final generated BD is shown in the figure below (the functions of each part are left in the pit), you can click the check (box tick) on the BD picture to check whether there is any error in the connection.
Please note that some methods indicate that in step 11, it is not necessary to use the generate output products step, but directly generate the top-level file (creat HDL wrapper), and then perform step 12 to generate bitstream
11. To export the result, right click on the Design Sources project on the right, after completing Generate Output Products,
select Create HDL Wrapper to generate the top-level file,
and the following dialog box will pop up. After the top-level file
is generated, the following figure will appear. After the bit stream of the bit file HLS_Project\AXI_Test1\project_1\project_1.runs\impl_1 is generated, key information such as resource overhead will appear, and then we switch back to the BD page and export Block Design——>tcl file
2.1.1, file path
For tcl file export, you can set the export directory by yourself (the default is in the root directory of the project). Finally, find the hwh file
in this path. The 2018 version of vivado path is ** AXI_Test1\project_1\project_1.srcs\sources_1\bd\Ultra96_Design\hw_handoff*** 2020 The version vivado path is AXI_Test1\project_1\project_1.gen\sources_1\bd\Ultra96_Design\hw_handoff*** bit file is in the directory project folder\project_1.runs\impl_1 , and the wrapper.bit file is finally obtained. These three files are completed After this part of the content, we can package and export bit, Hwh, tcl and other files to the jupyter project folder.
2.2. View address
When the PS side calls our above file for calculation, it needs to operate through the following address, open the following path in BD, and then you can view it
3. PYNQ call
Overview:
Load the bit, hwh, and tcl files mentioned above; load the bias, weight and other dat files required for model reasoning; in the third step, some registers and offsets need to be set for operation and control of IP
. Yes, for the above three files, the file name prefix needs to be changed to be the same (such as 1.tcl; 1.bit; 1.hwh), so that they can be called correctly in PYNQ.
3.1. Upload files and call
Upload files can be uploaded via jupyter web page, or uploaded directly to the root directory of the project via FTP.
from pynq import Overlay
from pynq import Xlnk
from pynq.lib.video import *
from PIL import ImageDraw as PIL_ImageDraw
from PIL import ImageFont
from pynq import MMIO
from PIL import Image as PIL_Image
import numpy as np
import math
import os
import inspect
import matplotlib.pyplot as plt
import time
import ctypes
overlay=Overlay("design_1.bit")
xlnk=Xlnk()
xlnk.xlnk_reset()
print('ok')
3.2. Set address
According to the screenshot in chapter 2.2, set the address
img_in_b=xlnk.cma_array(shape=(4,), dtype=np.int32)
in_buffer_address_b=img_in_b.physical_address
img_in_a=xlnk.cma_array(shape=(4,), dtype=np.int32)
in_buffer_address_a=img_in_a.physical_address
img_in_c=xlnk.cma_array(shape=(4,), dtype=np.int32)
in_buffer_address_c=img_in_c.physical_address
print("in buffer_address_b:",in_buffer_address_b)
print("in buffer_address_a:",in_buffer_address_a)
a=[2,1,4,3]
b=[1,2,1,0]
np.copyto(img_in_b,a)
np.copyto(img_in_a,b)
IP_BASE_ADDRESS =0X0080000000
ADDRESS_RANGE =0x30
FPGA_img_addr_AP_CTRL=0x00
FPGA_img_addr_GIE =0x04
FPGA_img_addr_IER =0x08
FPGA_img_addr_ISR =0x0c
FPGA_img_addr_b=0x10
FPGA_img_addr_a=0x18
FPGA_img_addr_c=0x20
3.3, run the program
During the running of the program, it needs to correspond to the program developed by HLS. An example of the calling method is shown in the figure below
def Matrix():
mmio=MMIO(IP_BASE_ADDRESS, ADDRESS_RANGE)
while True:
ap_idle=(mmio.read(FPGA_img_addr_AP_CTRL)>>2)&0x01
if(ap_idle):
break
mmio.write(FPGA_img_addr_b,in_buffer_address_b)
mmio.write(FPGA_img_addr_a,in_buffer_address_a)
mmio.write(FPGA_img_addr_c,in_buffer_address_c)
mmio.write(FPGA_img_addr_GIE,0)
mmio.write(FPGA_img_addr_AP_CTRL,1)
while True:
ap_done=(mmio.read(FPGA_img_addr_AP_CTRL)>>1)&0x01
if(ap_done):
break;
print("b_address:",mmio.read(FPGA_img_addr_b))
print("a_address:",mmio.read(FPGA_img_addr_a))
time.sleep(3)
start=time.time()
Matrix()
stop=time.time()
time_cifar_fpga=stop-start
print("Matrix time:",time_cifar_fpga)
time.sleep(3)
print("The out is",img_in_c)
4. Reference materials
"HLS Tutorial UG817"
"AXI Reference Guide UG1037"
"HLS User Guide UG902"
This article also refers to the related videos of Station B Up - Jiangyao