hls之优化

HLS实战1:矩阵乘法

  • 循环展开
  • Pipeline处理
  • 数组优化
  • 基于联合仿真的电路优化

1.并行计算介绍

1处原来需要4x2=8个周期才能完成,现在,我们想loop1.1.1在一个周期就能完成运算。也就是把四个循环同时计算,也即并行计算。
在这里插入图片描述
有两种方法实现:
(说法不严谨,知道严谨的表述后改正)
方法一:手动展开,加unroll,把循环展开。
方法二:自动展开,告知系统这段代码一个周期完成,让系统自动推断。

2.使用方法一:手动展开

(1) unroll

优化C[i][j]=C[i][j]+A[i][k]+B[k][j],运行时间从2个周期变成一个周期
选择Derictive
在这里插入图片描述

点击1,软件会自动选择2
在这里插入图片描述
右击for Statement,选择Insert Directive
在这里插入图片描述
在Directive中选择unroll
在这里插入图片描述
选择source(优化文件都写在C语言中),而Directive File是单独新建了一个约束文件,编码时需要两头看,不方便,一般选择前者。点击确定。
在这里插入图片描述
代码自动添加到循环语句中,它的作用就是让 C[i][j]=C[i][j]+A[i][k]+B[k][j]这句代码只需要一个周期完成,原来需要2个周期(见Synthesis结果分析)。至此,已经完成了该句代码优化。至此这还不是我们最理想的优化方式。
在这里插入图片描述

(2)pipeline

在这里插入图片描述
也就是实现下面的目标,一次性将A的一行与B的一列计算出来,只花费1个周期。
在这里插入图片描述
点击第二个for state
在这里插入图片描述
右键Insert Derective,选择pipeline,Interation Latency(也被称为interval)填1。
在这里插入图片描述
自动添加#pragma HLS PIPELINE II =1,我们希望c[i][j]=0以及第三个循环一起只花一个周期,但实际上仅仅这样是做不到的。
在这里插入图片描述
重新Run C Synthsis

Starting C synthesis ...
D:/Xilinx201901/Vivado/2019.1/bin/vivado_hls.bat D:/xxxx/hls/matrix/matrix/solution1/csynth.tcl
INFO: [HLS 200-10] Running 'D:/Xilinx201901/Vivado/2019.1/bin/unwrapped/win64.o/vivado_hls.exe'
INFO: [HLS 200-10] For user 'summer' on host 'desktop-dck1feh' (Windows NT_amd64 version 6.2) on Fri Mar 12 17:13:58 +0800 2021
INFO: [HLS 200-10] In directory 'D:/10GraduationProject/hls/matrix'
Sourcing Tcl script 'D:/xxxx/hls/matrix/matrix/solution1/csynth.tcl'
INFO: [HLS 200-10] Opening project 'D:/10GraduationProject/hls/matrix/matrix'.
INFO: [HLS 200-10] Adding design file 'matrix/matrix_mul.h' to the project
INFO: [HLS 200-10] Adding design file 'matrix/matrix_mul.cpp' to the project
INFO: [HLS 200-10] Adding test bench file 'matrix/main.cpp' to the project
INFO: [HLS 200-10] Opening solution 'D:/10GraduationProject/hls/matrix/matrix/solution1'.
INFO: [SYN 201-201] Setting up clock 'default' with a period of 30ns.
INFO: [HLS 200-10] Setting target device to 'xc7z020-clg400-1'
INFO: [SYN 201-201] Setting up clock 'default' with a period of 30ns.
INFO: [SCHED 204-61] Option 'relax_ii_for_timing' is enabled, will increase II to preserve clock frequency constraints.
INFO: [HLS 200-10] Analyzing design file 'matrix/matrix_mul.cpp' ... 
INFO: [HLS 200-111] Finished Linking Time (s): cpu = 00:00:03 ; elapsed = 00:00:22 . Memory (MB): peak = 178.305 ; gain = 87.430
INFO: [HLS 200-111] Finished Checking Pragmas Time (s): cpu = 00:00:03 ; elapsed = 00:00:22 . Memory (MB): peak = 178.305 ; gain = 87.430
INFO: [HLS 200-10] Starting code transformations ...
INFO: [HLS 200-111] Finished Standard Transforms Time (s): cpu = 00:00:03 ; elapsed = 00:00:23 . Memory (MB): peak = 178.305 ; gain = 87.430
INFO: [HLS 200-10] Checking synthesizability ...
INFO: [HLS 200-111] Finished Checking Synthesizability Time (s): cpu = 00:00:03 ; elapsed = 00:00:23 . Memory (MB): peak = 178.305 ; gain = 87.430
INFO: [XFORM 203-502] Unrolling all sub-loops inside loop 'Loop-1.1' (matrix/matrix_mul.cpp:6) in function 'matrix_mul' for pipelining.
INFO: [HLS 200-489] Unrolling loop 'Loop-1.1.1' (matrix/matrix_mul.cpp:10) in function 'matrix_mul' completely with a factor of 4.
INFO: [XFORM 203-11] Balancing expressions in function 'matrix_mul' (matrix/matrix_mul.cpp:2)...3 expression(s) balanced.
INFO: [HLS 200-111] Finished Pre-synthesis Time (s): cpu = 00:00:03 ; elapsed = 00:00:23 . Memory (MB): peak = 178.305 ; gain = 87.430
INFO: [XFORM 203-541] Flattening a loop nest 'Loop-1' (matrix/matrix_mul.cpp:4:13) in function 'matrix_mul'.
INFO: [HLS 200-111] Finished Architecture Synthesis Time (s): cpu = 00:00:03 ; elapsed = 00:00:23 . Memory (MB): peak = 178.305 ; gain = 87.430
INFO: [HLS 200-10] Starting hardware synthesis ...
INFO: [HLS 200-10] Synthesizing 'matrix_mul' ...
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-42] -- Implementing module 'matrix_mul' 
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [SCHED 204-11] Starting scheduling ...
INFO: [SCHED 204-61] Pipelining loop 'Loop 1'.
WARNING: [SCHED 204-69] Unable to schedule 'load' operation ('A_V_load_2', matrix/matrix_mul.cpp:12) on array 'A_V' due to limited memory ports. Please consider using a memory core with more ports or partitioning the array 'A_V'.
INFO: [SCHED 204-61] Pipelining result : Target II = 1, Final II = 2, Depth = 3.
INFO: [SCHED 204-11] Finished scheduling.
INFO: [HLS 200-111]  Elapsed time: 23.654 seconds; current allocated memory: 100.940 MB.
INFO: [BIND 205-100] Starting micro-architecture generation ...
INFO: [BIND 205-101] Performing variable lifetime analysis.
INFO: [BIND 205-101] Exploring resource sharing.
INFO: [BIND 205-101] Binding ...
INFO: [BIND 205-100] Finished micro-architecture generation.
INFO: [HLS 200-111]  Elapsed time: 0.127 seconds; current allocated memory: 101.231 MB.
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-10] -- Generating RTL for module 'matrix_mul' 
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [RTGEN 206-500] Setting interface mode on port 'matrix_mul/A_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'matrix_mul/B_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'matrix_mul/C_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on function 'matrix_mul' to 'ap_ctrl_hs'.
INFO: [SYN 201-210] Renamed object name 'matrix_mul_mac_muladd_8s_8s_16ns_16_1_1' to 'matrix_mul_mac_mubkb' due to the length limit 20
INFO: [RTGEN 206-100] Generating core module 'matrix_mul_mac_mubkb': 2 instance(s).
INFO: [RTGEN 206-100] Finished creating RTL model for 'matrix_mul'.
INFO: [HLS 200-111]  Elapsed time: 0.188 seconds; current allocated memory: 101.778 MB.
INFO: [HLS 200-111] Finished generating all RTL models Time (s): cpu = 00:00:05 ; elapsed = 00:00:26 . Memory (MB): peak = 178.305 ; gain = 87.430
INFO: [VHDL 208-304] Generating VHDL RTL for matrix_mul.
INFO: [VLOG 209-307] Generating Verilog RTL for matrix_mul.
INFO: [HLS 200-112] Total elapsed time: 25.61 seconds; peak allocated memory: 101.778 MB.
Finished C synthesis.

优化结果:
计算一共需要34个周期,这已经比以前的169个周期快了不少了,但是还有优化的余地,我们希望16个周期就能计算完。
在这里插入图片描述

在这里插入图片描述
在这里插入图片描述
由于上述未指定数组A和B的端口类型,所以默认为双端口,双端口读取数据时,一次智能读取2个,所以读取4x4=16个数据需要32个周期。(这个结论的由来请看联合仿真

有 两 种 方 法 解 决 这 个 问 题 { A r r a y P a r t i t i o n A r r a y R e s h a p 有两种方法解决这个问题 \begin{cases} Array Partition\\ Array Reshap \end{cases} { ArrayPartitionArrayReshap

联合仿真

上述发现其实可以通过联合仿真看波形,发现这个问题。
Run C/RTL Cosimulation
在这里插入图片描述
在这里插入图片描述
运行记录

Starting C/RTL cosimulation ...
D:/Xilinx201901/Vivado/2019.1/bin/vivado_hls.bat D:/xxxx/hls/matrix/matrix/solution1/cosim.tcl
INFO: [HLS 200-10] Running 'D:/Xilinx201901/Vivado/2019.1/bin/unwrapped/win64.o/vivado_hls.exe'
INFO: [HLS 200-10] For user 'summer' on host 'desktop-dck1feh' (Windows NT_amd64 version 6.2) on Fri Mar 12 17:50:37 +0800 2021
INFO: [HLS 200-10] In directory 'D:/xxxx/hls/matrix'
Sourcing Tcl script 'D:/xxxx/hls/matrix/matrix/solution1/cosim.tcl'
INFO: [HLS 200-10] Opening project 'D:/xxxx/hls/matrix/matrix'.
INFO: [HLS 200-10] Opening solution 'D:/xxxx/hls/matrix/matrix/solution1'.
INFO: [SYN 201-201] Setting up clock 'default' with a period of 30ns.
INFO: [HLS 200-10] Setting target device to 'xc7z020-clg400-1'
INFO: [COSIM 212-47] Using XSIM for RTL simulation.
INFO: [COSIM 212-14] Instrumenting C test bench ...
   Build using "D:/Xilinx201901/Vivado/2019.1/msys64/mingw64/bin/g++"
   Compiling matrix_mul.cpp_pre.cpp.tb.cpp
   Compiling main.cpp_pre.cpp.tb.cpp
   Compiling apatb_matrix_mul.cpp
   Generating cosim.tv.exe
INFO: [COSIM 212-302] Starting C TB testing ... 
C[0,0]56
C[0,1]62
C[0,2]68
C[0,3]74
C[1,0]152
C[1,1]174
C[1,2]196
C[1,3]218
C[2,0]248
C[2,1]286
C[2,2]324
C[2,3]362
C[3,0]344
C[3,1]398
C[3,2]452
C[3,3]506
INFO: [COSIM 212-333] Generating C post check test bench ...
INFO: [COSIM 212-12] Generating RTL test bench ...
INFO: [COSIM 212-1] *** C/RTL co-simulation file generation completed. ***
INFO: [COSIM 212-323] Starting verilog simulation. 
INFO: [COSIM 212-15] Starting XSIM ...

D:\xxxx\hls\matrix\matrix\solution1\sim\verilog>set PATH= 

D:\xxxx\hls\matrix\matrix\solution1\sim\verilog>call D:/Xilinx201901/Vivado/2019.1/bin/xelab xil_defaultlib.apatb_matrix_mul_top glbl -prj matrix_mul.prj -L smartconnect_v1_0 -L axi_protocol_checker_v1_1_12 -L axi_protocol_checker_v1_1_13 -L axis_protocol_checker_v1_1_11 -L axis_protocol_checker_v1_1_12 -L xil_defaultlib -L unisims_ver -L xpm --initfile "D:/Xilinx201901/Vivado/2019.1/data/xsim/ip/xsim_ip.ini" --lib "ieee_proposed=./ieee_proposed" -s matrix_mul -debug wave 
INFO: [XSIM 43-3496] Using init file passed via -initfile option "D:/Xilinx201901/Vivado/2019.1/data/xsim/ip/xsim_ip.ini".
Vivado Simulator 2019.1
Copyright 1986-1999, 2001-2019 Xilinx, Inc. All Rights Reserved.
Running: D:/Xilinx201901/Vivado/2019.1/bin/unwrapped/win64.o/xelab.exe xil_defaultlib.apatb_matrix_mul_top glbl -prj matrix_mul.prj -L smartconnect_v1_0 -L axi_protocol_checker_v1_1_12 -L axi_protocol_checker_v1_1_13 -L axis_protocol_checker_v1_1_11 -L axis_protocol_checker_v1_1_12 -L xil_defaultlib -L unisims_ver -L xpm --initfile D:/Xilinx201901/Vivado/2019.1/data/xsim/ip/xsim_ip.ini --lib ieee_proposed=./ieee_proposed -s matrix_mul -debug wave 
Multi-threading is on. Using 6 slave threads.
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/glbl.v" into library work
INFO: [VRFC 10-311] analyzing module glbl
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/AESL_automem_A_V.v" into library xil_defaultlib
INFO: [VRFC 10-311] analyzing module AESL_automem_A_V
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/AESL_automem_B_V.v" into library xil_defaultlib
INFO: [VRFC 10-311] analyzing module AESL_automem_B_V
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/AESL_automem_C_V.v" into library xil_defaultlib
INFO: [VRFC 10-311] analyzing module AESL_automem_C_V
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/matrix_mul.autotb.v" into library xil_defaultlib
INFO: [VRFC 10-311] analyzing module apatb_matrix_mul_top
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/matrix_mul.v" into library xil_defaultlib
INFO: [VRFC 10-311] analyzing module matrix_mul
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/matrix_mul_mac_mubkb.v" into library xil_defaultlib
INFO: [VRFC 10-311] analyzing module matrix_mul_mac_mubkb_DSP48_0
INFO: [VRFC 10-311] analyzing module matrix_mul_mac_mubkb
Starting static elaboration
Completed static elaboration
Starting simulation data flow analysis
Completed simulation data flow analysis
Time Resolution for simulation is 1ps
Compiling module xil_defaultlib.matrix_mul_mac_mubkb_DSP48_0
Compiling module xil_defaultlib.matrix_mul_mac_mubkb(ID=1,NUM_ST...
Compiling module xil_defaultlib.matrix_mul
Compiling module xil_defaultlib.AESL_automem_A_V
Compiling module xil_defaultlib.AESL_automem_B_V
Compiling module xil_defaultlib.AESL_automem_C_V
Compiling module xil_defaultlib.apatb_matrix_mul_top
Compiling module work.glbl
Built simulation snapshot matrix_mul

****** Webtalk v2019.1 (64-bit)
  **** SW Build 2552052 on Fri May 24 14:49:42 MDT 2019
  **** IP Build 2548770 on Fri May 24 18:01:18 MDT 2019
    ** Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.

source D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/xsim.dir/matrix_mul/webtalk/xsim_webtalk.tcl -notrace
INFO: [Common 17-186] 'D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/xsim.dir/matrix_mul/webtalk/usage_statistics_ext_xsim.xml' has been successfully sent to Xilinx on Fri Mar 12 17:51:17 2021. For additional details about this file, please refer to the WebTalk help file at D:/Xilinx201901/Vivado/2019.1/doc/webtalk_introduction.html.
INFO: [Common 17-206] Exiting Webtalk at Fri Mar 12 17:51:17 2021...

****** xsim v2019.1 (64-bit)
  **** SW Build 2552052 on Fri May 24 14:49:42 MDT 2019
  **** IP Build 2548770 on Fri May 24 18:01:18 MDT 2019
    ** Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.

source xsim.dir/matrix_mul/xsim_script.tcl
# xsim {matrix_mul} -autoloadwcfg -tclbatch {matrix_mul.tcl}
Vivado Simulator 2019.1
Time resolution is 1 ps
source matrix_mul.tcl
## log_wave -r /
WARNING: [Simtcl 6-197] One or more HDL objects could not be logged because of object type or size limitations.  To see details please rerun the command with -verbose (-v).
## set designtopgroup [add_wave_group "Design Top Signals"]
## set coutputgroup [add_wave_group "C Outputs" -into $designtopgroup]
## set C_group [add_wave_group C(memory) -into $coutputgroup]
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/C_V_d0 -into $C_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/C_V_we0 -into $C_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/C_V_ce0 -into $C_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/C_V_address0 -into $C_group -radix hex
## set cinputgroup [add_wave_group "C Inputs" -into $designtopgroup]
## set B_group [add_wave_group B(memory) -into $cinputgroup]
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/B_V_q1 -into $B_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/B_V_ce1 -into $B_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/B_V_address1 -into $B_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/B_V_q0 -into $B_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/B_V_ce0 -into $B_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/B_V_address0 -into $B_group -radix hex
## set A_group [add_wave_group A(memory) -into $cinputgroup]
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/A_V_q1 -into $A_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/A_V_ce1 -into $A_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/A_V_address1 -into $A_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/A_V_q0 -into $A_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/A_V_ce0 -into $A_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/A_V_address0 -into $A_group -radix hex
## set blocksiggroup [add_wave_group "Block-level IO Handshake" -into $designtopgroup]
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/ap_start -into $blocksiggroup
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/ap_done -into $blocksiggroup
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/ap_idle -into $blocksiggroup
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/ap_ready -into $blocksiggroup
## set resetgroup [add_wave_group "Reset" -into $designtopgroup]
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/ap_rst -into $resetgroup
## set clockgroup [add_wave_group "Clock" -into $designtopgroup]
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/ap_clk -into $clockgroup
## set testbenchgroup [add_wave_group "Test Bench Signals"]
## set tbinternalsiggroup [add_wave_group "Internal Signals" -into $testbenchgroup]
## set tb_simstatus_group [add_wave_group "Simulation Status" -into $tbinternalsiggroup]
## set tb_portdepth_group [add_wave_group "Port Depth" -into $tbinternalsiggroup]
## add_wave /apatb_matrix_mul_top/AUTOTB_TRANSACTION_NUM -into $tb_simstatus_group -radix hex
## add_wave /apatb_matrix_mul_top/ready_cnt -into $tb_simstatus_group -radix hex
## add_wave /apatb_matrix_mul_top/done_cnt -into $tb_simstatus_group -radix hex
## add_wave /apatb_matrix_mul_top/LENGTH_A_V -into $tb_portdepth_group -radix hex
## add_wave /apatb_matrix_mul_top/LENGTH_B_V -into $tb_portdepth_group -radix hex
## add_wave /apatb_matrix_mul_top/LENGTH_C_V -into $tb_portdepth_group -radix hex
## set tbcoutputgroup [add_wave_group "C Outputs" -into $testbenchgroup]
## set tb_C_group [add_wave_group C(memory) -into $tbcoutputgroup]
## add_wave /apatb_matrix_mul_top/C_V_d0 -into $tb_C_group -radix hex
## add_wave /apatb_matrix_mul_top/C_V_we0 -into $tb_C_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/C_V_ce0 -into $tb_C_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/C_V_address0 -into $tb_C_group -radix hex
## set tbcinputgroup [add_wave_group "C Inputs" -into $testbenchgroup]
## set tb_B_group [add_wave_group B(memory) -into $tbcinputgroup]
## add_wave /apatb_matrix_mul_top/B_V_q1 -into $tb_B_group -radix hex
## add_wave /apatb_matrix_mul_top/B_V_ce1 -into $tb_B_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/B_V_address1 -into $tb_B_group -radix hex
## add_wave /apatb_matrix_mul_top/B_V_q0 -into $tb_B_group -radix hex
## add_wave /apatb_matrix_mul_top/B_V_ce0 -into $tb_B_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/B_V_address0 -into $tb_B_group -radix hex
## set tb_A_group [add_wave_group A(memory) -into $tbcinputgroup]
## add_wave /apatb_matrix_mul_top/A_V_q1 -into $tb_A_group -radix hex
## add_wave /apatb_matrix_mul_top/A_V_ce1 -into $tb_A_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/A_V_address1 -into $tb_A_group -radix hex
## add_wave /apatb_matrix_mul_top/A_V_q0 -into $tb_A_group -radix hex
## add_wave /apatb_matrix_mul_top/A_V_ce0 -into $tb_A_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/A_V_address0 -into $tb_A_group -radix hex
## save_wave_config matrix_mul.wcfg
## run all

// Inter-Transaction Progress: Completed Transaction / Total Transaction
// Intra-Transaction Progress: Measured Latency / Latency Estimation * 100%
//
// RTL Simulation : "Inter-Transaction Progress" ["Intra-Transaction Progress"] @ "Simulation Time"

// RTL Simulation : 0 / 1 [0.00%] @ "165000"
// RTL Simulation : 1 / 1 [100.00%] @ "1245000"

$finish called at time : 1365 ns : File "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/matrix_mul.autotb.v" Line 334
## quit
INFO: [Common 17-206] Exiting xsim at Fri Mar 12 17:51:29 2021...
INFO: [COSIM 212-316] Starting C post checking ...
C[0,0]56
C[0,1]62
C[0,2]68
C[0,3]74
C[1,0]152
C[1,1]174
C[1,2]196
C[1,3]218
C[2,0]248
C[2,1]286
C[2,2]324
C[2,3]362
C[3,0]344
C[3,1]398
C[3,2]452
C[3,3]506
INFO: [COSIM 212-1000] *** C/RTL co-simulation finished: PASS ***
INFO: [COSIM 212-211] II is measurable only when transaction number is greater than 1 in RTL simulation. Otherwise, they will be marked as all NA. If user wants to calculate them, please make sure there are at least 2 transactions in RTL simulation.
Finished C/RTL cosimulation.

Open Wave Viewer

在这里插入图片描述
软件自动打开vivado
在这里插入图片描述
波形曲线太多,这里只选择端口的波形
在这里插入图片描述
按住shif点击头尾两个信号,全选,点击右键,add to wave window
在这里插入图片描述
可以看到:
在这里插入图片描述
点击float,波形图全屏
在这里插入图片描述
在这里插入图片描述
(教程说是start和done各占一个周期,32+2共34个周期)
在这里插入图片描述
除了第一个C需要三个周期才出结果,之后的C每两个周期就出结果。

使用单端口ram实现1个周期就计算出一个C。下面添加ARRAY_PARTITION约束实现单端口ram,A 矩阵沿着第2维切割,B矩阵沿着第1维切割。
在这里插入图片描述

(3)ARRAY_PARTITION

添加ARRAY_PARTITION约束
在这里插入图片描述
在这里插入图片描述
注意重新跑C/RTL Cosimulation时需要把之前的vivado关掉,因为新文件会覆盖之前的文件,否则会报错。

修改代码后再次Run Synthesis

运行记录

Starting C synthesis ...
D:/Xilinx201901/Vivado/2019.1/bin/vivado_hls.bat D:/xxxx/hls/matrix/matrix/solution1/csynth.tcl
INFO: [HLS 200-10] Running 'D:/Xilinx201901/Vivado/2019.1/bin/unwrapped/win64.o/vivado_hls.exe'
INFO: [HLS 200-10] For user 'summer' on host 'desktop-dck1feh' (Windows NT_amd64 version 6.2) on Fri Mar 12 20:31:28 +0800 2021
INFO: [HLS 200-10] In directory 'D:/xxxx/hls/matrix'
Sourcing Tcl script 'D:/xxxx/hls/matrix/matrix/solution1/csynth.tcl'
INFO: [HLS 200-10] Opening project 'D:/xxxx/hls/matrix/matrix'.
INFO: [HLS 200-10] Adding design file 'matrix/matrix_mul.h' to the project
INFO: [HLS 200-10] Adding design file 'matrix/matrix_mul.cpp' to the project
INFO: [HLS 200-10] Adding test bench file 'matrix/main.cpp' to the project
INFO: [HLS 200-10] Opening solution 'D:/xxxx/hls/matrix/matrix/solution1'.
INFO: [SYN 201-201] Setting up clock 'default' with a period of 30ns.
INFO: [HLS 200-10] Setting target device to 'xc7z020-clg400-1'
INFO: [SYN 201-201] Setting up clock 'default' with a period of 30ns.
INFO: [SCHED 204-61] Option 'relax_ii_for_timing' is enabled, will increase II to preserve clock frequency constraints.
INFO: [HLS 200-10] Analyzing design file 'matrix/matrix_mul.cpp' ... 
INFO: [HLS 200-111] Finished Linking Time (s): cpu = 00:00:04 ; elapsed = 00:00:29 . Memory (MB): peak = 178.824 ; gain = 87.109
INFO: [HLS 200-111] Finished Checking Pragmas Time (s): cpu = 00:00:04 ; elapsed = 00:00:29 . Memory (MB): peak = 178.824 ; gain = 87.109
INFO: [HLS 200-10] Starting code transformations ...
INFO: [HLS 200-111] Finished Standard Transforms Time (s): cpu = 00:00:04 ; elapsed = 00:00:31 . Memory (MB): peak = 178.824 ; gain = 87.109
INFO: [HLS 200-10] Checking synthesizability ...
INFO: [HLS 200-111] Finished Checking Synthesizability Time (s): cpu = 00:00:04 ; elapsed = 00:00:31 . Memory (MB): peak = 178.824 ; gain = 87.109
INFO: [XFORM 203-502] Unrolling all sub-loops inside loop 'Loop-1.1' (matrix/matrix_mul.cpp:8) in function 'matrix_mul' for pipelining.
INFO: [HLS 200-489] Unrolling loop 'Loop-1.1.1' (matrix/matrix_mul.cpp:12) in function 'matrix_mul' completely with a factor of 4.
INFO: [XFORM 203-101] Partitioning array 'A.V' (matrix/matrix_mul.cpp:2) in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'B.V' (matrix/matrix_mul.cpp:2) in dimension 1 completely.
INFO: [XFORM 203-11] Balancing expressions in function 'matrix_mul' (matrix/matrix_mul.cpp:2)...3 expression(s) balanced.
INFO: [HLS 200-111] Finished Pre-synthesis Time (s): cpu = 00:00:04 ; elapsed = 00:00:31 . Memory (MB): peak = 178.824 ; gain = 87.109
INFO: [XFORM 203-541] Flattening a loop nest 'Loop-1' (matrix/matrix_mul.cpp:6:13) in function 'matrix_mul'.
INFO: [HLS 200-111] Finished Architecture Synthesis Time (s): cpu = 00:00:04 ; elapsed = 00:00:31 . Memory (MB): peak = 178.824 ; gain = 87.109
INFO: [HLS 200-10] Starting hardware synthesis ...
INFO: [HLS 200-10] Synthesizing 'matrix_mul' ...
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-42] -- Implementing module 'matrix_mul' 
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [SCHED 204-11] Starting scheduling ...
INFO: [SCHED 204-61] Pipelining loop 'Loop 1'.
INFO: [SCHED 204-61] Pipelining result : Target II = 1, Final II = 1, Depth = 2.
INFO: [SCHED 204-11] Finished scheduling.
INFO: [HLS 200-111]  Elapsed time: 31.549 seconds; current allocated memory: 100.991 MB.
INFO: [BIND 205-100] Starting micro-architecture generation ...
INFO: [BIND 205-101] Performing variable lifetime analysis.
INFO: [BIND 205-101] Exploring resource sharing.
INFO: [BIND 205-101] Binding ...
INFO: [BIND 205-100] Finished micro-architecture generation.
INFO: [HLS 200-111]  Elapsed time: 0.195 seconds; current allocated memory: 101.240 MB.
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-10] -- Generating RTL for module 'matrix_mul' 
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [RTGEN 206-500] Setting interface mode on port 'matrix_mul/A_0_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'matrix_mul/A_1_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'matrix_mul/A_2_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'matrix_mul/A_3_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'matrix_mul/B_0_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'matrix_mul/B_1_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'matrix_mul/B_2_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'matrix_mul/B_3_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'matrix_mul/C_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on function 'matrix_mul' to 'ap_ctrl_hs'.
INFO: [SYN 201-210] Renamed object name 'matrix_mul_mac_muladd_8s_8s_16ns_16_1_1' to 'matrix_mul_mac_mubkb' due to the length limit 20
INFO: [RTGEN 206-100] Generating core module 'matrix_mul_mac_mubkb': 2 instance(s).
INFO: [RTGEN 206-100] Finished creating RTL model for 'matrix_mul'.
INFO: [HLS 200-111]  Elapsed time: 0.335 seconds; current allocated memory: 101.667 MB.
INFO: [HLS 200-111] Finished generating all RTL models Time (s): cpu = 00:00:06 ; elapsed = 00:00:34 . Memory (MB): peak = 178.824 ; gain = 87.109
INFO: [VHDL 208-304] Generating VHDL RTL for matrix_mul.
INFO: [VLOG 209-307] Generating Verilog RTL for matrix_mul.
INFO: [HLS 200-112] Total elapsed time: 34.524 seconds; peak allocated memory: 101.667 MB.
Finished C synthesis.

synthesis结果:
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

Run C/RTL Cosimulation

运行过程

Starting C/RTL cosimulation ...
D:/Xilinx201901/Vivado/2019.1/bin/vivado_hls.bat D:/xxxx/hls/matrix/matrix/solution1/cosim.tcl
INFO: [HLS 200-10] Running 'D:/Xilinx201901/Vivado/2019.1/bin/unwrapped/win64.o/vivado_hls.exe'
INFO: [HLS 200-10] For user 'summer' on host 'desktop-dck1feh' (Windows NT_amd64 version 6.2) on Fri Mar 12 21:04:07 +0800 2021
INFO: [HLS 200-10] In directory 'D:/xxxx/hls/matrix'
Sourcing Tcl script 'D:/xxxx/hls/matrix/matrix/solution1/cosim.tcl'
INFO: [HLS 200-10] Opening project 'D:/xxxx/hls/matrix/matrix'.
INFO: [HLS 200-10] Opening solution 'D:/xxxx/hls/matrix/matrix/solution1'.
INFO: [SYN 201-201] Setting up clock 'default' with a period of 30ns.
INFO: [HLS 200-10] Setting target device to 'xc7z020-clg400-1'
INFO: [COSIM 212-47] Using XSIM for RTL simulation.
INFO: [COSIM 212-14] Instrumenting C test bench ...
   Build using "D:/Xilinx201901/Vivado/2019.1/msys64/mingw64/bin/g++"
   Compiling matrix_mul.cpp_pre.cpp.tb.cpp
   Compiling main.cpp_pre.cpp.tb.cpp
   Compiling apatb_matrix_mul.cpp
   Generating cosim.tv.exe
INFO: [COSIM 212-302] Starting C TB testing ... 
C[0,0]56
C[0,1]62
C[0,2]68
C[0,3]74
C[1,0]152
C[1,1]174
C[1,2]196
C[1,3]218
C[2,0]248
C[2,1]286
C[2,2]324
C[2,3]362
C[3,0]344
C[3,1]398
C[3,2]452
C[3,3]506
INFO: [COSIM 212-333] Generating C post check test bench ...
INFO: [COSIM 212-12] Generating RTL test bench ...
INFO: [COSIM 212-1] *** C/RTL co-simulation file generation completed. ***
INFO: [COSIM 212-323] Starting verilog simulation. 
INFO: [COSIM 212-15] Starting XSIM ...

D:\xxxx\hls\matrix\matrix\solution1\sim\verilog>set PATH= 

D:\xxxx\hls\matrix\matrix\solution1\sim\verilog>call D:/Xilinx201901/Vivado/2019.1/bin/xelab xil_defaultlib.apatb_matrix_mul_top glbl -prj matrix_mul.prj -L smartconnect_v1_0 -L axi_protocol_checker_v1_1_12 -L axi_protocol_checker_v1_1_13 -L axis_protocol_checker_v1_1_11 -L axis_protocol_checker_v1_1_12 -L xil_defaultlib -L unisims_ver -L xpm --initfile "D:/Xilinx201901/Vivado/2019.1/data/xsim/ip/xsim_ip.ini" --lib "ieee_proposed=./ieee_proposed" -s matrix_mul -debug wave 
INFO: [XSIM 43-3496] Using init file passed via -initfile option "D:/Xilinx201901/Vivado/2019.1/data/xsim/ip/xsim_ip.ini".
Vivado Simulator 2019.1
Copyright 1986-1999, 2001-2019 Xilinx, Inc. All Rights Reserved.
Running: D:/Xilinx201901/Vivado/2019.1/bin/unwrapped/win64.o/xelab.exe xil_defaultlib.apatb_matrix_mul_top glbl -prj matrix_mul.prj -L smartconnect_v1_0 -L axi_protocol_checker_v1_1_12 -L axi_protocol_checker_v1_1_13 -L axis_protocol_checker_v1_1_11 -L axis_protocol_checker_v1_1_12 -L xil_defaultlib -L unisims_ver -L xpm --initfile D:/Xilinx201901/Vivado/2019.1/data/xsim/ip/xsim_ip.ini --lib ieee_proposed=./ieee_proposed -s matrix_mul -debug wave 
Multi-threading is on. Using 6 slave threads.
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/glbl.v" into library work
INFO: [VRFC 10-311] analyzing module glbl
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/AESL_automem_A_0_V.v" into library xil_defaultlib
INFO: [VRFC 10-311] analyzing module AESL_automem_A_0_V
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/AESL_automem_A_1_V.v" into library xil_defaultlib
INFO: [VRFC 10-311] analyzing module AESL_automem_A_1_V
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/AESL_automem_A_2_V.v" into library xil_defaultlib
INFO: [VRFC 10-311] analyzing module AESL_automem_A_2_V
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/AESL_automem_A_3_V.v" into library xil_defaultlib
INFO: [VRFC 10-311] analyzing module AESL_automem_A_3_V
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/AESL_automem_B_0_V.v" into library xil_defaultlib
INFO: [VRFC 10-311] analyzing module AESL_automem_B_0_V
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/AESL_automem_B_1_V.v" into library xil_defaultlib
INFO: [VRFC 10-311] analyzing module AESL_automem_B_1_V
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/AESL_automem_B_2_V.v" into library xil_defaultlib
INFO: [VRFC 10-311] analyzing module AESL_automem_B_2_V
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/AESL_automem_B_3_V.v" into library xil_defaultlib
INFO: [VRFC 10-311] analyzing module AESL_automem_B_3_V
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/AESL_automem_C_V.v" into library xil_defaultlib
INFO: [VRFC 10-311] analyzing module AESL_automem_C_V
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/matrix_mul.autotb.v" into library xil_defaultlib
INFO: [VRFC 10-311] analyzing module apatb_matrix_mul_top
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/matrix_mul.v" into library xil_defaultlib
INFO: [VRFC 10-311] analyzing module matrix_mul
INFO: [VRFC 10-2263] Analyzing SystemVerilog file "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/matrix_mul_mac_mubkb.v" into library xil_defaultlib
INFO: [VRFC 10-311] analyzing module matrix_mul_mac_mubkb_DSP48_0
INFO: [VRFC 10-311] analyzing module matrix_mul_mac_mubkb
Starting static elaboration
Completed static elaboration
Starting simulation data flow analysis
Completed simulation data flow analysis
Time Resolution for simulation is 1ps
Compiling module xil_defaultlib.matrix_mul_mac_mubkb_DSP48_0
Compiling module xil_defaultlib.matrix_mul_mac_mubkb(ID=1,NUM_ST...
Compiling module xil_defaultlib.matrix_mul
Compiling module xil_defaultlib.AESL_automem_A_0_V
Compiling module xil_defaultlib.AESL_automem_A_1_V
Compiling module xil_defaultlib.AESL_automem_A_2_V
Compiling module xil_defaultlib.AESL_automem_A_3_V
Compiling module xil_defaultlib.AESL_automem_B_0_V
Compiling module xil_defaultlib.AESL_automem_B_1_V
Compiling module xil_defaultlib.AESL_automem_B_2_V
Compiling module xil_defaultlib.AESL_automem_B_3_V
Compiling module xil_defaultlib.AESL_automem_C_V
Compiling module xil_defaultlib.apatb_matrix_mul_top
Compiling module work.glbl
Built simulation snapshot matrix_mul

****** Webtalk v2019.1 (64-bit)
  **** SW Build 2552052 on Fri May 24 14:49:42 MDT 2019
  **** IP Build 2548770 on Fri May 24 18:01:18 MDT 2019
    ** Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.

source D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/xsim.dir/matrix_mul/webtalk/xsim_webtalk.tcl -notrace
INFO: [Common 17-186] 'D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/xsim.dir/matrix_mul/webtalk/usage_statistics_ext_xsim.xml' has been successfully sent to Xilinx on Fri Mar 12 21:04:47 2021. For additional details about this file, please refer to the WebTalk help file at D:/Xilinx201901/Vivado/2019.1/doc/webtalk_introduction.html.
webtalk_transmit: Time (s): cpu = 00:00:00 ; elapsed = 00:00:06 . Memory (MB): peak = 109.488 ; gain = 17.836
INFO: [Common 17-206] Exiting Webtalk at Fri Mar 12 21:04:47 2021...

****** xsim v2019.1 (64-bit)
  **** SW Build 2552052 on Fri May 24 14:49:42 MDT 2019
  **** IP Build 2548770 on Fri May 24 18:01:18 MDT 2019
    ** Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.

source xsim.dir/matrix_mul/xsim_script.tcl
# xsim {matrix_mul} -autoloadwcfg -tclbatch {matrix_mul.tcl}
Vivado Simulator 2019.1
Time resolution is 1 ps
source matrix_mul.tcl
## log_wave -r /
WARNING: [Simtcl 6-197] One or more HDL objects could not be logged because of object type or size limitations.  To see details please rerun the command with -verbose (-v).
## set designtopgroup [add_wave_group "Design Top Signals"]
## set coutputgroup [add_wave_group "C Outputs" -into $designtopgroup]
## set C_group [add_wave_group C(memory) -into $coutputgroup]
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/C_V_d0 -into $C_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/C_V_we0 -into $C_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/C_V_ce0 -into $C_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/C_V_address0 -into $C_group -radix hex
## set cinputgroup [add_wave_group "C Inputs" -into $designtopgroup]
## set B_group [add_wave_group B(memory) -into $cinputgroup]
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/B_3_V_q0 -into $B_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/B_3_V_ce0 -into $B_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/B_3_V_address0 -into $B_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/B_2_V_q0 -into $B_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/B_2_V_ce0 -into $B_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/B_2_V_address0 -into $B_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/B_1_V_q0 -into $B_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/B_1_V_ce0 -into $B_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/B_1_V_address0 -into $B_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/B_0_V_q0 -into $B_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/B_0_V_ce0 -into $B_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/B_0_V_address0 -into $B_group -radix hex
## set A_group [add_wave_group A(memory) -into $cinputgroup]
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/A_3_V_q0 -into $A_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/A_3_V_ce0 -into $A_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/A_3_V_address0 -into $A_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/A_2_V_q0 -into $A_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/A_2_V_ce0 -into $A_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/A_2_V_address0 -into $A_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/A_1_V_q0 -into $A_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/A_1_V_ce0 -into $A_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/A_1_V_address0 -into $A_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/A_0_V_q0 -into $A_group -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/A_0_V_ce0 -into $A_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/A_0_V_address0 -into $A_group -radix hex
## set blocksiggroup [add_wave_group "Block-level IO Handshake" -into $designtopgroup]
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/ap_start -into $blocksiggroup
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/ap_done -into $blocksiggroup
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/ap_idle -into $blocksiggroup
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/ap_ready -into $blocksiggroup
## set resetgroup [add_wave_group "Reset" -into $designtopgroup]
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/ap_rst -into $resetgroup
## set clockgroup [add_wave_group "Clock" -into $designtopgroup]
## add_wave /apatb_matrix_mul_top/AESL_inst_matrix_mul/ap_clk -into $clockgroup
## set testbenchgroup [add_wave_group "Test Bench Signals"]
## set tbinternalsiggroup [add_wave_group "Internal Signals" -into $testbenchgroup]
## set tb_simstatus_group [add_wave_group "Simulation Status" -into $tbinternalsiggroup]
## set tb_portdepth_group [add_wave_group "Port Depth" -into $tbinternalsiggroup]
## add_wave /apatb_matrix_mul_top/AUTOTB_TRANSACTION_NUM -into $tb_simstatus_group -radix hex
## add_wave /apatb_matrix_mul_top/ready_cnt -into $tb_simstatus_group -radix hex
## add_wave /apatb_matrix_mul_top/done_cnt -into $tb_simstatus_group -radix hex
## add_wave /apatb_matrix_mul_top/LENGTH_A_0_V -into $tb_portdepth_group -radix hex
## add_wave /apatb_matrix_mul_top/LENGTH_A_1_V -into $tb_portdepth_group -radix hex
## add_wave /apatb_matrix_mul_top/LENGTH_A_2_V -into $tb_portdepth_group -radix hex
## add_wave /apatb_matrix_mul_top/LENGTH_A_3_V -into $tb_portdepth_group -radix hex
## add_wave /apatb_matrix_mul_top/LENGTH_B_0_V -into $tb_portdepth_group -radix hex
## add_wave /apatb_matrix_mul_top/LENGTH_B_1_V -into $tb_portdepth_group -radix hex
## add_wave /apatb_matrix_mul_top/LENGTH_B_2_V -into $tb_portdepth_group -radix hex
## add_wave /apatb_matrix_mul_top/LENGTH_B_3_V -into $tb_portdepth_group -radix hex
## add_wave /apatb_matrix_mul_top/LENGTH_C_V -into $tb_portdepth_group -radix hex
## set tbcoutputgroup [add_wave_group "C Outputs" -into $testbenchgroup]
## set tb_C_group [add_wave_group C(memory) -into $tbcoutputgroup]
## add_wave /apatb_matrix_mul_top/C_V_d0 -into $tb_C_group -radix hex
## add_wave /apatb_matrix_mul_top/C_V_we0 -into $tb_C_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/C_V_ce0 -into $tb_C_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/C_V_address0 -into $tb_C_group -radix hex
## set tbcinputgroup [add_wave_group "C Inputs" -into $testbenchgroup]
## set tb_B_group [add_wave_group B(memory) -into $tbcinputgroup]
## add_wave /apatb_matrix_mul_top/B_3_V_q0 -into $tb_B_group -radix hex
## add_wave /apatb_matrix_mul_top/B_3_V_ce0 -into $tb_B_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/B_3_V_address0 -into $tb_B_group -radix hex
## add_wave /apatb_matrix_mul_top/B_2_V_q0 -into $tb_B_group -radix hex
## add_wave /apatb_matrix_mul_top/B_2_V_ce0 -into $tb_B_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/B_2_V_address0 -into $tb_B_group -radix hex
## add_wave /apatb_matrix_mul_top/B_1_V_q0 -into $tb_B_group -radix hex
## add_wave /apatb_matrix_mul_top/B_1_V_ce0 -into $tb_B_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/B_1_V_address0 -into $tb_B_group -radix hex
## add_wave /apatb_matrix_mul_top/B_0_V_q0 -into $tb_B_group -radix hex
## add_wave /apatb_matrix_mul_top/B_0_V_ce0 -into $tb_B_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/B_0_V_address0 -into $tb_B_group -radix hex
## set tb_A_group [add_wave_group A(memory) -into $tbcinputgroup]
## add_wave /apatb_matrix_mul_top/A_3_V_q0 -into $tb_A_group -radix hex
## add_wave /apatb_matrix_mul_top/A_3_V_ce0 -into $tb_A_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/A_3_V_address0 -into $tb_A_group -radix hex
## add_wave /apatb_matrix_mul_top/A_2_V_q0 -into $tb_A_group -radix hex
## add_wave /apatb_matrix_mul_top/A_2_V_ce0 -into $tb_A_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/A_2_V_address0 -into $tb_A_group -radix hex
## add_wave /apatb_matrix_mul_top/A_1_V_q0 -into $tb_A_group -radix hex
## add_wave /apatb_matrix_mul_top/A_1_V_ce0 -into $tb_A_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/A_1_V_address0 -into $tb_A_group -radix hex
## add_wave /apatb_matrix_mul_top/A_0_V_q0 -into $tb_A_group -radix hex
## add_wave /apatb_matrix_mul_top/A_0_V_ce0 -into $tb_A_group -color #ffff00 -radix hex
## add_wave /apatb_matrix_mul_top/A_0_V_address0 -into $tb_A_group -radix hex
## save_wave_config matrix_mul.wcfg
## run all

// Inter-Transaction Progress: Completed Transaction / Total Transaction
// Intra-Transaction Progress: Measured Latency / Latency Estimation * 100%
//
// RTL Simulation : "Inter-Transaction Progress" ["Intra-Transaction Progress"] @ "Simulation Time"

// RTL Simulation : 0 / 1 [0.00%] @ "165000"
// RTL Simulation : 1 / 1 [100.00%] @ "765000"

$finish called at time : 885 ns : File "D:/xxxx/hls/matrix/matrix/solution1/sim/verilog/matrix_mul.autotb.v" Line 622
## quit
INFO: [Common 17-206] Exiting xsim at Fri Mar 12 21:04:58 2021...
INFO: [COSIM 212-316] Starting C post checking ...
C[0,0]56
C[0,1]62
C[0,2]68
C[0,3]74
C[1,0]152
C[1,1]174
C[1,2]196
C[1,3]218
C[2,0]248
C[2,1]286
C[2,2]324
C[2,3]362
C[3,0]344
C[3,1]398
C[3,2]452
C[3,3]506
INFO: [COSIM 212-1000] *** C/RTL co-simulation finished: PASS ***
INFO: [COSIM 212-211] II is measurable only when transaction number is greater than 1 in RTL simulation. Otherwise, they will be marked as all NA. If user wants to calculate them, please make sure there are at least 2 transactions in RTL simulation.
Finished C/RTL cosimulation.

wave波形
在这里插入图片描述
计算周期已经从169->34->18,达到了我们的要求。

不优化、只加PIPELINE优化、PIPELINE与ARRAY_PARTITION共同优化,资源对比

下图依次为【不优化】、【只加PIPELINE优化】、【PIPELINE与ARRAY_PARTITION共同优化】
在这里插入图片描述

(4)ARRAY_RESHAPE

添加约束文件

在这里插入图片描述
在这里插入图片描述
添加完ARRAY_RESHAPE,是这样
在这里插入图片描述

Run Synthesis

结果
在这里插入图片描述

reshape相比partition无法做到的点

在这里插入图片描述
在这里插入图片描述

在这里插入图片描述

Run C/RTL Cosimulation

在这里插入图片描述
也达到了我们的要求
在这里插入图片描述

(5)Latency测试

在这里插入图片描述
在这里插入图片描述
测试Latency结果,原来不加latency时,延时是1,将latency改为10后,可见,计算用时从18变成了27,比原来多了9,符合预期。
在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/weixin_40162095/article/details/114695910
HLS