The road to tcl learning (4) (vivado design analysis)

1. Objects in FPGA chip architecture

  With elaborated/synthesied/implemented turned on, you can use the following command to obtain the desired SLICE. SLICE is divided into SLICEL and SLICEM, consisting of LUT, FF, MUX, and CARRY.

set all_slice [get_sites SLICE*]
set col_slice [get_sites SLICEX0Y*]
set all_sliceL [get_sites -filter "SITE_TYPE == SLICEL"]
set all_sliceM [get_sites -filter "SITE_TYPE == SLICEM"]
#资源的个数可用llength来查看
llength $all_slice

  BEL (Basic Element) is the basic unit inside FPGA and belongs to the device object, which is part of the device structure. In other words, even if it is an empty design, you can still see BEL as long as you open the Device view. Specifically, BEL includes flip-flops, lookup tables, carry chains, F7MUX, F8MUX and F9MUX (taking the UltraScale series of chips as an example, it is not difficult to see that these basic units are all within SLICE). BEL also includes the basic units inside the DSP. Use get_bels to obtain bel resources.

get_bels -of [get_sites SLICE_X0Y0]
#这里=~表示匹配
get_bels -of [get_sites SLICE_X0Y0] -filter "TYPE =~ *6LUT || TYPE =~ *FF"

  Each SLICE is a basic site. In addition to SLICE, there are also DSP48, BLOCK RAM and other sites. One or more sites of the same type can form a tile.

get_tiles

  Different tiles are arranged in columns to form a clock region.

get_clock_regions

  SLR (super logic region) consists of multiple clock regions. Single-die (the die before the chip is packaged) chips only contain one SLR, while multi-die chips, also known as SSI devices, contain more than two SLRs.

get_slrs

  For the obtained objects, you can highlight them through highlight_objects, present them in a separate window through show_objects, and mark them through mark_objects. Add get_ in front of these commands to obtain the object name, and add un to cancel the command result.

2. Objects in the netlist

  When performing design analysis, design debugging, or describing constraints, you must look for the objects described by the RTL code, such as registers, storage units, computing units, or a certain clock or pin. A certain network cable, a certain port, etc.
  The five most critical elements in the netlist are cell, clock, pin, network cable net, and port part. You can use the following command to obtain

get_cells
get_clocks
get_pins
get_nets
get_ports
#current_instance可以设置顶层,如果后面不跟任何参数,那么就将设计顶层模块视为顶层
current_instance 

  Use -hier to find target objects layer by layer.

set dut [get_cells -hier ip_*]
#可以新打开一个窗口看到你查找的返回结果
show_objects $dut -name dut
#还可以通过对象特定属性进行查找,比如NAME
set ip [get_cells -hier -filter "NAME =~ ip*" U0]

  For specific attributes, another important one is REF_NAME (reference name). The following introduces the reference name of 7 series FPGA.

同步时钟使能异步复位D触发器  FDCE
同步时钟使能异步置位D触发器  FDPE
同步时钟使能同步复位D触发器  FDRE
同步时钟使能同步置位D触发器  FDSE
异步复位锁存器              LDCE
异步置位锁存器              LDPE
使用DSP48构成的计算单元     DSP48E1
使用Block RAM构成的36KbFIFO      FIFO36E1
使用Block RAM构成的36Kb存储单元   RAMB36E1
使用Block RAM构成的18KbFIFO      FIFO18E1
使用Block RAM构成的18Kb存储单元   RAMB18E1

  Normally, when looking for pins or network cables, first obtain the unit to which the pins or network cables belong, and then use -of to find the pins or network cables.

3. Clock analysis

#生成时钟报告,可以看到时钟名称、时钟周期、占空比、时钟属性和时钟源
report_clocks
#生成时钟网络报告,可以查看哪个时钟遗漏了时钟周期约束,还可以检查到是否出现了BUFG级联的情形
report_clock_networks -name network_1
#生成时钟资源利用率报告
report_clock_utilization -name clkuti1
#精简版报告,关注点放在时钟树的源头上
report_clock_utilization -clock_roots_only -name clkuti1

  In 7 series FPGA, clock resources include global clock buffer (BUFGCTRL), regional clock buffer (BUFH/BUFR/BUFMR/BUFIO) and clock generation module (MMCM/PLL), etc. As shown in the picture
       Insert image description here

4. Timing analysis

  There are two ways to generate timing reports: one is to generate timing reports through the command report_timing or report_timing_summary; the other is to first use get_timing_paths to obtain specific timing paths, and then use report_timing to generate timing reports for these paths. The meaning of some options is given below

-from         时序路径的起点,可以是端口、引脚、单元或时钟
-to           时序路径的终点,可以是端口、引脚、单元或时钟
-through      时序路径穿过的节点,可以是引脚、单元或网线
-delay_type   时序分析的延迟类型,min代表分析保持时间,max代表分析建立时间,min_max代表两者都分析
-hold         等同min
-setup        等同max
-max_paths    待分析的时序路径的最大个数(最小值为1)
-nworst       以同一点作为终点的最糟糕的时序路径个数(默认值为1)
-slack_lesser_than   只分析时序裕量小于指定值的路径
-slack_greater_than  只分析时序裕量大于指定值的路径
-group        分析指定组的时序路径,可通过命令get_path_groups或group_path获取
-of_objects   指定时序路径对象,由get_timing_paths获取

5.Quality analysis

#在综合后或布局布线后使用
report_qor_assessment

  QoR Assessment Score was observed after use. The score range is 1-5. The higher the score, the easier the timing convergence is. If the score is less than or equal to 3, it means that the timing needs to be improved, and there is a high probability that subsequent operations will not be necessary.
          Insert image description here
  In the second part, you should pay attention to the Status. If it is REVIEW, it means that it will affect the timing closure to a great extent and needs to be solved.
          Insert image description here

#生成改善设计质量的建议报告
report_qor_suggesrions

  Executing the command report_qor_suggestions in the routed .dcp can obtain the recommended strategy based on machine learning, and the original implementation strategy must be Explore or Default.

6. Resource utilization analysis

#生成了资源利用分析
report_utilization -name util -file util.rpt
#可以保存到.xlsx文件中
report_utilization -name util -spreadsheet_file util_table.xlsx -spreadsheet_table "Hierarchy"

7. Logical series analysis

  The number of logic levels refers to the number of combinational logic gates between the starting unit and the end unit of the timing path. It is generally believed that the delay of a lookup table plus a network cable is 0.5ns. Methods to achieve timing closure include pipelining and retiming.

#用于分析逻辑级数
report_design_analysis -logic_level_distribution -logic_level_dist_paths 100 -min_level 10 -max_level 100  -name logiclecela

8. Complexity and congestion analysis

report_design_analysis -complexity -name cplx

The following interface can be drawn:
       Insert image description here
  We need to pay special attention to the three parameters of Rent, Average Fanout, and Total Instance.
  Rent reflects the degree of interconnection of the module. The higher the index, the heavier the interconnection. Heavier interconnects mean that the design consumes a lot of global routing resources, resulting in routing congestion.

Rent范围
0~0.65  正常
0.65~0.85  如果total instances超过了15000,则要格外注意
>0.85      如果total instances超过了15000,布局布线会失败
Average Fanout范围
<4  正常
4~5 布局可能会出现拥塞。如果是SSI器件,并且Total instances超过了100000,则很难将实际放在1个SLR内或分布到两个SLR内
>5   布局布线可能会失败

  Reasons for the high Rent index: high LUT6 utilization (which also leads to high fan-out), as well as Block RAM and DSP utilization. When Rent or Fout is high, OOC synthesis can be used for related modules to avoid tools from performing boundary optimization, thus reducing the usage of LUT6. You can also use modular synthesis technology and use the modular synthesis attribute LUT_COMBINING to prevent LUT integration and reduce the usage of LUT6.

#分析拥塞程度
report_design_analysis -congestion -name cong

  In the report, special attention needs to be paid to Type and Level

type                  产生原因
全局拥塞(Global)        较高的LUT6利用率,过多的控制集,不合理的位置约束
长线拥塞(Long)          较高的BRAM或DSP利用率,过多的跨die网线
短线拥塞(Short)         较高的MUXF或进位链利用率
level                QoR影响
小于等于4                 影响不大
5                        在布局布线时会遇到一些困难
6                        会遇到很多困难,编译时间显著增加
7                        会失败

9. Fan-out analysis

#-load_types可以显示负载类型
report_high_fanout_nets -load_types -name high
#-clock_regions可以显示负载在每个时钟区域的个数
report_high_fanout_nets -clock_regions -name high
#-fanout_greater_than和-fanout_lesser_than可以限定扇出值
report_high_fanout_nets -fanout_greater_than 1000 -fanout_lesser_than 2000 -name high
#可以使用-cells限定分析的单元,-max_nets可以限定分析网线的最大个数,-timing可以显示网线所在路径的时序信息
report_high_fanout_nets -cells cpuEngine -max_nets 4 -timing -name high
#假设网线reset_reg有18个引脚为数据信号,可以借助如下代码,找到这18个信号
set net [get_nets reset_reg]
set mypin [get_pins -of $net -filter "DIRECTION==IN" -leaf]
set target_pin [filter $mypin "REF_PIN_NAME != CLR && REF_PIN_NAME != R && REF_PIN_NAME != PRE && REF_PIN_NAME != S"]
show_objects $target_pin -name data_pin

10.UFDM analysis

  The full name "UltraFast Design Methodology" translates to ultra-fast design method. It is a design methodology proposed by Xilinx for Vivado, covering board-level planning, code style, timing constraints, timing closure, etc.

report_methodology -name ufdm_1

11. Cross-clock domain analysis

report_clock_interaction -delay_type min_max -name timing_1
#分析跨时钟域路径在HDL方面的问题以及在约束层面的问题Clock Domain Crossings
report_cdc -name cdc1

12. Constraint analysis

report_exceptions -name exceptions_1
#-coverage可以显示时序例外约束的覆盖率
report_exceptions -coverage -name exceptions_2
#-write_valid_exceptions生成设计中有效的时序例外约束 -write_merged_exceptions生成被合并的时序例外约束;都要与-file同时使用
report_exceptions -write_valid_exceptions -file ./valid_exceptions.rpt
report_exceptions -write_merged_exceptions -file ./valid_exceptions.rpt
#将有效的时序约束输出到指定的文件中,-exclude_physical可以排除物理约束
write_xdc -constraints VALID -exclude_physical ./valid_timing_constraiints.xdc

Guess you like

Origin blog.csdn.net/weixin_44126785/article/details/132100353
TCL