HLS optimization debugging skills

  1. Determine the optimization direction, analyze the dependencies between the data stream and the data, consider the resource and throughput, and basically determine the optimization goals that can be achieved;

  2. According to the HLS synth report, analyze the overall latency and its specific composition, and then start optimizing from the module that has the greatest impact on latency;

  3. For cases where the results of csim and cosim are inconsistent, the general reason is that the variables and arrays are not initialized, because the uninitialized variable csim will be initialized to 0, but cosim will not. Sometimes we have not initialized the variables, and according to the design process, it should be written first and then read, but the actual execution process is not like this, so pay special attention; another situation where the results may be inconsistent is the boundary overflow when operating the array Situation, so pay special attention.

  4. For csim pass and cosim hang problems, if dataflow is used in the design, it depends on whether the empty and full of the stream will cause a certain level of stream to not work properly. If it is not dataflow, then it is likely to be the problem mentioned above.

  5. For CModel optimization, the most important thing is to have the thinking of a hardware engineer. Here are some optimization tips:

    1. Optimize loop

      1. Reduce the level of circulation;

      2. Merging the same level of circulation;

    2. For read-before-write (sequence and number of times are not fixed) the same BRAM address, double resources can be used to release depenency;

    3. For array storage resources that are not used at the same time, they can be time-multiplexed;

    4. The pragma inline can reuse DSP resources as much as possible, but it will affect timing. Comprehensive consideration is required. In the case of multiplexing, mul is used as an example. The amount of mul used is the most mul under the same clock, which is the actual mul. Use number;

    5. During code design, the problems of incomplete initialization of BRAM and the unknown value of array index overflow must not be ignored, which may cause cosim errors;

    6. For two-dimensional and above arrays, the design should first consider whether it can reduce dimensional storage, and consider the real-time calculation of each dimension instead of storage;

    7. The mutuality of storage resources and latency, especially for the underlying loop, can reduce latency by storing reads instead of real-time calculations without greatly increasing storage resources;

    8. For the use of pragmas, because different HLS tool versions have certain differences, and different versions of optimization and synthesis strategies have certain differences, so even if the tool can automatically generate the expected II, resource, it is best to add a reasonable pragma .

  6. For more information, please refer to [HLS Development Collection] ( https://www.xilinx.com/support/documentation/sw_manuals/xilinx2018_3/ug902-vivado-high-level-synthesis.pdf ).

Published 10 original articles · won praise 0 · Views 6265

Guess you like

Origin blog.csdn.net/u010379248/article/details/101988933