Xilinx FIFO Generator needs to pay attention to the resource consumption of BRAMs

Xilinx FIFO Generator needs to pay attention to the resource consumption of BRAMs

Recommended articles

  1. Xilinx FIFO Generator needs to pay attention to RST reset
  2. Xilinx FIFO Generator needs to pay attention to Actual Depth
  3. Xilinx FIFO Generator needs to pay attention to asymmetric bit width
  4. Xilinx FIFO Generator needs to pay attention to the resource consumption of BRAMs

There is a reason

I thought that if I chose an FPGA with a 16M BRAM resource, I didn't need to consider the use of internal RAM resources. 16M is adequate.

Unexpectedly, I was careless.

I have read the official MEM manual ug473 7 Series FPGAs Memory Resources before , and I didn’t care too much about the internal RAM bit width WIDTH, depth DEPTHand usage BRAM资源.

It wasn't until the BRAM resource overflow was prompted during the comprehensive implementation that it suddenly occurred that the BRAM resource was used unreasonably. The prompt information is as follows:

[Place 30-640] Place Check:This design requires more BRAM36/FIFO cells than are avaliable in the target device. This Design requires 534 of such cell types but only 445 compatible sites are avaliable in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device.

Last time I saw this situation, too much data was loaded during ILA debugging , which caused BRAM resource overflow.

Block RAM Introduciton

According to the manual, the 7 series devices have even larger 36Kb block RAMs . Each 36Kb block RAM contains two independently controlled 18Kb RAMs.

image-20201229100729226

These 36Kb blocks can be cascaded to achieve deeper or wider storage with minimal timing loss.

Looking at the table, we can see that the 7K325T Block RAMs are divided into 7 columns, each column has 70, and there are a total of 445 36 Kb Block RAMs (does it feel that there are 45 fewer?).

image-20201229101557404

RAM in FIFO

There is also an introduction to the use of internal resources in the manual:

image-20201229102057869

Embedded dual-port or single-port RAM modules, ROM modules, synchronous FIFOs and data width converters are implemented using Xilinx CORE Generator™ block memory modules. The CORE Generator FIFO generator module can be used to generate a dual clock FIFO. Synchronous or asynchronous (dual clock) FIFO implementation does not require other CLB resources for FIFO control logic, because it uses dedicated hardware resources.

In other words, if the BRAM storage type is selected when using FIFO, the capacity of the FIFO also depends on the capacity of the on-chip BRAM;

The manual also gives different types of FIFO capacity, the relationship between depth and bit width:

image-20201229112803029

In Standard mode , 18Kb BRAM refers to a RAM block with a bit width of 18bit and a depth of 1K. Similarly, 36Kb BRAM refers to a RAM block with a bit width of 36bit and a depth of 1K.

As can be seen from the above table, the maximum bit width and depth of one 18Kb BRAM is 36bit * 512; the maximum bit width and depth of one 36Kb BRAM is 72bit * 512;

FIFO bit width and depth

If the bit width and depth of the user FIFO do not match the above table, the FIFO Generator will automatically calculate the number of BRAM required.

Examples are as follows:

  • If the bit width is 16bit and the depth is not greater than 1024, one 18K BRAM is required;
  • If the bit width is 32bit and the depth is not greater than 512, one 18K BRAM is required;

If the bit width is greater than or equal to 36bit, then 36K BRAM is required; that is, if the bit width is 40bit and the depth is not greater than 1024, one 36K BRAM is required;

If the bit width is very wide and the depth is very shallow , for example, the bit width is 256bit and the depth is 16;

At this time, 36K and 18K BRAM are needed to make up the bit width first. An 18K BRAM supports a bit width of 36bit or less; a 36K BRAM supports a bit width of 72bit or less.

72 * 4 = 288;

Therefore, at least 4 36K BRAMs are required, and the depth is fixed to a maximum of 512; the Summary in the FIFO Generator can also get the same BRAM resource allocation method:

image-20201229181334921

In this way, the original cached data capacity is 256 * 16 = 4 K; but now it is implemented with a capacity of 36 * 4 = 144 K.

This is the reason why the final BRAM is not enough;

If the bit width is 512bit and the depth is 16;

72 * 7 + 18 = 522 > 512 bit;

Therefore, at least 7 36K BRAM and 1 18K BRAM are required, with a maximum depth of 512; The Summary in the FIFO Generator can also get the same BRAM resource allocation method:

image-20201229181542983

Similar to just now, because the FIFO bit width is too long and the depth is not used enough, a lot of BRAM resources are not used.

How to avoid the waste of BRAM

Control the bit width and depth value of the FIFO, try to use up one BRAM, and then enable the second or more BRAM;

For example, if you want to store 48 groups of data with a width of 768 bits into the FIFO:

According to the previous calculation, it needs to use 11 36K BRAM storage, and only 24 is used for the depth of 512;

You can first divide the 768bit data into 4 groups, each with a bit width of 192bit; store it in the FIFO according to the 192bit bit width, so that the depth becomes 48 * 4 = 192;

Need to use 3 36K BRAM can be stored, although there is still a lot of wasted space, but it can save more than the above.

According to requirements, after the data is read from the FIFO, bit splicing is also needed to splice into the previous 768bit (does it feel a bit cumbersome).

Some limitations

The limitation of this is that it sacrifices the time of data transmission, because the data is divided into fragments for a period of time. The more fragmented, the longer the time required;

In addition, the read and write bit width of the FIFO is reduced, and the read and write clock remains unchanged, so the bandwidth of the FIFO is also reduced.

There is no other way to deal with it yet. If there is a better way, please leave a message for discussion.

Still have questions

Why did Xilinx choose 36K as the capacity of a Block RAM?

to sum up

When using FIFO, you need to pay attention to the data bit width and data depth, and the impact on the size of the occupied BRAM resources. Avoid designing BRAM resources that exceed the FPGA.

Guess you like

Origin blog.csdn.net/sinat_31206523/article/details/111938672