工具/插件 -- CACTI:一种Cache/Memory分析工具

工具/插件 -- CACTI:一种Cache/Memory分析工具

@(工具/插件)

最近发现了一种可以评估DRAM访存功耗的工具,对于需要分析片外存储(DRAM)的访存功耗以及延时的设计比较有用,例如:深度学习加速器设计。

1. 简介

CACTI是一种分析工具,它接受一组 Caches/Memory参数作为输入,并计算其访存时间、功耗、周期时间和面积。目前更新到7.0版本,并且支持下面几种Memory的分析:

  • direct mapped caches
  • set-associative caches
  • fully associative caches
  • Embedded DRAM memories
  • Commodity DRAM memories

此外,还有以下功能:

  • 支持multi-ported uniform cache access (UCA)和multi-banked, multi-ported non-uniform cache access (NUCA).

  • 泄漏功耗的计算也考虑到了环境温度。

  • Router power model.

  • Interconnect model with different delay, power, and area properties including low-swing wire model.

  • An interface to perform trade-off analysis involving power, delay,area, and bandwidth.

  • All process specific values used by the tool are obtained from ITRS and currently, the tool supports 90nm, 65nm, 45nm, and 32nm technology nodes.

  • Chip IO model to calculate latency and energy for DDR bus. Users can model different loads (fan-outs) and evaluate the impact on frequency and energy. This model can be used to study LR-DIMMs, R-DIMMs, etc.

2. 使用

源码地址:https://github.com/HewlettPackard/cacti
技术文档: http://www.hpl.hp.com/techreports/2013/HPL-2013-79.pdf

在Windows上没调起来(windows上c++库缺少pthread,没找到比较简单的方法),后面直接在Centos上测试,下面是简单的使用方法:

  1. 从源码地址下载c++源码,放到centos系统下。
  2. 进入源码文件夹,直接在命令行里make
  3. 生成名为cacti的可执行文件后,执行
    ./cacti -infile ***.cfg
    其中.cfg文件是配置memory属性的文件,需要根据所使用的DRAM属性进行更改,这里我直接拿了他sample里的一个配置文件运行了:./cacti -infile sample_config_files/ddr3_cache.cfg

最后会得到一个详细的分析文档,这边贴一下:

Cache size                    : 8388608
Block size                    : 64
Associativity                 : 8
Read only ports               : 0
Write only ports              : 0
Read write ports              : 1
Single ended read ports       : 0
Cache banks (UCA)             : 1
Technology                    : 0.022
Temperature                   : 360
Tag size                      : 42
array type                    : Cache
Model as memory               : 0
Model as 3D memory       	 : 0
Access mode                   : 0
Data array cell type          : 0
Data array peripheral type    : 0
Tag array cell type           : 0
Tag array peripheral type     : 0
Optimization target           : 2
Design objective (UCA wt)     : 0 0 0 100 0
Design objective (UCA dev)    : 20 100000 100000 100000 100000
Cache model                   : 0
Nuca bank                     : 0
Wire inside mat               : 1
Wire outside mat              : 1
Interconnect projection       : 1
Wire signaling               : 1
Print level                   : 1
ECC overhead                  : 1
Page size                     : 8192
Burst length                  : 8
Internal prefetch width       : 8
Force cache config            : 0
Subarray Driver direction       : 1
iostate                       : READ
dram_ecc                      : NO_ECC
io_type                     : DDR3
dram_dimm                      : UDIMM
IO Area (sq.mm) = inf
IO Timing Margin (ps) = 35.8333
IO Votlage Margin (V) = 0.155
IO Dynamic Power (mW) = 1282.42 PHY Power (mW) = 232.752 PHY Wakeup Time (us) = 27.503
IO Termination and Bias Power (mW) = 3136.7

---------- CACTI (version 7.0.3DD Prerelease of Aug, 2012), Uniform Cache Access SRAM Model ----------

Cache Parameters:
    Total cache size (bytes): 8388608
    Number of banks: 1
    Associativity: 8
    Block size (bytes): 64
    Read/write Ports: 1
    Read ports: 0
    Write ports: 0
    Technology size (nm): 22

    Access time (ns): 3.03414
    Cycle time (ns):  1.84197
    Total dynamic read energy per access (nJ): 0.381869
    Total dynamic write energy per access (nJ): 0.446873
    Total leakage power of a bank (mW): 2520.29
    Total gate leakage power of a bank (mW): 4.71441
    Cache height x width (mm): 3.07383 x 2.89775

    Best Ndwl : 8
    Best Ndbl : 8
    Best Nspd : 2
    Best Ndcm : 1
    Best Ndsam L1 : 8
    Best Ndsam L2 : 1

    Best Ntwl : 16
    Best Ntbl : 8
    Best Ntspd : 8
    Best Ntcm : 1
    Best Ntsam L1 : 8
    Best Ntsam L2 : 2
    Data array, H-tree wire type: Global wires with 30% delay penalty
    Tag array, H-tree wire type: Global wires with 30% delay penalty

Time Components:

  Data side (with Output driver) (ns): 3.03414
	H-tree input delay (ns): 0.860695
	Decoder + wordline delay (ns): 0.607741
	Bitline delay (ns): 0.473783
	Sense Amplifier delay (ns): 0.00189739
	H-tree output delay (ns): 1.09002

  Tag side (with Output driver) (ns): 0.866708
	H-tree input delay (ns): 0.250295
	Decoder + wordline delay (ns): 0.0962495
	Bitline delay (ns): 0.078
	Sense Amplifier delay (ns): 0.00189739
	Comparator delay (ns): 0.0162774
	H-tree output delay (ns): 0.440265


Power Components:

  Data array: Total dynamic read energy/access  (nJ): 0.360657
	Total energy in H-tree (that includes both address and data transfer) (nJ): 0.270396
	Output Htree inside bank Energy (nJ): 0.263979
	Decoder (nJ): 0.000237668
	Wordline (nJ): 0.000275334
	Bitline mux & associated drivers (nJ): 0
	Sense amp mux & associated drivers (nJ): 0
	Bitlines precharge and equalization circuit (nJ): 0.00163006
	Bitlines (nJ): 0.0612354
	Sense amplifier energy (nJ): 0.0018371
	Sub-array output driver (nJ): 0.0249178
	Total leakage power of a bank (mW): 2357.99
	Total leakage power in H-tree (that includes both address and data network) ((mW)): 18.9776
	Total leakage power in cells (mW): 0
	Total leakage power in row logic(mW): 0
	Total leakage power in column logic(mW): 0
	Total gate leakage power in H-tree (that includes both address and data network) ((mW)): 0.0916133

  Tag array:  Total dynamic read energy/access (nJ): 0.0212128
	Total leakage read/write power of a bank (mW): 162.298
	Total energy in H-tree (that includes both address and data transfer) (nJ): 0.00268136
	Output Htree inside a bank Energy (nJ): 0.00104879
	Decoder (nJ): 0.000585105
	Wordline (nJ): 0.000356972
	Bitline mux & associated drivers (nJ): 0
	Sense amp mux & associated drivers (nJ): 0.000288214
	Bitlines precharge and equalization circuit (nJ): 0.00153419
	Bitlines (nJ): 0.0132631
	Sense amplifier energy (nJ): 0.00155643
	Sub-array output driver (nJ): 8.13397e-05
	Total leakage power of a bank (mW): 162.298
	Total leakage power in H-tree (that includes both address and data network) ((mW)): 0.23223
	Total leakage power in cells (mW): 0
	Total leakage power in row logic(mW): 0
	Total leakage power in column logic(mW): 0
	Total gate leakage power in H-tree (that includes both address and data network) ((mW)): 0.00146699


Area Components:

  Data array: Area (mm2): 7.28836
	Height (mm): 3.07383
	Width (mm): 2.3711
	Area efficiency (Memory cell area/Total area) - 73.1983 %
		MAT Height (mm): 0.716448
		MAT Length (mm): 0.540768
		Subarray Height (mm): 0.328909
		Subarray Length (mm): 0.26532

  Tag array: Area (mm2): 0.377107
	Height (mm): 0.716051
	Width (mm): 0.526648
	Area efficiency (Memory cell area/Total area) - 74.9106 %
		MAT Height (mm): 0.173381
		MAT Length (mm): 0.063873
		Subarray Height (mm): 0.0822272
		Subarray Length (mm): 0.027995

Wire Properties:

  Delay Optimal
	Repeater size - 42.0297 
	Repeater spacing - 0.0329013 (mm) 
	Delay - 0.216837 (ns/mm) 
	PowerD - 0.000279845 (nJ/mm) 
	PowerL - 0.0215298 (mW/mm) 
	PowerLgate - 9.15623e-05 (mW/mm)
	Wire width - 0.022 microns
	Wire spacing - 0.022 microns

  5% Overhead
	Repeater size - 17.0297 
	Repeater spacing - 0.0329013 (mm) 
	Delay - 0.226875 (ns/mm) 
	PowerD - 0.0001818 (nJ/mm) 
	PowerL - 0.00872349 (mW/mm) 
	PowerLgate - 3.70994e-05 (mW/mm)
	Wire width - 0.022 microns
	Wire spacing - 0.022 microns

  10% Overhead
	Repeater size - 15.0297 
	Repeater spacing - 0.0329013 (mm) 
	Delay - 0.235988 (ns/mm) 
	PowerD - 0.000174237 (nJ/mm) 
	PowerL - 0.00769899 (mW/mm) 
	PowerLgate - 3.27424e-05 (mW/mm)
	Wire width - 0.022 microns
	Wire spacing - 0.022 microns

  20% Overhead
	Repeater size - 12.0297 
	Repeater spacing - 0.0329013 (mm) 
	Delay - 0.257722 (ns/mm) 
	PowerD - 0.00016297 (nJ/mm) 
	PowerL - 0.00616223 (mW/mm) 
	PowerLgate - 2.62069e-05 (mW/mm)
	Wire width - 0.022 microns
	Wire spacing - 0.022 microns

  30% Overhead
	Repeater size - 10.0297 
	Repeater spacing - 0.0329013 (mm) 
	Delay - 0.28134 (ns/mm) 
	PowerD - 0.000155511 (nJ/mm) 
	PowerL - 0.00513773 (mW/mm) 
	PowerLgate - 2.18498e-05 (mW/mm)
	Wire width - 0.022 microns
	Wire spacing - 0.022 microns

  Low-swing wire (1 mm) - Note: Unlike repeated wires, 
	delay and power values of low-swing wires do not
	have a linear relationship with length. 
	delay - 0.0902442 (ns) 
	powerD - 2.8399e-06 (nJ) 
	PowerL - 1.71796e-07 (mW) 
	PowerLgate - 1.29017e-09 (mW)
	Wire width - 4.4e-08 microns
	Wire spacing - 4.4e-08 microns


Segmentation fault

其中

Cache Parameters:
    Total dynamic read energy per access (nJ): 0.381869
    Total dynamic write energy per access (nJ): 0.446873

给出了单次的读写功耗。

具体的配置文件相关条目的说明可以翻阅上面提到的技术文档,后面有时间再研究一下。

猜你喜欢

转载自www.cnblogs.com/lyc-seu/p/12934186.html