opcontrol 捕捉L2缓存IN事件

# 查看缓存大小 $cat /sys/devices/system/cpu/cpu0/cache/index*/size 32K:指令缓存
32K:L1D数据缓存
256K:L2缓存
15360K:L3缓存   # 查看一条缓存行大小 $cat /sys/devices/system/cpu/cpu0/cache/index*/number_of_sets
64:指令缓存行
64:L1D数据缓存行
512:L2数据缓存行
12288:L3数据缓存行

# 设置捕捉L2缓存IN事件
$ sudo  opcontrol --setup --event=l2_lines_in:100000

# 清空工作区
$ sudo opcontrol --reset

# 开始捕捉
$ sudo opcontrol --start

# 运行程序
$ java FalseSharing

# 程序跑完后, dump捕捉到的数据 
$ sudo opcontrol --dump 

# 停止捕捉 
$ sudo opcontrol -h

# 报告结果 
$ opreport -l `which java`

结果示例如下:

CPU: Intel Sandy Bridge microarchitecture, speed 2300.24 MHz (estimated)
Counted l2_lines_in events (L2 cache lines in) with a unit mask of 0x07 (all L2 cache lines filling L2) count 100000
samples  %        image name               symbol name
14914    100.000  anon (tgid:9752 range:0x7fddb8424000-0x7fddb8694000) anon (tgid:9752 range:0x7fddb8424000-0x7fddb8694000)

结果中,samples的数值越高,说明l2_lines_in触发的次数越多。

本地的结果如下:

CPU: Intel Sandy Bridge microarchitecture, speed 2300.24 MHz (estimated)

均运行在4个cpu逻辑核心上。

名称 备注 时间(秒) l1d次数 l2_lines_in次数 l2_trans次数
FlaseSharing 有L1D竞争 259 1473 37439 156777
FalseSharing2 没有L1D竞争 40 501 31778 41114
AffinityFalseSharingDifferentSocket 有L1D竞争,每个socket上有两个逻辑核心 217 1527 43370 101298
AffinityFalseSharingSameCore 有L1D竞争,同一个socket上的2个core上4个逻辑核心 26 400 13673 22382
AffinityFalseSharingSameSocket 有L1D竞争,同一个socket上的4个core上4个逻辑核心 44 912 35217 42213

第二组测试,限定使用2个逻辑核心

名称 备注 时间(秒) l1d次数 l2_lines_in次数 l2_trans次数 预期结果
FlaseSharing 有L1D竞争 77 472 18941 25000 近似
FalseSharing2 没有L1D竞争 51 271 6316 14925 近似
AffinityFalseSharingDifferentSocket 有L1D竞争,2个socket上的2个逻辑核心 54 429 16803 28851  
AffinityFalseSharing2DifferentSocket 没有L1D竞争,2个socket上的2个逻辑核心 30  
AffinityFalseSharingSameCore 有L1D竞争,1个socket上的2个core上2个逻辑核心 21 284 10528 14900 待确认
AffinityFalseSharing2SameCore 没有L1D竞争,1个socket上的2个core上2个逻辑核心 31 725 15930 30264 待确认
AffinityFalseSharingSameSocket 有L1D竞争,1个socket上的2个core上2个逻辑核心 35 661 16019 27445  
AffinityFalseSharing2SameSocket 没有L1D竞争,1个socket上的2个core上2个逻辑核心 20 322 11877 16513  

l1d: (counter: all)
        L1D cache events (min count: 2000000)
        Unit masks (default 0x1)
        ----------
        0x01: replacement L1D Data line replacements.
        0x02: allocated_in_m L1D M-state Data Cache Lines Allocated
        0x04: eviction L1D M-state Data Cache Lines Evicted due to replacement (only)
        0x08: all_m_replacement All Modified lines evicted out of L1D

l2_l1d_wb_rqsts: (counter: all)
        writebacks from L1D to the L2 cache (min count: 200000)
        Unit masks (default 0x4)
        ----------
        0x04: hit_e writebacks from L1D to L2 cache lines in E state
        0x08: hit_m writebacks from L1D to L2 cache lines in M state


l1d_pend_miss: (counter: 2)
        Cycles with L1D load Misses outstanding. (min count: 2000000)
        Unit masks (default 0x1)
        ----------
        0x01: pending Cycles with L1D load Misses outstanding.
        0x01: occurences This event counts the number of L1D misses outstanding occurences.
              (extra: edge cmask=1)

l1d_blocks: (counter: all)
        L1D cache blocking events (min count: 100000)
        Unit masks (default 0x1)
        ----------
        0x01: ld_bank_conflict Any dispatched loads cancelled due to DCU bank conflict
        0x05: bank_conflict_cycles Cycles with l1d blocks due to bank conflicts (extra: cmask=1)

l2_trans: (counter: all)
        L2 cache accesses (min count: 200000)
        Unit masks (default 0x80)
        ----------
        0x80: all_requests Transactions accessing L2 pipe
        0x01: demand_data_rd Demand Data Read requests that access L2 cache, includes L1D
              prefetches.
        0x02: rfo RFO requests that access L2 cache
        0x04: code_rd L2 cache accesses when fetching instructions including L1D code prefetches
        0x08: all_pf L2 or LLC HW prefetches that access L2 cache
        0x10: l1d_wb L1D writebacks that access L2 cache
        0x20: l2_fill L2 fill requests that access L2 cache
        0x40: l2_wb L2 writebacks that access L2 cache


l2_lines_in: (counter: all)
        L2 cache lines in (min count: 100000)
        Unit masks (default 0x7)
        ----------
        0x07: all L2 cache lines filling L2
        0x01: i L2 cache lines in I state filling L2
        0x02: s L2 cache lines in S state filling L2
        0x04: e L2 cache lines in E state filling L2


l2_lines_out: (counter: all)
        L2 cache lines out (min count: 100000)
        Unit masks (default 0x1)
        ----------
        0x01: demand_clean Clean line evicted by a demand
        0x02: demand_dirty Dirty line evicted by a demand
        0x04: pf_clean Clean line evicted by an L2 Prefetch
        0x08: pf_dirty Dirty line evicted by an L2 Prefetch
        0x0a: dirty_all Any Dirty line evicted

附二:

# 查看缓存大小

$ls /sys/devices/system/cpu/cpu0/cache/

 index0  index1  index2  index3

4个目录
index0:1级数据cache
index1:1级指令cache
index2:2级cache
index3:3级cache ,对应cpuinfo里的cache


目录里的文件是cache信息描述,以本机的cpu0/index0为例简单解释一下:

64*4*128=32K
文件 内容 说明
type Data 数据cache,如果查看index1就是Instruction
Level 1 L1
Size 32K 大小为32K
coherency_line_size 64
physical_line_partition 1
ways_of_associativity 4
number_of_sets 128
shared_cpu_map 00000101 表示这个cache被CPU0和CPU8 share

解释一下shared_cpu_map内容的格式:
表面上看是2进制,其实是16进制表示,每个bit表示一个cpu,1个数字可以表示4个cpu
截取00000101的后4位,转换为2进制表示

CPU id 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0×0101的2进制表示 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1

0101表示cpu8和cpu0,即cpu0的L1 data cache是和cpu8共享的。

验证一下?
cat /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_map
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000101

再看一下index3 shared_cpu_map的例子
cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_map
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000f0f

CPU id 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0x0f0f的2进制表示 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

cpu0,1,2,3和cpu8,9,10,11共享L3 cache


参考:
1. http://itindex.net/detail/37419-java-视角-理解

2. http://www.searchtb.com/2012/12/玩转cpu-topology.html

猜你喜欢

转载自flowaters.iteye.com/blog/2195805