cpu cache loss rate impact on application performance test

Has no intuitive feel the impact cpu cache program performance right, now use the time command and valgrind cachegrind tools to do a test, you can truly feel the impact of cpu cache on program performance. Thereby helping to optimize the program.

1. Classical test code

cache1.c

     1	#include <stdio.h>
       
     2	#define MAXROW 8000
     3	#define MAXCOL 8000
     4	int main(int argc, char **argv)
     5	{
     6		int i,j;
     7		static x[MAXROW][MAXCOL];
     8		printf("starting\n");
     9		for(i = 0; i < MAXROW; i++){
    10			for(j = 0; j < MAXCOL; j++){
    11				x[i][j] = i * j;
    12			}
    13		}
    14		printf("complete\n");
    15	}

cache2.c

     1	#include <stdio.h>
       
     2	#define MAXROW 8000
     3	#define MAXCOL 8000
     4	int main(int argc, char **argv)
     5	{
     6		int i,j;
     7		static x[MAXROW][MAXCOL];
     8		printf("starting\n");
     9		for(j = 0; j < MAXCOL; j++){
    10			for(i = 0; i < MAXROW; i++){
    11				x[i][j] = i * j;
    12			}
    13		}
    14		printf("complete\n");
    15	}

2. Experimental results

[root@192 test]# gcc cache1.c -o cache1
[root@192 test]# time ./cache1 
starting
complete

real	0m0.359s
user	0m0.186s
sys	0m0.126s
[root@192 test]# gcc cache2.c -o cache2
[root@192 test]# time ./cache2 
starting
complete

real	0m2.033s
user	0m1.899s
sys	0m0.116s
[root@192 test]#

As can be seen from the source, cache2.c just change a line of code, but the execution time there is a big difference, cache1 359ms to complete, but cache2 was carried out for 2 seconds. Let's look at two cpu cache usage.

[root@192 test]# valgrind --tool=cachegrind ./cache1
==6269== Cachegrind, a cache and branch-prediction profiler
==6269== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote et al.
==6269== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==6269== Command: ./cache1
==6269== 
--6269-- warning: L3 cache found, using its data for the LL simulation.
starting
complete
==6269== 
==6269== I   refs:      768,153,385
==6269== I1  misses:            758
==6269== LLi misses:            754
==6269== I1  miss rate:        0.00%
==6269== LLi miss rate:        0.00%
==6269== 
==6269== D   refs:      448,071,036  (384,049,725 rd   + 64,021,311 wr)
==6269== D1  misses:      4,001,857  (      1,306 rd   +  4,000,551 wr)
==6269== LLd misses:      4,001,663  (      1,134 rd   +  4,000,529 wr)
==6269== D1  miss rate:         0.9% (        0.0%     +        6.2%  )
==6269== LLd miss rate:         0.9% (        0.0%     +        6.2%  )
==6269== 
==6269== LL refs:         4,002,615  (      2,064 rd   +  4,000,551 wr)
==6269== LL misses:       4,002,417  (      1,888 rd   +  4,000,529 wr)
==6269== LL miss rate:          0.3% (        0.0%     +        6.2%  )
[root@192 test]# valgrind --tool=cachegrind ./cache2
==6270== Cachegrind, a cache and branch-prediction profiler
==6270== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote et al.
==6270== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==6270== Command: ./cache2
==6270== 
--6270-- warning: L3 cache found, using its data for the LL simulation.
starting
complete
==6270== 
==6270== I   refs:      768,153,385
==6270== I1  misses:            758
==6270== LLi misses:            754
==6270== I1  miss rate:        0.00%
==6270== LLi miss rate:        0.00%
==6270== 
==6270== D   refs:      448,071,036  (384,049,725 rd   + 64,021,311 wr)
==6270== D1  misses:     64,001,856  (      1,306 rd   + 64,000,550 wr)
==6270== LLd misses:      4,009,662  (      1,134 rd   +  4,008,528 wr)
==6270== D1  miss rate:        14.3% (        0.0%     +      100.0%  )
==6270== LLd miss rate:         0.9% (        0.0%     +        6.3%  )
==6270== 
==6270== LL refs:        64,002,614  (      2,064 rd   + 64,000,550 wr)
==6270== LL misses:       4,010,416  (      1,888 rd   +  4,008,528 wr)
==6270== LL miss rate:          0.3% (        0.0%     +        6.3%  )
[root@192 test]#

As it can be seen from the cache miss rate, loss rate cache1 0.9% of the D1, D2 of the loss rate cache2 of 14.3%, so you can feel the impact of the loss of cpu cache on program performance how important yes.

Guess you like

Origin www.cnblogs.com/ZhaoKevin/p/12580575.html