Reprinted from https://blog.csdn.net/garfier/article/details/12489953#
Performance Analysis Tool
The performance of software is an important inspection point for software quality. Whether it is an online service program or an offline program, or even a terminal application, performance is the key to user experience. The major performance categories mentioned here include performance and stability. When we do software testing, we also need to focus on testing the performance and stability of the version. There are many ways to locate performance problems found during software testing. The basic method may be that developers review the code, or use some tools to perform performance analysis of the code. What are the common performance analysis tuning tools? The following two articles provide a detailed summary:
- https://computing.llnl.gov/tutorials/performance_tools/#Considerations
- http://en.wikipedia.org/wiki/List_of_performance_analysis_tools
Fundamentals of Gprof
Gprof can let you know where in your code is time-consuming, which functions are called a lot, and let you see the calling relationship between functions at a glance. gprof is a performance diagnostic tool supported by the gcc/g++ compiler. As long as the -pg option is added when compiling, the compiler will add an mcount function call at the beginning of each function when compiling the program. This mcount function will be called before each function call, and the function's Calling graph and function call time and call times and other information. Finally, it is saved in the gmon.out file when the program exits. It should be noted that the program must exit normally or exit through the exit call, because the program will only be triggered to write the gmon.out file when exit() is called.
Then, the use of gprof is mainly the following three steps:
- will compile the program with the -pg parameter
- Run the program and exit normally
- View the gmon.out file
Gprof usage example
#include<iostream> using namespace std; int add(int a, int b) { return a+b; } int sub(int a, int b) { return a-b; } int call () { std::cout << add(1,2) << std::endl; std::cout << sub(2,4) << std::endl; } intmain() { int a=1, b=2; cout << add(a,b) << endl; for (int i=0; i<10000; i++) call(); return 0; }
Compile with g++ and add the -pg parameter:
- g++ -o hello hello_grof.cpp -pg -g
After getting the executable file, we can use readelf to see the difference between its symbol table and the one compiled without -pg: compare the results of readelf -r ./hello and readelf -r ./hello_normal.
- gprof -b ./hello gmon.out
- Flat profile:
- Each sample counts as 0.01 seconds.
- no time accumulated
- % cumulative self self total
- time seconds seconds calls Ts/call Ts/call name
- 0.00 0.00 0.00 10001 0.00 0.00 add(int, int)
- 0.00 0.00 0.00 10000 0.00 0.00 sub(int, int)
- 0.00 0.00 0.00 10000 0.00 0.00 call()
- 0.00 0.00 0.00 1 0.00 0.00 global constructors keyed to _Z3addii
- 0.00 0.00 0.00 1 0.00 0.00 __static_initialization_and_destruction_0(int, int)
- Call graph
- granularity: each sample hit covers 2 byte(s) no time propagated
- index % time self children called name
- 0.00 0.00 1/10001 main [7]
- 0.00 0.00 10000/10001 call() [10]
- [8] 0.0 0.00 0.00 10001 add(int, int) [8]
- -----------------------------------------------
- 0.00 0.00 10000/10000 call() [10]
- [9] 0.0 0.00 0.00 10000 sub(int, int) [9]
- -----------------------------------------------
- 0.00 0.00 10000/10000 main [7]
- [10] 0.0 0.00 0.00 10000 call() [10]
- 0.00 0.00 10000/10001 add(int, int) [8]
- 0.00 0.00 10000/10000 sub(int, int) [9]
- -----------------------------------------------
- 0.00 0.00 1/1 __do_global_ctors_aux [13]
- [11] 0.0 0.00 0.00 1 global constructors keyed to _Z3addii [11]
- 0.00 0.00 1/1 __static_initialization_and_destruction_0(int, int) [12]
- -----------------------------------------------
- 0.00 0.00 1/1 global constructors keyed to _Z3addii [11]
- [12] 0.0 0.00 0.00 1 __static_initialization_and_destruction_0(int, int) [12]
- -----------------------------------------------
- Index by function name
- [11] global constructors keyed to _Z3addii (hello_grof.cpp) [9] sub(int, int) [10] call()
- [8] add(int, int) [12] __static_initialization_and_destruction_0(int, int) (hello_grof.cpp)
You can use the run command:
- gprof -b ./hello gmon.out | gprof2doc.py > ~WWW/hello.dot
To generate a call graph file in dot format, you can use the Windows version of GVEdit for Graphviz to view the call graph:
Interpretation of Gprof output
This part of the content can be removed from the -b parameter in gprof -b ./hello, and the detailed description of the field can be displayed:
- 14 % the percentage of the total running time of the
- 15 time program used by this function.
- 16
- 17 cumulative a running sum of the number of seconds accounted
- 18 seconds for by this function and those listed above it.
- 19
- 20 self the number of seconds accounted for by this
- 21 seconds function alone. This is the major sort for this
- 22 listing.
- 23
- 24 calls the number of times this function was invoked, if
- 25 this function is profiled, else blank.
- 26
- 27 self the average number of milliseconds spent in this
- 28 ms/call function per call, if this function is profiled,
- 29 else blank.
- 30
- 31 total the average number of milliseconds spent in this
- 32 ms/call function and its descendents per call, if this
- 33 function is profiled, else blank.
- 34
- 35 name the name of the function. This is the minor sort
- 36 for this listing. The index shows the location of
- 37 the function in the gprof listing. If the index is
- 38 in parenthesis it shows where it would appear in
- 39 the gprof listing if it were to be printed.
Summarize
gprof is a common performance analysis tool. Here are some of its shortcomings, which are also seen from the Internet:
- 1. Poor multi-thread support, inaccurate
- 2. You must exit exit()
- 3. It can only analyze the user time consumed by the application program in the running process, and cannot obtain the running time of the program kernel space. Call analysis of the kernel state is powerless. If the program system call ratio is relatively large, it is not suitable.