directory title
The pronunciation is [wɑːɡrɪnd].
Installation of Valgrind tools
Valgrind is an open source tool for memory debugging, memory leak detection, and performance analysis. Here are the steps to install Valgrind on Ubuntu or other Debian systems:
- Open a terminal.
- First, you need to update your system package list. The following commands can be used:
sudo apt-get update
- Then, install Valgrind with the following command:
sudo apt-get install valgrind
- Once the installation is complete, you can verify that Valgrind was installed successfully with the following command:
If Valgrind was successfully installed, this will display the version number of Valgrind.valgrind --version
For other Linux distributions such as Fedora or CentOS, you can install Valgrind using the corresponding package manager such as yum or dnf. For example, on Fedora, you can install Valgrind with:
sudo dnf install valgrind
On macOS, you can use Homebrew to install Valgrind:
brew install valgrind
Note that Valgrind may not support recent versions of macOS.
On Windows, Valgrind is not directly available, but you can use it through the Windows Subsystem for Linux (WSL).
Cross compile Valgrind
Valgrind's source code includes all the libraries it needs to run, so in most cases you can compile and run Valgrind without additional dependencies. However, some basic development tools are required to compile Valgrind, including a C compiler (such as gcc) and the make tool.
If you plan to cross-compile Valgrind on a different platform, you will need a cross-compiler configured for that platform. You also need to make sure that your build environment includes all the headers and libraries Valgrind needs.
Here are the basic steps to cross-compile Valgrind:
- First, download the source code for Valgrind. You can download the latest source code from Valgrind's official website: http://valgrind.org/downloads/current.html
- Unzip the source code package and enter the source code directory:
(Please adjust the version number in the above command according to the version of Valgrind you downloaded)tar xvf valgrind-3.17.0.tar.bz2 cd valgrind-3.17.0
- Configure the compilation environment. You need to specify your cross compiler and target platform. For example, if your cross compiler is arm-linux-gnueabi-gcc, and your target platform is arm-linux, you can use the following command:
./configure --host=arm-linux CC=arm-linux-gnueabi-gcc
- Compile Valgrind:
make
- Finally, you can copy the compiled Valgrind to your target system, or use
make install
the command to install it into your cross-compilation environment.
Note that this is just a basic example, and you may need to adjust these steps based on your specific needs and your cross-compilation environment.
What the Valgrind tool does
Valgrind is a very powerful tool mainly used for memory management bug detection, and CPU and memory analysis. Here are some basic usages:
-
memory leak detection
This is one of the most commonly used features of Valgrind. You can check your program for memory leaks with the following command:
valgrind --leak-check=yes your_program [your_program_arguments]
This will run your program and report any memory leaks after the program ends.
--leak-check=yes
option tells Valgrind to check for memory leaks. -
Stack analysis with Massif
Massif is a Valgrind tool for analyzing how much stack your program uses. You can run Massif with the following command:
valgrind --tool=massif your_program [your_program_arguments]
This will generate a
massif.out.pid
file called , wherepid
is your program's process ID. You canms_print
view the contents of this file with the command:ms_print massif.out.pid
-
Profiling with Callgrind
Callgrind is a Valgrind tool for profiling the performance of your programs. You can run Callgrind with the following command:
valgrind --tool=callgrind your_program [your_program_arguments]
This will generate a
callgrind.out.pid
file called , wherepid
is your program's process ID. You cancallgrind_annotate
view the contents of this file with the command:callgrind_annotate callgrind.out.pid
The above are just some basic usages of Valgrind. Valgrind has many other functions and options, you can refer to the official documentation of Valgrind to learn more: http://valgrind.org/docs/manual/index.html
Memcheck memory leak detection tool
Please note that you need to ensure that your program and dynamic library are -g
compiled with debugging information (for example, using the gcc option), so that Valgrind can provide more detailed reports.
Routine inspection (generate report after program ends)
Valgrind's memory leak detection tool Memcheck can be accurate to the line number of the source code and tell you which line of code allocated memory that was not released correctly. However, in order to be able to do this, you need to include debug information when compiling your program.
If you compiled your program with GCC or Clang, you can add -g
the option to include debug information:
gcc -g -o your_program your_program.c
Then, you can run your program with Valgrind:
valgrind --tool=memcheck --leak-check=full ./your_program
In the reported results, Valgrind will show the line of code that caused the memory leak. For example:
==12345== 40 bytes in 1 blocks are definitely lost in loss record 1 of 2
==12345== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12345== by 0x108671: main (your_program.c:2)
In this example, your_program.c:2
it means that the memory leak occurs on your_program.c
line 2 of the file.
Note that some optimizations may make line number information inaccurate if your program is compiled with optimization options such as -O2
or . -O3
If possible, you should compile your program without optimization for memory leak detection.
Important parameters
-
--leak-check=yes
will tell Valgrind to do memory leak detection, but it will only provide aggregate information for each leak point, such as the total number of bytes leaked and the number of blocks leaked. --leak-check=full
More detailed information will be provided. In addition to overall information about the leak point, it also displays information about each individual leaked block, including its size and the stack trace of the function that allocated it. This can help you pinpoint the location of the memory leak more precisely.--show-leak-kinds=all
Shows all types of memory leaks, including "definitely lost", "indirectly lost", "possibly lost", and "still reachable".--num-callers=n
Increase the depth of the call stack.
--leak-check=full
You'll get more verbose output if you use :
==12345== 128 bytes in 1 blocks are definitely lost in loss record 1 of 1
==12345== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12345== by 0x108671: func (your_program.c:4)
==12345== by 0x108687: main (your_program.c:8)
This shows that the memory leak is happening at the line func
in the function , which can help you pinpoint the problem more precisely.your_program.c:4
long running service
Valgrind by default reports memory leaks and other problems at the end of the program. However, if your program is a long-running service or you wish to view reports while it is running, you can use Valgrind's gdbserver mode, which allows you to interact with Valgrind at runtime.
Following are the basic steps to use gdbserver mode:
- First, start your program, using
--vgdb=yes
the option to tell Valgrind to start gdbserver on startup:valgrind --vgdb=yes --leak-check=full your_program [your_program_arguments]
- In another terminal, you can use gdb to connect to Valgrind. First, you need to find the process ID (PID) of your program. Then, connect to Valgrind with the following command:
This will start gdb and connect to Valgrind.gdb your_program (gdb) target remote | vgdb
- Now, you can use the command in gdb
monitor
to interact with Valgrind. For example, you can usemonitor leak_check full reachable any
the command to check for memory leaks at runtime:
This will tell Valgrind to immediately perform a full memory leak check and report all reachable and unreachable memory leaks.(gdb) monitor leak_check full reachable any
Note that this is just a basic example, and you may need to adapt these steps to your specific needs. You can refer to Valgrind's official documentation to learn more about gdbserver mode: http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver
make the report output to a file
By default, memory leaks will be directly output to the console following the process, but if
--show-leak-kinds=all
the parameter is enabled, the report will reach tens of thousands of lines, and then it needs to be output to a file
You can redirect Valgrind's output to a file. >
You can use operators on the command line to do this. Here is an example:
valgrind --leak-check=full --show-leak-kinds=all your_program > output.txt
This command will save the output of Valgrind to output.txt
a file. You can output.txt
replace with any filename you want.
Additionally, Valgrind provides an --log-file
option that you can use to specify the output file. Here is an example:
valgrind --leak-check=full --show-leak-kinds=all --log-file=output.txt your_program
This command has the same effect as the above command, which saves the output of Valgrind to output.txt
a file.
report analysis
example one
==4197== LEAK SUMMARY:
==4197== definitely lost: 6,624 bytes in 2 blocks
==4197== indirectly lost: 0 bytes in 0 blocks
==4197== possibly lost: 12,864 bytes in 34 blocks
==4197== still reachable: 404,895,424 bytes in 504,849 blocks
==4197== of which reachable via heuristic:
==4197== multipleinheritance: 240 bytes in 1 blocks
==4197== suppressed: 0 bytes in 0 blocks
==4197== Reachable blocks (those to which a pointer was found) are not shown.
==4197== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==4197==
==4197== For lists of detected and suppressed errors, rerun with: -s
==4197== ERROR SUMMARY: 32 errors from 32 contexts (suppressed: 0 from 0)
Analysis_Example 1
The following is an analysis of this report:
definitely lost: 6,624 bytes in 2 blocks
: This indicates that 6624 bytes of memory were deterministically lost in 2 blocks. This usually means that the memory has been allocated, but not freed, and the program cannot access the memory again. This is a serious memory leak.indirectly lost: 0 bytes in 0 blocks
: This indicates that there is no indirectly lost memory. Indirectly lost memory means that directly lost memory blocks are also considered lost because they contain pointers to them.possibly lost: 12,864 bytes in 34 blocks
: This indicates that 12864 bytes of memory may be lost in 34 blocks. Potentially lost memory means that the memory may still be in use or may have been lost.still reachable: 404,895,424 bytes in 504,849 blocks
: This means that 404895424 bytes of memory are still accessible in 504849 blocks. This memory is not freed when the program ends, but the program can still access it if needed. This isn't necessarily a problem, but if the number keeps growing, it could lead to a memory leak.suppressed: 0 bytes in 0 blocks
: This indicates that no errors were suppressed. Suppressed errors are those that the user has explicitly told Valgrind to ignore.ERROR SUMMARY: 32 errors from 32 contexts
: This indicates that Valgrind found 32 bugs in 32 different contexts.
This report indicates that your program may have a memory leak. You should check your code, especially the parts that allocate and deallocate memory, to make sure that all allocated memory is properly freed when you're done using it.
Here are a few keywords you should focus on when reviewing reports:
- "definitely lost": This means that memory was allocated but not released, and the program cannot access it again. This is a serious memory leak.
- "possibly lost": This indicates that memory may still be in use, or it may have been lost. This situation usually occurs when the program uses complex data structures, such as circular linked lists.
- "still reachable": This means that there is memory that was not freed at the end of the program, but the program can still access it if needed. This isn't necessarily a problem, but if the number keeps growing, it could lead to a memory leak.
- "indirectly lost": This means that there is memory because the directly lost memory blocks contained pointers to this memory, so that memory is also considered lost.
Example two
==4761== 320 bytes in 1 blocks are possibly lost in loss record 9,331 of 11,803
==4761== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==4761== by 0x40147D9: calloc (rtld-malloc.h:44)
==4761== by 0x40147D9: allocate_dtv (dl-tls.c:375)
==4761== by 0x40147D9: _dl_allocate_tls (dl-tls.c:634)
==4761== by 0x4F08834: allocate_stack (allocatestack.c:430)
==4761== by 0x4F08834: pthread_create@@GLIBC_2.34 (pthread_create.c:647)
==4761== by 0xCEAF03A: zmq::thread_t::start(void (*)(void*), void*) (in /home/lzy/work/acvite_code/test/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libspi-protocol-convert.so)
==4761== by 0xCE9FB3B: zmq::epoll_t::start() (in /home/lzy/work/acvite_code/test/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libspi-protocol-convert.so)
==4761== by 0xCE8A35E: zmq::reaper_t::start() (in /home/lzy/work/acvite_code/test/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libspi-protocol-convert.so)
==4761== by 0xCE7C833: zmq::ctx_t::create_socket(int) (in /home/lzy/work/acvite_code/test/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libspi-protocol-convert.so)
==4761== by 0xCE7A2C5: zmq_socket (in /home/lzy/work/acvite_code/test/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libspi-protocol-convert.so)
==4761== by 0xCE633EA: PubCANOutput::InitPub() (spi-service-protocol.h:33)
==4761== by 0xCE63A93: PubCANOutput::IpcCANOutputPub(unsigned char const*, int) (spi-service-protocol.h:65)
==4761== by 0xCE60F90: Send (spi-protocol-convert.h:22)
==4761== by 0xCE60F90: Send (protocol-convert-base.h:49)
==4761== by 0xCE60F90: HobotADAS::SPIProtocolConvert::OnMessageEnd(long const&, long const&) (spi-protocol-convert.cc:134)
==4761== by 0x2AE98C: HobotADAS::CommPluginManager::ProcessMsg() (comm_plugin_manager.cpp:321)
==4761==
==4761== 320 bytes in 1 blocks are possibly lost in loss record 9,332 of 11,803
==4761== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==4761== by 0x40147D9: calloc (rtld-malloc.h:44)
==4761== by 0x40147D9: allocate_dtv (dl-tls.c:375)
==4761== by 0x40147D9: _dl_allocate_tls (dl-tls.c:634)
==4761== by 0x4F08834: allocate_stack (allocatestack.c:430)
==4761== by 0x4F08834: pthread_create@@GLIBC_2.34 (pthread_create.c:647)
==4761== by 0xCEAF03A: zmq::thread_t::start(void (*)(void*), void*) (in /home/lzy/work/acvite_code/test/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libspi-protocol-convert.so)
==4761== by 0xCE9FB3B: zmq::epoll_t::start() (in /home/lzy/work/acvite_code/test/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libspi-protocol-convert.so)
==4761== by 0xCE8281C: zmq::io_thread_t::start() (in /home/lzy/work/acvite_code/test/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libspi-protocol-convert.so)
==4761== by 0xCE7C916: zmq::ctx_t::create_socket(int) (in /home/lzy/work/acvite_code/test/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libspi-protocol-convert.so)
==4761== by 0xCE7A2C5: zmq_socket (in /home/lzy/work/acvite_code/test/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libspi-protocol-convert.so)
==4761== by 0xCE633EA: PubCANOutput::InitPub() (spi-service-protocol.h:33)
==4761== by 0xCE63A93: PubCANOutput::IpcCANOutputPub(unsigned char const*, int) (spi-service-protocol.h:65)
==4761== by 0xCE60F90: Send (spi-protocol-convert.h:22)
==4761== by 0xCE60F90: Send (protocol-convert-base.h:49)
==4761== by 0xCE60F90: HobotADAS::SPIProtocolConvert::OnMessageEnd(long const&, long const&) (spi-protocol-convert.cc:134)
==4761== by 0x2AE98C: HobotADAS::CommPluginManager::ProcessMsg() (comm_plugin_manager.cpp:321)
Analysis_Example 2
This report shows possible memory leaks that occur in libspi-protocol-convert.so
certain function calls in the library. Specifically, these functions include zmq::thread_t::start()
, zmq::epoll_t::start()
, zmq::reaper_t::start()
, zmq::ctx_t::create_socket()
, zmq_socket
, PubCANOutput::InitPub()
, PubCANOutput::IpcCANOutputPub()
, and HobotADAS::SPIProtocolConvert::OnMessageEnd()
. These functions may pass or allocate memory
during the call , but may not release the memory correctly. However, it is important to note that these memory leaks are marked as "possibly missing", which means that Valgrind cannot determine whether these memory leaks are actually leaks. Sometimes this can be due to complex memory management strategies or the behavior of specific library functions (for example, certain thread functions). So, while reports suggest there may be a memory leak, it's not definitive. You may need to further examine your code, especially those parts that involve memory allocation and deallocation, to determine if there is indeed a memory leak.calloc
pthread_create@@GLIBC_2.34
libspi-protocol-convert.so
Massif stack detection tool
Massif is a tool of Valgrind, which is mainly used to analyze the heap memory usage of the program during operation. It can help developers find out the problem that the program consumes too much memory during the running process, especially when the memory can be released normally after the program ends, but the memory continues to grow during the running process.
Basic use of Massif
You can run Massif with the following command:
valgrind --tool=massif your_program [your_program_arguments]
This will generate a massif.out.pid
file called , where pid
is your program's process ID. You can ms_print
view the contents of this file with the command:
ms_print massif.out.pid
This will show you the memory usage of your program while it is running. You can check this report to see if there are any unexpected spikes in memory usage, or if memory usage continues to increase over time.
Limitations of Massif
Note that Massif can only measure heap memory usage, not stack memory or other types of memory. If your program has problems with other types of memory, you may need to use other tools or techniques to detect it.
Advanced use of Massif
Although Massif can provide the overall memory usage of the program, it cannot directly tell you which module or which piece of code is continuously requesting memory. However, you can use some additional options and tools to get more detailed information.
use --alloc-fn
option
If you know which function is allocating memory (for example, if your module has a specific memory allocation function), you can use the --alloc-fn
option to tell Massif to track memory allocations for this function:
valgrind --tool=massif --alloc-fn=my_alloc your_program [your_program_arguments]
This will cause Massif to my_alloc
credit all memory allocated by the function to my_alloc
the caller of the .
use --pages-as-heap
option
This option will cause Massif to treat all memory pages as heap, which allows you to see all memory allocations, not just memory allocated by malloc
, new
etc. functions:
valgrind --tool=massif --pages-as-heap=yes your_program [your_program_arguments]
Profiling with Callgrind
Although Callgrind is primarily used for profiling, it can also display per-function memory usage. You can use Callgrind to see which function allocates the most memory:
valgrind --tool=callgrind your_program [your_program_arguments]
You can then use kcachegrind
or another Callgrind data viewer to view the results.
Note that these methods may require some understanding of your code and its memory usage patterns. This can be difficult if your code is complex, or if it uses many different memory allocation functions.
view report
Massif is different from some other memory detection tools in that it generates a report while the program is running, rather than waiting for the program to end.
Massif's report is a text file usually named massif.out.pid
, where pid
is the process ID of the running program. This report file contains detailed information about the memory usage of the program while it is running.
Each row in the report represents a sampling point, showing the heap memory usage of the program at that moment. Each sampling point contains the following information:
- Time: the time from the start of the program to the sampling point.
- Memory usage: At this sampling point, the total amount of heap memory used by the program.
- Stack trace: At this sampling point, the program's stack trace information, showing which function calls lead to memory allocation.
You can ms_print
view and parse this report file with the command:
ms_print massif.out.pid
ms_print
will format and output the contents of the report file to the console, making it easier to read and understand. The output includes a graph of memory usage and details for each sampling point.
Callgrind Performance Analysis Tool
Valgrind's Callgrind tool is mainly used to collect runtime behavior information of the program, including the number of function calls, the number of instruction reads, and so on. However, it does not directly measure the execution time of a function. While instruction fetch counts and function call counts can provide some information about program performance, they don't directly tell you which functions are time-consuming.
command read times
"Instruction Fetches" is a metric that represents the number of times an instruction in a particular function was read (and possibly executed) during program execution. This metric can help you understand which functions are frequently executed in your program.
However, the number of instruction fetches is not directly equal to the execution time of the function. A function may have many instructions, but if those instructions are executed quickly, the execution time of the function may still be short. On the other hand, a function may have only a small number of instructions, but if those instructions take a long time to execute (for example, if they involve disk I/O or network communication), then the function's execution time may be very long.
So while instruction fetch counts can provide some information about a program's performance, it doesn't directly tell you which functions are time-consuming. If you want to know the execution time of a function, you may need to use other profiling tools or techniques, such as a CPU sampling profiler or a timer.
use parameters
Valgrind's Callgrind tool has many options that can be used to customize its behavior and output. Here are some options that may work for you:
--dump-instr=yes
: This option tells Callgrind to collect per-instruction information, not just per-function information. This can give you a more detailed understanding of your program's behavior, but also makes Callgrind run slower and generate larger output files.--collect-jumps=yes
: This option lets Callgrind collect jump information in the program. This can help you understand your program's control flow, but also makes Callgrind run slower and generate larger output files.--branch-sim=yes
: This option tells Callgrind to simulate the branch prediction of the program. This can help you understand your program's branch prediction efficiency, but will also make Callgrind run slower.
You can add these options when running Callgrind, for example:
valgrind --tool=callgrind --dump-instr=yes --collect-jumps=yes your_program [your_program_arguments]
You can then use callgrind_annotate
or kcachegrind
to view the results. These tools can display detailed information collected by Callgrind, including per-instruction information and jump information.
Note that these options may make Callgrind run more slowly and produce larger output files. You should only use these options when you need these details.
report generation
You can callgrind_annotate
redirect the output of to a file. On Unix-like systems (such as Linux or macOS), you can use >
the symbol to redirect output. For example:
callgrind_annotate callgrind.out.pid > output.txt
This will write callgrind_annotate
the output to output.txt
the file instead of displaying it on the terminal. If output.txt
the file already exists, this command will overwrite its contents; if the file does not exist, this command will create it.
If you want to append the output to an existing file instead of overwriting it, you can use >>
the notation:
callgrind_annotate callgrind.out.pid >> output.txt
This will callgrind_annotate
append the output of to output.txt
the end of the file.
Note that these commands may not work from the command prompt on Windows systems. If you are running Valgrind on a Windows system (for example, via WSL or Cygwin), you should use these commands in the corresponding Unix-like environment.
Visualization Open Report
kcachegrind
and qcachegrind
are two tools for visualizing profiling data that can read the output generated by Valgrind's Callgrind tool and generate detailed call graphs.
Here are the basic steps on how to use both tools:
- Generating Callgrind output : First, you need to use the Callgrind tool to generate profiling data. You can run your program and generate Callgrind output with the following command:
In this command,valgrind --tool=callgrind your_program
your_program
is the program you want to analyze. This command produces acallgrind.out.pid
file named , wherepid
is the process ID of your program. - Install kcachegrind or qcachegrind : Then, you need to install
kcachegrind
orqcachegrind
. You can install both tools through your package manager. For example, if you are using Ubuntu, you can install it with the following commandkcachegrind
:
Alternatively, you can install it with the following commandsudo apt-get install kcachegrind
qcachegrind
:sudo apt-get install qcachegrind
- Turn on Callgrind's output : Finally, you can use
kcachegrind
orqcachegrind
to turn on Callgrind's output. You can open the file with the following commandcallgrind.out.pid
:
or:kcachegrind callgrind.out.pid
In both commands,qcachegrind callgrind.out.pid
callgrind.out.pid
is the output file of Callgrind.
kcachegrind
and qcachegrind
will display a detailed call graph, which you can use to view your program's performance bottlenecks. You can click on the nodes in the graph to view detailed call information, including the number of calls, CPU time consumed, etc.
Note: Flame graphs cannot be generated
callgrind_annotate
Flame graph generation is not natively supported. A flame graph is a visualization tool for displaying profiling data of a program. They are typically used to display CPU usage, but can also be used to display other types of performance data.
gprof2dot
It is a tool for generating call graphs. It can accept the output of various performance analysis tools, including gprof, oprofile, HProf, Xdebug, Visual Studio, VTune, etc., but it cannot directly parse the output callgrind_annotate
.
report analysis
Performance Analysis Report Fragment Example 1
--------------------------------------------------------------------------------
2 Profile data file 'callgrind.out.3896' (creator: callgrind-3.18.1)
3 --------------------------------------------------------------------------------
4 I1 cache:
5 D1 cache:
6 LL cache:
7 Timerange: Basic block 0 - 3688459443
8 Trigger: Program termination
9 Profiled target: ./comm_plugin_manager (PID 3896, part 1)
10 Events recorded: Ir
11 Events shown: Ir
12 Event sort order: Ir
13 Thresholds: 99
14 Include dirs:
15 User annotated:
16 Auto-annotation: on
17
18 --------------------------------------------------------------------------------
19 Ir
20 --------------------------------------------------------------------------------
21 7,662,006,442 (100.0%) PROGRAM TOTALS
22
23 --------------------------------------------------------------------------------
24 Ir file:function
25 --------------------------------------------------------------------------------
26 1,335,146,312 (17.43%) ./string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:__memset_avx2_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc.so.6]
27 1,255,927,115 (16.39%) ./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc.so.6]
28 767,318,807 (10.01%) ./malloc/./malloc/malloc.c:_int_free [/usr/lib/x86_64-linux-gnu/libc.so.6]
29 748,924,400 ( 9.77%) ./malloc/./malloc/malloc.c:_int_malloc [/usr/lib/x86_64-linux-gnu/libc.so.6]
30 424,929,976 ( 5.55%) ./malloc/./malloc/malloc.c:malloc [/usr/lib/x86_64-linux-gnu/libc.so.6]
31 191,346,740 ( 2.50%) ./malloc/./malloc/malloc.c:free [/usr/lib/x86_64-linux-gnu/libc.so.6]
32 152,418,873 ( 1.99%) ./malloc/./malloc/malloc.c:malloc_consolidate [/usr/lib/x86_64-linux-gnu/libc.so.6]
33 107,482,497 ( 1.40%) ./string/../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:__memcmp_avx2_movbe [/usr/lib/x86_64-linux-gnu/libc.so.6]
34 105,946,368 ( 1.38%) ???:operator new(unsigned long) [/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30]
35 96,170,106 ( 1.26%) ???:hobot::pack_sdk::Meta::GetDataIndex(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) const [/home/lzy/work/acvi te_code/master/mfc5j3_appsw_libconvert/build/ubuntu/output/bin/comm_plugin_manager]
36 94,600,118 ( 1.23%) ./malloc/./malloc/arena.c:free
37 82,278,706 ( 1.07%) ???:std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare(char const*) const [/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30] 38 60,595,452 ( 0.79%) ???:ObstacleProto::WorldSpaceInfo::MergeFrom(ObstacleProto::WorldSpaceInfo const&) [/home/lzy/work/acvite_code/master/mfc5j3_appsw_libconvert/build/ubuntu/outp ut/lib/libspi-protocol-convert.so]
39 58,554,462 ( 0.76%) ./malloc/./malloc/malloc.c:unlink_chunk.constprop.0 [/usr/lib/x86_64-linux-gnu/libc.so.6]
40 54,806,504 ( 0.72%) ???:hobot::pack_sdk::Meta::GetTopicMeta(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, std::vector<long, std::all ocator<long> >*, std::vector<long, std::allocator<long> >*, std::vector<void const*, std::allocator<void const*> >*, std::vector<unsigned long, std::allocator<unsigned long> >*) const [/home/lzy/work/acvite_code/master/mfc5j3_appsw_libconvert/build/ubuntu/output/bin/comm_plugin_manager]
41 52,638,193 ( 0.69%) ???:ObstacleProto::Obstacle::MergeFrom(ObstacleProto::Obstacle const&) [/home/lzy/work/acvite_code/master/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libsp i-protocol-convert.so]
42 52,069,931 ( 0.68%) ???:std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::al locator<char> > const&) [/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30]
43 47,538,816 ( 0.62%) ???:PerceptionBaseProto::Point::MergeFrom(PerceptionBaseProto::Point const&) [/home/lzy/work/acvite_code/master/mfc5j3_appsw_libconvert/build/ubuntu/output/lib /libspi-protocol-convert.so]
44 46,026,687 ( 0.60%) ./string/../sysdeps/x86_64/multiarch/strlen-avx2.S:__strlen_avx2 [/usr/lib/x86_64-linux-gnu/libc.so.6]
Analysis Fragment One
In this report, the percentages in parentheses represent the percentage of "Ir" (instruction read) events. This is a measure of how many times instructions in this function were fetched (and possibly executed) as a percentage of the total number of instruction fetches during program execution.
In this context, "Ir" stands for "Instruction read", which is the number of instructions read. This number indicates how many instructions were fetched by the processor during program execution.
The line "7,662,006,442 (100.0%) PROGRAM TOTALS" indicates that a total of 7,662,006,442 instructions were read during the entire program execution.
For example, for the following line:
1,335,146,312 (17.43%) ./string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:__memset_avx2_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc.so.6]
This means that __memset_avx2_unaligned_erms
the instructions in the function were fetched 1,335,146,312 times, which is 17.43% of the total instruction fetches.
This metric can help you understand where your program's performance bottlenecks are. If a function has a high percentage of instruction fetches, then it probably means that this function is a hotspot in your program, and you may need to optimize this function to improve the performance of your program.
Note that this metric does not directly indicate how many times a function was called or how long it took. The number of instruction fetches for a function may vary due to factors such as its size, the number of times it is called, its code complexity, etc. If you want to know how many times a function is called or how long it takes, you may need to use other profiling tools or techniques.
Performance Analysis Report Fragment Example 2
. void IdAdaptor::UpdateMap(
. std::map<uint32_t, uint8_t> ¤t_map_,
. std::map<uint32_t, uint8_t> &history_map_,
. const std::vector<ObstacleProto::Obstacle> &object,
. uint16_t trackAge[],
40,766 ( 0.00%) uint8_t trackAgeSize) {
. // clear the current_map_
. current_map_.clear();
2,398 ( 0.00%) int objects_size = object.size();
. std::vector<uint32_t> new_id;
. std::vector<uint8_t> vec_current_id;
19,184 ( 0.00%) if ((trackAge == nullptr)|| (objects_size <= 0) || (trackAgeSize <= 0)) {
. return;
. }
Analysis Fragment Two
In the report, the percentages in parentheses still represent the percentage of "Ir" (instruction read) events. This is a measure of how many times instructions in this function were fetched (and possibly executed) as a percentage of the total number of instruction fetches during program execution.
For your function IdAdaptor::UpdateMap
, its instruction fetches as a percentage of total instruction fetches is 0.00%. This means that during the execution of your program, the number of times the instructions in this function are read is very small compared to the number of instruction reads in the entire program.
This could be because the function is called less often, or the code for this function is shorter, or the execution of this function is optimized. In either case, this means that this function is probably not your program's performance bottleneck.
Note that this metric does not directly indicate how many times a function was called or how long it took. The number of instruction fetches for a function may vary due to factors such as its size, the number of times it is called, its code complexity, etc. If you want to know how many times a function is called or how long it takes, you may need to use other profiling tools or techniques.
Comparison with other tools
perf
andValgrind
perf
Callgrind and Valgrind are performance analysis tools, but they have some important differences in design and usage.
- Data collection mode :
perf
is an event-based sampling analyzer that periodically checks the state of the system (for example, every certain number of CPU cycles) and records the currently executing functions. This approach can provide information about which functions spend the most time on the CPU, but it may miss some short but frequent function calls. Callgrind, on the other hand, is a simulation-based profiler that records all function calls and instruction reads of a program, which can provide more detailed information, but can also result in greater performance overhead. - Available Information :
perf
Various types of performance events can be collected, including CPU cycles, cache hits/misses, branch mispredictions, and more. This can help you understand how your program behaves at the hardware level. Callgrind, on the other hand, focuses on the behavior of the program, such as function calls and instruction fetches. It can also provide some information about memory usage and data flow. - Ease of use and portability :
perf
Part of Linux, it is available on all Linux systems, but may not be available on other operating systems. Valgrind, on the other hand, is a standalone tool that runs on a variety of operating systems, including Linux, macOS, and Windows (via Cygwin or WSL).
Overall, perf
Callgrind and Callgrind are powerful tools that can provide different types of profiling information. Which tool to choose depends on your specific needs. In some cases, you may find that using both tools together provides the most comprehensive performance analysis.
gperf
andValgrind
gprof
Valgrind and Valgrind are two different performance analysis tools with some important differences in their design and use.
- Data Collection Mode :
gprof
is a sampling-based profiler that works by periodically interrupting program execution and recording the currently executing function. This approach can provide information about which functions spend the most time on the CPU, but it may miss some short but frequent function calls. On the other hand, Valgrind's Callgrind tool is a simulation-based profiler that records all function calls and instruction reads of a program, which can provide more detailed information, but can also result in greater performance overhead. - Available information :
gprof
Provides information on the number of function calls and the cumulative execution time of each function, which can help you understand which functions take up the most time in program execution. Callgrind, on the other hand, provides more detailed information, including the number of instruction fetches, cache hits/misses, branch mispredictions, etc. for each function. This can help you gain a deeper understanding of your program's behavior and performance bottlenecks. - Ease of use and portability :
gprof
is part of GNU binutils, which is available on all systems that support the GNU toolchain. However, in order to use itgprof
, you need to compile your program with-pg
the option, which may make your program run slower. Valgrind, on the other hand, is a standalone tool that runs on a variety of operating systems, including Linux, macOS, and Windows (via Cygwin or WSL). Using Valgrind does not require modifying your compile options, but its performance overhead is usuallygprof
greater than that of .
Overall, gprof
Valgrind and Valgrind are powerful tools that can provide different types of profiling information. Which tool to choose depends on your specific needs. In some cases, you may find that using both tools together provides the most comprehensive performance analysis.
comparison chart
tool | Data collection method | available information | Ease of use and portability | platform | Required compile options |
---|---|---|---|---|---|
Election gate | Mock-based profiler that logs all function calls and instruction reads | Provides detailed information, including the number of instruction fetches, cache hits/misses, branch mispredictions, and more for each function | Runs on multiple operating systems including Linux, macOS and Windows (via Cygwin or WSL) | Linux, macOS, Windows (Cygwin, WSL) | none |
gprof | Sampling-based profiler by periodically interrupting program execution and recording the currently executing function | Provides information on the number of function calls and the cumulative execution time of each function | It is part of GNU binutils and needs to be compiled with -pg options |
All systems that support the GNU toolchain | -pg |
perf | An event-based sampling analyzer that periodically checks the state of the system and records the currently executing functions | Various types of performance events can be collected, including CPU cycles, cache hits/misses, branch mispredictions, etc. | is part of Linux and is only available on Linux systems | Linux | none |
Intel VTune | An event-based sampling analyzer that periodically checks the state of the system and records the currently executing functions | Provides detailed performance analysis information, including function execution time, CPU cache hit rate, branch prediction accuracy, etc. | is a commercial product of Intel and is only available on systems that support Intel CPUs | Windows, Linux, macOS | none |
Visual Studio Profiler | An event-based sampling analyzer that periodically checks the state of the system and records the currently executing functions | Provides detailed performance analysis information, including function execution time, CPU usage, memory usage, etc. | is part of Visual Studio and is only available on Windows systems | Windows | none |
Instruments (macOS) | An event-based sampling analyzer that periodically checks the state of the system and records the currently executing functions | Provides detailed performance analysis information, including function execution time, CPU usage, memory usage, etc. | is part of Xcode and is only available on macOS systems | macOS | none |
Precautions
End the process gracefully
Do not use kill -9
kill -9
Killing a running Valgrind process with will affect the detection results. kill -9
will kill the process immediately without giving it a chance to do any cleanup. In the case of Valgrind, this means that it may not be able to generate a full report, as it usually generates reports when the instrumented program ends normally.
If you need to stop Valgrind while it is running, your best bet is to try sending a TERM signal with kill
the command (without -9
the option). This will ask Valgrind to terminate gracefully, it should be able to generate a report, and then exit.
kill <valgrind_pid>
However, you may want to use TERM if the Valgrind process fails to respond to the TERM signal for some reason kill -9
. Note that this may cause Valgrind to fail to generate reports, or the generated reports may be incomplete.
In general, if possible, you should avoid killing Valgrind while it is running to ensure that you can get complete and accurate reports. If you need to stop Valgrind while it is running, you should try to use kill
the command (without -9
the option) to terminate it gracefully.
The different tools of Valgrind (such as Memcheck, Callgrind, Massif, etc.) cannot run at the same time. Every time you run Valgrind, you must choose a tool to use. This is because each tool has its own specific goals and methods, and they cannot be applied to the same running instance of the program at the same time.
Use Ctrl+Z
cannot be terminated
Using Ctrl+Z
on the command line will send a SIGSTOP signal to the current foreground process (Valgrind in this case). This causes the process to suspend execution, but does not terminate it. You can use fg
the command to resume a suspended process.
In the case of Valgrind, using Ctrl+Z
does not directly affect its detection results, since it just suspends the process, not terminates it. However, if you modify the state of the program it is instrumenting while Valgrind is paused, or perform other operations that may affect the instrumentation results, this may affect Valgrind's reports.
In general, if you just want to temporarily stop Valgrind and resume later, using Ctrl+Z
is fine. However, you should avoid doing anything that might affect the detection results while Valgrind is paused.
The TERM signal should be sent with the default parameters of killall or kill
-
kill
command to send a signal to the specified process ID. For example,kill 12345
the TERM signal is sent to process ID 12345. -
killall
command to send a signal to all processes whose names match. For example,killall myprogram
would send the TERM signal to all processes named "myprogram".If no signal is specified,
kill
bothkillall
default to sending the TERM signal, which is a signal that requests the process to terminate gracefully. You can use-s
options to specify which signal to send, e.g.,kill -s HUP 12345
orkillall -s HUP myprogram
.
tools cannot be used at the same time
The different tools of Valgrind (such as Memcheck, Callgrind, Massif, etc.) cannot run at the same time. Every time you run Valgrind, you must choose a tool to use. This is because each tool has its own specific goals and methods, and they cannot be applied to the same running instance of the program at the same time.
For example, if you want to perform memory leak detection, you should use the Memcheck tool:
valgrind --tool=memcheck --leak-check=yes your_program [your_program_arguments]
If you want to perform profiling, you should use the Callgrind tool:
valgrind --tool=callgrind your_program [your_program_arguments]
If you want to do memory leak detection and profiling at the same time, you need to run Valgrind twice, once with Memcheck and once with Callgrind.
Note that while this may take more time, this ensures that the results you get are accurate and reliable. If you try to run both tools at the same time, it may cause confusing and inaccurate results.
epilogue
Comprehension is an important step towards the next level in our programming learning journey. However, mastering new skills and ideas always takes time and persistence. From a psychological point of view, learning is often accompanied by continuous trial and error and adjustment, which is like our brain gradually optimizing its "algorithm" for solving problems.
That's why when we encounter mistakes, we should see them as opportunities to learn and improve, not just obsessions. By understanding and solving these problems, we can not only fix the current code, but also improve our programming ability and prevent the same mistakes from being made in future projects.
I encourage everyone to actively participate and continuously improve their programming skills. Whether you are a beginner or an experienced developer, I hope my blog can help you in your learning journey. If you find this article useful, please click to bookmark it, or leave your comments to share your insights and experiences. You are also welcome to make suggestions and questions about the content of my blog. Every like, comment, share and follow is the greatest support for me and the motivation for me to continue to share and create.
Read my CSDN homepage to unlock more exciting content: Bubble's CSDN homepage