[Valgrind, a tool for C/C++ integrated memory debugging, memory leak detection and performance analysis] A comprehensive guide to the use of Valgrind tools under Linux


insert image description here
insert image description here

The pronunciation is [wɑːɡrɪnd].

Installation of Valgrind tools

Valgrind is an open source tool for memory debugging, memory leak detection, and performance analysis. Here are the steps to install Valgrind on Ubuntu or other Debian systems:

  1. Open a terminal.
  2. First, you need to update your system package list. The following commands can be used:
    sudo apt-get update
    
  3. Then, install Valgrind with the following command:
    sudo apt-get install valgrind
    
  4. Once the installation is complete, you can verify that Valgrind was installed successfully with the following command:
    valgrind --version
    
    If Valgrind was successfully installed, this will display the version number of Valgrind.

For other Linux distributions such as Fedora or CentOS, you can install Valgrind using the corresponding package manager such as yum or dnf. For example, on Fedora, you can install Valgrind with:

sudo dnf install valgrind

On macOS, you can use Homebrew to install Valgrind:

brew install valgrind

Note that Valgrind may not support recent versions of macOS.

On Windows, Valgrind is not directly available, but you can use it through the Windows Subsystem for Linux (WSL).

Cross compile Valgrind

Valgrind's source code includes all the libraries it needs to run, so in most cases you can compile and run Valgrind without additional dependencies. However, some basic development tools are required to compile Valgrind, including a C compiler (such as gcc) and the make tool.

If you plan to cross-compile Valgrind on a different platform, you will need a cross-compiler configured for that platform. You also need to make sure that your build environment includes all the headers and libraries Valgrind needs.

Here are the basic steps to cross-compile Valgrind:

  1. First, download the source code for Valgrind. You can download the latest source code from Valgrind's official website: http://valgrind.org/downloads/current.html
  2. Unzip the source code package and enter the source code directory:
    tar xvf valgrind-3.17.0.tar.bz2
    cd valgrind-3.17.0
    
    (Please adjust the version number in the above command according to the version of Valgrind you downloaded)
  3. Configure the compilation environment. You need to specify your cross compiler and target platform. For example, if your cross compiler is arm-linux-gnueabi-gcc, and your target platform is arm-linux, you can use the following command:
    ./configure --host=arm-linux CC=arm-linux-gnueabi-gcc
    
  4. Compile Valgrind:
    make
    
  5. Finally, you can copy the compiled Valgrind to your target system, or use make installthe command to install it into your cross-compilation environment.

Note that this is just a basic example, and you may need to adjust these steps based on your specific needs and your cross-compilation environment.

What the Valgrind tool does

Valgrind is a very powerful tool mainly used for memory management bug detection, and CPU and memory analysis. Here are some basic usages:

  1. memory leak detection

    This is one of the most commonly used features of Valgrind. You can check your program for memory leaks with the following command:

    valgrind --leak-check=yes your_program [your_program_arguments]
    

    This will run your program and report any memory leaks after the program ends. --leak-check=yesoption tells Valgrind to check for memory leaks.

  2. Stack analysis with Massif

    Massif is a Valgrind tool for analyzing how much stack your program uses. You can run Massif with the following command:

    valgrind --tool=massif your_program [your_program_arguments]
    

    This will generate a massif.out.pidfile called , where pidis your program's process ID. You can ms_printview the contents of this file with the command:

    ms_print massif.out.pid
    
  3. Profiling with Callgrind

    Callgrind is a Valgrind tool for profiling the performance of your programs. You can run Callgrind with the following command:

    valgrind --tool=callgrind your_program [your_program_arguments]
    

    This will generate a callgrind.out.pidfile called , where pidis your program's process ID. You can callgrind_annotateview the contents of this file with the command:

    callgrind_annotate callgrind.out.pid
    

The above are just some basic usages of Valgrind. Valgrind has many other functions and options, you can refer to the official documentation of Valgrind to learn more: http://valgrind.org/docs/manual/index.html

Memcheck memory leak detection tool

Please note that you need to ensure that your program and dynamic library are -gcompiled with debugging information (for example, using the gcc option), so that Valgrind can provide more detailed reports.

Routine inspection (generate report after program ends)

Valgrind's memory leak detection tool Memcheck can be accurate to the line number of the source code and tell you which line of code allocated memory that was not released correctly. However, in order to be able to do this, you need to include debug information when compiling your program.

If you compiled your program with GCC or Clang, you can add -gthe option to include debug information:

gcc -g -o your_program your_program.c

Then, you can run your program with Valgrind:

valgrind --tool=memcheck  --leak-check=full ./your_program

In the reported results, Valgrind will show the line of code that caused the memory leak. For example:

==12345== 40 bytes in 1 blocks are definitely lost in loss record 1 of 2
==12345==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12345==    by 0x108671: main (your_program.c:2)

In this example, your_program.c:2it means that the memory leak occurs on your_program.cline 2 of the file.

Note that some optimizations may make line number information inaccurate if your program is compiled with optimization options such as -O2or . -O3If possible, you should compile your program without optimization for memory leak detection.

Important parameters


  • --leak-check=yeswill tell Valgrind to do memory leak detection, but it will only provide aggregate information for each leak point, such as the total number of bytes leaked and the number of blocks leaked.
  • --leak-check=fullMore detailed information will be provided. In addition to overall information about the leak point, it also displays information about each individual leaked block, including its size and the stack trace of the function that allocated it. This can help you pinpoint the location of the memory leak more precisely.
  • --show-leak-kinds=allShows all types of memory leaks, including "definitely lost", "indirectly lost", "possibly lost", and "still reachable".
  • --num-callers=nIncrease the depth of the call stack.

--leak-check=fullYou'll get more verbose output if you use :

==12345== 128 bytes in 1 blocks are definitely lost in loss record 1 of 1
==12345==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12345==    by 0x108671: func (your_program.c:4)
==12345==    by 0x108687: main (your_program.c:8)

This shows that the memory leak is happening at the line funcin the function , which can help you pinpoint the problem more precisely.your_program.c:4

long running service

Valgrind by default reports memory leaks and other problems at the end of the program. However, if your program is a long-running service or you wish to view reports while it is running, you can use Valgrind's gdbserver mode, which allows you to interact with Valgrind at runtime.

Following are the basic steps to use gdbserver mode:

  1. First, start your program, using --vgdb=yesthe option to tell Valgrind to start gdbserver on startup:
    valgrind --vgdb=yes --leak-check=full your_program [your_program_arguments]
    
  2. In another terminal, you can use gdb to connect to Valgrind. First, you need to find the process ID (PID) of your program. Then, connect to Valgrind with the following command:
    gdb your_program
    (gdb) target remote | vgdb
    
    This will start gdb and connect to Valgrind.
  3. Now, you can use the command in gdb monitorto interact with Valgrind. For example, you can use monitor leak_check full reachable anythe command to check for memory leaks at runtime:
    (gdb) monitor leak_check full reachable any
    
    This will tell Valgrind to immediately perform a full memory leak check and report all reachable and unreachable memory leaks.

Note that this is just a basic example, and you may need to adapt these steps to your specific needs. You can refer to Valgrind's official documentation to learn more about gdbserver mode: http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver

make the report output to a file

By default, memory leaks will be directly output to the console following the process, but if --show-leak-kinds=allthe parameter is enabled, the report will reach tens of thousands of lines, and then it needs to be output to a file

You can redirect Valgrind's output to a file. >You can use operators on the command line to do this. Here is an example:

valgrind --leak-check=full --show-leak-kinds=all your_program > output.txt

This command will save the output of Valgrind to output.txta file. You can output.txtreplace with any filename you want.

Additionally, Valgrind provides an --log-fileoption that you can use to specify the output file. Here is an example:

valgrind --leak-check=full --show-leak-kinds=all --log-file=output.txt your_program

This command has the same effect as the above command, which saves the output of Valgrind to output.txta file.

report analysis

example one

==4197== LEAK SUMMARY:
==4197==    definitely lost: 6,624 bytes in 2 blocks
==4197==    indirectly lost: 0 bytes in 0 blocks
==4197==      possibly lost: 12,864 bytes in 34 blocks
==4197==    still reachable: 404,895,424 bytes in 504,849 blocks
==4197==                       of which reachable via heuristic:
==4197==                         multipleinheritance: 240 bytes in 1 blocks
==4197==         suppressed: 0 bytes in 0 blocks
==4197== Reachable blocks (those to which a pointer was found) are not shown.
==4197== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==4197==
==4197== For lists of detected and suppressed errors, rerun with: -s
==4197== ERROR SUMMARY: 32 errors from 32 contexts (suppressed: 0 from 0)

Analysis_Example 1

The following is an analysis of this report:

  • definitely lost: 6,624 bytes in 2 blocks: This indicates that 6624 bytes of memory were deterministically lost in 2 blocks. This usually means that the memory has been allocated, but not freed, and the program cannot access the memory again. This is a serious memory leak.
  • indirectly lost: 0 bytes in 0 blocks: This indicates that there is no indirectly lost memory. Indirectly lost memory means that directly lost memory blocks are also considered lost because they contain pointers to them.
  • possibly lost: 12,864 bytes in 34 blocks: This indicates that 12864 bytes of memory may be lost in 34 blocks. Potentially lost memory means that the memory may still be in use or may have been lost.
  • still reachable: 404,895,424 bytes in 504,849 blocks: This means that 404895424 bytes of memory are still accessible in 504849 blocks. This memory is not freed when the program ends, but the program can still access it if needed. This isn't necessarily a problem, but if the number keeps growing, it could lead to a memory leak.
  • suppressed: 0 bytes in 0 blocks: This indicates that no errors were suppressed. Suppressed errors are those that the user has explicitly told Valgrind to ignore.
  • ERROR SUMMARY: 32 errors from 32 contexts: This indicates that Valgrind found 32 bugs in 32 different contexts.

This report indicates that your program may have a memory leak. You should check your code, especially the parts that allocate and deallocate memory, to make sure that all allocated memory is properly freed when you're done using it.

Here are a few keywords you should focus on when reviewing reports:

  • "definitely lost": This means that memory was allocated but not released, and the program cannot access it again. This is a serious memory leak.
  • "possibly lost": This indicates that memory may still be in use, or it may have been lost. This situation usually occurs when the program uses complex data structures, such as circular linked lists.
  • "still reachable": This means that there is memory that was not freed at the end of the program, but the program can still access it if needed. This isn't necessarily a problem, but if the number keeps growing, it could lead to a memory leak.
  • "indirectly lost": This means that there is memory because the directly lost memory blocks contained pointers to this memory, so that memory is also considered lost.

Example two

==4761== 320 bytes in 1 blocks are possibly lost in loss record 9,331 of 11,803
==4761==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==4761==    by 0x40147D9: calloc (rtld-malloc.h:44)
==4761==    by 0x40147D9: allocate_dtv (dl-tls.c:375)
==4761==    by 0x40147D9: _dl_allocate_tls (dl-tls.c:634)
==4761==    by 0x4F08834: allocate_stack (allocatestack.c:430)
==4761==    by 0x4F08834: pthread_create@@GLIBC_2.34 (pthread_create.c:647)
==4761==    by 0xCEAF03A: zmq::thread_t::start(void (*)(void*), void*) (in /home/lzy/work/acvite_code/test/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libspi-protocol-convert.so)
==4761==    by 0xCE9FB3B: zmq::epoll_t::start() (in /home/lzy/work/acvite_code/test/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libspi-protocol-convert.so)
==4761==    by 0xCE8A35E: zmq::reaper_t::start() (in /home/lzy/work/acvite_code/test/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libspi-protocol-convert.so)
==4761==    by 0xCE7C833: zmq::ctx_t::create_socket(int) (in /home/lzy/work/acvite_code/test/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libspi-protocol-convert.so)
==4761==    by 0xCE7A2C5: zmq_socket (in /home/lzy/work/acvite_code/test/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libspi-protocol-convert.so)
==4761==    by 0xCE633EA: PubCANOutput::InitPub() (spi-service-protocol.h:33)
==4761==    by 0xCE63A93: PubCANOutput::IpcCANOutputPub(unsigned char const*, int) (spi-service-protocol.h:65)
==4761==    by 0xCE60F90: Send (spi-protocol-convert.h:22)
==4761==    by 0xCE60F90: Send (protocol-convert-base.h:49)
==4761==    by 0xCE60F90: HobotADAS::SPIProtocolConvert::OnMessageEnd(long const&, long const&) (spi-protocol-convert.cc:134)
==4761==    by 0x2AE98C: HobotADAS::CommPluginManager::ProcessMsg() (comm_plugin_manager.cpp:321)
==4761==
==4761== 320 bytes in 1 blocks are possibly lost in loss record 9,332 of 11,803
==4761==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==4761==    by 0x40147D9: calloc (rtld-malloc.h:44)
==4761==    by 0x40147D9: allocate_dtv (dl-tls.c:375)
==4761==    by 0x40147D9: _dl_allocate_tls (dl-tls.c:634)
==4761==    by 0x4F08834: allocate_stack (allocatestack.c:430)
==4761==    by 0x4F08834: pthread_create@@GLIBC_2.34 (pthread_create.c:647)
==4761==    by 0xCEAF03A: zmq::thread_t::start(void (*)(void*), void*) (in /home/lzy/work/acvite_code/test/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libspi-protocol-convert.so)
==4761==    by 0xCE9FB3B: zmq::epoll_t::start() (in /home/lzy/work/acvite_code/test/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libspi-protocol-convert.so)
==4761==    by 0xCE8281C: zmq::io_thread_t::start() (in /home/lzy/work/acvite_code/test/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libspi-protocol-convert.so)
==4761==    by 0xCE7C916: zmq::ctx_t::create_socket(int) (in /home/lzy/work/acvite_code/test/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libspi-protocol-convert.so)
==4761==    by 0xCE7A2C5: zmq_socket (in /home/lzy/work/acvite_code/test/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libspi-protocol-convert.so)
==4761==    by 0xCE633EA: PubCANOutput::InitPub() (spi-service-protocol.h:33)
==4761==    by 0xCE63A93: PubCANOutput::IpcCANOutputPub(unsigned char const*, int) (spi-service-protocol.h:65)
==4761==    by 0xCE60F90: Send (spi-protocol-convert.h:22)
==4761==    by 0xCE60F90: Send (protocol-convert-base.h:49)
==4761==    by 0xCE60F90: HobotADAS::SPIProtocolConvert::OnMessageEnd(long const&, long const&) (spi-protocol-convert.cc:134)
==4761==    by 0x2AE98C: HobotADAS::CommPluginManager::ProcessMsg() (comm_plugin_manager.cpp:321)

Analysis_Example 2

This report shows possible memory leaks that occur in libspi-protocol-convert.socertain function calls in the library. Specifically, these functions include zmq::thread_t::start(), zmq::epoll_t::start(), zmq::reaper_t::start(), zmq::ctx_t::create_socket(), zmq_socket, PubCANOutput::InitPub(), PubCANOutput::IpcCANOutputPub(), and HobotADAS::SPIProtocolConvert::OnMessageEnd(). These functions may pass or allocate memory
during the call , but may not release the memory correctly. However, it is important to note that these memory leaks are marked as "possibly missing", which means that Valgrind cannot determine whether these memory leaks are actually leaks. Sometimes this can be due to complex memory management strategies or the behavior of specific library functions (for example, certain thread functions). So, while reports suggest there may be a memory leak, it's not definitive. You may need to further examine your code, especially those parts that involve memory allocation and deallocation, to determine if there is indeed a memory leak.callocpthread_create@@GLIBC_2.34

libspi-protocol-convert.so

Massif stack detection tool

Massif is a tool of Valgrind, which is mainly used to analyze the heap memory usage of the program during operation. It can help developers find out the problem that the program consumes too much memory during the running process, especially when the memory can be released normally after the program ends, but the memory continues to grow during the running process.

Basic use of Massif

You can run Massif with the following command:

valgrind --tool=massif your_program [your_program_arguments]

This will generate a massif.out.pidfile called , where pidis your program's process ID. You can ms_printview the contents of this file with the command:

ms_print massif.out.pid

This will show you the memory usage of your program while it is running. You can check this report to see if there are any unexpected spikes in memory usage, or if memory usage continues to increase over time.

Limitations of Massif

Note that Massif can only measure heap memory usage, not stack memory or other types of memory. If your program has problems with other types of memory, you may need to use other tools or techniques to detect it.

Advanced use of Massif

Although Massif can provide the overall memory usage of the program, it cannot directly tell you which module or which piece of code is continuously requesting memory. However, you can use some additional options and tools to get more detailed information.

use --alloc-fnoption

If you know which function is allocating memory (for example, if your module has a specific memory allocation function), you can use the --alloc-fnoption to tell Massif to track memory allocations for this function:

valgrind --tool=massif --alloc-fn=my_alloc your_program [your_program_arguments]

This will cause Massif to my_alloccredit all memory allocated by the function to my_allocthe caller of the .

use --pages-as-heapoption

This option will cause Massif to treat all memory pages as heap, which allows you to see all memory allocations, not just memory allocated by malloc, newetc. functions:

valgrind --tool=massif --pages-as-heap=yes your_program [your_program_arguments]

Profiling with Callgrind

Although Callgrind is primarily used for profiling, it can also display per-function memory usage. You can use Callgrind to see which function allocates the most memory:

valgrind --tool=callgrind your_program [your_program_arguments]

You can then use kcachegrindor another Callgrind data viewer to view the results.

Note that these methods may require some understanding of your code and its memory usage patterns. This can be difficult if your code is complex, or if it uses many different memory allocation functions.

view report

Massif is different from some other memory detection tools in that it generates a report while the program is running, rather than waiting for the program to end.

Massif's report is a text file usually named massif.out.pid, where pidis the process ID of the running program. This report file contains detailed information about the memory usage of the program while it is running.

Each row in the report represents a sampling point, showing the heap memory usage of the program at that moment. Each sampling point contains the following information:

  • Time: the time from the start of the program to the sampling point.
  • Memory usage: At this sampling point, the total amount of heap memory used by the program.
  • Stack trace: At this sampling point, the program's stack trace information, showing which function calls lead to memory allocation.

You can ms_printview and parse this report file with the command:

ms_print massif.out.pid

ms_printwill format and output the contents of the report file to the console, making it easier to read and understand. The output includes a graph of memory usage and details for each sampling point.

Callgrind Performance Analysis Tool

Valgrind's Callgrind tool is mainly used to collect runtime behavior information of the program, including the number of function calls, the number of instruction reads, and so on. However, it does not directly measure the execution time of a function. While instruction fetch counts and function call counts can provide some information about program performance, they don't directly tell you which functions are time-consuming.

command read times

"Instruction Fetches" is a metric that represents the number of times an instruction in a particular function was read (and possibly executed) during program execution. This metric can help you understand which functions are frequently executed in your program.

However, the number of instruction fetches is not directly equal to the execution time of the function. A function may have many instructions, but if those instructions are executed quickly, the execution time of the function may still be short. On the other hand, a function may have only a small number of instructions, but if those instructions take a long time to execute (for example, if they involve disk I/O or network communication), then the function's execution time may be very long.

So while instruction fetch counts can provide some information about a program's performance, it doesn't directly tell you which functions are time-consuming. If you want to know the execution time of a function, you may need to use other profiling tools or techniques, such as a CPU sampling profiler or a timer.

use parameters

Valgrind's Callgrind tool has many options that can be used to customize its behavior and output. Here are some options that may work for you:

  • --dump-instr=yes: This option tells Callgrind to collect per-instruction information, not just per-function information. This can give you a more detailed understanding of your program's behavior, but also makes Callgrind run slower and generate larger output files.
  • --collect-jumps=yes: This option lets Callgrind collect jump information in the program. This can help you understand your program's control flow, but also makes Callgrind run slower and generate larger output files.
  • --branch-sim=yes: This option tells Callgrind to simulate the branch prediction of the program. This can help you understand your program's branch prediction efficiency, but will also make Callgrind run slower.

You can add these options when running Callgrind, for example:

valgrind --tool=callgrind --dump-instr=yes --collect-jumps=yes your_program [your_program_arguments]

You can then use callgrind_annotateor kcachegrindto view the results. These tools can display detailed information collected by Callgrind, including per-instruction information and jump information.

Note that these options may make Callgrind run more slowly and produce larger output files. You should only use these options when you need these details.

report generation

You can callgrind_annotateredirect the output of to a file. On Unix-like systems (such as Linux or macOS), you can use >the symbol to redirect output. For example:

callgrind_annotate callgrind.out.pid > output.txt

This will write callgrind_annotatethe output to output.txtthe file instead of displaying it on the terminal. If output.txtthe file already exists, this command will overwrite its contents; if the file does not exist, this command will create it.

If you want to append the output to an existing file instead of overwriting it, you can use >>the notation:

callgrind_annotate callgrind.out.pid >> output.txt

This will callgrind_annotateappend the output of to output.txtthe end of the file.

Note that these commands may not work from the command prompt on Windows systems. If you are running Valgrind on a Windows system (for example, via WSL or Cygwin), you should use these commands in the corresponding Unix-like environment.

Visualization Open Report

kcachegrindand qcachegrindare two tools for visualizing profiling data that can read the output generated by Valgrind's Callgrind tool and generate detailed call graphs.

Here are the basic steps on how to use both tools:

  1. Generating Callgrind output : First, you need to use the Callgrind tool to generate profiling data. You can run your program and generate Callgrind output with the following command:
    valgrind --tool=callgrind your_program
    
    In this command, your_programis the program you want to analyze. This command produces a callgrind.out.pidfile named , where pidis the process ID of your program.
  2. Install kcachegrind or qcachegrind : Then, you need to install kcachegrindor qcachegrind. You can install both tools through your package manager. For example, if you are using Ubuntu, you can install it with the following command kcachegrind:
    sudo apt-get install kcachegrind
    
    Alternatively, you can install it with the following command qcachegrind:
    sudo apt-get install qcachegrind
    
  3. Turn on Callgrind's output : Finally, you can use kcachegrindor qcachegrindto turn on Callgrind's output. You can open the file with the following command callgrind.out.pid:
    kcachegrind callgrind.out.pid
    
    or:
    qcachegrind callgrind.out.pid
    
    In both commands, callgrind.out.pidis the output file of Callgrind.

kcachegrindand qcachegrindwill display a detailed call graph, which you can use to view your program's performance bottlenecks. You can click on the nodes in the graph to view detailed call information, including the number of calls, CPU time consumed, etc.

Note: Flame graphs cannot be generated

callgrind_annotateFlame graph generation is not natively supported. A flame graph is a visualization tool for displaying profiling data of a program. They are typically used to display CPU usage, but can also be used to display other types of performance data.

gprof2dotIt is a tool for generating call graphs. It can accept the output of various performance analysis tools, including gprof, oprofile, HProf, Xdebug, Visual Studio, VTune, etc., but it cannot directly parse the output callgrind_annotate.

report analysis

Performance Analysis Report Fragment Example 1

 --------------------------------------------------------------------------------
   2 Profile data file 'callgrind.out.3896' (creator: callgrind-3.18.1)
   3 --------------------------------------------------------------------------------
   4 I1 cache:
   5 D1 cache:
   6 LL cache:
   7 Timerange: Basic block 0 - 3688459443
   8 Trigger: Program termination
   9 Profiled target:  ./comm_plugin_manager (PID 3896, part 1)
  10 Events recorded:  Ir
  11 Events shown:     Ir
  12 Event sort order: Ir
  13 Thresholds:       99
  14 Include dirs:
  15 User annotated:
  16 Auto-annotation:  on
  17
  18 --------------------------------------------------------------------------------
  19 Ir
  20 --------------------------------------------------------------------------------
  21 7,662,006,442 (100.0%)  PROGRAM TOTALS
  22
  23 --------------------------------------------------------------------------------
  24 Ir                      file:function
  25 --------------------------------------------------------------------------------
  26 1,335,146,312 (17.43%)  ./string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:__memset_avx2_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc.so.6]
  27 1,255,927,115 (16.39%)  ./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc.so.6]
  28   767,318,807 (10.01%)  ./malloc/./malloc/malloc.c:_int_free [/usr/lib/x86_64-linux-gnu/libc.so.6]
  29   748,924,400 ( 9.77%)  ./malloc/./malloc/malloc.c:_int_malloc [/usr/lib/x86_64-linux-gnu/libc.so.6]
  30   424,929,976 ( 5.55%)  ./malloc/./malloc/malloc.c:malloc [/usr/lib/x86_64-linux-gnu/libc.so.6]
  31   191,346,740 ( 2.50%)  ./malloc/./malloc/malloc.c:free [/usr/lib/x86_64-linux-gnu/libc.so.6]
  32   152,418,873 ( 1.99%)  ./malloc/./malloc/malloc.c:malloc_consolidate [/usr/lib/x86_64-linux-gnu/libc.so.6]
  33   107,482,497 ( 1.40%)  ./string/../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:__memcmp_avx2_movbe [/usr/lib/x86_64-linux-gnu/libc.so.6]
  34   105,946,368 ( 1.38%)  ???:operator new(unsigned long) [/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30]
  35    96,170,106 ( 1.26%)  ???:hobot::pack_sdk::Meta::GetDataIndex(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) const [/home/lzy/work/acvi     te_code/master/mfc5j3_appsw_libconvert/build/ubuntu/output/bin/comm_plugin_manager]
  36    94,600,118 ( 1.23%)  ./malloc/./malloc/arena.c:free
  37    82,278,706 ( 1.07%)  ???:std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare(char const*) const [/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30]  38    60,595,452 ( 0.79%)  ???:ObstacleProto::WorldSpaceInfo::MergeFrom(ObstacleProto::WorldSpaceInfo const&) [/home/lzy/work/acvite_code/master/mfc5j3_appsw_libconvert/build/ubuntu/outp     ut/lib/libspi-protocol-convert.so]
  39    58,554,462 ( 0.76%)  ./malloc/./malloc/malloc.c:unlink_chunk.constprop.0 [/usr/lib/x86_64-linux-gnu/libc.so.6]
  40    54,806,504 ( 0.72%)  ???:hobot::pack_sdk::Meta::GetTopicMeta(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, std::vector<long, std::all     ocator<long> >*, std::vector<long, std::allocator<long> >*, std::vector<void const*, std::allocator<void const*> >*, std::vector<unsigned long, std::allocator<unsigned long> >*) const      [/home/lzy/work/acvite_code/master/mfc5j3_appsw_libconvert/build/ubuntu/output/bin/comm_plugin_manager]
  41    52,638,193 ( 0.69%)  ???:ObstacleProto::Obstacle::MergeFrom(ObstacleProto::Obstacle const&) [/home/lzy/work/acvite_code/master/mfc5j3_appsw_libconvert/build/ubuntu/output/lib/libsp     i-protocol-convert.so]
  42    52,069,931 ( 0.68%)  ???:std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::al     locator<char> > const&) [/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30]
  43    47,538,816 ( 0.62%)  ???:PerceptionBaseProto::Point::MergeFrom(PerceptionBaseProto::Point const&) [/home/lzy/work/acvite_code/master/mfc5j3_appsw_libconvert/build/ubuntu/output/lib     /libspi-protocol-convert.so]
  44    46,026,687 ( 0.60%)  ./string/../sysdeps/x86_64/multiarch/strlen-avx2.S:__strlen_avx2 [/usr/lib/x86_64-linux-gnu/libc.so.6]
 

Analysis Fragment One

In this report, the percentages in parentheses represent the percentage of "Ir" (instruction read) events. This is a measure of how many times instructions in this function were fetched (and possibly executed) as a percentage of the total number of instruction fetches during program execution.

In this context, "Ir" stands for "Instruction read", which is the number of instructions read. This number indicates how many instructions were fetched by the processor during program execution.

The line "7,662,006,442 (100.0%) PROGRAM TOTALS" indicates that a total of 7,662,006,442 instructions were read during the entire program execution.

For example, for the following line:

1,335,146,312 (17.43%)  ./string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:__memset_avx2_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc.so.6]

This means that __memset_avx2_unaligned_ermsthe instructions in the function were fetched 1,335,146,312 times, which is 17.43% of the total instruction fetches.

This metric can help you understand where your program's performance bottlenecks are. If a function has a high percentage of instruction fetches, then it probably means that this function is a hotspot in your program, and you may need to optimize this function to improve the performance of your program.

Note that this metric does not directly indicate how many times a function was called or how long it took. The number of instruction fetches for a function may vary due to factors such as its size, the number of times it is called, its code complexity, etc. If you want to know how many times a function is called or how long it takes, you may need to use other profiling tools or techniques.

Performance Analysis Report Fragment Example 2

       .           void IdAdaptor::UpdateMap(
        .             std::map<uint32_t, uint8_t> &current_map_,
        .             std::map<uint32_t, uint8_t> &history_map_,
        .             const std::vector<ObstacleProto::Obstacle>  &object,
        .             uint16_t trackAge[],
   40,766 ( 0.00%)    uint8_t trackAgeSize) {
        .             // clear the current_map_
        .             current_map_.clear();
    2,398 ( 0.00%)    int objects_size = object.size();
        .             std::vector<uint32_t> new_id;  
        .             std::vector<uint8_t> vec_current_id;
   19,184 ( 0.00%)    if ((trackAge == nullptr)|| (objects_size <= 0) || (trackAgeSize <= 0)) {
        .               return;
        .             } 

Analysis Fragment Two

In the report, the percentages in parentheses still represent the percentage of "Ir" (instruction read) events. This is a measure of how many times instructions in this function were fetched (and possibly executed) as a percentage of the total number of instruction fetches during program execution.

For your function IdAdaptor::UpdateMap, its instruction fetches as a percentage of total instruction fetches is 0.00%. This means that during the execution of your program, the number of times the instructions in this function are read is very small compared to the number of instruction reads in the entire program.

This could be because the function is called less often, or the code for this function is shorter, or the execution of this function is optimized. In either case, this means that this function is probably not your program's performance bottleneck.

Note that this metric does not directly indicate how many times a function was called or how long it took. The number of instruction fetches for a function may vary due to factors such as its size, the number of times it is called, its code complexity, etc. If you want to know how many times a function is called or how long it takes, you may need to use other profiling tools or techniques.

Comparison with other tools

perfandValgrind

perfCallgrind and Valgrind are performance analysis tools, but they have some important differences in design and usage.

  1. Data collection mode : perfis an event-based sampling analyzer that periodically checks the state of the system (for example, every certain number of CPU cycles) and records the currently executing functions. This approach can provide information about which functions spend the most time on the CPU, but it may miss some short but frequent function calls. Callgrind, on the other hand, is a simulation-based profiler that records all function calls and instruction reads of a program, which can provide more detailed information, but can also result in greater performance overhead.
  2. Available Information : perfVarious types of performance events can be collected, including CPU cycles, cache hits/misses, branch mispredictions, and more. This can help you understand how your program behaves at the hardware level. Callgrind, on the other hand, focuses on the behavior of the program, such as function calls and instruction fetches. It can also provide some information about memory usage and data flow.
  3. Ease of use and portability : perfPart of Linux, it is available on all Linux systems, but may not be available on other operating systems. Valgrind, on the other hand, is a standalone tool that runs on a variety of operating systems, including Linux, macOS, and Windows (via Cygwin or WSL).

Overall, perfCallgrind and Callgrind are powerful tools that can provide different types of profiling information. Which tool to choose depends on your specific needs. In some cases, you may find that using both tools together provides the most comprehensive performance analysis.

gperfandValgrind

gprofValgrind and Valgrind are two different performance analysis tools with some important differences in their design and use.

  1. Data Collection Mode : gprofis a sampling-based profiler that works by periodically interrupting program execution and recording the currently executing function. This approach can provide information about which functions spend the most time on the CPU, but it may miss some short but frequent function calls. On the other hand, Valgrind's Callgrind tool is a simulation-based profiler that records all function calls and instruction reads of a program, which can provide more detailed information, but can also result in greater performance overhead.
  2. Available information : gprofProvides information on the number of function calls and the cumulative execution time of each function, which can help you understand which functions take up the most time in program execution. Callgrind, on the other hand, provides more detailed information, including the number of instruction fetches, cache hits/misses, branch mispredictions, etc. for each function. This can help you gain a deeper understanding of your program's behavior and performance bottlenecks.
  3. Ease of use and portability : gprofis part of GNU binutils, which is available on all systems that support the GNU toolchain. However, in order to use it gprof, you need to compile your program with -pgthe option, which may make your program run slower. Valgrind, on the other hand, is a standalone tool that runs on a variety of operating systems, including Linux, macOS, and Windows (via Cygwin or WSL). Using Valgrind does not require modifying your compile options, but its performance overhead is usually gprofgreater than that of .

Overall, gprofValgrind and Valgrind are powerful tools that can provide different types of profiling information. Which tool to choose depends on your specific needs. In some cases, you may find that using both tools together provides the most comprehensive performance analysis.

comparison chart

tool Data collection method available information Ease of use and portability platform Required compile options
Election gate Mock-based profiler that logs all function calls and instruction reads Provides detailed information, including the number of instruction fetches, cache hits/misses, branch mispredictions, and more for each function Runs on multiple operating systems including Linux, macOS and Windows (via Cygwin or WSL) Linux, macOS, Windows (Cygwin, WSL) none
gprof Sampling-based profiler by periodically interrupting program execution and recording the currently executing function Provides information on the number of function calls and the cumulative execution time of each function It is part of GNU binutils and needs to be compiled with -pgoptions All systems that support the GNU toolchain -pg
perf An event-based sampling analyzer that periodically checks the state of the system and records the currently executing functions Various types of performance events can be collected, including CPU cycles, cache hits/misses, branch mispredictions, etc. is part of Linux and is only available on Linux systems Linux none
Intel VTune An event-based sampling analyzer that periodically checks the state of the system and records the currently executing functions Provides detailed performance analysis information, including function execution time, CPU cache hit rate, branch prediction accuracy, etc. is a commercial product of Intel and is only available on systems that support Intel CPUs Windows, Linux, macOS none
Visual Studio Profiler An event-based sampling analyzer that periodically checks the state of the system and records the currently executing functions Provides detailed performance analysis information, including function execution time, CPU usage, memory usage, etc. is part of Visual Studio and is only available on Windows systems Windows none
Instruments (macOS) An event-based sampling analyzer that periodically checks the state of the system and records the currently executing functions Provides detailed performance analysis information, including function execution time, CPU usage, memory usage, etc. is part of Xcode and is only available on macOS systems macOS none

Precautions

End the process gracefully

Do not use kill -9

kill -9Killing a running Valgrind process with will affect the detection results. kill -9will kill the process immediately without giving it a chance to do any cleanup. In the case of Valgrind, this means that it may not be able to generate a full report, as it usually generates reports when the instrumented program ends normally.

If you need to stop Valgrind while it is running, your best bet is to try sending a TERM signal with killthe command (without -9the option). This will ask Valgrind to terminate gracefully, it should be able to generate a report, and then exit.

kill <valgrind_pid>

However, you may want to use TERM if the Valgrind process fails to respond to the TERM signal for some reason kill -9. Note that this may cause Valgrind to fail to generate reports, or the generated reports may be incomplete.

In general, if possible, you should avoid killing Valgrind while it is running to ensure that you can get complete and accurate reports. If you need to stop Valgrind while it is running, you should try to use killthe command (without -9the option) to terminate it gracefully.

The different tools of Valgrind (such as Memcheck, Callgrind, Massif, etc.) cannot run at the same time. Every time you run Valgrind, you must choose a tool to use. This is because each tool has its own specific goals and methods, and they cannot be applied to the same running instance of the program at the same time.

Use Ctrl+Zcannot be terminated

Using Ctrl+Zon the command line will send a SIGSTOP signal to the current foreground process (Valgrind in this case). This causes the process to suspend execution, but does not terminate it. You can use fgthe command to resume a suspended process.

In the case of Valgrind, using Ctrl+Zdoes not directly affect its detection results, since it just suspends the process, not terminates it. However, if you modify the state of the program it is instrumenting while Valgrind is paused, or perform other operations that may affect the instrumentation results, this may affect Valgrind's reports.

In general, if you just want to temporarily stop Valgrind and resume later, using Ctrl+Zis fine. However, you should avoid doing anything that might affect the detection results while Valgrind is paused.

The TERM signal should be sent with the default parameters of killall or kill

  • killcommand to send a signal to the specified process ID. For example, kill 12345the TERM signal is sent to process ID 12345.

  • killallcommand to send a signal to all processes whose names match. For example, killall myprogramwould send the TERM signal to all processes named "myprogram".

    If no signal is specified, killboth killalldefault to sending the TERM signal, which is a signal that requests the process to terminate gracefully. You can use -soptions to specify which signal to send, e.g., kill -s HUP 12345or killall -s HUP myprogram.

tools cannot be used at the same time

The different tools of Valgrind (such as Memcheck, Callgrind, Massif, etc.) cannot run at the same time. Every time you run Valgrind, you must choose a tool to use. This is because each tool has its own specific goals and methods, and they cannot be applied to the same running instance of the program at the same time.

For example, if you want to perform memory leak detection, you should use the Memcheck tool:

valgrind --tool=memcheck --leak-check=yes your_program [your_program_arguments]

If you want to perform profiling, you should use the Callgrind tool:

valgrind --tool=callgrind your_program [your_program_arguments]

If you want to do memory leak detection and profiling at the same time, you need to run Valgrind twice, once with Memcheck and once with Callgrind.

Note that while this may take more time, this ensures that the results you get are accurate and reliable. If you try to run both tools at the same time, it may cause confusing and inaccurate results.

epilogue

Comprehension is an important step towards the next level in our programming learning journey. However, mastering new skills and ideas always takes time and persistence. From a psychological point of view, learning is often accompanied by continuous trial and error and adjustment, which is like our brain gradually optimizing its "algorithm" for solving problems.

That's why when we encounter mistakes, we should see them as opportunities to learn and improve, not just obsessions. By understanding and solving these problems, we can not only fix the current code, but also improve our programming ability and prevent the same mistakes from being made in future projects.

I encourage everyone to actively participate and continuously improve their programming skills. Whether you are a beginner or an experienced developer, I hope my blog can help you in your learning journey. If you find this article useful, please click to bookmark it, or leave your comments to share your insights and experiences. You are also welcome to make suggestions and questions about the content of my blog. Every like, comment, share and follow is the greatest support for me and the motivation for me to continue to share and create.


Read my CSDN homepage to unlock more exciting content: Bubble's CSDN homepage
insert image description here

Guess you like

Origin blog.csdn.net/qq_21438461/article/details/131668820