Linux--Valgrind tool memory leak detection and performance analysis

Preface

Memory leak is a problem we often encounter when debugging programs, and there are many memory leak analysis software. This article mainly analyzes the use of Valgrind tools.

Introduction and installation of Valgrind

Download from Valgrind official website: http://valgrind.org/downloads/current.html#current
The latest version is valgrind 3.15.0, and the download on the official website is very slow. The tool can be installed directly from the mirror source. as follows:

sudo apt install valgrind

Insert picture description here
Use the valgrind command to verify whether the installation is successful.

valgrind ls -l

Insert picture description here

Use of Valgrind

The Valgrind toolkit contains multiple tools, such as Memcheck, Cachegrind, Helgrind, Callgrind, Massif.

Memcheck

The most commonly used tool is used to detect memory problems in the program. All reads and writes to the memory will be detected, and all calls to malloc()/free()/new/delete will be captured. Therefore, the Memcheck tool mainly checks the following program errors.
(1) Use of uninitialised memory
(2) Use of freed memory Reading/writing memory after it has been free'd
(3) Use of more than malloc allocated memory space Reading/writing off the end of malloc 'd blocks
(4) Illegal access to the stack Reading/writing inappropriate areas on the stack
(5) Whether the requested space is released Memory leaks – where pointers to malloc'd blocks are lost forever
(6) malloc/free/new/ Mismatched use of malloc/new/new [] vs free/delete/delete []
(7) Overlapping src and dst pointers in memcpy() and related functions.
These problems are often C /C++ programmers' most troublesome problem, Memcheck is here to help.

Callgrind

An analysis tool similar to gprof, but its observation of the operation of the program is more subtle and can provide us with more information. Unlike gprof, it does not require additional special options when compiling the source code, but it is recommended to add debugging options. Callgrind collects some data when the program is running, builds a function call relationship graph, and optionally performs cache simulation. At the end of the run, it will write the analysis data to a file. callgrind_annotate can convert the content of this file into a readable form.

Cachegrind

The Cache Analyzer, which simulates the first-level cache I1, Dl and the second-level cache in the CPU, can accurately point out the cache misses and hits in the program. If necessary, it can also provide us with the number of cache misses, the number of memory references, and the number of instructions generated by each line of code, each function, each module, and the entire program. This is a great help to optimize the program.

Helgrind

It is mainly used to check competition problems in multithreaded programs. Helgrind looks for areas in the memory that are accessed by multiple threads and are not consistently locked. These areas are often places where threads lose synchronization and can lead to errors that are difficult to discover. Helgrind implemented a competition detection algorithm called "Eraser" and made further improvements to reduce the number of errors reported. However, Helgrind is still in the experimental stage.

Massif

Stack analyzer, it can measure how much memory the program uses in the stack, tell us the heap block, heap management block and stack size. Massif can help us reduce the use of memory. In modern systems with virtual memory, it can also speed up the running of our programs and reduce the chance of programs staying in the swap area.

Usage: valgrind [options] prog-and-args
[options]: common options, applicable to all Valgrind tools

-tool=<name> 最常用的选项。运行 valgrind中名为toolname的工具。默认memcheck。

    memcheck ------> 这是valgrind应用最广泛的工具，一个重量级的内存检查器，能够发现开发中绝大多数内存错误使用情况，比如：使用未初始化的内存，使用已经释放了的内存，内存访问越界等。

    callgrind ------> 它主要用来检查程序中函数调用过程中出现的问题。

    cachegrind ------> 它主要用来检查程序中缓存使用出现的问题。

    helgrind ------> 它主要用来检查多线程程序中出现的竞争问题。

    massif ------> 它主要用来检查程序中堆栈使用中出现的问题。

    extension ------> 可以利用core提供的功能，自己编写特定的内存调试工具

-h –help 显示帮助信息。
-version 显示valgrind内核的版本，每个工具都有各自的版本。
-q –quiet 安静地运行，只打印错误信息。
-v –verbose 更详细的信息, 增加错误数统计。
-trace-children=no|yes 跟踪子线程? [no]
-track-fds=no|yes 跟踪打开的文件描述？[no]
-time-stamp=no|yes 增加时间戳到LOG信息? [no]
-log-fd=<number> 输出LOG到描述符文件 [2=stderr]
-log-file=<file> 将输出的信息写入到filename.PID的文件里，PID是运行程序的进行ID
-log-file-exactly=<file> 输出LOG信息到 file
-log-file-qualifier=<VAR> 取得环境变量的值来做为输出信息的文件名。 [none]
-log-socket=ipaddr:port 输出LOG到socket ，ipaddr:port

Example test

Write an example:

#include <stdlib.h>
#include <malloc.h>
#include <string.h>

void test()
{
        int *ptr = malloc( sizeof(int)* 10);
        ptr[10] =100;// 内存越界 

        memcpy(ptr+1,ptr,5); 踩内存，内存的源地址和目的地址重叠

        free(ptr);
        free(ptr);// 重复释放

        int *p1;
        *p1 =10;// 非法指针 


}

int main(int argc,char** argv)
{
        test();
        return 0;
}

gcc -g  -o test -fno-inline   test.c

-G -fno-inline is a compilation option to retain debugging information, otherwise the following valgrind cannot display the error line number and run directly.
Run directly ./test
Insert picture description here
prompt; double free or corruption, report core dumped error.
Use the valgrind tool below to analyze.

valgrind --tool=memcheck --leak-check=full --show-reachable=yes --trace-children=yes ./test

--Leak-check=full refers to check memory leaks completely,

--Show-reachable=yes is to show the location of the memory leak,

-Trace-children=yes is to follow the child process.
operation result:

root@ubuntu:~# valgrind --tool=memcheck --leak-check=full --show-reachable=yes --trace-children=yes ./test
==43170== Memcheck, a memory error detector
==43170== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==43170== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==43170== Command: ./test
==43170== 
==43170== Invalid write of size 4  // 无效写入，内存越界了 
==43170==    at 0x4005F4: test (test.c:8)
==43170==    by 0x400653: main (test.c:23)
==43170==  Address 0x5204068 is 0 bytes after a block of size 40 alloc'd
==43170==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==43170==    by 0x4005E7: test (test.c:7)
==43170==    by 0x400653: main (test.c:23)
==43170== 
==43170== Source and destination overlap in memcpy(0x5204044, 0x5204040, 5)// 内存地址重叠
==43170==    at 0x4C32513: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==43170==    by 0x400615: test (test.c:10)
==43170==    by 0x400653: main (test.c:23)
==43170== 
==43170== Invalid free() / delete / delete[] / realloc()// 重复释放 
==43170==    at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==43170==    by 0x40062D: test (test.c:13)
==43170==    by 0x400653: main (test.c:23)
==43170==  Address 0x5204040 is 0 bytes inside a block of size 40 free'd
==43170==    at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==43170==    by 0x400621: test (test.c:12)
==43170==    by 0x400653: main (test.c:23)
==43170==  Block was alloc'd at
==43170==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==43170==    by 0x4005E7: test (test.c:7)
==43170==    by 0x400653: main (test.c:23)
==43170== 
==43170== Use of uninitialised value of size 8 // 使用了未初始化的指针，非法的指针 
==43170==    at 0x400632: test (test.c:16)
==43170==    by 0x400653: main (test.c:23)
==43170== 
==43170== Invalid write of size 4
==43170==    at 0x400632: test (test.c:16)
==43170==    by 0x400653: main (test.c:23)
==43170==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==43170== 
==43170== 
==43170== Process terminating with default action of signal 11 (SIGSEGV)//由于非法指针赋值导致的程序崩溃
==43170==  Access not within mapped region at address 0x0
==43170==    at 0x400632: test (test.c:16)
==43170==    by 0x400653: main (test.c:23)
==43170==  If you believe this happened as a result of a stack
==43170==  overflow in your program's main thread (unlikely but
==43170==  possible), you can try to increase the size of the
==43170==  main thread stack using the --main-stacksize= flag.
==43170==  The main thread stack size used in this run was 8388608.
==43170== 
==43170== HEAP SUMMARY:
==43170==     in use at exit: 0 bytes in 0 blocks
==43170==   total heap usage: 1 allocs, 2 frees, 40 bytes allocated
==43170== 
==43170== All heap blocks were freed -- no leaks are possible
==43170== 
==43170== For counts of detected and suppressed errors, rerun with: -v
==43170== Use --track-origins=yes to see where uninitialised values come from
==43170== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 0 from 0) //一共5个错误
Segmentation fault (core dumped)

Callgrind uses:
an analysis tool similar to gprof, but it observes the operation of the program more nuanced and can provide us with more information. Unlike gprof, it does not require additional special options when compiling the source code, but it is recommended to add debugging options. Callgrind collects some data when the program is running, builds a function call relationship graph, and optionally performs cache simulation. At the end of the run, it will write the analysis data to a file. callgrind_annotate can convert the content of this file into a readable form.

Examples of use of cachegrind:

#include <stdio.h>
#include <malloc.h>
void test()
{
    sleep(1);
}
void f()
{
    int i;
    for( i = 0; i < 5; i ++)
        test();
}
int main()
{
    f();
    printf("process is over!\n");
    return 0;
}

The method of use is: valgrind --tool=cachegrind ./test
Insert picture description here

Helgrind
is mainly used to check competition problems in multithreaded programs. Helgrind looks for areas in the memory that are accessed by multiple threads and are not consistently locked. These areas are often places where threads lose synchronization and can lead to errors that are difficult to discover. Helgrind implemented a competition detection algorithm called "Eraser" and made further improvements to reduce the number of errors reported. However, Helgrind is still in the experimental stage.

Let's first give an example of competition:

#include <stdio.h>
#include <pthread.h>
#define NLOOP 50
int counter = 0; /* incremented by threads */
void *threadfn(void *);

int main(int argc, char **argv)
{
pthread_t tid1, tid2,tid3;

pthread_create(&tid1, NULL, &threadfn, NULL);  
pthread_create(&tid2, NULL, &threadfn, NULL);  
pthread_create(&tid3, NULL, &threadfn, NULL);  


/* wait for both threads to terminate */  
pthread_join(tid1, NULL);  
pthread_join(tid2, NULL);  
pthread_join(tid3, NULL);  


return 0;

}

void threadfn(void vptr)
{ int i, val; for (i = 0; i <NLOOP; i++) { val = counter; printf("%x: %d \n", (unsigned int)pthread_self(), val +1); counter = val+1; } return NULL; } The race condition of this program is in lines 30~32. The effect we want is that the three threads accumulate the global variable 50 times, and the final value of the global variable is 150. Since there is no lock here, it is obvious that the race condition prevents the program from reaching our goal. Let's see how Helgrind can help us detect race conditions. First compile the program: gcc -o test thread.c -lpthread, then execute: valgrind --tool=helgrind ./test The output is as follows: 49c0b70: 1 49c0b70: 2

4666 Thread #3 was created
4666 at 0x412E9D8: clone (clone.S:111)
4666 by 0x40494B5: pthread_create@@GLIBC_2.1 (createthread.c:256)
4666 by 0x4026E2D: pthread_create_WRK (hg_intercepts.c:257)
4666 by 0x4026F8B: pthread_create@ (hg_intercepts.c:288)
4666 by 0x8048524: main (in /home/yanghao/Desktop/testC/testmem/a.out)
4666
4666 Thread #2 was created
4666 at 0x412E9D8: clone (clone.S:111)
4666 by 0x40494B5: pthread_create@@GLIBC_2.1 (createthread.c:256)
4666 by 0x4026E2D: pthread_create_WRK (hg_intercepts.c:257)
4666 by 0x4026F8B: pthread_create@ (hg_intercepts.c:288)
4666 by 0x8048500: main (in /home/yanghao/Desktop/testC/testmem/a.out)
4666
4666 Possible data race during read of size 4 at 0x804a028 by thread #3
4666 at 0x804859C: threadfn (in /home/yanghao/Desktop/testC/testmem/a.out)
4666 by 0x4026F60: mythread_wrapper (hg_intercepts.c:221)
4666 by 0x4048E98: start_thread (pthread_create.c:304)
4666 by 0x412E9ED: clone (clone.S:130)
4666 This conflicts with a previous write of size 4 by thread #2
4666 at 0x80485CA: threadfn (in /home/yanghao/Desktop/testC/testmem/a.out)
4666 by 0x4026F60: mythread_wrapper (hg_intercepts.c:221)
4666 by 0x4048E98: start_thread (pthread_create.c:304)
4666 by 0x412E9ED: clone (clone.S:130)
4666
4666 Possible data race during write of size 4 at 0x804a028 by thread #2
4666 at 0x80485CA: threadfn (in /home/yanghao/Desktop/testC/testmem/a.out)
4666 by 0x4026F60: mythread_wrapper (hg_intercepts.c:221)
4666 by 0x4048E98: start_thread (pthread_create.c:304)
4666 by 0x412E9ED: clone (clone.S:130)
4666 This conflicts with a previous read of size 4 by thread #3
4666 at 0x804859C: threadfn (in /home/yanghao/Desktop/testC/testmem/a.out)
4666 by 0x4026F60: mythread_wrapper (hg_intercepts.c:221)
4666 by 0x4048E98: start_thread (pthread_create.c:304)
4666 by 0x412E9ED: clone (clone.S:130)
4666
49c0b70: 3
…
55c1b70: 51
4666
4666 For counts of detected and suppressed errors, rerun with: -v
4666 Use --history-level=approx or =none to gain increased speed, at
4666 the cost of reduced accuracy of conflicting-access information
4666 ERROR SUMMARY: 8 errors from 2 contexts (suppressed: 99 from 31)

Helgrind successfully found the position of the competition, shown in bold.

Massif

Massif profiles the allocation and release of memory. Program developers can use it to deeply understand the memory usage behavior of the program and optimize the memory usage. This feature is especially useful for C++, because C++ has many hidden memory allocations and releases.

In addition, lackey and nulgrind will also provide. Lackey is a small tool that is rarely used; Nulgrind just shows developers how to create a tool. We will not introduce it.

Memory detection principle

Insert picture description here

Quoted from: https://www.cnblogs.com/AndyStudy/p/6409287.html
https://www.linuxidc.com/Linux/2012-06/63754.htm