memory leak? Two methods and tools for Tencent engineers to press the bottom of the box

Introduction | Suffering from memory leaks is often a headache for developers. Traditional analysis tools such as gdb and Valgrind are less efficient in solving memory leaks. This article specially invited Xing Mengbang, a background development engineer from Tencent, to take the mysql-proxy memory leak problem in the actual production of TDSQL as the analysis object, and share his general memory leak (growth) analysis method based on dynamic tracking technology. It will introduce in detail the analysis of memory allocator behavior and abnormal page fault event analysis, covering the common process of application memory allocation. After reading this article, developers only need to pay attention to a few code paths that may cause memory leaks, and can effectively improve the efficiency of locating memory leaks (growth) problems.

background

In a TDSQL privatization environment, when the middleware mysql-proxy forwards a large number of requests, the memory usage continues to increase, resulting in OOM phenomenon, which ultimately affects the normal use of user services. In the process of analyzing this problem, I found a common business pain point: the efficiency of traditional analysis tools (gdb, Valgrind, etc.) is relatively low, especially in privatization scenarios. In response to this pain point, I will provide a relatively general memory leak (growth) analysis method to help developers locate the code path where the leak occurs more efficiently, in order to minimize labor input costs and reduce the impact on user business experience.

basic concept

Before we expand on the memory leak (growth) analysis method, let's first understand some related basic concepts.

Memory leaks include kernel memory leaks and application memory leaks . Kernel memory leaks can be detected by kmemleak. This article focuses on application memory leaks. Application memory leaks can be subdivided into: heap memory (Heap) leaks, memory mapping area (Memory Mappings) leaks. The memory leak we usually refer to mainly refers to the leak of physical memory (continuous allocation, mapping of actual physical memory, and has not been released), which is very harmful and needs to be repaired immediately .

In addition, the leakage of virtual memory (constantly allocating virtual memory, but not allocating and mapping actual physical memory) is easy to be ignored. Although the harm is relatively small, additional attention is required (the total number of memory mapping areas of a process has an upper limit, and the default is 1w ).

Usually, the steps involved in application memory allocation are roughly shown in the figure below: First, the application applies for memory through the malloc and its variant functions provided by the memory allocator (such as libc), and the free function releases the corresponding memory. Second, the memory allocator (such as libc) internally expands the heap memory (small block memory allocation) through the system call brk. Third, the memory allocator (such as libc) internally allocates the memory mapping area through the system call mmap (large memory allocation, the default is not less than 128 KB). Fourth, the second or third applied virtual memory triggers a page fault when it is written for the first time Exception, the OS allocates actual physical pages and associates virtual memory with them, recording them in the page table.

Among them, steps 1 to 3 are virtual memory, and step 4 allocates actual physical memory and creates a corresponding page table.

Traditional analysis tools gdb, Valgrind

In the process of locating the mysql-proxy memory leak (growth) problem, the developer tried to use Valgrind Memcheck and gdb to assist in the analysis. In the end, the actual effect of the former was not ideal; I analyzed the cause of the leak through the latter, but the whole process took a lot of time.

gdb is a commonly used program debugging tool , and its benefits need not be repeated. However, for memory leaks or growth problems, gdb's shortcomings are also more obvious, roughly as follows: It interferes with the normal operation of the program and is not suitable for a production environment; it is difficult to directly locate and requires a certain understanding of the source code .

Valgrind Memcheck is a well-known memory leak analysis tool. It is very powerful and can quickly find memory leaks in the scene during development and debugging. However, before using it, developers are advised to understand the following situations: First, the program needs to be restarted and run as a Valgrind subprocess. Not suitable for profiling processes that are experiencing memory growth . Second, instead of the default allocation functions such as malloc/free, the running speed of the target process is slowed down by 20~30 times. Third, it cannot support tcmalloc and jemalloc memory allocators well. (mysql-proxy uses the jemalloc memory allocator)

A general analysis method based on dynamic tracking

For applications that are running and the memory continues to grow, gdb and Valgrind Memcheck tools are actually quite difficult to play value. In contrast, dynamic tracking technology provides a general and easy-to-use approach. Memory allocator-related function calls, system calls, page fault exceptions, etc., can be regarded as events. Through the tracking and statistics of these events, we can analyze the specific code path related to memory usage, and quickly narrow down the scope of the leak without going into the details of the source code.

This article involves two general analysis methods based on dynamic tracking: memory allocator behavior analysis and page fault exception event analysis , covering the common process of application memory allocation.

1) Analysis of memory allocator behavior

The overall idea of ​​memory allocator (glibc, jemalloc, etc.) behavior analysis is as follows: First, from the application perspective, focus on the code path of application memory allocation. Secondly, dynamically track memory allocation-related functions, count the call stack and total bytes of unreleased memory allocation, and form the analysis tool memstacks.

  • Develop new tool memstacks

The tool supports generating two types of flame graphs: one is to only track malloc and its variant functions without free offsets, and the results can be used to generate flame graphs of full memory allocation. The other is to track malloc and its variant functions and free functions, and calculate unreleased memory allocations during the tracking period, and the results can be used to generate flame graphs of unreleased memory allocations.

The implementation principle is roughly as follows: learn from the existing BCC tools memleak and mallocstacks, support the generation of folded stacks, and generate full memory allocation flame graphs and unreleased memory allocation flame graphs. Dynamically track malloc (and variants cmalloc, realloc), free with uprobes.

As shown in the figure above, the existing BCC tools memleak and mallocstacks have their own advantages and disadvantages. The new tool memstacks combines the advantages of both, allowing to selectively generate the full memory allocation flame graph or the collapsed stack format required by the unreleased memory allocation flame graph .

  • Full memory allocation flame graph

Execute the following command to trace all malloc and its variant calls of the mysql-proxy process for 60s, and generate a flame graph of the full amount of memory allocation.

# 步骤 1. 追踪 60s,生成全量内存分配折叠栈
# 其中,参数 -a 表示追踪所有的 malloc 及其变体,但不追踪 free 进行相互抵消。参数 -f 表示生成折叠栈,用于步骤 2 生成火焰图。
./memstacks -p $(pgrep -nx mysql-proxy) -af 60 > all_mallocs.stacks

# 步骤 2. 执行下述命令生成全量内存分配火焰图,输出至文件 all_mallocs.svg。
./flamegraph.pl --color=mem --title="All malloc() bytes Flame Graph" --countname="bytes" < all_mallocs.stacks > all_mallocs.svg

The flame graph is shown below, which can help developers understand the key code path of mysql-proxy calling malloc and its variants.

  • Unreleased memory allocation flame graph

Execute the following command to track the mysql-proxy process not releasing malloc and its variant calls for 60s, and generate a memory allocation flame graph.

# 步骤 1. 追踪 60s,生成未释放内存分配折叠栈
# 其中,参数 -f 表示生成折叠栈,用于步骤 2 生成火焰图。
memstacks -p $(pgrep -nx mysql-proxy) -f 60 > unfreed_mallocs.stacks

# 步骤 2. 执行下述命令生成未释放内存分配火焰图,输出到文件 unfreed_mallocs.svg。
./flamegraph.pl --color=mem --title="Unfreed malloc() bytes Flame Graph" --countname="bytes" < unfreed_mallocs.stacks > unfreed_mallocs.svg

The flame graph is as follows, where: the unreleased memory totals 27.75 MB (during the tracking period, it is observed through pidstat that the RSS increment of the mysql-proxy process is close to 27 MB, which is basically consistent with the unreleased memory statistics of 27.75 MB).

There are two main places where code paths are allocated but not deallocated. Among them, according to research and development feedback, tdsql::Item_param::set_str is the place where the mysql-proxy memory leak occurs. And the other one wasn't really a leak. This tool has certain side effects. Since some newly allocated memory has not yet been released in the final stage of tracking, it is necessary to further read the source code for screening. In addition, it is recommended to run several times to compare the results and exclude those allocation paths that change frequently.

Expand the code path that has been allocated but not released, the result is as follows:

Compared with the full memory allocation flame graph, the amount of data is reduced by nearly 60 times, and the code paths that need to be focused on are also significantly reduced. Therefore, it is recommended to use the unreleased memory allocation flame graph for analysis.

2) Analysis of abnormal page fault events

Compared with the behavior analysis of the memory allocator, the analysis of page fault exception events provides another perspective. The overall idea is as follows: First, from the perspective of the kernel, the focus is on the code path that triggers the page fault exception when it is written for the first time, rather than triggering memory allocation. code path. The former is the reason for the process RSS growth, while the latter only allocates virtual memory and has not yet mapped physical memory. Secondly, track abnormal page fault events, count the number of call stacks and total pages of unreleased physical memory, and form the analysis tool pgfaultstacks.

  • Existing analysis tools

Traditional tool perf, based on software event page-faults

perf record -p $(pgrep -nx mysql-proxy) -e page-faults -c 1 -g -- sleep 60

BCC tool stackcount

Based on static tracking point exceptions: page_fault_user.

stackcount -p $(pgrep -nx mysql-proxy) -U t:exceptions:page_fault_user

Although the existing analysis tools are convenient, they use incremental statistics without considering the physical memory released during the tracking process. The final statistical results are usually too large, which will interfere with the analysis of memory leaks (growth).

  • Abnormal page fault flame graph (current version)

Execute the following command to track all page fault events of the mysql-proxy process for 60s and generate a page fault exception flame graph.

perf record -p $(pgrep -nx mysql-proxy) -e page-faults -c 1 -g -- sleep 60 > pgfault.stacks

./flamegraph.pl --color=mem --title="Page Fault Flame Graph" --countname="pages" < pgfault.stacks > pgfault.svg

The flame graph is as follows, a total of 420,342 page fault events, but not every page fault event allocates a new physical page (not allocated in most cases), and the actual growth of mysql-proxy RSS is only more than 60 MB.

  • Develop new tool pgfaultstacks

The implementation principle of this tool is roughly as follows: First, improve the existing page fault event statistics method (filter the existing page fault events of the physical page, and read the memory map list of the target process after the tracking is completed, and calculate the released Physical pages are excluded), focusing only on real leaked physical memory.

Second, with the help of tracepoint or kprobe to dynamically track page faults events, the performance overhead is generally negligible.

  • Page fault exception flame graph

Execute the following command to trace the page fault events of the mysql-proxy process that meet the filter conditions for 60s, and generate a page fault flame graph.

# 步骤 1. 追踪 60s,生成缺页异常折叠栈。其中,参数 -f 表示生成折叠栈,用于步骤 2 生成火焰图。
pgfaultstacks -p $(pgrep -nx mysql-proxy) -f 60 > pgfault.stacks

# 步骤 2. 生成缺页火焰图,输出到文件 pgfault.svg。
./flamegraph.pl --color=mem --title="Page Fault Flame Graph" --countname="pages" < pgfault.stacks > pgfault.svg

The page fault flame graph is as follows, in which: a total of 17801 physical pages have been added (basically consistent with the RSS increment of the mysql-proxy process). Focus on the function g_string_append_printf. (Note: It is not an environment where memory leaks occur, it is only used to demonstrate the abnormal flame graph of page faults)

Compared with the existing version, the amount of data in this version is reduced by more than 20 times, and the code paths that need to be focused on are also significantly reduced.

Summarize

This article takes the mysql-proxy memory leak problem in the actual production of TDSQL as the analysis object, explores the general memory leak (growth) analysis method based on dynamic tracking technology: memory allocator behavior analysis, page fault exception event analysis , and analyzes the existing analysis tools Improve and form corresponding analysis tools memstacks, pgfaultstacks, developers are welcome to try to develop. Tool users only need to pay attention to a few code paths that may cause memory leaks, effectively improving the efficiency of locating memory leaks (growth) problems. If you are suffering from memory leaks (increase), you may wish to download and use the latest version of OpenCloudOS and try the analysis methods and tools mentioned in this article.

Guess you like

Origin blog.csdn.net/youzhangjing_/article/details/131722322