Baidu engineers take you to explore C++ memory management

I. Overview

ptmalloc is the default memory manager of the open source GNU C Library (glibc). Currently, most Linux server programs use the malloc/free series functions provided by ptmalloc, and its performance is far worse than Meta's jemalloc and Google's tcmalloc. The server program calls the malloc/free function provided by ptmalloc to apply for and release memory, and ptmalloc provides centralized management of memory to achieve as much as possible:

  • It is more efficient for users to apply for and release memory, and avoid multi-threaded memory application concurrency and locking

  • Seek a balance between memory usage and malloc/free performance consumption in the process of interacting with the operating system, reduce memory fragmentation, and infrequently call system call functions

Briefly summarize the memory management strategy of ptmalloc:

  • Apply to the operating system in advance and hold a piece of memory for user malloc, while managing used and free memory

  • When the user executes free, the reclaimed memory will be managed, and the management policy will be executed to determine whether to return it to the operating system

Next, the implementation and use of the most classic C++ memory manager will be introduced from the ptmalloc data structure, memory allocation, advantages and disadvantages (taking 32-bit computers as an example).

2. Memory management

2.1 Data structure

In order to solve the problem of multi-thread lock contention, the memory allocation is divided into the main allocation area (main_area) and the non-main allocation area (no_main_area). At the same time, in order to facilitate memory management, the pre-applied memory is divided into many chunks (chunk) by boundary marking method; in the ptmalloc memory allocator, malloc_chunk is the basic organization unit, used to manage different types of chunks, chunks with similar functions and sizes Concatenated into a linked list, called a bin.

main_arena plus non_main_arena

The main allocation area and the non-main allocation area form a circular linked list for management, and each allocation area uses a mutual exclusion lock to realize mutual exclusion of threads' access to the allocation area. Each process has only one main allocation area, but multiple non-main allocation areas are allowed, and the number of non-main allocation areas can only increase but not decrease. The main allocation area can access the heap area and the mmap mapping area of ​​the process, that is, the main allocation area can use sbrk() and mmap() to allocate memory; the non-main allocation area can only use mmap() to allocate memory.

The management strategies for different arenas are roughly as follows:

  • Allocate memory

    • Check whether there is already an allocation area in the private variable of the thread and lock it. If the lock is successful, use the allocation area to allocate memory; if the partition is not found or the lock fails, traverse the ring list to get a Unlocked allocation area

    • If there is no unlocked allocation area in the entire circular linked list, create a new allocation area, add it to the circular linked list and lock it, and use this allocation area to meet the memory allocation of the current thread

  • free memory

    • First acquire the lock of the allocation area where the memory block to be released is located. If other threads are using the allocation area, wait for other threads to release the mutex lock of the allocation area before releasing the memory

The primary allocation area and the non-primary allocation area are structured as follows:

Among them, fastbinsY and bins are the management and operation structures of the actual memory block:

  • fastbinsY: used to save fast bins

  • bins[NBINS * 2 - 2]: unsorted bin (1, bin[1]), small bins (62, bin[2]~bin[63]), large bins (63, bin[64]~bin [126]), a total of 126 entries (NBINS = 128), bin[0] and bin[127] are not used

malloc_chunk与bins

ptmalloc uniformly manages the free chunks in the heap and mmap mapping areas. When the user makes an allocation request, it will first try to find and divide in the free chunks, thereby avoiding frequent system calls and reducing the memory allocation overhead. In order to better manage and find free chunks, necessary control information is added before and after the pre-allocated space. The members and functions of the memory management structure malloc_chunk are as follows:

  • mchunk_prev_size: The size of the previous free chunk

  • mchunk_size: the size of the current chunk

  • Required attribute flags:

    • The previous chunk is in use (P = 1)

    • The current chunk is mmap mapping area allocation (M = 1) or heap area allocation (M = 0)

    • The current chunk belongs to the non-main allocation area (A = 0) or the non-main allocation area (A = 1)

  • fd and bk: exist when the chunk block is free, and are used to add the free chunk block to the free chunk block linked list for unified management

Based on the size and usage of the chunk, the following bins are divided:

  • fast bins

    Fast bins only save a very small heap, and use a single linked list to connect in series. The addition and deletion of chunks all occur at the head of the linked list, further improving the allocation efficiency of small memory. Fast bins record the bin linked list whose size is incremented by 8 bytes, and generally will not be merged with other heap blocks.

  • unsorted bin

    The buffers of small bins and large bins are used to speed up the allocation. There is no size limit on the chunk size. The heap blocks released by the user will first enter the unsorted bin. When allocating a heap block, it will first check whether there is a suitable heap block in the unsorted bin linked list, cut it and return it.

  • small bins

    The bins that hold chunks of size < 512B are called small bins. Small bins have a difference of 8 bytes between each bin, and the chunks in the same small bin have the same size, and are connected in series using a two-way circular linked list.

  • large bins

    Bins that store chunks with a size >= 512B are called large bins. Each bin in the large bins contains a chunk within a given range, where the chunks are in descending order of size, and the same size is in descending order of time.

Of course, not all chunks are organized in the above way, other commonly used chunks, such as:

  • top chunk: Free memory at the top of the allocation area. When bins cannot meet the memory allocation requirements, it will try to allocate in the top chunk.

    • When top chunk > user request size, top chunk will be divided into two parts: user request size (user chunk) and remaining top chunk size (remainder chunk)

    • When the top chunk < the size requested by the user, the top chunk is expanded through the sbrk (main_arena) or mmap (non_main_arena) system call

2.2  Memory allocation and release

The process of summarizing memory malloc and free is roughly as follows:

Memory allocation malloc process

1. Acquire the lock of the allocated area

2. Calculate the actual chunk size of the memory to be allocated

3. If the size of the chunk < max_fast, find a suitable chunk on the fast bins; if it does not exist, go to 5

4. If the chunk size is < 512B, search for the chunk from the small bins. If it exists, the allocation ends

5. What needs to be allocated is a large memory, or the chunk cannot be found in the small bins:

a. Traversing the fast bins, merging adjacent chunks, and linking them to the unsorted bin

b. Traverse the chunks in the unsorted bin:

   - Able to cut chunks and allocate directly, the allocation ends

   - Put the chunk into small bins or large bins according to the space size of the chunk. After the traversal is completed, go to 6

6. A large piece of memory needs to be allocated, or no suitable chunk can be found in both small bins and unsorted bin, and all chunks in fast bins and unsorted bin have been cleared:

Search from the large bins, traverse the linked list in reverse, until the first chunk whose size is larger than the one to be allocated is found and cut, and the rest are put into the unsorted bin, and the allocation ends

7. Retrieve fast bins and bins without finding a suitable chunk, judge whether the size of the top chunk meets the size of the required chunk, and allocate from the top chunk

8. The top chunk cannot meet the demand, and the top chunk needs to be expanded:

a. On the primary partition, if the allocated memory < allocation threshold (128KB by default), use brk() to allocate; if the allocated memory > allocation threshold, use mmap to allocate

b. On the non-primary partition, use mmap to allocate a piece of memory

2. Calculate the actual chunk size of the memory to be allocated

3. If the size of the chunk < max_fast, find a suitable chunk on the fast bins; if it does not exist, go to 5

4. If the chunk size is < 512B, search for the chunk from the small bins. If it exists, the allocation ends

5. What needs to be allocated is a large memory, or the chunk cannot be found in the small bins:

a. Traversing the fast bins, merging adjacent chunks, and linking them to the unsorted bin

b. Traverse the chunks in the unsorted bin:

   - Able to cut chunks and allocate directly, the allocation ends

   - Put the chunk into small bins or large bins according to the space size of the chunk. After the traversal is completed, go to 6

6. A large piece of memory needs to be allocated, or no suitable chunk can be found in both small bins and unsorted bin, and all chunks in fast bins and unsorted bin have been cleared:

Search from the large bins, traverse the linked list in reverse, until the first chunk whose size is larger than the one to be allocated is found and cut, and the rest are put into the unsorted bin, and the allocation ends

7. Retrieve fast bins and bins without finding a suitable chunk, judge whether the size of the top chunk meets the size of the required chunk, and allocate from the top chunk

8. The top chunk cannot meet the demand, and the top chunk needs to be expanded:

a. On the primary partition, if the allocated memory < allocation threshold (128KB by default), use brk() to allocate; if the allocated memory > allocation threshold, use mmap to allocate

b. On the non-primary partition, use mmap to allocate a piece of memory

Memory release free process

1. Acquire the lock of the allocated area

2. If free is a null pointer, return

3. If the current chunk is the memory mapped by the mmap mapping area, call munmap() to release the memory

4. If the chunk is adjacent to the top chunk, directly merge with the top chunk and go to 8

5. If the size of the chunk > max_fast, put it into the unsorted bin, and check whether there is a merge:

a. Free if there is no merger

b. There is a merge situation and it is adjacent to the top chunk, go to 8

6. If the size of the chunk < max_fast, put it into the fast bin, and check whether there is a merge:

a.fast bin does not change the state of the chunk, and it is free if there is no merger

b. If there is a merger, go to 7

7. In the fast bin, if the adjacent chunks are free, merge the two chunks and put them into the unsorted bin. If the merged size is > 64KB, the fast bins merge operation will be triggered, the chunks in the fast bins will be traversed and merged, and the merged chunks will be put into the unsorted bin. The merged chunk is adjacent to the top chunk, it will be merged into the top chunk, go to 8

8. If the size of the top chunk > mmap shrink threshold (128KB by default), for the main allocation area, it will try to return part of the top chunk to the operating system

3. Advantages and disadvantages

As the default memory manager of glibc, ptmalloc has widely satisfied the memory management of most large-scale projects, and its implementation idea also provides reference for later memory managers.

The introduction of ptmalloc has come to an end, and the next few articles will continue to discuss the masters of high-performance memory management libraries-jemalloc and tcmalloc memory management libraries.

For more fun little projects, see my Bilibili, Q skirt:

Come on Xiaoyu's personal space-Come on come on Xiaoyu's personal homepage-哔哩哔哩video哔哩哔哩come on Xiaoyu Come on personal space, provide video, audio, articles, dynamics shared by Come on Xiaoyu , Favorites, etc., follow the account of Xiaoyu Huailai, and learn about UP's dynamics as soon as possible. Programming learning group: 725022484 Share a small programming game every day~C/C++ game source code materials and various installation packages, private messages are not often seen! https://space.bilibili.com/1827181878?spm_id_from=333.1007.0.0

Guess you like

Origin blog.csdn.net/yx5666/article/details/129099732