C++ implements high-performance memory pool project implementation

I. Overview

In C/C++, memory management is a very difficult problem. When we write a program, we almost inevitably encounter memory allocation logic. At this time, there are some questions: whether there is enough memory Is the memory available for allocation? What if the allocation fails? How to manage its own memory usage? A series of questions. In a highly available software, it is obviously unreasonable if we simply apply for memory to the operating system and exit the software when the memory is insufficient. The correct idea should be to consider how to manage and optimize the memory already used when the memory is insufficient, so as to make the software more usable. In this project we will implement a memory pool and use a stack structure to test the allocation performance provided by our memory pool. In the end, the performance of the memory pool we want to achieve in the stack structure is much higher than using std::allocator and std::vector, as shown in the following figure:

 

Knowledge points involved in the project

Memory allocator std::allocator in C++

memory pool technology

Manual implementation of template chain stack
Performance comparison of chain stack and list stack

Introduction to memory pools

Memory pools are a form of pooling technology. Usually when we write a program, we use the keywords new delete to apply for memory to the operating system, and the consequence of this is that every time we apply for memory and release memory, we need to deal with the system calls of the operating system. Allocate the required memory. If such operations are performed too frequently, a large number of memory fragments will be found, which will reduce the performance of memory allocation, or even fail to allocate memory.

The memory pool is a technology created to solve this problem. From the concept of memory allocation, memory application is nothing more than asking for a pointer from the memory allocator. When applying for memory to the operating system,

The operating system needs to perform complex memory management scheduling before correctly assigning a corresponding pointer. In the process of allocation, we also face the risk of allocation failure.

Therefore, each time memory allocation is performed, it will consume the time to allocate memory. Let this time be T, then the total time consumed by n allocations is nT; if we determine how much memory we may need from the beginning, then in Such a memory area is allocated at the very beginning. When we need memory, we can use it directly from the allocated memory, so the total allocation time required is only T. When n is larger, more time is saved.

Second, the main function design

We want to design and implement a high-performance memory pool, so naturally we need to compare the existing memory, and to compare the memory allocation performance of the memory pool, we need to implement a structure that needs to dynamically allocate memory (for example: linked list stack) , you can write the following code for this:

#include <iostream>   // std::cout, std::endl#include <cassert>    // assert()#include <ctime>      // clock()#include <vector>     // std::vector#include "MemoryPool.hpp"  // MemoryPool<T>#include "StackAlloc.hpp"  // StackAlloc<T, Alloc>// 插入元素个数#define ELEMS 10000000// 重复次数#define REPS 100int main(){
   
       clock_t start;    // 使用 STL 默认分配器    StackAlloc<int, std::allocator<int> > stackDefault;    start = clock();    for (int j = 0; j < REPS; j++) {
   
           assert(stackDefault.empty());        for (int i = 0; i < ELEMS; i++)          stackDefault.push(i);        for (int i = 0; i < ELEMS; i++)          stackDefault.pop();    }    std::cout << "Default Allocator Time: ";    std::cout << (((double)clock() - start) / CLOCKS_PER_SEC) << "\n\n";    // 使用内存池    StackAlloc<int, MemoryPool<int> > stackPool;    start = clock();    for (int j = 0; j < REPS; j++) {
   
           assert(stackPool.empty());        for (int i = 0; i < ELEMS; i++)          stackPool.push(i);        for (int i = 0; i < ELEMS; i++)          stackPool.pop();    }    std::cout << "MemoryPool Allocator Time: ";    std::cout << (((double)clock() - start) / CLOCKS_PER_SEC) << "\n\n";    return 0;}

In the above two pieces of code, StackAlloc is a linked list stack that accepts two template parameters, the first parameter is the element type in the stack, and the second parameter is the memory allocator used by the stack.

Therefore, the template parameter of this memory allocator is the only variable in the whole comparison process, the template parameter of using the default allocator is std::allocator, and the template parameter of using the memory pool is MemoryPool.

std::allocator is the default allocator provided in the C++ standard library. Its characteristic is that when we use new to apply for memory to construct a new object, we must call the default constructor of the class object, while using std::allocator The logic of memory allocation and object construction can be separated, so that the allocated memory is raw and unconstructed.

Let's implement this linked list stack.

3. Template linked list stack

The structure of the stack is very simple, and there are no complicated logical operations. Its member functions only need to consider two basic operations: stacking and popping. For the convenience of operation, we may also need such methods: judging whether the stack is empty, clearing the stack, and obtaining the top element of the stack.

#include <memory>template <typename T>struct StackNode_{
   
     T data;  StackNode_* prev;};// T 为存储的对象类型, Alloc 为使用的分配器, 并默认使用 std::allocator 作为对象的分配器template <typename T, typename Alloc = std::allocator<T> >class StackAlloc{
   
     public:    // 使用 typedef 简化类型名    typedef StackNode_<T> Node;    typedef typename Alloc::template rebind<Node>::other allocator;    // 默认构造    StackAlloc() { head_ = 0; }    // 默认析构    ~StackAlloc() { clear(); }    // 当栈中元素为空时返回 true    bool empty() {return (head_ == 0);}    // 释放栈中元素的所有内存    void clear();    // 压栈    void push(T element);    // 出栈    T pop();    // 返回栈顶元素    T top() { return (head_->data); }  private:    //     allocator allocator_;    // 栈顶    Node* head_;};

Simple logic such as constructing, destructing, judging whether the stack is empty, and returning the top element of the stack is very simple, and is directly implemented in the above definition. Let's implement the three clear(), push() and pop() important logic:

// 释放栈中元素的所有内存void clear() {
   
     Node* curr = head_;  // 依次出栈  while (curr != 0)  {
   
       Node* tmp = curr->prev;    // 先析构, 再回收内存    allocator_.destroy(curr);    allocator_.deallocate(curr, 1);    curr = tmp;  }  head_ = 0;}// 入栈void push(T element) {
   
     // 为一个节点分配内存  Node* newNode = allocator_.allocate(1);  // 调用节点的构造函数  allocator_.construct(newNode, Node());  // 入栈操作  newNode->data = element;  newNode->prev = head_;  head_ = newNode;}// 出栈T pop() {
   
     // 出栈操作 返回出栈元素  T result = head_->data;  Node* tmp = head_->prev;  allocator_.destroy(head_);  allocator_.deallocate(head_, 1);  head_ = tmp;  return result;}

So far, we have completed the entire template linked list stack. Now we can comment out the code that uses the memory pool part of the main() function to test the memory allocation of this linked list stack. We can get the following results:

 In the default memory allocator using std::allocator, in

#define ELEMS 10000000
#define REPS 100

Under the conditions, it took nearly a minute in total.

If you feel that it takes a long time and you are unwilling to wait, you can try to reduce these two values

Summarize

In this section, we implement a template linked list stack for testing performance comparison. The current code is as follows. In the next section, we start implementing our high-performance memory pool in detail.

// StackAlloc.hpp#ifndef STACK_ALLOC_H#define STACK_ALLOC_H#include <memory>template <typename T>struct StackNode_{
   
     T data;  StackNode_* prev;};// T 为存储的对象类型, Alloc 为使用的分配器,// 并默认使用 std::allocator 作为对象的分配器template <class T, class Alloc = std::allocator<T> >class StackAlloc{
   
     public:    // 使用 typedef 简化类型名    typedef StackNode_<T> Node;    typedef typename Alloc::template rebind<Node>::other allocator;    // 默认构造    StackAlloc() { head_ = 0; }    // 默认析构    ~StackAlloc() { clear(); }    // 当栈中元素为空时返回 true    bool empty() {return (head_ == 0);}    // 释放栈中元素的所有内存    void clear() {
   
         Node* curr = head_;      while (curr != 0)      {
   
           Node* tmp = curr->prev;        allocator_.destroy(curr);        allocator_.deallocate(curr, 1);        curr = tmp;      }      head_ = 0;    }    // 入栈    void push(T element) {
   
         // 为一个节点分配内存      Node* newNode = allocator_.allocate(1);      // 调用节点的构造函数      allocator_.construct(newNode, Node());      // 入栈操作      newNode->data = element;      newNode->prev = head_;      head_ = newNode;    }    // 出栈    T pop() {
   
         // 出栈操作 返回出栈结果      T result = head_->data;      Node* tmp = head_->prev;      allocator_.destroy(head_);      allocator_.deallocate(head_, 1);      head_ = tmp;      return result;    }    // 返回栈顶元素    T top() { return (head_->data); }  private:    allocator allocator_;    Node* head_;};#endif // STACK_ALLOC_H​​​​​​
// main.cpp#include <iostream>#include <cassert>#include <ctime>#include <vector>// #include "MemoryPool.hpp"#include "StackAlloc.hpp"// 根据电脑性能调整这些值// 插入元素个数#define ELEMS 25000000// 重复次数#define REPS 50int main(){
   
       clock_t start;   // 使用默认分配器    StackAlloc<int, std::allocator<int> > stackDefault;    start = clock();    for (int j = 0; j < REPS; j++) {
   
           assert(stackDefault.empty());        for (int i = 0; i < ELEMS; i++)          stackDefault.push(i);        for (int i = 0; i < ELEMS; i++)          stackDefault.pop();    }    std::cout << "Default Allocator Time: ";    std::cout << (((double)clock() - start) / CLOCKS_PER_SEC) << "\n\n";    // 使用内存池    // StackAlloc<int, MemoryPool<int> > stackPool;    // start = clock();    // for (int j = 0; j < REPS; j++) {
   
       //     assert(stackPool.empty());    //     for (int i = 0; i < ELEMS; i++)    //       stackPool.push(i);    //     for (int i = 0; i < ELEMS; i++)    //       stackPool.pop();    // }    // std::cout << "MemoryPool Allocator Time: ";    // std::cout << (((double)clock() - start) / CLOCKS_PER_SEC) << "\n\n";    return 0;}

2. Design memory pool

In the previous experiment, we used the default constructor in the template linked list stack to manage the element memory in the stack operation, which involved rebind::other, allocate(), dealocate(), construct(), destroy() these critical interfaces. So in order to make the code directly usable, we should also design the same interface in the memory pool:

#ifndef MEMORY_POOL_HPP#define MEMORY_POOL_HPP
#include <climits>#include <cstddef>
template <typename T, size_t BlockSize = 4096>class MemoryPool{
   
     public:    // 使用 typedef 简化类型书写    typedef T*              pointer;
    // 定义 rebind<U>::other 接口    template <typename U> struct rebind {
   
         typedef MemoryPool<U> other;    };
    // 默认构造, 初始化所有的槽指针    // C++11 使用了 noexcept 来显式的声明此函数不会抛出异常    MemoryPool() noexcept {
   
         currentBlock_ = nullptr;      currentSlot_ = nullptr;      lastSlot_ = nullptr;      freeSlots_ = nullptr;    }
    // 销毁一个现有的内存池    ~MemoryPool() noexcept;
    // 同一时间只能分配一个对象, n 和 hint 会被忽略    pointer allocate(size_t n = 1, const T* hint = 0);
    // 销毁指针 p 指向的内存区块    void deallocate(pointer p, size_t n = 1);
    // 调用构造函数    template <typename U, typename... Args>    void construct(U* p, Args&&... args);
    // 销毁内存池中的对象, 即调用对象的析构函数    template <typename U>    void destroy(U* p) {
   
         p->~U();    }
  private:    // 用于存储内存池中的对象槽,     // 要么被实例化为一个存放对象的槽,     // 要么被实例化为一个指向存放对象槽的槽指针    union Slot_ {
   
         T element;      Slot_* next;    };
    // 数据指针    typedef char* data_pointer_;    // 对象槽    typedef Slot_ slot_type_;    // 对象槽指针    typedef Slot_* slot_pointer_;
    // 指向当前内存区块    slot_pointer_ currentBlock_;    // 指向当前内存区块的一个对象槽    slot_pointer_ currentSlot_;    // 指向当前内存区块的最后一个对象槽    slot_pointer_ lastSlot_;    // 指向当前内存区块中的空闲对象槽    slot_pointer_ freeSlots_;
    // 检查定义的内存池大小是否过小    static_assert(BlockSize >= 2 * sizeof(slot_type_), "BlockSize too small.");};
#endif // MEMORY_POOL_HPP

As can be seen in the above class design, in this memory pool, a linked list is actually used to manage the memory blocks of the entire memory pool. The memory pool first defines a fixed-size basic memory block (Block), and then defines an object slot (Slot_) that can be instantiated to store object memory slots and a combination of object slot pointers. Then in the block, four pointers with key properties are defined, and their functions are:

currentBlock_: pointer to the current memory block

currentSlot_: points to the object slot in the current memory block

lastSlot_: Points to the last object slot in the current memory block

freeSlots_: Points to all free object slots in the current memory block

After sorting out the design structure of the entire memory pool, we can start to implement the key logic.

3. Realization

MemoryPool::construct() implementation

The logic of MemoryPool::construct() is the simplest, all we need to implement is to call the constructor of the letter object, so:

// 调用构造函数, 使用 std::forward 转发变参模板template <typename U, typename... Args>void construct(U* p, Args&&... args) {
   
       new (p) U (std::forward<Args>(args)...);}

MemoryPool::deallocate() implementation

MemoryPool::deallocate() is called after the object in the object slot is destroyed. The main purpose is to destroy the memory slot. Its logic is not complicated either:

​​​​​

// 销毁指针 p 指向的内存区块void deallocate(pointer p, size_t n = 1) {
   
     if (p != nullptr) {
   
       // reinterpret_cast 是强制类型转换符    // 要访问 next 必须强制将 p 转成 slot_pointer_    reinterpret_cast<slot_pointer_>(p)->next = freeSlots_;    freeSlots_ = reinterpret_cast<slot_pointer_>(p);  }}

MemoryPool::~MemoryPool() implementation

The destructor is responsible for destroying the entire memory pool, so we need to delete the memory blocks originally requested from the operating system one by one:

​​​​​​

// 销毁一个现有的内存池~MemoryPool() noexcept {
   
     // 循环销毁内存池中分配的内存区块  slot_pointer_ curr = currentBlock_;  while (curr != nullptr) {
   
       slot_pointer_ prev = curr->next;    operator delete(reinterpret_cast<void*>(curr));    curr = prev;  }}

MemoryPool::allocate() implementation

MemoryPool::allocate() is undoubtedly the key to the entire memory pool, but in fact, after clarifying the design of the entire memory pool, its implementation is not complicated. The specific implementation is as follows:

​​​​​​

// 同一时间只能分配一个对象, n 和 hint 会被忽略pointer allocate(size_t n = 1, const T* hint = 0) {
   
     // 如果有空闲的对象槽,那么直接将空闲区域交付出去  if (freeSlots_ != nullptr) {
   
       pointer result = reinterpret_cast<pointer>(freeSlots_);    freeSlots_ = freeSlots_->next;    return result;  } else {
   
       // 如果对象槽不够用了,则分配一个新的内存区块    if (currentSlot_ >= lastSlot_) {
   
         // 分配一个新的内存区块,并指向前一个内存区块      data_pointer_ newBlock = reinterpret_cast<data_pointer_>(operator new(BlockSize));      reinterpret_cast<slot_pointer_>(newBlock)->next = currentBlock_;      currentBlock_ = reinterpret_cast<slot_pointer_>(newBlock);      // 填补整个区块来满足元素内存区域的对齐要求      data_pointer_ body = newBlock + sizeof(slot_pointer_);      uintptr_t result = reinterpret_cast<uintptr_t>(body);      size_t bodyPadding = (alignof(slot_type_) - result) % alignof(slot_type_);      currentSlot_ = reinterpret_cast<slot_pointer_>(body + bodyPadding);      lastSlot_ = reinterpret_cast<slot_pointer_>(newBlock + BlockSize - sizeof(slot_type_) + 1);    }    return reinterpret_cast<pointer>(currentSlot_++);  }}

4. Performance comparison with std::vector

We know that for stacks, chain stacks are actually not the best way to implement them, because a stack of this structure will inevitably involve pointer-related operations, and at the same time, it will also consume a certain amount of space to store between nodes. pointer. In fact, we can use the two operations of push_back() and pop_back() in std::vector to simulate a stack. Let's compare the performance of this std::vector and the memory pool we implemented. Who is low, we add the following code to the main function:

// 比较内存池和 std::vector 之间的性能    std::vector<int> stackVector;    start = clock();    for (int j = 0; j < REPS; j++) {
   
           assert(stackVector.empty());        for (int i = 0; i < ELEMS; i++)          stackVector.push_back(i);        for (int i = 0; i < ELEMS; i++)          stackVector.pop_back();    }    std::cout << "Vector Time: ";    std::cout << (((double)clock() - start) / CLOCKS_PER_SEC) << "\n\n";

At this point, when we recompile the code, we can see the gap:

 

The first is that the linked list stack using the default allocator is the slowest, and the second is the stack structure simulated by std::vector, which greatly reduces the time based on the linked list stack.

The implementation of std::vector is actually similar to the memory pool. When the space of std::vector is not enough, it will abandon the current memory area and re-apply for a larger area, and copy the data in the current memory area as a whole to in the new area.

Finally, for the memory pool we implemented, the time consumed is the least, that is, the memory allocation performance is the best, and this project is completed.

Summarize

In this section, we implement a memory pool that was not implemented in our experiments in the previous section, completing the goal of the entire project. This memory pool is not only compact but also efficient. The complete code of the entire memory pool is as follows:

#ifndef MEMORY_POOL_HPP#define MEMORY_POOL_HPP
#include <climits>#include <cstddef>
template <typename T, size_t BlockSize = 4096>class MemoryPool{
   
     public:    // 使用 typedef 简化类型书写    typedef T*              pointer;
    // 定义 rebind<U>::other 接口    template <typename U> struct rebind {
   
         typedef MemoryPool<U> other;    };
    // 默认构造    // C++11 使用了 noexcept 来显式的声明此函数不会抛出异常    MemoryPool() noexcept {
   
         currentBlock_ = nullptr;      currentSlot_ = nullptr;      lastSlot_ = nullptr;      freeSlots_ = nullptr;    }
    // 销毁一个现有的内存池    ~MemoryPool() noexcept {
   
         // 循环销毁内存池中分配的内存区块      slot_pointer_ curr = currentBlock_;      while (curr != nullptr) {
   
           slot_pointer_ prev = curr->next;        operator delete(reinterpret_cast<void*>(curr));        curr = prev;      }    }
    // 同一时间只能分配一个对象, n 和 hint 会被忽略    pointer allocate(size_t n = 1, const T* hint = 0) {
   
         if (freeSlots_ != nullptr) {
   
           pointer result = reinterpret_cast<pointer>(freeSlots_);        freeSlots_ = freeSlots_->next;        return result;      }      else {
   
           if (currentSlot_ >= lastSlot_) {
   
             // 分配一个内存区块          data_pointer_ newBlock = reinterpret_cast<data_pointer_>(operator new(BlockSize));          reinterpret_cast<slot_pointer_>(newBlock)->next = currentBlock_;          currentBlock_ = reinterpret_cast<slot_pointer_>(newBlock);          data_pointer_ body = newBlock + sizeof(slot_pointer_);          uintptr_t result = reinterpret_cast<uintptr_t>(body);          size_t bodyPadding = (alignof(slot_type_) - result) % alignof(slot_type_);          currentSlot_ = reinterpret_cast<slot_pointer_>(body + bodyPadding);          lastSlot_ = reinterpret_cast<slot_pointer_>(newBlock + BlockSize - sizeof(slot_type_) + 1);        }        return reinterpret_cast<pointer>(currentSlot_++);      }    }
    // 销毁指针 p 指向的内存区块    void deallocate(pointer p, size_t n = 1) {
   
         if (p != nullptr) {
   
           reinterpret_cast<slot_pointer_>(p)->next = freeSlots_;        freeSlots_ = reinterpret_cast<slot_pointer_>(p);      }    }
    // 调用构造函数, 使用 std::forward 转发变参模板    template <typename U, typename... Args>    void construct(U* p, Args&&... args) {
   
         new (p) U (std::forward<Args>(args)...);    }
    // 销毁内存池中的对象, 即调用对象的析构函数    template <typename U>    void destroy(U* p) {
   
         p->~U();    }
  private:    // 用于存储内存池中的对象槽    union Slot_ {
   
         T element;      Slot_* next;    };
    // 数据指针    typedef char* data_pointer_;    // 对象槽    typedef Slot_ slot_type_;    // 对象槽指针    typedef Slot_* slot_pointer_;
    // 指向当前内存区块    slot_pointer_ currentBlock_;    // 指向当前内存区块的一个对象槽    slot_pointer_ currentSlot_;    // 指向当前内存区块的最后一个对象槽    slot_pointer_ lastSlot_;    // 指向当前内存区块中的空闲对象槽    slot_pointer_ freeSlots_;    // 检查定义的内存池大小是否过小    static_assert(BlockSize >= 2 * sizeof(slot_type_), "BlockSize too small.");};
#endif // MEMORY_POOL_HPP

Written at the end: For those who are ready to learn C/C++ programming, if you want to better improve your core programming skills (internal skills), you might as well start now

Penguin colony base learning:

C language/C++ programming base icon-default.png?t=M85Bhttps://jq.qq.com/?_wv=1027&k=OH5psI6E

Guess you like

Origin blog.csdn.net/yx5666/article/details/127492365