tcmalloc+libhugetlbfs使用

tcmalloc是内存管理的一个库,非常好用以及高效。libhugetlbfs是大页内存管理,提高tlb命中率,对程序来说也是一个非常好以及高效的一个库。

现在我们要将两个库进行结合。第一:从动态库的层面来说,两者如果单独使用只需要在编译的时候连接这两个库就OK了。但是,但是,但是,如果两者结合使用,同时连接这两个库,你会发现最终的只会是其中的某一个库在运行,或者说只有tcmalloc在运行,大页内层根本没有使用。第一失败了,那么就是第二了。第二:从代码层面说,代码是无所不能的,那么我们研究代码,发现tcmalloc其实也要分配内存,找到它分配内存的地方进行修改。还好大页内存给了代码调用的接口。

其实我们在跟踪的时候,会发现最后在中央内存分配的时候,有几种分配方式,有sbrk的,有mmap的。

在system-alloc.cc文件中

void InitSystemAllocators(void) {
  MmapSysAllocator *mmap = new (mmap_space) MmapSysAllocator();
  SbrkSysAllocator *sbrk = new (sbrk_space) SbrkSysAllocator();

  // In 64-bit debug mode, place the mmap allocator first since it
  // allocates pointers that do not fit in 32 bits and therefore gives
  // us better testing of code's 64-bit correctness.  It also leads to
  // less false negatives in heap-checking code.  (Numbers are less
  // likely to look like pointers and therefore the conservative gc in
  // the heap-checker is less likely to misinterpret a number as a
  // pointer).
  DefaultSysAllocator *sdef = new (default_space) DefaultSysAllocator();
  if (kDebugMode && sizeof(void*) > 4) {
    sdef->SetChildAllocator(mmap, 0, mmap_name);
    sdef->SetChildAllocator(sbrk, 1, sbrk_name);
  } else {
    sdef->SetChildAllocator(sbrk, 0, sbrk_name);
    sdef->SetChildAllocator(mmap, 1, mmap_name);
  }
  sys_alloc = sdef;
}

那么我们需要在这个地方添加一个大页内存的分配方式,代码如下:

#include <config.h>
#include <errno.h>                      // for EAGAIN, errno
#include <fcntl.h>                      // for open, O_RDWR
#include <stddef.h>                     // for size_t, NULL, ptrdiff_t
#if defined HAVE_STDINT_H
#include <stdint.h>                     // for uintptr_t, intptr_t
#elif defined HAVE_INTTYPES_H
#include <inttypes.h>
#else
#include <sys/types.h>
#endif
#ifdef HAVE_MMAP
#include <sys/mman.h>                   // for munmap, mmap, MADV_DONTNEED, etc
#endif
#ifdef HAVE_UNISTD_H
#include <unistd.h>                     // for sbrk, getpagesize, off_t
#endif
#include <new>                          // for operator new
#include <gperftools/malloc_extension.h>
#include "base/basictypes.h"
#include "base/commandlineflags.h"
#include "base/spinlock.h"              // for SpinLockHolder, SpinLock, etc
#include "common.h"
#include "internal_logging.h"

extern "C"{
	#include <hugetlbfs.h>
};

extern "C"
{
	extern long gethugepagesize(void);
	extern int gethugepagesizes(long pagesizes[], int n_elem);
	extern int getpagesizes(long pagesizes[], int n_elem);
	extern void *get_huge_pages(size_t len, ghp_t flags);
	extern void free_huge_pages(void *ptr);
    extern void *get_hugepage_region(size_t len, ghr_t flags);
    extern void free_hugepage_region(void *ptr);
}

//add bu liyu
class HugeSysAllocator : public SysAllocator {
public:
  HugeSysAllocator() : SysAllocator() {
  }
  void* Alloc(size_t size, size_t *actual_size, size_t alignment);
};
static char huge_space[sizeof(HugeSysAllocator)];


//add buy liyu
void* HugeSysAllocator::Alloc(size_t size, size_t *actual_size,
                              size_t alignment) {

  // sbrk will release memory if passed a negative number, so we do
  // a strict check here
  if (static_cast<ptrdiff_t>(size + alignment) < 0) return NULL;

  // This doesn't overflow because TCMalloc_SystemAlloc has already
  // tested for overflow at the alignment boundary.
  size = ((size + alignment - 1) / alignment) * alignment;

  // "actual_size" indicates that the bytes from the returned pointer
  // p up to and including (p + actual_size - 1) have been allocated.
  if (actual_size) {
    *actual_size = size;
  }

  void* result = get_huge_pages(size,GHP_DEFAULT);
  //void* result = get_hugepage_region(size,GHR_DEFAULT);
  if (result == reinterpret_cast<void*>(0)) {
    return NULL;
  }

  // Is it aligned?
  uintptr_t ptr = reinterpret_cast<uintptr_t>(result);
  if ((ptr & (alignment-1)) == 0)  
  	return result;
  else
  {
    ptr += alignment - (ptr & (alignment-1));
  }
  return reinterpret_cast<void*>(ptr);
#endif  // HAVE_HUGE
}



只需要根据sbrk的分配方式加入大页内存,并优先调用就ok了,如果这个地方调用get_huge_pages()报错,那么就使用get_hugepage_region(),记得参数修改,多测试找一个合适的参数。

代码修改完了,那么现在就需要修改生成tcmalloc动态库的连接了。

找到在加载pthread库的地方,在后面加上 -lhugetlbfs就ok,记得把生成的hugetlbfs库拷贝到/usr/local/lib下,头文件拷贝到/usr/local/include/下,还要再ld.so.conf中去添加一下/usr/local/lib哦。

剩下的就可以使用了,效率还是杠杠的,比之前tcmalloc还要好一点哦,大约提升0.5-2层性能哦。

猜你喜欢

转载自blog.csdn.net/liyu123__/article/details/84246729