tcmalloc是内存管理的一个库,非常好用以及高效。libhugetlbfs是大页内存管理,提高tlb命中率,对程序来说也是一个非常好以及高效的一个库。
现在我们要将两个库进行结合。第一:从动态库的层面来说,两者如果单独使用只需要在编译的时候连接这两个库就OK了。但是,但是,但是,如果两者结合使用,同时连接这两个库,你会发现最终的只会是其中的某一个库在运行,或者说只有tcmalloc在运行,大页内层根本没有使用。第一失败了,那么就是第二了。第二:从代码层面说,代码是无所不能的,那么我们研究代码,发现tcmalloc其实也要分配内存,找到它分配内存的地方进行修改。还好大页内存给了代码调用的接口。
其实我们在跟踪的时候,会发现最后在中央内存分配的时候,有几种分配方式,有sbrk的,有mmap的。
在system-alloc.cc文件中
void InitSystemAllocators(void) {
MmapSysAllocator *mmap = new (mmap_space) MmapSysAllocator();
SbrkSysAllocator *sbrk = new (sbrk_space) SbrkSysAllocator();
// In 64-bit debug mode, place the mmap allocator first since it
// allocates pointers that do not fit in 32 bits and therefore gives
// us better testing of code's 64-bit correctness. It also leads to
// less false negatives in heap-checking code. (Numbers are less
// likely to look like pointers and therefore the conservative gc in
// the heap-checker is less likely to misinterpret a number as a
// pointer).
DefaultSysAllocator *sdef = new (default_space) DefaultSysAllocator();
if (kDebugMode && sizeof(void*) > 4) {
sdef->SetChildAllocator(mmap, 0, mmap_name);
sdef->SetChildAllocator(sbrk, 1, sbrk_name);
} else {
sdef->SetChildAllocator(sbrk, 0, sbrk_name);
sdef->SetChildAllocator(mmap, 1, mmap_name);
}
sys_alloc = sdef;
}
那么我们需要在这个地方添加一个大页内存的分配方式,代码如下:
#include <config.h>
#include <errno.h> // for EAGAIN, errno
#include <fcntl.h> // for open, O_RDWR
#include <stddef.h> // for size_t, NULL, ptrdiff_t
#if defined HAVE_STDINT_H
#include <stdint.h> // for uintptr_t, intptr_t
#elif defined HAVE_INTTYPES_H
#include <inttypes.h>
#else
#include <sys/types.h>
#endif
#ifdef HAVE_MMAP
#include <sys/mman.h> // for munmap, mmap, MADV_DONTNEED, etc
#endif
#ifdef HAVE_UNISTD_H
#include <unistd.h> // for sbrk, getpagesize, off_t
#endif
#include <new> // for operator new
#include <gperftools/malloc_extension.h>
#include "base/basictypes.h"
#include "base/commandlineflags.h"
#include "base/spinlock.h" // for SpinLockHolder, SpinLock, etc
#include "common.h"
#include "internal_logging.h"
extern "C"{
#include <hugetlbfs.h>
};
extern "C"
{
extern long gethugepagesize(void);
extern int gethugepagesizes(long pagesizes[], int n_elem);
extern int getpagesizes(long pagesizes[], int n_elem);
extern void *get_huge_pages(size_t len, ghp_t flags);
extern void free_huge_pages(void *ptr);
extern void *get_hugepage_region(size_t len, ghr_t flags);
extern void free_hugepage_region(void *ptr);
}
//add bu liyu
class HugeSysAllocator : public SysAllocator {
public:
HugeSysAllocator() : SysAllocator() {
}
void* Alloc(size_t size, size_t *actual_size, size_t alignment);
};
static char huge_space[sizeof(HugeSysAllocator)];
//add buy liyu
void* HugeSysAllocator::Alloc(size_t size, size_t *actual_size,
size_t alignment) {
// sbrk will release memory if passed a negative number, so we do
// a strict check here
if (static_cast<ptrdiff_t>(size + alignment) < 0) return NULL;
// This doesn't overflow because TCMalloc_SystemAlloc has already
// tested for overflow at the alignment boundary.
size = ((size + alignment - 1) / alignment) * alignment;
// "actual_size" indicates that the bytes from the returned pointer
// p up to and including (p + actual_size - 1) have been allocated.
if (actual_size) {
*actual_size = size;
}
void* result = get_huge_pages(size,GHP_DEFAULT);
//void* result = get_hugepage_region(size,GHR_DEFAULT);
if (result == reinterpret_cast<void*>(0)) {
return NULL;
}
// Is it aligned?
uintptr_t ptr = reinterpret_cast<uintptr_t>(result);
if ((ptr & (alignment-1)) == 0)
return result;
else
{
ptr += alignment - (ptr & (alignment-1));
}
return reinterpret_cast<void*>(ptr);
#endif // HAVE_HUGE
}
只需要根据sbrk的分配方式加入大页内存,并优先调用就ok了,如果这个地方调用get_huge_pages()报错,那么就使用get_hugepage_region(),记得参数修改,多测试找一个合适的参数。
代码修改完了,那么现在就需要修改生成tcmalloc动态库的连接了。
找到在加载pthread库的地方,在后面加上 -lhugetlbfs就ok,记得把生成的hugetlbfs库拷贝到/usr/local/lib下,头文件拷贝到/usr/local/include/下,还要再ld.so.conf中去添加一下/usr/local/lib哦。
剩下的就可以使用了,效率还是杠杠的,比之前tcmalloc还要好一点哦,大约提升0.5-2层性能哦。