foreword
This article mainly talks about the memory management of Curve from the perspective of the user. It does not require some memory management theories in the development of developers. The purpose is that we share the memory management and memory problem analysis of Linux in the software process with the Curve practice process . Two related questions.
-
The memory on the chunkserver cannot be released
-
mds shows slow growth
The memory usage in most development phases is very serious. In the discovery phase, the high-pressure stability test (7*7*24 hours or more), and test exceptions are usually more prone to problems during runtime. Of course, this also requires us to suddenly appear during the testing phase. In addition to paying attention In addition to io-related test indicators, monitor the usage of resources such as server memory/CPU/network card and whether the collected indicators meet the conditions. For example, the above problem md grows slowly . If you only pay attention to whether it is normal, it will not be found during the testing phase. It's also not easy after memory issues arise, especially in the case of large software.
This article mainly talks about Curve's memory management from the perspective of the user. It does not require some memory management theories in the development of developers. The purpose is to share with you some of our views on Linux memory management and memory problem analysis in the software process. This article will expand from the following aspects:
-
Curve software instructions calculate the distribution.
-
memory policy. Explain the necessity of the allocator, as well as the problems that need to be solved and its characteristics, and illustrate the memory management method including an allocator through an example.
-
Curve's memory management. Describes the current Curve software memory allocator selection and why.
Curve is a blockchain storage cloud computing foundation) sandbox, which is composed of Yijian.com projects (CNCF, Yiwei, cloud storage operation system), and consists of two parts: block Curve and cloud file storage operation system.
2
geographic distribution
Before talking about computing management, first briefly introduce the relevant knowledge of distribution. The physical memory is the real memory stick; the existence of virtual memory hides the concept of physical memory for the process, and provides a convenient interface and complexity for the process . Need to abstract virtual memory? How are virtual memory and physical memory mapped to the management layer? What is the scope of the physical processing discussion? The problem with these virtual memory is not here.
Linux 为每个进程维护了一个单独的虚拟地址空间,包括两个部分进程虚拟存储器(用户空间)和内核虚拟存储器(内核空间),本文主要讨论进程可操作的用户空间,形式如下图。
Now we use our
pmap View the distribution of the running curve-mds virtual space.
pmap View memory screen information for a process, the command reads
Information in /proc/[pid]/maps.
// pmap -X {进程id} 查看进程内存分布
sudo pmap -X 2804620
// pmap 获取的 curve-mds 内存分布有很多项
Address Perm Offset Device Inode Size Rss Pss Referenced Anonymous ShmemPmdMapped Shared_Hugetlb Private_Hugetlb Swap SwapPss Locked Mapping
// 为了方便展示这里把从 Pss 后面的数值删除了, 中间部分地址做了省略
2804620: /usr/bin/curve-mds -confPath=/etc/curve/mds.conf -mdsAddr=127.0.0.1:6666 -log_dir=/data/log/curve/mds -graceful_quit_on_sigterm=true -stderrthreshold=3
Address Perm Offset Device Inode Size Rss Pss Mapping
c000000000 rw-p 00000000 00:00 0 65536 1852 1852
559f0e2b9000 r-xp 00000000 41:42 37763836 9112 6296 6296 curve-mds
559f0eb9f000 r--p 008e5000 41:42 37763836 136 136 136 curve-mds
559f0ebc1000 rw-p 00907000 41:42 37763836 4 4 4 curve-mds
559f0ebc2000 rw-p 00000000 00:00 0 10040 4244 4244
559f1110a000 rw-p 00000000 00:00 0 2912 2596 2596 [heap]
7f6124000000 rw-p 00000000 00:00 0 156 156 156
7f6124027000 ---p 00000000 00:00 0 65380 0 0
7f612b7ff000 ---p 00000000 00:00 0 4 0 0
7f612b800000 rw-p 00000000 00:00 0 8192 8 8
7f612c000000 rw-p 00000000 00:00 0 132 4 4
7f612c021000 ---p 00000000 00:00 0 65404 0 0
.....
7f6188cff000 ---p 0026c000 41:42 37750237 2044 0 0
7f61895b7000 r-xp 00000000 41:42 50201214 96 96 0 libpthread-2.24.so
7f61895cf000 ---p 00018000 41:42 50201214 2044 0 0 libpthread-2.24.so
7f61897ce000 r--p 00017000 41:42 50201214 4 4 4 libpthread-2.24.so
7f61897cf000 rw-p 00018000 41:42 50201214 4 4 4 libpthread-2.24.so
7f61897d0000 rw-p 00000000 00:00 0 16 4 4
7f61897d4000 r-xp 00000000 41:42 50200647 16 16 0 libuuid.so.1.3.0
7f61897d8000 ---p 00004000 41:42 50200647 2044 0 0 libuuid.so.1.3.0
7f61899d7000 r--p 00003000 41:42 50200647 4 4 4 libuuid.so.1.3.0
7f61899d8000 rw-p 00004000 41:42 50200647 4 4 4 libuuid.so.1.3.0
7f61899d9000 r-xp 00000000 41:42 37617895 9672 8904 8904 libetcdclient.so
7f618a34b000 ---p 00972000 41:42 37617895 2048 0 0 libetcdclient.so
7f618a54b000 r--p 00972000 41:42 37617895 6556 5664 5664 libetcdclient.so
7f618abb2000 rw-p 00fd9000 41:42 37617895 292 252 252 libetcdclient.so
7f618abfb000 rw-p 00000000 00:00 0 140 60 60
7f618ac1e000 r-xp 00000000 41:42 50201195 140 136 0 ld-2.24.so
7f618ac4a000 rw-p 00000000 00:00 0 1964 1236 1236
7f618ae41000 r--p 00023000 41:42 50201195 4 4 4 ld-2.24.so
7f618ae42000 rw-p 00024000 41:42 50201195 4 4 4 ld-2.24.so
7f618ae43000 rw-p 00000000 00:00 0 4 4 4
7fffffd19000 rw-p 00000000 00:00 0 132 24 24 [stack]
7fffffdec000 r--p 00000000 00:00 0 8 0 0 [vvar]
7fffffdee000 r-xp 00000000 00:00 0 8 4 0 [vdso]
ffffffffff600000 r-xp 00000000 00:00 0 4 0 0 [vsyscall]
======= ===== =====
1709344 42800 37113
上面输出中进程实际占用的空间是从 0x559f0e2b9000 开始,
-
In the above output, the process actually starts from 0x559f0e2b9000, not 0x400000000 in the address allocation picture above. This is because of the address allocation space (ASLR), which is used to generate process space (such as stack, library or heap). Known address attack. The difference between 1 and 2 is that the key part of modernization is different; 0 means off.
-
Next 0x559F0E2B90000x559F0EB9F000 0x559F0EBC1000 three three three address yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes but but but but this has different different curve-mds is sprite from related From the point of view, it includes code, data, BSS segment, etc.; from the perspective of multiple memory types, the operation does not include the content involved in each segment, but only related to the content. The representative, mainly, so the segment with the same authority as the operation problem can be loaded here, that is, the code segment we use as the representative, the code segment as the authority, the data segment as the outstanding segment and the data segment as the authority segment The permission of the BSS segment, with the data segment as the writable segment.
-
Then start at 0x559f1110a000 to meet the runtime heap in the above figure, and the memory location dynamically allocated at runtime will be carried out during this process. We found that there is also a frequent offset in the .bss section.
-
Execute 0x7f612400000000 to start, which is the image on the mmap dynamic mapping library, the user, the large memory contained in this area, etc. Memory, which will be expanded in the next section on memory allocation strategy, is also the focus of this article.
-
Turn on 0x7fffffd19000 to begin with the stack space, generally several megabytes.
-
The vvar and vsys calls are to implement virtual function calls to speed up some system calls, and call a program so you can call the system call without entering the kernel state. The zone is not here.
3
allocation strategy
We do this with two weekly system calls called malloc and Memory Mapping Region: brk and mmmap
-
The region heap allocated by brk
-
mmap-allocated area memory-mapped area
If developers are allowed to use the system call brk and mmap to allocate and release memory in software development, it is easy to use it directly in development, and generally it is rarely used. Memory management library, memory management manager for currently available resource allocators: there is a free interface to the resource library tcmloc, and glibc uses system calls to these libraries' regions. A generic memory allocator should have the following characteristics:
-
Additional application space is only consumed as simply as needed. For example, for 5k memory, the allocator allocates 10k to him, which will cause a waste of space.
-
Assigned acceleration distance.
-
Below we combine the picture to feel that there is no trace.
-
Versatility, versatility, portability, and easy debugging.
We show the memory management method of glibc through the following picture
-
malloc(30k) allocates memory by extending the top of the heap through the system call brk.
-
malloc(20k) continues to expand the top of the heap through the system call brk.
-
malloc(200k) requests memory greater than 128K by default
(determined by M_MMAP_THRESHOLD, the default size is 128K, which can be adjusted), then use the system call mmap to allocate memory.
-
free (this part of the space is also managed by ptmalloc. The two steps of malloc in steps 1 and 2 can call brk to expand when we allocate space without the top of the heap. The space for the system is that this space is a bunch of simultaneous operations The top of the heap. The 10k space is from the descending space, without the need for brk to apply. Consider a situation where the top space has been occupied, and some of the heap space is released by the application here, but because this space is no longer used , on it will form memory fragmentation.
-
free(20) After this part of space is freed by the application program, the 2-segment and 30k-segment areas of ptmalloc are merged. If the block at the top of the heap exceeds M_TRIM_THREASHOLD, the block area will be split and assigned to the operation.
-
The space allocated by free(200k) mmap will be directly returned to the system.
How does ptloc allocate multiprocessing programs? Top, multiple application space of HEAP_MAX_Z, the application in most areas is the application, for the operation, allocation, competition area of most areas, it is easy to apply. ptmalloc contains multiple allocation areas using deployment types: main allocation area and dynamic allocation area.
-
Main allocation area: memory will be allocated in two areas: the heap and the memory mapping area;
-
Dynamic allocation area: Allocate memory in the memory mapping area, and the default is the size of each application in 64 systems. The main thread bit and the thread that executes malloc first use different dynamic allocation areas, and the number of dynamic allocation areas will not increase. The number of dynamic allocation areas is at most (2 cores + 1) for 32-bit systems, and for 64-bit systems The maximum is (8 cores + 1).
Let's look at the space allocation in this case as an example of the following problem:
// 共有三个线程
// 主线程:分配一次 4k 空间
// 线程1: 分配 100 次 4k 空间
// 线程2: 分配 100 次 4k 空间
# include <stdio.h>
# include <stdlib.h>
# include <pthread.h>
# include <unistd.h>
# include <sys/types.h>
void* threadFunc(void* id) {
std::vector<char *> malloclist;
for (int i = 0; i < 100; i++) {
malloclist.emplace_back((char*) malloc(1024 * 4));
}
sleep(300); // 这里等待是为查看内存分布
}
int main() {
pthread_t t1,t2;
int id1 = 1;
int id2 = 2;
void* s;
int ret;
char* addr;
addr = (char*) malloc(4 * 1024);
pthread_create(&t1, NULL, threadFunc, (void *) &id1);
pthread_create(&t2, NULL, threadFunc, (void *) &id2);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
return 0;
}
Let's use pmap to view the cost distribution of the program:
741545: ./memory_test
Address Perm Offset Device Inode Size Rss Pss Mapping
56127705a000 r-xp 00000000 08:02 62259273 4 4 4 memory_test
56127725a000 r--p 00000000 08:02 62259273 4 4 4 memory_test
56127725b000 rw-p 00001000 08:02 62259273 4 4 4 memory_test
5612784b9000 rw-p 00000000 00:00 0 132 8 8 [heap]
**7f0df0000000 rw-p 00000000 00:00 0 404 404 404
7f0df0065000 ---p 00000000 00:00 0 65132 0 0
7f0df8000000 rw-p 00000000 00:00 0 404 404 404
7f0df8065000 ---p 00000000 00:00 0 65132 0 0**
7f0dff467000 ---p 00000000 00:00 0 4 0 0
7f0dff468000 rw-p 00000000 00:00 0 8192 8 8
7f0dffc68000 ---p 00000000 00:00 0 4 0 0
7f0dffc69000 rw-p 00000000 00:00 0 8192 8 8
7f0e00469000 r-xp 00000000 08:02 50856517 1620 1052 9 libc-2.24.so
7f0e005fe000 ---p 00195000 08:02 50856517 2048 0 0 libc-2.24.so
7f0e007fe000 r--p 00195000 08:02 50856517 16 16 16 libc-2.24.so
7f0e00802000 rw-p 00199000 08:02 50856517 8 8 8 libc-2.24.so
7f0e00804000 rw-p 00000000 00:00 0 16 12 12
7f0e00808000 r-xp 00000000 08:02 50856539 96 96 1 libpthread-2.24.so
7f0e00820000 ---p 00018000 08:02 50856539 2044 0 0 libpthread-2.24.so
7f0e00a1f000 r--p 00017000 08:02 50856539 4 4 4 libpthread-2.24.so
7f0e00a20000 rw-p 00018000 08:02 50856539 4 4 4 libpthread-2.24.so
7f0e00a21000 rw-p 00000000 00:00 0 16 4 4
7f0e00a25000 r-xp 00000000 08:02 50856513 140 140 1 ld-2.24.so
7f0e00c31000 rw-p 00000000 00:00 0 16 16 16
7f0e00c48000 r--p 00023000 08:02 50856513 4 4 4 ld-2.24.so
7f0e00c49000 rw-p 00024000 08:02 50856513 4 4 4 ld-2.24.so
7f0e00c4a000 rw-p 00000000 00:00 0 4 4 4
7ffe340be000 rw-p 00000000 00:00 0 132 12 12 [stack]
7ffe3415c000 r--p 00000000 00:00 0 8 0 0 [vvar]
7ffe3415e000 r-xp 00000000 00:00 0 8 4 0 [vdso]
ffffffffff600000 r-xp 00000000 00:00 0 4 0 0 [vsyscall]
====== ==== ===
153800 2224 943
Pay attention to the part with thick lines, the blue area is 65536K, of which 404K is rw-p (can be added and written) permission, 65132K is --p (not writable) permission; when the yellow area is allocated, ptmalloc gives one respectively Dynamic partition, apply for 64M memory each time, and then split it to the application from 64M.
There is also an anti-program phenomenon: we use s -f -e "brk, mmap, munmap" -p {pid} to track and check the malloc system call:
mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f624a169000
strace: Process 774601 attached
[pid 774018] mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f6249968000
[pid 774601] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f6241968000
[pid 774601] munmap(0x7f6241968000, 40468480strace: Process 774602 attached
) = 0
[pid 774601] munmap(0x7f6248000000, 26640384) = 0
[pid 774602] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f623c000000
[pid 774602] munmap(0x7f6240000000, 67108864) = 0
Here [774018] 8m+4K space thread thread 1 [774601] first MMAP the 128m space, and then return the part of 0x7f6241968000 is 0x7F6244000000 ~ 0x7F6248000000. The purpose of applying first and then returning is to let the start and end addresses of the allocated memory be byte-focused.
3
Curve's memory management
The allocators are selected in the curve: ptmalloc and terminal jemalloc. Among them, MDS uses the default ptmalloc, and Chunkserver and Client use jemalloc.
The two mentioned in this article are explained here. The first is the slow growth of MDS, the phenomenon is the daily growth of 3G. The analysis process for this problem is as follows:
1. The first is to use pmap memory allocation. From here it is suspected that there is a motive.
2815659: /usr/bin/curve-mds -confPath=/etc/curve/mds.conf -mdsAddr=*.*.*.*:6666 -log_dir=/data/log/curve/mds -graceful_quit_on_sigterm=true -stderrthreshold=3
Address Perm Offset Device Inode Size Rss Pss Referenced Anonymous ShmemPmdMapped Shared_Hugetlb Private_Hugetlb Swap SwapPss Locked Mapping
c000000000 rw-p 00000000 00:00 0 8192 4988 4988 4988 4988 0 0 0 0 0 0
c000800000 rw-p 00000000 00:00 0 57344 0 0 0 0 0 0 0 0 0 0
557c5abb6000 r-xp 00000000 41:42 55845493 9112 6488 6488 6488 0 0 0 0 0 0 0 /usr/bin/curve-mds
557c5b49c000 r--p 008e5000 41:42 55845493 136 136 136 136 136 0 0 0 0 0 0 /usr/bin/curve-mds
557c5b4be000 rw-p 00907000 41:42 55845493 4 4 4 4 4 0 0 0 0 0 0 /usr/bin/curve-mds
557c5b4bf000 rw-p 00000000 00:00 0 10040 2224 2224 2224 2224 0 0 0 0 0 0
557c5cce2000 rw-p 00000000 00:00 0 5604 5252 5252 5252 5252 0 0 0 0 0 0 [heap]
7f837f7ff000 ---p 00000000 00:00 0 4 0 0 0 0 0 0 0 0 0 0
7f837f800000 rw-p 00000000 00:00 0 8192 8 8 8 8 0 0 0 0 0 0
7f8380000000 rw-p 00000000 00:00 0 132 12 12 12 12 0 0 0 0 0 0
......
7fbcf8000000 rw-p 00000000 00:00 0 65536 65536 65536 65536 65536 0 0 0 0 0 0
7fbcfc000000 rw-p 00000000 00:00 0 65536 65536 65536 65528 65536 0 0 0 0 0 0
7fbd04000000 rw-p 00000000 00:00 0 65536 65536 65536 65520 65536 0 0 0 0 0 0
7fbd08000000 rw-p 00000000 00:00 0 65536 65536 65536 65536 65536 0 0 0 0 0 0
7fbd0c000000 rw-p 00000000 00:00 0 65536 65536 65536 65528 65536 0 0 0 0 0 0
7fbd10000000 rw-p 00000000 00:00 0 65536 65536 65536 65524 65536 0 0 0 0 0 0
7fbd14000000 rw-p 00000000 00:00 0 65536 65536 65536 65532 65536 0 0 0 0 0 0
7fbd18000000 rw-p 00000000 00:00 0 65536 65536 65536 65536 65536 0 0 0 0 0 0
7fbd1c000000 rw-p 00000000 00:00 0 65536 65536 65536 65524 65536 0 0 0 0 0 0
7fbd20000000 rw-p 00000000 00:00 0 65536 65536 65536 65524 65536 0 0 0 0 0 0
7fbd24000000 rw-p 00000000 00:00 0 65536 65536 65536 65512 65536 0 0 0 0 0 0
7fbd28000000 rw-p 00000000 00:00 0 65536 65536 65536 65520 65536 0 0 0 0 0 0
7fbd2c000000 rw-p 00000000 00:00 0 65536 65536 65536 65520 65536 0 0 0 0 0 0
7fbd30000000 rw-p 00000000 00:00 0 65536 65536 65536 65516 65536 0 0 0 0 0 0
......
======= ====== ====== ========== ========= ============== ============== =============== ==== ======= ======
7814504 272928 263610 272928 248772 0 0 0 0 0 0 KB
Check the relevant pressure monitoring indicators on the MDS, and find that the pressure on the MDS is small, and some surface iops are in the test, which should not be caused by the pressure control on the PC.
Use gdb -p {pid} attach to track the thread, dump memory mem.bin {addr1} {addr2} to get the memory of the specified address segment, and then check some of the memory. The content below the basic address is part of the dumped content, many of which are 01 Strings, as well as some measurement information and other information, can basically be divided into several pieces.
Arrange the code according to these points to see if there is any display. 01 is the fragment information of our file information, string information, file and mapping information, which is converted into a key-value median for encoding. Those that will allocate memory are GetFileInfo, GetFileInfo, and other interfaces, and other interfaces, and use the configuration information configured by egg after being found in these interfaces. MDS data transfer to the memory needs to be released by C management. The following part of the memory was forgotten to be released when writing the code.
During the trial period, I also tried it once. ,Order:
valgrind --tool=memcheck --leak-check=full --show-reachable=yes
--trace-children=yes --track-origins=yes /usr/bin/curve-mds
-confPath=/etc/curve/mds.conf -mdsAddr=*.*.*.*:6666
-log_dir=/data/log/curve/mds -graceful_quit_on_sigterm=true -stderrthreshold=3
The result is very long, part of it is intercepted, and there is a more obvious prompt in the yellow marked area:
==1559781== 13,440 bytes in 40 blocks are possibly lost in loss record 2,296 of 2,367
==1559781== at 0x4C2DBC5: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1559781== by 0x4011E31: allocate_dtv (dl-tls.c:322)
==1559781== by 0x40127BD: _dl_allocate_tls (dl-tls.c:539)
==1559781== by 0x628A189: allocate_stack (allocatestack.c:584)
==1559781== by 0x628A189: pthread_create@@GLIBC_2.2.5 (pthread_create.c:663)
==1559781== by 0x54F88D: bthread::TaskControl::add_workers(int) (in /usr/bin/curve-mds)
==1559781== by 0x54147B: bthread_setconcurrency (in /usr/bin/curve-mds)
==1559781== by 0x35A0C9: brpc::Server::Init(brpc::ServerOptions const*) (in /usr/bin/curve-mds)
==1559781== by 0x35AAD7: brpc::Server::StartInternal(in_addr const&, brpc::PortRange const&, brpc::ServerOptions const*) (in /usr/bin/curve-mds)
==1559781== by 0x35BCBC: brpc::Server::Start(butil::EndPoint const&, brpc::ServerOptions const*) (in /usr/bin/curve-mds)
==1559781== by 0x35BD60: brpc::Server::Start(char const*, brpc::ServerOptions const*) (in /usr/bin/curve-mds)
==1559781== by 0x19442B: curve::mds::MDS::StartServer() (in /usr/bin/curve-mds)
==1559781== by 0x194A61: curve::mds::MDS::Run() (in /usr/bin/curve-mds)
==1559781==
**==1559781== 85,608 bytes in 4,125 blocks are definitely lost in loss record 2,333 of 2,367
==1559781== at 0x4C2BBAF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1559781== by 0x56D6863: _cgo_728933f4f8ea_Cfunc__Cmalloc (_cgo_export.c:502)
==1559781== by 0x51A9817: runtime.asmcgocall (/home/xuchaojie/github/curve/thirdparties/etcdclient/tmp/go/src/runtime/asm_amd64.s:635)
==1559781== by 0xC00000077F: ???
==1559781== by 0x300000005: ???**
==1559781==
==1559781== LEAK SUMMARY:
==1559781== definitely lost: 85,655 bytes in 4,128 blocks
==1559781== indirectly lost: 0 bytes in 0 blocks
==1559781== possibly lost: 94,392 bytes in 125 blocks
==1559781== still reachable: 27,012,904 bytes in 13,518 blocks
==1559781== of which reachable via heuristic:
==1559781== newarray : 3,136 bytes in 2 blocks
==1559781== multipleinheritance: 1,616 bytes in 1 blocks
==1559781== suppressed: 0 bytes in 0 blocks
==1559781== Reachable blocks (those to which a pointer was found) are not shown.
==1559781== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==1559781==
==1559781== For counts of detected and suppressed errors, rerun with: -v
==1559781== ERROR SUMMARY: 10331 errors from 988 contexts (suppressed: 0 from 0)
Chunkserver did not use jemalloc from the beginning, and initially used the default ptmalloc. Changing to jemalloc is the problem that the memory of Chunkserver cannot be released during the test process mentioned at the beginning of this article . The phenomenon of this problem is: the memory of the chunkserver increases within 2 but quickly, a total of about 50 hours, and the next 50 hour release.
-
This is different from MDS, which is full of control requests and some metadata caches. First of all, the memory growth on the Chunkserver generally comes from two places: one is the request sent by the user, and the other is the data synchronization between the leader and the follower of the replica set. These two involve brpc. The memory management of brpc has two IOBuf and ResourcePool. The space module in IOBuf is generally used to store user data. ResourcePool module socket, bthread_id and other objects manage memory object units of 64K. The detailed data structure of this module will not be explained here. brpc friends can read the document: brpc -innternal, ResourcePool can see two source codes.
-
Look at the metrics of the trend indicators IOBuf and ResourcePool of these two modules to discover different memory sizes in different ways. IOBuf will return the occupied memory to the allocated memory, and the management trend in ResourcePool will not be returned to ptmalloc and managed by itself.
-
Combined with the memory allocation strategy in Section 2, if the space at the top of the heap has been occupied, the space below the top of the heap cannot be released. You can still look at the size of memory used on the current heap and the permissions of the memory (if there are many —-p memory) to determine what you want. Therefore, jemalloc can be used in the following Chunkserver. The problem.
Here is an MDS, two different problems in the project code and Chunkserver, I want to select different memory allocators for different situations and description curves of the project. , If you have a good experience with more memory, start to decide and solve it, but if not, you can choose first, or analyze it after evaluation; it is a problem that needs to be improved to find a solution; you can give a small Partners solve some ideas~