[Linux] Several situations where the cache is not released

foreword

In the Linux system, we often use the free command to check the usage status of the system memory. On a RHEL6 system, the display content of the free command is roughly as follows:

[root@tencent64 ~]# free
             total       used       free     shared    buffers     cached
Mem:     132256952   72571772   59685180          0    1762632   53034704
-/+ buffers/cache:   17774436  114482516
Swap:      2101192        508    2100684

The default display unit here is kb, and my server has 128G memory, so the number appears relatively large. This command is an inevitable command for almost everyone who has used Linux, but the more such a command is, the fewer people seem to really understand it (I mean the smaller the proportion). In general, the understanding of the output of this command can be divided into several levels:

do not understand. The first reaction of such a person is: Oh my god, the memory is used up a lot, more than 70 G, but I hardly run any big programs? why? Linux occupies a lot of memory!

I thought I knew it well. Such a person will usually say after self-study evaluation: Well, according to my professional vision, the memory is only used about 17G, and there is still a lot of remaining memory available. Buffers/cache occupies a lot, indicating that some processes in the system have read and written files, but it doesn't matter, this part of memory is used when it is free.

Really understand. The reaction of this kind of people makes people feel that they don't understand Linux the most. Their reaction is: this is what free shows, okay, I know. What? You ask me if the memory is enough, of course I don't know! How the hell do I know how your program is written?

According to the content of technical documents on the Internet, I believe that the vast majority of people who know a little about Linux should be at the second level. It is generally believed that the memory space occupied by buffers and cached can be released as free space when the memory pressure is high. But is that really the case? Before discussing this topic, let's briefly introduce what buffers and cached mean:

What is buffer/cache?

Buffer and cache are two terms that are overused in computer technology, and they have different meanings in different contexts. In Linux memory management, the buffer here refers to Linux memory: Buffer cache. The cache here refers to the Linux memory: Page cache. Translated into Chinese can be called buffer cache and page cache. Historically, one of them (buffer) was used as a cache for writing to io devices, and the other (cache) was used as a read cache for io devices. The io devices here mainly refer to block device files and Ordinary files on the file system. But now, they have different meanings. In the current kernel, page cache, as the name suggests, is a cache for memory pages. To put it bluntly, if there is memory allocated and managed by pages, page cache can be used as its cache for management and use. Of course, not all memory is managed by pages, and many of them are managed by blocks. If the cache function is used for this part of memory, it will be concentrated in the buffer cache for use. (From this point of view, is it better to change the name of the buffer cache to block cache?) However, not all blocks (blocks) have a fixed length. The length of the block on the system is mainly determined by the block device used, while the page The length is 4k on X86 whether it is 32-bit or 64-bit.

After understanding the difference between these two caching systems, you can understand what they can be used for.

what is page cache

Page cache is mainly used as a cache of file data on the file system, especially when the process has read/write operations on the file. If you think about it, isn't it natural that mmap, as a system call that can map a file into memory, should also use the page cache? In the current system implementation, the page cache is also used as a cache device for other file types, so in fact the page cache is also responsible for caching most of the block device files.

what is buffer cache

Buffer cache is mainly designed to be used by a system that caches data on blocks when the system reads and writes to block devices. This means that certain operations on blocks will be cached using the buffer cache, such as when we format the file system. Generally, the two cache systems are used together. For example, when we write to a file, the content of the page cache will be changed, and the buffer cache can be used to mark the page as a different buffer, and Record which buffer was modified. In this way, when the kernel executes the writeback of dirty data (writeback), it does not need to write back the entire page, but only needs to write back the modified part.

How to reclaim the cache?

The Linux kernel will trigger the work of memory recovery when the memory is about to be exhausted, so as to free up memory for processes that urgently need memory. In general, the main memory release in this operation comes from the release of buffer/cache. Especially when more cache space is used. Since it is mainly used for caching, it only speeds up the process's reading and writing speed of files when the memory is sufficient, so when the memory pressure is high, it is of course necessary to clear the cache and distribute it to related processes as free space. So in general, we think that the buffer/cache space can be released, and this understanding is correct.

But this cache clearing work is not without cost. Understand what the cache does, and you can understand that the cache must be cleared to ensure that the data in the cache is consistent with the data in the corresponding file before the cache can be released. Therefore, the behavior of clearing the cache is usually accompanied by the high IO of the system. Because the kernel needs to compare whether the data in the cache is consistent with the data on the corresponding hard disk file, if they are inconsistent, they need to be written back, and then they can be recycled.

In addition to clearing the cache when the memory will be exhausted in the system, we can also use the following file to manually trigger the cache clearing operation:

[root@tencent64 ~]# cat /proc/sys/vm/drop_caches 

the way is:

echo 1 > /proc/sys/vm/drop_caches
Of course, the values ​​that can be set in this file are 1, 2, and 3 respectively. The meaning they represent is: echo 1 > /proc/sys/vm/drop_caches: means to clear the pagecache.

echo 2 > /proc/sys/vm/drop_caches

Indicates to clear and recycle objects in the slab allocator (including directory entry cache and inode cache). The slab allocator is a mechanism for managing memory in the kernel, and many cached data implementations use pagecache.

echo 3 > /proc/sys/vm/drop_caches

Indicates to clear the cache objects in the pagecache and slab allocator.

Can the cache be recycled?

We have analyzed the situation that the cache can be recycled, so is there any cache that cannot be recycled? Of course there is. Let's look at the first case first:

tmpfs

Everyone knows that Linux provides a "temporary" file system called tmpfs, which can use part of the memory space as a file system, so that the memory space can be used as a directory file. Most Linux systems now have a tmpfs directory called /dev/shm, which is such an existence. Of course, we can also create our own tmpfs manually, as follows:

[root@tencent64 ~]# mkdir /tmp/tmpfs
[root@tencent64 ~]# mount -t tmpfs -o size=20G none /tmp/tmpfs/

[root@tencent64 ~]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda1             10325000   3529604   6270916  37% /
/dev/sda3             20646064   9595940  10001360  49% /usr/local
/dev/mapper/vg-data  103212320  26244284  71725156  27% /data
tmpfs                 66128476  14709004  51419472  23% /dev/shm
none                  20971520         0  20971520   0% /tmp/tmpfs

So we created a new tmpfs with a space of 20G, and we can create a file within 20G in /tmp/tmpfs. If the actual space occupied by the file we created is memory, what part of the memory space should these data occupy? According to the implementation function of pagecache, it can be understood that since it is some kind of file system, it is natural to use the space of pagecache for management. Shall we try it?

[root@tencent64 ~]# free -g
             total       used       free     shared    buffers     cached
Mem:           126         36         89          0          1         19
-/+ buffers/cache:         15        111
Swap:            2          0          2
[root@tencent64 ~]# dd if=/dev/zero of=/tmp/tmpfs/testfile bs=1G count=13
13+0 records in
13+0 records out
13958643712 bytes (14 GB) copied, 9.49858 s, 1.5 GB/s
[root@tencent64 ~]# 
[root@tencent64 ~]# free -g
             total       used       free     shared    buffers     cached
Mem:           126         49         76          0          1         32
-/+ buffers/cache:         15        110
Swap:            2          0          2

We created a 13G file in the tmpfs directory, and through the comparison of the free command before and after, we found that cached has increased by 13G, indicating that this file is indeed placed in the memory and the kernel uses cache as storage. Look at the indicator we care about: -/+ buffers/cache line. We found that in this case the free command still prompts us that 110G memory is available, but is there really so much? We can manually trigger memory reclamation to see how much memory can be reclaimed now:

[root@tencent64 ~]# echo 3 > /proc/sys/vm/drop_caches
[root@tencent64 ~]# free -g
             total       used       free     shared    buffers     cached
Mem:           126         43         82          0          0         29
-/+ buffers/cache:         14        111
Swap:            2          0          2

It can be seen that the space occupied by cached has not been completely released as we imagined, and the 13G space is still occupied by files in /tmp/tmpfs. Of course, there are other non-releasable caches in my system occupying the remaining 16G memory space. So when will the cache space occupied by tmpfs be released? It is when the file is deleted. If the file is not deleted, no matter how much memory is exhausted, the kernel will not automatically delete the file in tmpfs to free up cache space.

[root@tencent64 ~]# rm /tmp/tmpfs/testfile 
[root@tencent64 ~]# free -g
             total       used       free     shared    buffers     cached
Mem:           126         30         95          0          0         16
-/+ buffers/cache:         14        111
Swap:            2          0          2

This is the first case we analyzed where the cache cannot be recycled. There are other situations, such as:

Shared memory

Shared memory is a common inter-process communication (IPC) method provided by the system, but this communication method cannot be applied and used in the shell, so we need a simple test program, the code is as follows:

[root@tencent64 ~]# cat shm.c 

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <string.h>

#define MEMSIZE 2048*1024*1023

int
main()
{
    
    
    int shmid;
    char *ptr;
    pid_t pid;
    struct shmid_ds buf;
    int ret;

    shmid = shmget(IPC_PRIVATE, MEMSIZE, 0600);
    if (shmid<0) {
    
    
        perror("shmget()");
        exit(1);
    }
     
    ret = shmctl(shmid, IPC_STAT, &buf);
    if (ret < 0) {
    
    
        perror("shmctl()");
        exit(1);
    }
     
    printf("shmid: %d\n", shmid);
    printf("shmsize: %d\n", buf.shm_segsz);
     
    buf.shm_segsz *= 2;
     
    ret = shmctl(shmid, IPC_SET, &buf);
    if (ret < 0) {
    
    
        perror("shmctl()");
        exit(1);
    }
     
    ret = shmctl(shmid, IPC_SET, &buf);
    if (ret < 0) {
    
    
        perror("shmctl()");
        exit(1);
    }
     
    printf("shmid: %d\n", shmid);
    printf("shmsize: %d\n", buf.shm_segsz);

 

    pid = fork();
    if (pid<0) {
    
    
        perror("fork()");
        exit(1);
    }
    if (pid==0) {
    
    
        ptr = shmat(shmid, NULL, 0);
        if (ptr==(void*)-1) {
    
    
            perror("shmat()");
            exit(1);
        }
        bzero(ptr, MEMSIZE);
        strcpy(ptr, "Hello!");
        exit(0);
    } else {
    
    
        wait(NULL);
        ptr = shmat(shmid, NULL, 0);
        if (ptr==(void*)-1) {
    
    
            perror("shmat()");
            exit(1);
        }
        puts(ptr);
        exit(0);
    }
}

The function of the program is very simple. It is to apply for a shared memory of less than 2G, and then open a child process to initialize the shared memory. After the child process is initialized, the parent process outputs the content of the shared memory, and then exits. But this shared memory is not deleted before exiting. Let's take a look at the memory usage of this program before and after execution:

[root@tencent64 ~]# free -g
             total       used       free     shared    buffers     cached
Mem:           126         30         95          0          0         16
-/+ buffers/cache:         14        111
Swap:            2          0          2
[root@tencent64 ~]# ./shm 
shmid: 294918
shmsize: 2145386496
shmid: 294918
shmsize: -4194304
Hello!
[root@tencent64 ~]# free -g
             total       used       free     shared    buffers     cached
Mem:           126         32         93          0          0         18
-/+ buffers/cache:         14        111
Swap:            2          0          2
# cached空间由16G涨到了18G。那么这段cache能被回收么?继续测试:

[root@tencent64 ~]# echo 3 > /proc/sys/vm/drop_caches
[root@tencent64 ~]# free -g
             total       used       free     shared    buffers     cached
Mem:           126         32         93          0          0         18
-/+ buffers/cache:         14        111
Swap:            2          0          2

The result is still not recyclable. You can observe that even if no one uses this shared memory, it will still be stored in the cache for a long time until it is deleted. There are two ways to delete, one is to use shmctl() to go to IPC_RMID in the program, and the other is to use the ipcrm command. Let's delete it:

[root@tencent64 ~]# ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      
0x00005feb 0          root       666        12000      4                       
0x00005fe7 32769      root       666        524288     2                       
0x00005fe8 65538      root       666        2097152    2                       
0x00038c0e 131075     root       777        2072       1                       
0x00038c14 163844     root       777        5603392    0                       
0x00038c09 196613     root       777        221248     0                       
0x00000000 294918     root       600        2145386496 0                       

[root@tencent64 ~]# ipcrm -m 294918
[root@tencent64 ~]# ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      
0x00005feb 0          root       666        12000      4                       
0x00005fe7 32769      root       666        524288     2                       
0x00005fe8 65538      root       666        2097152    2                       
0x00038c0e 131075     root       777        2072       1                       
0x00038c14 163844     root       777        5603392    0                       
0x00038c09 196613     root       777        221248     0                       

[root@tencent64 ~]# free -g
             total       used       free     shared    buffers     cached
Mem:           126         30         95          0          0         16
-/+ buffers/cache:         14        111
Swap:            2          0          2

After deleting the shared memory, the cache is released normally. This behavior is similar to the logic of tmpfs. The bottom layer of the kernel uses tmpfs when implementing the memory storage of the POSIX:XSI IPC mechanism of shared memory (shm), message queue (msg) and semaphore array (sem). This is why the operation logic of shared memory is similar to tmpfs. Of course, in general, shm takes up more memory, so we emphasize the use of shared memory here. Speaking of shared memory, Linux also provides us with another method of shared memory, mmap.

mmap

mmap() is a very important system call, which cannot be seen from the functional description of mmap itself. Literally, mmap is to map a file into the virtual memory address of the process, and then the contents of the file can be manipulated by manipulating the memory. But in fact this call has a wide range of uses. When malloc applies for memory, the small segment memory kernel uses sbrk for processing, while the large segment memory uses mmap. When the system calls the exec family function to execute, because it essentially loads an executable file into the memory for execution, the kernel can naturally use the mmap method for processing. We only consider one situation here, that is, when using mmap to apply for shared memory, will it also use cache like shmget()?

Similarly, we also need a simple test program:

[root@tencent64 ~]# cat mmap.c 
#include <stdlib.h>
#include <stdio.h>
#include <strings.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>

#define MEMSIZE 1024*1024*1023*2
#define MPFILE "./mmapfile"

int main()
{
    
    
 void *ptr;
 int fd;

 fd = open(MPFILE, O_RDWR);
 if (fd < 0) {
    
    
  perror("open()");
  exit(1);
 }

 ptr = mmap(NULL, MEMSIZE, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANON, fd, 0);
 if (ptr == NULL) {
    
    
  perror("malloc()");
  exit(1);
 }

 printf("%p\n", ptr);
 bzero(ptr, MEMSIZE);

 sleep(100);

 munmap(ptr, MEMSIZE);
 close(fd);

 exit(1);
}

This time we simply don’t use the method of parent-child process, just one process, apply for a 2G mmap shared memory, and then wait 100 seconds after initializing this space, and then release the mapping, so we need to check our within 100 seconds of its sleep System memory usage, see what space it uses? Of course, a 2G file ./mmapfile must be created before this. The result is as follows:

[root@tencent64 ~]# dd if=/dev/zero of=mmapfile bs=1G count=2
[root@tencent64 ~]# echo 3 > /proc/sys/vm/drop_caches
[root@tencent64 ~]# free -g
             total       used       free     shared    buffers     cached
Mem:           126         30         95          0          0         16
-/+ buffers/cache:         14        111
Swap:            2          0          2

Then execute the test program:

[root@tencent64 ~]# ./mmap &
[1] 19157
0x7f1ae3635000
[root@tencent64 ~]# free -g
             total       used       free     shared    buffers     cached
Mem:           126         32         93          0          0         18
-/+ buffers/cache:         14        111
Swap:            2          0          2

[root@tencent64 ~]# echo 3 > /proc/sys/vm/drop_caches
[root@tencent64 ~]# free -g
             total       used       free     shared    buffers     cached
Mem:           126         32         93          0          0         18
-/+ buffers/cache:         14        111
Swap:            2          0          2

We can see that during the execution of the program, the cache has always been 18G, which is 2G higher than before, and this cache still cannot be recycled at this time. Then we wait 100 seconds before the program ends.

[root@tencent64 ~]# 
[1]+  Exit 1                  ./mmap
[root@tencent64 ~]# 
[root@tencent64 ~]# free -g
             total       used       free     shared    buffers     cached
Mem:           126         30         95          0          0         16
-/+ buffers/cache:         14        111
Swap:            2          0          2

After the program exits, the space occupied by cached is released. In this way, we can see that using mmap to apply for the memory whose status is MAP_SHARED, the kernel also uses the cache for storage. Before the process releases the relevant memory, this cache cannot be released normally. In fact, the memory requested by mmap's MAP_SHARED method is also implemented by tmpfs in the kernel. From this, we can also speculate that since the read-only part of the shared library is managed in the memory by the MAP_SHARED method of mmap, in fact, they all occupy the cache and cannot be released.

at last

Through three test examples, we found that the cache in the Linux system memory cannot be released as free space in all cases. And it is also clear that even if the cache can be released, it is not without cost to the system. To summarize the main points, we should remember the following points:

  • When the cache is released as a file cache, it will cause high IO, which is the cost of the cache to speed up file access.

  • The files stored in tmpfs will occupy the cache space, and the cache will not be released automatically unless the files are deleted.

  • The shared memory requested by using shmget will occupy the cache space, unless the shared memory is ipcrm or shmctl is used to IPC_RMID, otherwise the related cache space will not be released automatically.

  • The memory with the MAP_SHARED flag applied by the mmap method will occupy the cache space. Unless the process munmaps this memory, the related cache space will not be released automatically.

In fact, the shared memory of shmget and mmap is realized by tmpfs at the kernel layer, and the storage realized by tmpfs is all cache.

When you understand these, I hope everyone's understanding of the free command can reach the third level we mentioned. We should understand that the use of memory is not a simple concept, and cache cannot really be used as free space. If we really want to deeply understand whether the memory usage on your system is reasonable, we need to understand a lot of more detailed knowledge and make more detailed judgments on the implementation of related businesses. Our current experimental scenario is the environment of Centos 6. The free reality of different versions of Linux may be different. You can find out the different reasons by yourself.

Of course, what is described in this article is not all the situations where the cache cannot be released. So, in your application scenario, what about those scenarios where the cache cannot be released?

Original link: https://blog.csdn.net/m0_71777195/article/details/128325248

Guess you like

Origin blog.csdn.net/imliuqun123/article/details/130149771