Why Redis snapshots use subprocesses

Why's THE Design is a series of articles about programming decisions in the computer field. In each article in this series, we will ask a specific question and discuss the advantages and disadvantages of this design from different angles. The impact on specific implementation. If you have any questions you want to know, you can leave a message below the article.

Although we often regard Redis as a pure memory key-value storage system, we will also use its persistence function. RDB and AOF are the two persistence tools provided by Redis, among which RDB is Redis data snapshot In this article, we want to analyze why Redis needs to use sub-processes when persisting data in a snapshot, instead of directly exporting the data structure in memory to disk for storage.

Overview

Before analyzing today’s problem in detail, we first need to understand what Redis's persistent storage mechanism RDB is. RDB will take snapshots of the current data set in the Redis service at regular intervals, except for the Redis configuration file, which can be used for snapshots. In addition to setting the interval, the Redis client also provides two commands to generate the RDB storage file at the same time, that is, SAVE and BGSAVE. We can guess the difference between the two commands through the name of the command.

save-and-bgsave

The SAVE command will directly block the current thread when it is executed. Because Redis is single-threaded , the SAVE command will directly block all other requests from the client. This is often unacceptable for Redis services that need to provide strong availability guarantees. .

We often need to BGSAVE command to generate the RDB file corresponding to all Redis data in the background. When we use the BGSAVE command, Redis will immediately spawn fork a child process, and the child process will execute the process of "saving the data in memory to disk in RDB format" , And the Redis service BGSAVE can still process requests from the client during work.

rdbSaveBackground It is used to handle the function that saves data to disk in the background:

int rdbSaveBackground(char *filename, rdbSaveInfo *rsi) {
    pid_t childpid;

    if (hasActiveChildProcess()) return C_ERR;
    ...

    if ((childpid = redisFork()) == 0) {
        int retval;

        /* Child */
        redisSetProcTitle("redis-rdb-bgsave");
        retval = rdbSave(filename,rsi);
        if (retval == C_OK) {
            sendChildCOWInfo(CHILD_INFO_TYPE_RDB, "RDB");
        }
        exitFromChild((retval == C_OK) ? 0 : 1);
    } else {
        /* Parent */
        ...
    }
    ...
}

The Redis server will BGSAVE call a redisFork function when triggered to create a child process and call rdbSave to persist the data in the child process. Although we have omitted some content in the function here, the overall structure is still very clear. Interested readers can Click on the link above to learn about the implementation of the entire function.

fork The purpose of use must ultimately be to improve the availability of Redis services without blocking the main process, but here we can actually find two problems:

Why can the fork subsequent child process get the data in the memory of the parent process?
fork Does the function bring additional performance overhead, and how can we avoid these overheads?

Since Redis has chosen to use the fork method to solve the problem of snapshot persistence, it means that these two questions have been answered. First, the fork subsequent child process can obtain the data in the parent process memory, which fork brings additional performance overhead. Compared to blocking the main thread, it must be acceptable. Only when both of these points are present, Redis will ultimately choose such a solution.

design

In order to analyze the two issues raised in the previous section, we need to understand the following contents here, which are fork the prerequisites for the Redis server to use functions, and are also the key to ultimately spur it to choose this implementation:

The fork generated parent and child processes will share resources including memory space;
fork Function does not bring significant performance overhead, especially for a large number of copies of memory, it can postpone the work of copying memory until it is really needed by copy-on-write;

Child process

In the field of computer programming, especially in Unix and Unix-like systems, fork a process is used to create its own copy operation. It is often a system call implemented by the operating system kernel, and it is also the operating system that creates a new process in the *nix system. The main method.

fork-and-processes

When the program calls the fork method, we can fork determine the parent-child process by the return value, in order to perform different operations:

fork When the function returns 0, it means that the current process is a child process;
fork When the function returns non-zero, it means that the current process is the parent process and the return value is the child process pid;

int main() {
    if (fork() == 0) {
        // child process
    } else {
        // parent process
    }
}

In fork the manual , we will find that fork the parent and child processes after the call will run in different memory spaces. When this fork occurs, the memory spaces of the two have exactly the same content, and the writing and modification of the memory, and the mapping of the files are independent Yes, the two processes will not affect each other.

The child process and the parent process run in separate memory spaces. At the time of fork() both memory spaces have the same content. Memory writes, file mappings (mmap(2)), and unmappings (munmap(2)) performed by one of the processes do not affect other.

In addition, the child process is almost a complete copy of the parent process (Exact duplicate), but these two processes will have minor differences in the following aspects:

The child process is used for an independent and unique process ID;
The parent process ID of the child process is exactly the same as the parent process ID;
The child process will not inherit the memory lock of the parent process;
The child process will reset the process resource utilization and CPU timer;
…

The most important point is that the memory of the parent and child processes fork are exactly the same at all times, and fork subsequent writes and modifications will not affect each other. This actually solves the problem of the snapshot scene perfectly-only a certain point in time The data in the memory, and the parent process can continue to modify its own memory, which will neither be blocked nor affect the generated snapshot.

Copy-on-write

Since the parent process and the child process have exactly the same memory space and the two writes to the memory will not affect each other, does it mean that the child process fork needs to make a full copy of the parent process's memory at the time? Assuming that the child process needs to copy the memory of the parent process, this is basically disastrous for the Redis service, especially in the following two scenarios:

A large amount of data is stored in the memory, fork and copying the memory space from time to time will consume a lot of time and resources, which will cause the program to be unavailable for a period of time;
Redis occupies 10G of memory, and the resource upper limit of the physical machine or virtual machine is only 16G. At this time, we cannot persist the data in Redis, which means that the maximum utilization of Redis on the machine's memory resources cannot exceed 50 %;

If the above two problems cannot be solved, the method used fork to generate memory mirroring cannot be implemented, and it is not a method that can be used in a project.

Even if it is out of the Redis scenario, fork it is unacceptable to copy the full amount of memory. Assuming we need to execute a command on the command line, we need to fork create a new process and then exec execute the program. fork The large amount of memory space copied is for the child process. It may not have any effect at all, but it introduces huge additional overhead.

Copy (Copy-on-Write) appears when writing is to solve this problem, as we introduced at the beginning of this section, the main role is to copy-on-write copy when postponed to write really happened , which also Avoid a lot of meaningless copy operations. On some early *nix systems, the system call fork does immediately copy the memory space of the parent process, but in most systems today, fork this process is not triggered immediately:

process-shared-memory

When the fork function is called, the parent process and the child process will be allocated to different virtual memory spaces by the Kernel, so in the eyes of the two processes, they are accessing different memory:

When actually accessing the virtual memory space, Kernel maps the virtual memory to the physical memory, so the parent and child processes share the physical memory space;
When the parent process or the child process modifies the shared memory, the shared memory will be copied in units of pages , the parent process will retain the original physical space, and the child process will use the copied new physical space;

In the Redis service, the child process only reads the data in the shared memory, it does not perform any write operations, only the parent process will trigger this mechanism when writing, and for most Redis services or databases , Write requests are often much smaller than read requests, so the use fork of the copy-on-write mechanism can bring very good performance and make BGSAVE the implementation of this operation very simple.

to sum up

Redis implements background snapshots in a very clever way fork . This function is easily achieved through the features provided by the operating system and copy-on-write. From here we can see that the author has a very solid grasp of operating system knowledge, and most people are in When faced with a similar scenario, the method that comes to mind may be to manually implement a feature similar to "copy-on-write", but this not only increases the workload, but also increases the possibility of program problems.

So far, let's briefly summarize why Redis uses RDB to take snapshots through sub-processes:

The fork created child process can obtain the same memory space as the parent process. The memory modification of the parent process is invisible to the child process, and the two will not affect each other;
By fork creating a child process, the copy of a large amount of memory will not be triggered immediately, and the memory will be copied in units of pages when it is modified, which also avoids the performance problems caused by a large amount of copying memory;

Of the above two reasons, one provides support for the child process to access the parent process, and the other provides support for reducing additional overhead. Both are indispensable, which together become the reason why Redis uses child processes to achieve snapshot persistence. In the end, let's look at some more open related issues. Interested readers can carefully consider the following issues:

The main process of Nginx will run fork a group of sub-processes, these sub-processes can handle requests separately, what other services will use this feature?
Copy-on-write is actually a relatively common mechanism. Where else would it be used outside of Redis?