Detailed explanation of JuiceFS caching strategy

For a file system driven by a combination of object storage and database, caching is an important link for efficient interaction between local clients and remote services. The read and write data can be loaded into the cache in advance or asynchronously, and then the client interacts with the remote service in the background to perform asynchronous upload or prefetch data. Compared with directly interacting with remote services, the use of caching technology can greatly reduce the latency of storage operations and improve data throughput.

data consistency

JuiceFS provides a "close-to-open" consistency guarantee, that is, when two or more clients read and write the same file at the same time, client A's modifications may not be immediately visible to client B. However, once the file is written and closed on client A, reopening the file on any client afterwards guarantees access to the newly written data, whether on the same node or not.

"Close and reopen" is the minimum consistency guarantee provided by JuiceFS. In some cases, it may not be necessary to reopen the file to access the latest written data. For example, multiple applications use the same JuiceFS client to access the same file (file changes are immediately visible), or to view the latest data through tail -fcommands .

metadata cache

JuiceFS supports caching metadata in kernel and client memory (ie JuiceFS process) to improve metadata access performance.

Kernel metadata cache

There are three kinds of metadata that can be cached in the kernel: attribute , entry and direntry. The cache time can be controlled by the following mount parameters:

--attr-cache value       属性缓存时长,单位秒 (默认值: 1)
--entry-cache value      文件项缓存时长,单位秒 (默认值: 1)
--dir-entry-cache value  目录项缓存时长,单位秒 (默认值: 1)

JuiceFS caches attributes, file items and directory items in the kernel by default for 1 second to improve the performance of lookup and getattr. When clients of multiple nodes use the same file system at the same time, the metadata cached in the kernel can only be invalidated through time. That is to say, in extreme cases, node A may modify the metadata of a file (eg chown), and access through node B cannot see the update immediately. Of course, after the cache expires, all nodes will eventually see the modifications made by A.

Client-side in-memory metadata cache

Note : This feature requires JuiceFS version 0.15.0 and above.

open()When a JuiceFS client opens a file, its file attributes are automatically cached in client memory. If the --open-cacheoption value greater than 0, subsequent getattr()and open()operations will return the result from the in-memory cache immediately, as long as the cache has not expired.

When performing read()an operation , namely reading a file, the chunk and slice information of the file will be automatically cached in the client memory. During the validity period of the cache, reading the chunk again will immediately return the slice information from the memory cache.

Tip : You can read "How JuiceFS Stores Files" to learn what chunks and slices are.

By default, for a file whose metadata has been cached in memory, if it has not been accessed by any process for more than 1 hour, all its metadata cache will be automatically deleted.

data cache

JuiceFS also provides various caching mechanisms for data to improve performance, including the page cache in the kernel and the local cache of the node where the client is located.

kernel data cache

Note : This feature requires JuiceFS version 0.15.0 and above.

For a file that has been read, the kernel will automatically cache its content, and then open the file. If the file has not been updated (that is, the mtime has not been updated), the file can be directly read from the cache in the kernel to obtain best performance. Thanks to the kernel cache, repeated reads of the same file in JuiceFS can be very fast, with latency as low as microseconds, and throughput in GiBs per second.

The JuiceFS client has not yet enabled the kernel's write cache function by default. Since Linux kernel 3.15, FUSE supports "writeback-cache mode" which means that write()system . You can enable writeback-cache mode by setting the -o writeback_cacheoption . It is recommended to enable this mount option when very small data (eg around 100 bytes) needs to be written frequently.

Client read cache

The JuiceFS client automatically pre-reads data into the cache according to the read mode, thereby improving the performance of sequential reads. By default, when reading data, 1 block is concurrently prefetched and cached locally. Local cache can be set up in any local file system based on hard disk, SSD or memory.

The local cache can be adjusted with the following options when mounting the filesystem:

--prefetch value          并发预读 N 个块 (默认: 1)
--cache-dir value         本地缓存目录路径;使用冒号隔离多个路径 (默认: "$HOME/.juicefs/cache" 或 "/var/jfsCache")
--cache-size value        缓存对象的总大小;单位为 MiB (默认: 1024)
--free-space-ratio value  最小剩余空间比例 (默认: 0.1)
--cache-partial-only      仅缓存随机小块读 (默认: false)

Additionally, if you want to store JuiceFS's local cache in memory there are two ways, one is to --cache-dirset it to memory, the other is to set it to /dev/shm/<cache-dir>. The difference between the two methods is that the former will clear the cached data after remounting the JuiceFS file system, while the latter will keep it, and there is not much difference between the two in terms of performance.

The JuiceFS client will write the data downloaded from the object storage (including newly uploaded data less than 1 block size) to the cache directory as quickly as possible without compression and encryption. Because JuiceFS generates unique names for all block objects written to the object store, and all block objects will not be modified, there is no need to worry about cached data invalidation when the file content is updated.

When the cache space reaches the upper limit (that is, the size of the cache is greater than or equal to --cache-size) or the disk is full (that is, the proportion of free space on the disk is less than that --free-space-ratio), it will be automatically cleaned up. The current rule is to first clean up infrequently accessed files according to the access time.

Data caching can effectively improve random read performance. For applications that require higher random read performance, such as Elasticsearch and ClickHouse, it is recommended to set the cache path on a faster storage medium and allocate a larger cache space.

Client write cache

When writing data, the JuiceFS client will cache the data in memory close(), fsync()and the data will not be uploaded to object storage until a chunk is filled or forced by or . When calling fsync()or close(), the client waits until the data is written to the object store and notifies the metadata service before returning, thus ensuring data integrity.

In some cases, if the local storage is reliable, and the write performance of the local storage is significantly better than that of the network write (such as SSD disk), the write performance can be improved by enabling asynchronous upload of data, so that the close()operation is not Will wait for the data to be written to the object store, but return when the data is written to the local cache directory.

The asynchronous upload feature is disabled by default and can be enabled with the following options:

--writeback  后台异步上传对象 (默认: false)

When a large number of small files need to be written in a short period of time, it is recommended to use the --writebackparameter to mount the file system to improve the writing performance. After the writing is completed, you can consider canceling this option and remounting to obtain higher reliability of subsequent written data. In addition, it is also recommended to enable scenarios that require a large number of random write operations, such as incremental backups of MySQL --writeback.

Warning : When the asynchronous upload is enabled, that is, --writebackwhen , do not delete <cache-dir>/<UUID>/rawstagingthe contents of the directory, otherwise it will cause data loss.

When the cache disk is about to be full, data writing will be suspended and data will be uploaded directly to the object storage instead (that is, the client-side write cache function will be turned off). When the asynchronous upload function is enabled, the reliability of the cache itself is directly related to the reliability of data writing. It should be used with caution in scenarios that require high data reliability.

Summarize

Finally, share a question that users often ask ** "Why is the cache size set to 50 GiB, but it actually takes up 60 GiB of space?"**

For the same amount of cached data, there are different capacity calculation rules on different file systems. JuiceFS is currently estimated by accumulating the size of all cached objects and adding a fixed overhead (4KiB), which is not exactly the same as the value obtained by the ducommand . To prevent the cache disk from being full, when the file system where the cache directory is located runs out of space, the client will try to reduce the cache usage as much as possible.

Through the above introduction, we have a further understanding of the principle of JuiceFS's caching mechanism. JuiceFS itself, as the underlying file system, provides various caching mechanisms including metadata cache, data read and write cache, etc., to ensure the consistency of data to the greatest extent. I hope that you can better apply JuiceFS through the understanding of this article.

Recommended reading: Zhihu x JuiceFS: Using JuiceFS to Accelerate Flink Container Startup

If it is helpful, please follow our project Juicedata/JuiceFS ! (0ᴗ0✿)

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324175629&siteId=291194637