JuiceFS novice must know 24 questions

JuiceFS is an innovative software product. Many friends who try it for the first time feel a lot of doubts about the product and its usage, so in order to help you quickly understand and get started with JuiceFS, we have compiled 24 answers to classic questions about JuiceFS. I believe that after these 24 questions , everyone will have a clearer understanding of JuiceFS, and it will be more convenient to use.

1. What are the basic capabilities of JuiceFS

JuiceFS is a high-performance shared file system designed for cloud native, released under the Apache 2.0 open source agreement. It provides complete POSIX compatibility, and can connect almost all object storage locally as a massive local disk, and can also mount and read on different hosts across platforms and regions at the same time.

2. How does JuiceFS perform?

JuiceFS is a distributed file system. The latency of metadata access depends on 1 to 2 network round trips (usually 1-3 ms) between the mount point and the server. The latency of data access depends on the latency of object storage. (usually 20-100 ms). The throughput of sequential read and write can reach 50MiB/s to 2800MiB/s (see fio test results ), depending on the network bandwidth and whether the data is easy to be compressed.

JuiceFS has a built-in multi-level cache (active invalidation). Once the cache is warmed up, the access latency and throughput are very close to the performance of a stand-alone file system (FUSE will bring a small amount of overhead).

3. Necessary conditions for the operation of JuiceFS

Before running, the metadata engine and object storage need to be prepared. The metadata engine stores metadata such as the name, size, and modification time of the file, while the object storage stores the content of the file.

The metadata engine currently supports: Redis, TiKV, MySQL, PostgreSQL, etc. For the current metadata engine support list and specific configuration, please refer to How to set the metadata engine document

Object storage supports more, and basically common object storage is supported, such as AWS S3, Alibaba Cloud OSS, Huawei Cloud OBS, Tencent Cloud COS, etc. In addition, for the convenience of testing, local disks are also specially supported as object storage. The current object storage support list and specific configuration refer to how to set up object storage documentation

4. Steps to use JuiceFS

The steps to use are very simple. There are two steps, the first step is to format, and the second step is to mount it locally. The following is an example of using JuiceFS and Redis to mount Alibaba Cloud OSS locally:

# 1. 格式化一个文件系统
juicefs format \
--storage oss \
--bucket https://zhijian-dev.oss-cn-hangzhou.aliyuncs.com \
--access-key xxxx \ 
--secret-key xxxx \
redis://localhost:6379/1 \
test1 

# 2. 后台挂载文件系统到 /tmp/jfs 目录
juicefs mount -d redis://localhost:6379/1 /tmp/jfs

5. The fastest way to experience JuiceFS

I don't have Redis or object storage locally, can I experience JuiceFS? Of course, it is possible to pull up the JuiceFS service. Metadata engine and object storage are two components, but we can use the simplest of these two components. For example, the metadata engine uses the embedded database of SQLite, while the object storage uses the local disk (format When —bucketthe parameter is not filled, it is the default parameter. The default is to use the local disk as the object storage. The default storage path for root users is /var/jfs, and the default storage path for ordinary users is ~/.juicefs/local). In this way, you can experience JuiceFS products without any external components only by JuiceFS binaries.

# 1. 使用 sqlite 作为元数据引擎格式化文件系统 
juicefs format "sqlite3://my-jfs.db" test1

# 2. 后台挂载文件系统到 /tmp/jfs 目录
juicefs mount -d sqlite3://my-jfs.db /tmp/jfs

6. Can I mount it with a user `root`other than ?

Yes, JuiceFS can be mounted by any user. The default cache directory is $HOME/.juicefs/cache(macOS) or /var/jfsCache(Linux), please ensure that the user has write permission to this directory, or switch to another directory with permission. See "Client Read Caching" for more information.

7. How compatible is JuiceFS with the POSIX protocol?

JuiceFS uses Pjdfstest and LTP to verify its compatibility with POSIX. The final test result is that it has passed all the test cases in pjdfstest and most of the use cases in LTP.

8. What methods does JuiceFS support to access data besides ordinary mount

In addition to ordinary mounting, the following methods are also supported:

Kuberenetes CSI driver: Use JuiceFS as the storage layer of the Kubernetes cluster through the Kubernetes CSI driver. For details, please refer to "Kubernetes uses JuiceFS" .
Hadoop Java SDK: It is convenient to use Java client compatible with HDFS interface to access JuiceFS in Hadoop system. For details, please refer to "Hadoop using JuiceFS" .
S3 Gateway: Access JuiceFS through S3 protocol, please refer to "Configure JuiceFS S3 Gateway" for details .
Docker Volume plugin: It is convenient to use JuiceFS in Docker. For details, please refer to "Docker uses JuiceFS" .
WebDAV Gateway: Access JuiceFS via the WebDAV protocol

9. Does Redis in Sentinel or cluster mode support JuiceFS metadata engine?

Support, and here is a best practice article on Redis as the JuiceFS metadata engine for reference.

10. How to test the performance of JuiceFS

After Mounting JuiceFS to a local directory, execute the JuiceFS bench command on this directory. The JuiceFS bench command will perform large and small file read and write tests on this directory. for example:

# /tmp/jfs 是 JuiceFS 挂载在本地的目录
$ juicefs bench /tmp/jfs
Cleaning kernel cache, may ask for root privilege...
Password:
  Write big blocks count: 1024 / 1024 [==============================================================]  done
   Read big blocks count: 1024 / 1024 [==============================================================]  done
Write small blocks count: 100 / 100 [==============================================================]  done
 Read small blocks count: 100 / 100 [==============================================================]  done
  Stat small files count: 100 / 100 [==============================================================]  done
Benchmark finished!
BlockSize: 1 MiB, BigFileSize: 1024 MiB, SmallFileSize: 128 KiB, SmallFileCount: 100, NumThreads: 1
+------------------+-----------------+--------------+
|       ITEM       |      VALUE      |     COST     |
+------------------+-----------------+--------------+
|   Write big file |   1236.96 MiB/s |  0.83 s/file |
|    Read big file |   2962.88 MiB/s |  0.35 s/file |
| Write small file |  2277.4 files/s | 0.44 ms/file |
|  Read small file |  2753.0 files/s | 0.36 ms/file |
|        Stat file | 16603.3 files/s | 0.06 ms/file |
+------------------+-----------------+--------------+

juicefs benchThe command can also be used as a simple test after mount to quickly judge whether the JuiceFS service is normal. For more questions about JuiceFS performance testing, please see this performance evaluation guide document

11. How to test the compatibility and performance of object storage

Object storage is an important component of JuiceFS. The correctness and performance of object storage directly affect the correctness and performance of JuiceFS services. Therefore, when JuiceFS has problems, you can try to rule out the possibility of object storage problems. To facilitate testing, we have built-in juicefs objbenchcommands that can quickly test the correctness and performance of object storage. Example:

$ juicefs objbench --storage minio  http://127.0.0.1:9000/testbucket --access-key admin --secret-key admin123
Start Functional Testing ...
+----------+---------------------+-------------+
| CATEGORY |         TEST        |    RESULT   |
+----------+---------------------+-------------+
|    basic |     create a bucket |        pass |
|    basic |       put an object |        pass |
|    basic |       get an object |        pass |
|    basic |       get non-exist |        pass |
|    basic |  get partial object |        pass |
|    basic |      head an object |        pass |
|    basic |    delete an object |        pass |
|    basic |    delete non-exist |        pass |
|    basic |        list objects |        pass |
|     sync |    put a big object |        pass |
|     sync | put an empty object |        pass |
|     sync |    multipart upload |        pass |
|     sync |  change owner/group | not support |
|     sync |   change permission | not support |
|     sync |        change mtime | not support |
+----------+---------------------+-------------+

Start Performance Testing ...
put small objects count: 100 / 100 [==============================================================]  done
get small objects count: 100 / 100 [==============================================================]  done
   upload objects count: 256 / 256 [==============================================================]  done
 download objects count: 256 / 256 [==============================================================]  done
     list objects count: 100 / 100 [==============================================================]  done
     head objects count: 100 / 100 [==============================================================]  done
   delete objects count: 100 / 100 [==============================================================]  done
Benchmark finished! block-size: 4096 KiB, big-object-size: 1024 MiB, small-object-size: 128 KiB, small-objects: 100, NumThreads: 4
+--------------------+--------------------+-----------------+
|        ITEM        |        VALUE       |       COST      |
+--------------------+--------------------+-----------------+
|     upload objects |        67.12 MiB/s | 59.59 ms/object |
|   download objects |       106.86 MiB/s | 37.43 ms/object |
|  put small objects |    508.2 objects/s |  1.97 ms/object |
|  get small objects |    728.0 objects/s |  1.37 ms/object |
|       list objects | 46890.01 objects/s |      2.13 ms/op |
|       head objects |   2861.2 objects/s |  0.35 ms/object |
|     delete objects |   2295.1 objects/s |  0.44 ms/object |
| change permissions |        not support |     not support |
| change owner/group |        not support |     not support |
|       update mtime |        not support |     not support |
+--------------------+--------------------+-----------------+

12. Uninstalling the mount point reports `Resource busy -- try 'diskutil unmount'`an error

This means that a certain file or directory under the mount point is being used. It cannot be directly umountchecked (such as through lsofthe command ) whether there is an open terminal located in a certain directory of the JuiceFS mount point, or an application is processing the mount point. files in the loadpoint. If so, exit the terminal or application before attempting to unmount the filesystem using juicefs umountthe command .

13. How to destroy a filesystem

Use juicefs destroythe command to destroy a file system, this command will clear the relevant data in the metadata engine and object storage. For details on the use of this command, please refer to the documentation .

14. Where is the JuiceFS log

The log will be written to the log file when JuiceFS is mounted in the background, and the log will be printed directly to the terminal when mounted in the foreground or other foreground commands

The default log file on Mac systems is/Users/$User/.juicefs/juicefs.log

The default log file on a Linux system is/var/log/juicefs.log

15. Why can't I see the original file stored in JuiceFS in the object storage

Using JuiceFS, files will eventually be split into Chunks, Slices and Blocks and stored in object storage. Therefore, you will find that the source files stored in JuiceFS cannot be found in the file browser of the object storage platform. There is only a chunks directory and a bunch of digitally numbered directories and files in the storage bucket. Don't panic, this is the secret to JuiceFS file system's high performance! See how JuiceFS stores files for details .

16. What is the basic principle of JuiceFS random writing

JuiceFS does not store the original file into the object storage, but splits it into N data blocks (Blocks) according to a certain size (4MiB by default), uploads them to the object storage, and then stores the ID of the data block into the metadata engine. When writing randomly, the logic is to overwrite the original content. In fact, the metadata of the data block to be overwritten is marked as old data. At the same time, only the new data block generated during random writing is uploaded to the object storage, and the new data The metadata corresponding to the block is updated to the metadata engine.

When reading the data of the overwritten part, according to the latest metadata , it can be read from the new data block uploaded during random writing , and the old data block may be automatically cleaned up by the garbage collection task running in the background. This transfers the complexity of random writes to the complexity of reads.

This is just a rough introduction to the implementation logic. The specific reading and writing process is very complicated. You can study the two documents of JuiceFS internal implementation and reading and writing process and comb them together with the code.

17. Why do I delete files at the mount point, but the object storage footprint does not change or changes very little

The first reason is that you may have enabled the Recycle Bin feature. To ensure data security, the Recycle Bin is enabled by default. Deleted files are actually placed in the Recycle Bin, but not actually deleted. Therefore, the size of the object storage will not change. The retention time of the recycle bin can be juicefs formatspecified or juicefs configmodified by . Please refer to the Recycle Bin documentation for more information.

The second reason is that JuiceFS deletes the data in the object storage asynchronously, so the space change of the object storage will be slower. If you need to immediately clean up the data that needs to be deleted in the object storage, you can try running the juicefs gc command.

18. Why is there a difference between the size displayed by the mount point and the space occupied by the object storage

From the answer to the question "What is the implementation principle of JuiceFS supporting random writing?" It can be inferred that the space occupied by object storage is greater than or equal to the actual size in most cases, especially when a large number of overwrites are performed in a short period of time to generate many files. After fragments. These fragments still occupy the object storage space before merging and recycling are triggered. However, there is no need to worry about these fragments occupying space all the time, because every time a file is read/written, it will check and trigger the defragmentation work related to the file when necessary. In addition, you can manually trigger the merge and recycling through juicefs gc —-compact -—deletethe command .

In addition, if the compression function is enabled in the JuiceFS file system (it is not enabled by default), the objects stored on the object storage may be smaller than the actual file size (depending on the compression ratio of different types of files).

If the above factors have been ruled out, please check the storage type of the object storage you are using . The cloud service provider may set the minimum unit of measurement for certain storage types. For example, the minimum measurement unit of Alibaba Cloud OSS low-frequency access storage is 64KB, and if a single file is smaller than 64KB, it will also be calculated as 64KB.

19. Does the JuiceFS S3 gateway support advanced functions such as multi-user management?

The built-in gatewaysubcommands do not support functions such as multi-user management, and only provide basic S3 Gateway functions. If you need to use these advanced features, you can refer to our repository , which uses JuiceFS as an implementation of the MinIO gateway backend and supports the full functionality of the MinIO gateway.

20. What is the difference between JuiceFS and XXX

Please see the Technology Comparison document for more information.

21. Does JuiceFS support using a directory in the object storage as the value of `—-bucket`the option

This feature is not supported as of JuiceFS 1.0.0-rc3.

22. Does JuiceFS support reading existing data in object storage?

This feature is not supported as of JuiceFS 1.0.0-rc3.

23. Does JuiceFS currently support distributed caching?

This feature is not supported as of JuiceFS 1.0.0-rc3

24. Is there an SDK available for JuiceFS?

As of the release of JuiceFS 1.0.0-rc3, the community has two SDKs, one is the Java SDK officially maintained by Juicedata and the HDFS interface is highly compatible , and the other is the Python SDK maintained by community users .

If you are helpful, please pay attention to our project Juicedata/JuiceFS ! (0ᴗ0✿)