Curve file storage: how to support tens of billions of files?

Curve file storage is a POSIX-compliant distributed file system, suitable for private cloud, public cloud, and hybrid cloud environments. We can easily access tens of billions of files through Curve file storage.

Let's give a brief introduction to the architecture of Curve file storage. There are two types of information that need to be persisted in file storage, one is the metadata of the file, mainly inode and dentry, and the other is the data of the file, that is, the content written by the user. At the beginning of the design of the Curve file system, considering the support of multi-cloud and the cost in large-scale data scenarios (mostly cold data), it is necessary to support the flow of data in storage with different performance, so the metadata and data are stored separately. .

The figure below is the architecture of the Curve file system.

  • Metadata uses a separate cluster storage to ensure high reliability, high availability, and high scalability

  • Data can have a variety of options. It can be connected to Curve block storage, object storage on the public cloud, or multiple storages with different performances at the same time, such as Curve block storage (SSD), Curve block storage (HDD), and object storage. (triple copy), object storage (EC), object storage (archive). It also ensures high reliability, high availability, and high scalability.

How to support tens of billions of files

One of the important features of the Curve file system is that it is suitable for mass file storage, so how can the Curve file system ensure that it can support tens of billions of scale? How to ensure the performance at the scale of tens of billions? From a theoretical point of view:

  • In terms of scale, the metadata cluster of Curve file storage, each node stores a certain range of inodes (such as 1~10000) and dentry. If the number of files increases, the storage nodes can be expanded, so the scale is theoretically unlimited.

  • In terms of performance, when the number of files is large, there is no difference in the operation of a single file, but there will be performance problems for some aggregation operations that require metadata, such as du (to calculate the capacity of the current file system), ls (to obtain the All file information) and other operations need to be optimized to ensure performance.

So how does the Curve file system actually perform?

First introduce several general testing tools for the file system.

  1. pjdfstest[1] : posix compatibility test. There are 3600+ regression test cases, covering chmod, chown, link, mkdir, mkfifo, open, rename, rmdir, symlink, truncate, unlink, etc.

  2. mdtest[2] : metadata performance test. Perform operations such as open/stat/close on files or directories, and return reports

  3. vdbench[3] : Data consistency test. Vdbench is a widely used storage performance testing tool written by Oracle. It supports both block device performance testing and file system performance testing. It is very convenient for random write consistency testing and can check which sectors appear in real time. data inconsistency

  4. fio[4] : data performance test.

The Curve file system has provided a separate way of stress testing metadata clusters since v2.3 (data clusters generally use Curve block storage and S3, so you can directly perform performance tests on these components).

  1. Build the file system through  CurveAdm[5]  , and  add a configuration item  when preparing the client configuration file  client.yaml[6] : s3.fakeS3=true[7] .

  2. Using  mdtest , vdbench , ImageNet dataset [8] as the data source, test the stability and performance of the file system in the mixed scene of large and small files.

According to the data structure estimation of metadata, the storage metadata logical space of tens of billions of files needs about 8TB, and the actual storage uses 3 copies of about 24TB. Friends who are interested in testing can refer to it.

How about the performance under massive file storage

Curve file storage can maintain relatively stable performance as the stock data grows (the drop in stat requests is about 15%).

Scenario 1 (the case with a large number of test directories):

Test command mdtest -z 2 -b 3 -I 10000 -d /mountpoint

Scenario 2 (the test directory has a deep hierarchy): 

Test command  mdtest -z 10 -b 2 -I 100 -d /mountpoint

Curve file storage currently uses metadata clusters, which have better performance than distributed KV storage (such as TiKV).

Note: This group of tests has enabled fuseClient.enableMultiMountPointRename to ensure the transactional nature of multi-mount point rename, so there is a deviation from the basic test data of the previous group.

Scenario 1 (the case with a large number of test directories): 

Test command mdtest -z 2 -b 3 -I 10000 -d /mountpoint

Scenario 2 (the test directory has a deep hierarchy): 

Test command  mdtest -z 10 -b 2 -I 100 -d /mountpoint

Currently, Curve file storage has been implemented in ES and AI scenarios, and corresponding cases will be shared with you in the future.

<Original author: Li Xiaocui, Curve Maintainer >

Reference link:

pjdfstest: [1]

https://github.com/pjd/pjdfstest

mdtest:[2]

https://github.com/LLNL/mdtest

vdbench:[3]

https://www.oracle.com/downloads/server-storage/vdbench-downloads.html

fio:[4]

https://github.com/axboe/fio

CurveAdm:[5]

https://github.com/opencurve/curveadm/wiki

client.yaml:[6]

https://github.com/opencurve/curveadm/wiki/curvefs-client-deployment#%E7%AC%AC-3-%E6%AD%A5%E5%87%86%E5%A4%87%E5%AE%A2%E6%88%B7%E7%AB%AF%E9%85%8D%E7%BD%AE%E6%96%87%E4%BB%B6

s3.fakeS3=true:[7]

https://github.com/opencurve/curve/blob/5df72f5e1e2813e4bfa5d73672ea0f6a25630e74/curvefs/conf/client.conf#L128

ImageNet dataset : [8]

https://www.kaggle.com/competitions/imagenet-object-localization-challenge/data

 

Curve is a high-performance, easy-to-operate, and cloud-native open source distributed storage system. Can be applied to mainstream cloud-native infrastructure platforms: connect to OpenStack platform to provide high-performance block storage services for cloud hosts; connect to Kubernetes  to provide RWO, RWX and other types of persistent storage volumes; connect to  PolarFS as high-performance storage for cloud-native databases The base perfectly supports the storage-computing separation architecture of cloud-native databases.

Curve can also be used as cloud storage middleware to use S3-compatible object storage as a data storage engine to provide cost-effective shared file storage for public cloud users.

  • GitHub:https://github.com/opencurve/curve

  • Official website : https://opencurve.io/

  • User Forum : https://ask.opencurve.io/

  • WeChat group: search group assistant WeChat account  OpenCurve_bot

The country's first IDE that supports multi-environment development——CEC-IDE Microsoft has integrated Python into Excel, and Uncle Gui participated in the framework formulation. Chinese programmers refused to write gambling programs and were pulled out 14 teeth, with 88% body damage . Podman Desktop, an open-source imitation Song font, breaks through 500,000 downloads. Automatically skips opening screen advertisements. The application "Li Tiao Tiao" stops updating indefinitely. There is a remote code execution vulnerability Xiaomi filed mios.cn website domain name
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4565392/blog/5591357