Comparison and selection of distributed file system

1. Distributed File System

Distributed File System (Distributed File System) refers to the physical storage resources managed by the file system are not necessarily directly connected to the local node, but connected to the node through the computer network. The design of the distributed file system is based on the client / server model. A typical network may include multiple servers for multiple users to access. In addition, the peer-to-peer feature allows some systems to play the dual role of client and server. For example, a user can "publish" a directory that other clients can access. Once accessed, the directory appears to the client as if it were a local drive.

Judging whether a distributed file system is excellent depends on the following three factors:

  1. The data storage method , for example, there are 10 million data files, you can store all data files on one node, and each node stores 10 / N million data files on the other N nodes as backup; Storage, each node stores 1000 / N million data files. No matter what storage method is adopted, the purpose is to ensure the storage of data is safe and easy to obtain.
  2. The data reading rate includes responding to the user's request to read the data file, locating the node where the data file is located, the time to read the data file in the actual hard disk, the data transmission time between different nodes, and the processing time of some processors. Various factors determine the user experience of the distributed file system. That is, the data reading rate in the distributed file system cannot be too different from the data reading rate in the local file system, otherwise it takes 2 seconds to open a file in the local file system, and various factors in the distributed file system It takes more than 10 seconds to affect the user experience.
  3. The data security mechanism , because the data is scattered in each node, must adopt redundancy, backup, mirroring and other methods to ensure that the node can recover data in the event of a node failure, to ensure data security.

2. Introduction of mainstream distributed file system

The current mainstream distributed file systems are: GFS, HDFS, Ceph, Lustre, MogileFS, MooseFS, FastDFS, TFS, GridFS, etc.

1. GFS(Google File System)

Google's proprietary Linux-based distributed file system developed to meet the company's needs. Although Google announced some technical details of the system, Google did not release the software part of the system as open source software.

2. HDFS(Hadoop Distributed File System)

Hadoop implements a distributed file system, referred to as HDFS. Hadoop is a widely used text search library developed by Apache Lucene founder Doug Cutting. It originated from Apache Nutch, which is an open source web search engine and is itself part of the Luene project. The Aapche Hadoop architecture is an open source application of the MapReduce algorithm, and it is an important cornerstone for Google to create its empire.

Reference link:

http://hadoop.apache.org/docs/r2.9.1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html

3. Ceph

It is a distributed file system developed by Sage Weil at the University of California, Santa Cruz while studying for a PhD. And completed his thesis using Ceph.
Since ceph uses the btrfs file system, the btrfs file system requires Linux 2.6.34 and above to support it. Ceph is not yet mature enough, the btrfs it is based on is not mature, and its official website clearly states that ceph should not be used in a production environment.

Reference link:

https://github.com/ceph/ceph

https://ceph.com

4. Lustre

Lustre is a large-scale, safe and reliable cluster file system with high availability, which is developed and maintained by SUN. The main purpose of the project is to develop a next-generation cluster file system that can support more than 10,000 nodes and count PB data storage systems. At present, Lustre has been used in some fields, such as HP SFS products.

Reference link:

http://lustre.org/

5. MooseFS

It supports FUSE, is relatively lightweight, has a single point of dependence on the master server, and is written in perl. The performance is relatively poor, and more people use it in China.

Reference link:

https://moosefs.com

https://sourceforge.net/projects/moosefs/?source=directory

https://www.cnblogs.com/hjc4025/p/9956988.html

6. MogileFS

It is a product developed by memcahed's development company danga perl. Currently, mogielFS is used in China for image hosting sites such as yupoo. MogileFS is a set of efficient automatic file backup components, developed by Six Apart, and widely used on web2.0 sites including LiveJournal.

Reference link:

https://github.com/mogilefs

7. FastDFS

It is an open source distributed file system similar to Google FS, developed in pure C language. FastDFS is an open source lightweight distributed file system. It manages files. Functions include: file storage, file synchronization, file access (file upload, file download), etc., which solves the problems of large-capacity storage and load balancing. Especially suitable for online services that use documents as carriers, such as photo album websites, video websites, etc.

Reference link:

https://github.com/happyfish100/fastdfs

https://www.cnblogs.com/shenxm/p/8459292.html

8. GlusterFS

The open source distributed horizontal expansion file system can quickly allocate storage according to storage needs, contains rich automatic failover functions, and discards the idea of ​​a centralized metadata server. A scalable network file system suitable for data-intensive tasks, featuring scalability, high performance, and high availability. Gluster was acquired by red hat on October 7, 2011.

Reference link:

http://www.gluster.org

https://blog.csdn.net/liuaigui/article/details/6284551

9. TFS(Taobao File System)

TFS is a highly scalable, highly available, high-performance, Internet-oriented distributed file system, mainly for massive unstructured data. It is built on a common Linux machine cluster and can provide high reliability and high concurrency to the outside Storage access. TFS provides a large amount of small file storage for Taobao, usually the file size does not exceed 1M, which meets Taobao's demand for small file storage, and is widely used in various applications of Taobao. It uses HA architecture and smooth expansion, ensuring the availability and scalability of the entire file system. At the same time, the flat data organization structure can map the file name to the physical address of the file, simplifying the file access process, and providing TFS with good read and write performance to a certain extent.

Reference link:

http://code.taobao.org/p/tfs/src/

10. GridFS

MongoDB is a well-known NoSql database. GridFS is a built-in function of MongoDB. It provides a set of file operation APIs to use MongoDB to store files. The basic principle of GridFS is to store files in two Collections, one to save file indexes One saves the file content, the file content is divided into several blocks according to a certain size, and each block is stored in a Document. This method not only provides file storage, but also provides some additional attributes related to the file (such as MD5 value, file name, etc.) Storage. Files in GridFS will be stored in blocks of 4MB.

Reference link:

https://docs.mongodb.com/manual/core/gridfs

Third, the comparison of distributed file systems

1. Comprehensive comparison

File system Developers Development language Open source agreement Ease of use Applicable scene characteristic Disadvantages
GFS                  Google                              Not open source                                                
HDFS Apache Java Apache Simple installation and professional official documentation Store very large files Large data batch read and write, high throughput; write once, read many times, read and write sequentially Difficult to meet low-latency data access at the millisecond level; does not support multiple users writing the same file concurrently; not suitable for a large number of small files
Ceph Sage Weil, University of California, Santa Cruz C++ LGPL Simple installation and professional official documentation Single cluster of large, medium and small files Distributed, no single point of dependency, written in C, good performance Based on immature btrfs, it is not mature enough and stable by itself, it is not recommended to use in production environment
Lustre SUN C GPL Complex, and heavily dependent on the kernel, the kernel needs to be recompiled Reading and writing large files Enterprise-level products, very large, deeply dependent on kernel and ext3
MooseFS Core Sp. z o.o. C GPL v3 Simple installation, many official documents, and a web interface to manage and monitor Reading and writing a large number of small files Lightweight, written in perl, more people use it in China There is a single point of dependence on the master server, the performance is relatively poor
MogileFS Danga Interactive Perl GPL Mainly used in the web field to handle massive small pictures Key-value type metafile system; much higher efficiency than mooseFS Does not support FUSE
FastDFS Domestic developer Yu Qing C GPL v3 Simple installation and relatively active community Small and medium files for a single cluster The system does not need to support POSIX, which reduces the complexity of the system and improves processing efficiency; implements soft RAID, enhances the system's concurrent processing capability and data fault tolerance recovery capability; supports master-slave files, supports custom extensions; master-slave tracker service, Enhance system availability Does not support breakpoint resuming, not suitable for large file storage; does not support POSIX, low versatility; there is a large delay for file synchronization across the public network, and the corresponding fault tolerance strategy needs to be applied; the synchronization mechanism does not support file correctness Verification; download via API, there is a single point of performance bottleneck
GlusterFS Z RESEARCH C GPL v3 Simple installation and professional official documentation Suitable for large files, there is still a lot of room for optimization of small file performance No metadata server, stack architecture (basic function modules can be stacked in combination to achieve powerful functions), with linear scale-out capability; larger than mooseFS Since there is no metadata server, the load on the client is increased, which occupies a considerable amount of CPU and memory; but when traversing the file directory, it is more complicated and inefficient, and it is necessary to search all storage nodes, and it is not recommended to use a deeper path
TFS Alibaba C++ LPG V2 Complex installation and few official documents Small files across clusters Tailored for small files, the performance of random IO is relatively high; implements soft RAID, enhances the system's concurrent processing capacity and data fault-tolerant recovery ability; supports master-slave hot swap, improves system availability; supports master-slave cluster deployment, and slave clusters Provide read / standby function Not suitable for storage of large files; does not support POSIX, and has low versatility; does not support custom directory structure and file permission control; download via API, there is a single point of performance bottleneck; few official documents, high learning cost
GridFS MongoDB C++ 安装简单 通常用来处理大文件(超过16M) 可以访问部分文件,而不用向内存中加载全部文件,从而保持高性能;文件和元数据自动同步

2. 特性对比

文件系统 数据存储方式 集群节点通讯协议 专用元数据存储点 在线扩容 冗余备份 单点故障 跨集群同步 FUSE挂载 访问接口
HDFS              文件 私有协议(TCP) 占用MDS 支持 存在 不支持 支持 不支持POSIX
Ceph 对象/文件/块 私有协议(TCP) 占用MDS 支持 支持 存在 不支持 支持 POSIX
Lustre 对象 私有协议(TCP)/ RDAM(远程直接访问内存) 双MDS 支持 不支持 存在 未知 支持 POSIX/MPI
MooseFS 私有协议(TCP) 占用MFS 支持 支持 存在 不支持 支持 POSIX
MogileFS 文件 HTTP 占用DB 支持 不支持 存在 不支持 不支持 不支持POSIX
FastDFS 文件/块 私有协议(TCP) 支持 支持 不存在 部分支持 不支持 不支持POSIX
GlusterFS 文件/块 私有协议(TCP)/RDAM(远程直接访问内存) 支持 支持 不存在 支持 支持 POSIX
TFS 文件 私有协议(TCP) 占用NS 支持 支持 存在 支持 未知 不支持POSIX

什么是POSIX?

POSIX表示可移植操作系统接口(Portable Operating System Interface of UNIX,缩写为 POSIX ),也就是Unix下应用程序共同遵循的一种规范。支持POSIX的应用程序意味着在各个Unix系统间提供了跨平台运行的支持。

四、选型参考

1. 按特性分类

适合做通用文件系统的有:Ceph,Lustre,MooseFS,GlusterFS;

适合做小文件存储的文件系统有:Ceph,MooseFS,MogileFS,FastDFS,TFS;

适合做大文件存储的文件系统有:HDFS,Ceph,Lustre,GlusterFS,GridFS;

轻量级文件系统有:MooseFS,FastDFS;

简单易用,用户数量活跃的文件系统有:MooseFS,MogileFS,FastDFS,GlusterFS;

支持FUSE挂载的文件系统有:HDFS,Ceph,Lustre,MooseFS,GlusterFS。

2. 初步筛选

考虑到GFS不开源,学习成本高,且相关特性资料不全面的情况下,暂时先不考虑使用GFS;

Ceph目前不够成熟稳定,很少有使用在生产环境的案例,暂时排除;

Lustre对内核依赖程度过重,且不易安装使用,暂时排除;

TFS安装复杂,且官方文档少,不利于以后的学习使用,暂时先排除;

经初步筛选剩下的文件系统有:HDFS、MooseFS、MogileFS、FastDFS、GlusterFS、GridFS。

3. 根据需求分析进一步筛选

需求

  • 需要搭建一部管理原始凭证的文件系统,原始凭证的文件类型主要是小图片,写操作量少,读操作量大,且对安全性要求较高。
  • 随着系统在使用过程中数据量逐步庞大,图片的量会变得繁多,对图片读取速率要求尽可能高但不追求极致(无需到毫秒级)。
  • 文件系统需要有较完善的冗余备份与容错机制,功能尽量精简耐用,安装配置应简单且适合于国产环境部署。

分析

  1. 根据需求,首选需要选择适合海量小图片存储的文件系统,适合的文件系统有:MooseFS,MogileFS,FastDFS。

  2. 其次需要支持冗余备份,适合的文件系统有:MooseFS、FastDFS、GlusterFS。

  3. 符合条件1,2且功能精简的文件系统有:FastDFS。

  4. 符合条件1,2且功能全面的文件系统有:MooseFS。

总结

MooseFS功能较为全面,支持在线扩容、冗余备份、FUSE挂载和POSIX访问接口,不支持跨集群同步,存在单点故障,性能相对较差。

FastDFS功能精简,支持在线扩容、冗余备份,部分支持跨集群同步,不支持FUSE挂载和POSIX访问接口,不存在单点故障,性能较好。

提供的建议选型参考为FastDFS或MooseFS,可根据需求的细化进一步分析选取。

备注:此选型参考提供的是分布式文件系统的选型建议,根据系统需求也可选择NFS等其他更合适的文件系统类型;此选型参考仅局限于分析范围内的文件系统,仍然有其他类型的文件系统可能是更好的选择;此选型参考中没有确切的性能测试数据作为对比,无法提供性能方面的精确比较。

五、参考文献

开源分布式存储系统的对比 [http://my.525.life/article?id=1510739742054]

分布式文件系统MFS、Ceph、GlusterFS、Lustre的对比 [https://www.cnblogs.com/zhiguo/p/3334993.html]

使用 FUSE 开发自己的文件系统 [https://www.ibm.com/developerworks/cn/linux/l-fuse]

发布了40 篇原创文章 · 获赞 25 · 访问量 10万+

Guess you like

Origin blog.csdn.net/yym373872996/article/details/105650908