Introduction to Distributed File Systems

1. Lustre
lustre is a large-scale, safe and reliable cluster file system with high availability, which is developed and maintained by SUN. The main purpose of this project is to develop a next-generation cluster file system that can support more than 10,000 nodes and a storage system in the number of petabytes.
Lustre is an open source cluster file system that adopts the GPL license agreement. Currently, in cluster computers, the improvement of data exchange between computers and disks cannot keep up with the growth rate of microprocessors and memory, which also drags down the performance of applications. An emerging cluster file system software that increases I/O speeds has the potential to reduce the cost of and change the way businesses purchase storage Enter the general business computing market. The new cluster file system uses the open source Lustre technology, developed by the US Department of Energy and commercially supported by Hewlett-Packard (HP). It has significantly improved the input-output (I/O) speed, and has already had an impact in universities, national laboratories, and supercomputing research centers, and in the next few years, it is likely to enter the field of ordinary commercial computers.
Running under Linux, development language c/c++
official website: http://lustre.org/

2.
Hadoop hadoop is not just a distributed file system for storage, but is designed to be used in general-purpose computing devices. Framework for executing distributed applications on large clusters. The authorization protocol is apache, and the development language is Java. The resource consumption is a bit large.
Official website: http://hadoop.apache.org/

3.MogileFs
Mogile Fs is an open source distributed file system. The main features include
1. Components at the application layer
2. No single point of failure
3. Automatic file replication
4. Better reliability than RAID
5. No RAID nigukefs support required, running under linux.
Official website: http://www.danga.com/

4.FastDFS
FastDFS is an open source distributed file system, which manages files, including file storage, file synchronization, file access (file upload, file download), etc. , which solves the problem of mass storage and load balancing. It is especially suitable for online services with files as the carrier, such as photo album websites, video websites and so on.
The FastDFS server has two roles: tracker and storage node. The tracker mainly does scheduling work and plays the role of load balancing in access.
The storage node stores files and completes all the functions of file management: storage, synchronization and providing access interfaces. FastDFS also manages the meta data of files. The so-called meta data of a file is the relevant attributes of the file, which is represented by a key value pair, such as width=1024, where the key is width and the value is 1024. File meta data is a list of file attributes, which can contain multiple key-value pairs.
The development language is c/c++, which can run cross-platform.
Project address: https://github.com/happyfish100/fastdfs

5.NFS
network file system is one of the file systems supported by FreeBSD, also known as NFS.
NFS allows a system to share directories and files with others on the network. By using NFS, users and programs can access files on remote systems as if they were local files. Its advantages are:
1. The local workstation uses less disk space, because the usual data can be stored on one machine and can be accessed through the network.
2. The user does not have to have a home directory on each machine on the network. The home directory can be placed on an NFS server and available everywhere on the network.
3. Storage devices such as floppy drives, CDROMs, and ZIPs can be used by other machines on the network. You can reduce the number of removable media devices across the network.
The development language is c/c++, which can run cross-platform.
Official website: http://www.tldp.org/HOWTO/NFS-HOWTO/index.html

6.
OpenAFS OpenAFS is a set of open source distributed file systems, allowing systems to share files and resources through local area networks and wide area networks . OpenAFS is organized around a set of file servers called cells. The identity of each server is usually hidden in the file system. A user logged in from an AFS client will not be able to tell which server they are running on, because from the user's point of view See, they want to run on a single system with recognized Unix filesystem semantics.
Filesystem contents are usually replicated across cells, so that the failure of one hard drive will not harm operations on OpenAFS clients. OpenAFS requires a large client cache of up to 1GB to allow access to frequently used files. It is a very secure kerbero-based system that uses access control lists (ACLs) to allow fine-grained access, not based on the usual Linux and Unix security models.
Development protocol IBM Public, running under linux.
Official website: http://www.openafs.org/

7.MooseFs
Moose File System is a network distributed file system with fault tolerance, which distributes data on different servers in the network, MooseFs makes it look like through FUSE It is a Unix file system. But there is one problem, it still can't solve the problem of single point of failure. The development language is perl, which can be operated across platforms.
The MooseFS file system structure includes the following four roles:
1 Management server managingserver (master)
2 Metadata log server Metaloggerserver (Metalogger)
3 Data storage server data servers (chunkservers)
4 Client mounts use
various roles of clientcomputers:
1 Management server : Responsible for the management of each data storage server, file read and write scheduling, file space recovery and recovery. Multi-node copy
2 Metadata log server: Responsible for backing up the change log file of the master server, the file type is changelog_ml.*.mfs, so as to facilitate the When the master server fails, it takes over the work.
3 Data storage server: responsible for connecting to the management server, obeying the management server scheduling, providing storage space, and providing data transmission for customers.
4 Client: Connect to the remote management server through the fuse kernel interface Managed by the data storage server, it appears that the shared file system is used in the same way as the local unix file system.
Official website: http://www.moosefs.org/Reference
articles:
1. MooseFS that we used together in those years
2. Distributed file system MFS (moosefs) realizes storage sharing
http://sery.blog.51cto.com/ 10037/147756
http://sery.blog.51cto.com/10037/263515

8.googleFs
It is said to be a relatively good scalable distributed file system for large-scale, distributed applications that access large amounts of data. It runs on cheap common hardware, but can provide fault tolerance, it can provide high-performance services to a large number of users. Developed by google.

9.GlusterFS
GlusterFS is a cluster file system that supports PB-level data volume. GlusterFS is the core of the Scale-Out storage solution Gluster. It is an open source distributed file system with strong horizontal scalability, which can support several petabytes of storage capacity and handle thousands of clients through expansion. GlusterFS aggregates physically distributed storage resources using TCP/IP or InfiniBand RDMA networks, using a single global namespace to manage data. GlusterFS is based on a stackable user space design, which provides excellent performance for a variety of different data loads.
Official website: http://www.gluster.org/

10.TFS
TFS (Taobao FileSystem) is a highly scalable, highly available, high-performance, Internet service-oriented distributed file system, mainly for massive unstructured data. It is built on a common Linux machine cluster and can provide external storage access with high reliability and high concurrency. TFS provides a large amount of small file storage for Taobao, usually the file size does not exceed 1M, which meets Taobao's demand for small file storage and is widely used in various Taobao applications. It adopts HA architecture and smooth expansion to ensure the availability and scalability of the entire file system. At the same time, the flat data organization structure can map the file name to the physical address of the file, which simplifies the file access process and provides TFS with good read and write performance to a certain extent.
Overall structure:
A TFS cluster consists of two NameServer nodes (one active and one standby) and multiple DataServer nodes. These service programs are run as a user-level program on ordinary Linux machines.
In TFS, a large number of small files (actual data files) are merged into a large file. This large file is called a block (Block). Each block has a unique number (Block Id) in the cluster. When the block is created, the NameServer maintains the relationship between the block and the DataServer. The actual data in the Block is stored on the DataServer. A DataServer server generally has multiple independent DataServer processes. Each process is responsible for managing a mount point. This mount point is generally a file directory on an independent disk to reduce the impact of single disk damage.
The main functions of NameServer are: Manage and maintain related information of Block and DataServer, including DataServer joining, exiting, heartbeat information, establishing and releasing the corresponding relationship between block and DataServer. Under normal circumstances, a block will exist on the DataServer. The main NameServer is responsible for the creation, deletion, replication, balancing, and sorting of the Block. The NameServer is not responsible for reading and writing the actual data. The reading and writing of the actual data is completed by the DataServer.
The main functions of DataServer are: Responsible for the storage and reading and writing of actual data.
At the same time, in order to consider disaster recovery, NameServer adopts the HA structure, that is, two machines are hot standby for each other and run at the same time, one is the main machine and the other is the backup. The host is bound to the external VIP to provide services; when the main machine goes down, Quickly bind the vip to the backup NameServer, switch it to the host, and provide external services. The HeartAgent in the figure completes this function.

Official website: http://code.taobao.org/p/tfs/wiki/index/

11.pNFS
The Network File System (NFS) is an important part of most local area networks (LANs). But NFS is not suitable for demanding input bookcase-intensive programs in high-performance computing, at least before. The offense modification of the NFS standard incorporates Parallel NFS (pNFS), a parallel implementation of file sharing that increases transfer rates by orders of magnitude.
Development language c/c++, running under linux.
Official website: http://www.pnfs.com/

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326418169&siteId=291194637