MooseFS working mechanism and advantages and disadvantages analysis

1. Introduction to MooseFS

MooseFS is a distributed network file system with redundant fault tolerance. It stores data on multiple physical servers or separate disks or partitions to ensure that there are multiple backup copies of a piece of data. To the client or user who accesses, the entire distributed network file system cluster looks like a resource. From its file operations, MooseFS is equivalent to a UNIX-like file system.

2. Four roles in the MooseFS architecture

  • Management server (Master Server): also known as a metadata server, responsible for managing each data storage server, scheduling file reads and writes, reclaiming file space, and recovering multiple node copies. Currently MFS only supports one metadata server master, which is a single point of failure and requires a stable server to act.
  • Metalogger Server: It is responsible for backing up the change log file of the management server. The file type is changelog_ml. *. Mfs, so that it can take over when the management server has a problem. The metadata log server is a new service added after mfs 1.6. Metadata logs can be kept in the management server or can be stored separately in a server. To ensure data security and reliability, it is recommended to use a separate server to store metadata logs. It should be noted that the metadata log daemon is on the same server as the management server, and the backup metadata log server is its client, and the log file is obtained from the management server for backup.
  • Data storage server (Chunk Server): The data storage server is a server that truly stores user data. It is responsible for connecting to the management server, obeying the management server scheduling, providing storage space, and providing data transmission for customers. When storing files, first divide the file into blocks, and then copy these blocks to each other between data storage servers.
  • Client (Client): Mount the data storage server managed on the remote management server through the FUSE kernel interface, so that the effect of the shared file system and the use of the local file system looks the same.

3. MooseFS working mechanism

1. Cluster architecture diagram

Insert picture description here

2. Reading and writing mechanism

2.1 Reading mechanism

Insert picture description here

  1. First, the client (Client) accesses the master server (Master) to obtain information such as the location of the file entity.
  2. The master server queries the cache records and sends the relevant information such as the location of the file entity to the client.
  3. The client (Client) accesses the corresponding server (Chunk Server) that stores the entity data according to the obtained information.
  4. The server that stores the entity data (Chunk Server) returns the corresponding data to the client (Client).
2.2 Write mechanism

Insert picture description here

  1. The client accesses the master server and requests to write data.
  2. The master server queries the cache records. If it is a new file, it will contact the subsequent data server (Chunk Server) to create a corresponding chunk object to prepare to store the file.
  3. The data server (Chunk Server) returns the success message to the master server to create the chunk object.
  4. The master server sends related information such as the location of the file entity to the client.
  5. The client (Client) accesses the corresponding data server (Chunk Server) to write data.
  6. Synchronize data between data servers (Chunk Server) and confirm each other's success.
  7. The data server (Chunk Server) returns the successfully written information to the client (Client).
  8. The client reports back to the master server that the writing is complete.

Four, pros and cons analysis

1. Advantages

  1. Since MFS is released based on GPL, it is completely free, and the development and community are very active, and the information is very rich
  2. Lightweight, easy to deploy, easy to configure, easy to maintain
  3. Universal file system, can be used without modifying upper-layer applications
  4. Low expansion cost, supports online expansion, does not affect business, and the system architecture is highly scalable
  5. Highly available architecture, no single point of failure for all components
  6. File objects are highly available, and any degree of file redundancy can be set (providing a higher level of redundancy than Raid 10)
  7. Provide system load, distribute data read and write to all servers, accelerate read and write performance
  8. Provide many advanced features, such as Windows-like recycle bin function, JAVA-like GC (garbage collection), snapshot function, etc.
  9. MooseFS is a C implementation of Google Filesystem
  10. Built-in Web Gui monitoring interface
  11. Improve random read or write efficiency and read and write efficiency of massive small files

2. Disadvantages

  1. The performance bottleneck of the Master Server itself. The master-slave architecture of MFS is similar to the master-slave replication of MySQL. The slave can be expanded, but the master is not easy to expand. The short-term countermeasure is to do segmentation according to business.
  2. As the total number of files stored in the MFS architecture rises, the memory requirements of the Master Server will continue to increase (MFS caches the structure of the file system in the memory of the Maset Server). According to official data, 8g corresponds to 25 million files, and 200 million files require 64GB of memory. Short-term countermeasures are also divided according to business.
  3. The robustness of Master Server's single point solution. At present, the official comes with the synchronization of data information from the Master Server to the Metalogger Server. Once the Master Server has a problem, the Metalogger Server can be restored and upgraded to the Master Server, but it takes recovery time. At present, the single point problem of the Master Server can also be solved through a third-party high-availability solution (heartbeat + drbd + moosefs).
  4. Metalogger Server has a longer interval for copying metadata (adjustable).

V. References

MooseFS official website [https://moosefs.com]

Introduction to MooseFS Distributed File System [https://www.cnblogs.com/hjc4025/p/9956988.html]

MooseFS of distributed file system [https://blog.51cto.com/zouqingyun/1698710]

Published 40 original articles · 25 praises · 100,000+ views

Guess you like

Origin blog.csdn.net/yym373872996/article/details/105650977
Recommended