FastDFS (1) Distributed File System

FastDFS (1) Distributed File System

What is a file system

How is file data stored? ?

Insert picture description here

Distributed file system

  • A computer has limited storage capacity and limited concurrent throughput. How to improve performance?

  • I want to ship a ton of goods to Turpan:

    • 1 human luck, can't imagine
    • 50 people luck, too difficult
    • 500 people shipped, everyone is easy

There is the concept of clustering and the concept of distributed. Don’t confuse the two

  • Distributed: Different business modules are deployed on different servers or the same business module is split into multiple sub-businesses and deployed on different servers. Solve the problem of high concurrency

  • Cluster: The same business is deployed on multiple servers to improve the high availability of the system

  • chestnut:

    • The small restaurant used to have only one chef, who cut vegetables, washed vegetables and prepared ingredients with one hand. There are more and more guests, and one chef can’t be too busy, so he can only hire another chef, and both chefs can cook, that is, the roles of the two chefs are the same. In this way, the relationship between the two chefs is a "cluster."
    • In order to allow the chef to concentrate on cooking and stir-fry the dishes to the extreme, he invited a side dish chef to be responsible for cutting vegetables and preparing ingredients. The relationship between chef and preparer is "distributed"
    • One vegetable preparer was too busy to provide two ingredients to the two chefs, and another vegetable preparer was hired. The relationship between the two vegetable preparers is again a "cluster".

Mainstream distributed file system

HDFS

(Hadoop Distributed File System) Hadoop Distributed File System;

Highly fault-tolerant system, suitable for deployment on cheap machines;

Can provide high throughput data access, very suitable for large-scale data applications;

HDFS adopts a master-slave structure, an HDFS is composed of a name node and N data nodes;

The name node stores metadata, and a file is divided into N pieces and stored on different data nodes.

GFS

Google File System

Scalable distributed file system for large-scale, distributed applications that access large amounts of data;

Runs on cheap common hardware and can provide fault tolerance;

It can provide a large number of users with services with higher overall performance;

GFS adopts a master-slave structure. A GFS cluster consists of a master and a large number of chunkservers;

A file is divided into several blocks and stored in multiple block servers

FastDFS

  • Written and open sourced by Taobao senior architect Yu Qing
  • Tailored specifically for the Internet, fully considering mechanisms such as redundant backup, load balancing, linear expansion, and focusing on high availability, high performance and other indicators. It is easy to build a high-performance file server cluster with FastDFS to provide file upload and download Wait for service
  • HDFS, GFS, etc. are common file systems. Their advantage is a good development experience, but the complexity of the system is high and the performance is average.
  • In contrast, a dedicated distributed file system has poor experience, but has low complexity and high performance. In particular, fastDFS is particularly suitable for small files such as pictures and small videos. Because fastDFS does not split files, there is no file merging overhead.
  • Socket for network communication, fast

working principle

fastDFS includes Tracker Server and Storage Server;

The client requests the Tracker Server to upload and download files;

Tracker Server schedules Storage Server to finally complete upload and download.

Insert picture description here

Tracker (Translation: Tracker)

  • The role is load balancing and scheduling. It manages the storage service (Storage Server), which can be understood as: "big housekeeper, tracker, dispatcher"
  • Tracker Server can be clustered to achieve high availability, the strategy is "polling"

Storage (Translation: warehouse; storage)

  • The function is file storage, and the files uploaded by the client are finally stored on the storage server
  • The storage cluster adopts a grouping method. Each server in the same group is in an equal relationship, and data is synchronized. The purpose is to achieve data backup and thus high availability. There is no communication between servers in different groups.
  • When the storage capacity of each server in the same group is inconsistent, the one with the smallest capacity will be selected, so it is best to keep the software and hardware consistent between servers in the same group
  • Storage Server will connect to all Tracker Servers in the cluster and report to them on their status regularly, such as: remaining space, file synchronization status, file upload and download times and other information

Upload/download principle

Insert picture description here

Insert picture description here

After the client uploads the file, storage will return the file id to the client

group1 / M00 / 02/11 / aJxAeF21O5wAAAAAAAAGaEIOA12345.sh

  • Group name: After the file is uploaded, the name of the storage group. After the file is uploaded successfully, it will be returned by the storage and the client needs to save it by itself

  • Virtual disk path:

    • The virtual path of storage configuration, corresponding to the disk option storage_path
    • storage_path0 corresponds to M00
    • storage_path1 corresponds to M01
  • Data two-level directory:

    • The directory created by storage under the virtual disk
  • file name:

    • Different from uploading, it is generated by storage based on specific information, which contains: storage server ip, creation timestamp, size, suffix name and other information

Guess you like

Origin blog.csdn.net/weixin_49741990/article/details/113049359