gfs distributed file system

1. Introduction A brief introduction to gfs introduction + design principles

    gfs is a large distributed file system built on cheap servers.

Design Principles:

  • Failure of gfs components is the norm, not an accident. gfs is built on ordinary commercial PCs, and the stability of these PCs is not highly guaranteed, and components may fail to work at any time.
  • Most of the files stored in the gfs file system are large files of several gigabytes.
  • Most files are modified by appending data at the end of the file, rather than overwriting the original data. Random writes to files are practically nonexistent. Once written, the only operations on the file are reads, usually sequential reads.

2. Overall structure

    The overall architecture of the gfs file system is shown in the figure:

 gfs file system

The gfs file system consists of four parts:

  • gfs master server (master)
  • chunk server
  • gfs client

(2.1) Master server (master)

    The master server of gfs is a master-slave structure. The point is that there is a global master node for the entire storage system, and management is relatively simple. The disadvantage is that many service requests need to go through the master server, so it can easily become the bottleneck of the entire system.

     The role of the master includes the following aspects:

    i. Maintain metadata in the system: namespace (file and chunk namespace), mapping relationship between files and chunks, and storage location information of chunks. The first two need to be persisted to the disk, and the third can be obtained through the chunkserver report (the master server needs less than 64 bytes of metadata to manage a 64M chunk)

    ii. Load balancing: including chunk creation, replication and load balancing

        The creation of the chunk will select the storage location of the chunk copy according to the following factors:

       1) The disk utilization of the chunkserver where the new replica is located is lower than average;

       2) Limit the number of recent creations per chunkserver. Because the creation of chunks often means that there are a large number of subsequent write operations.

       3) In order to improve the availability of the system, gfs will try to put different copies of the same chunk on different racks

    iii. Garbage collection: gfd adopts a delayed collection strategy. When a file is to be deleted, gfs just changes the file to a hidden name in the metadata and includes a deletion time. The master checks regularly, and if it finds that the file has been deleted for more than a certain period of time, it deletes the metadata from memory. When the chunkserver reports chunk information through heartbeat, the master will reply to the chunk information that does not exist in the master metadata, and then the chunkserver will release these chunk copies.

    iv. Snapshots (can be done in an instant, almost without any interruption to other ongoing operations)

        Copy-on-write generates snapshots. Proceed as follows:

        1) Recover the write operation permission of each chunk of the file through the lease mechanism (the master records this operation to the hard disk in the form of a log), and stop the write operation to the file

        2) The master copies metadata such as file names to generate a new snapshot file

        3) Increment the reference count on all chunks of the file that performed the snapshot

        After the snapshot operation, the client writes data to the chunks involved in the snapshot steps:

        1) Ask the master who currently holds the lease

         2) The master finds that the reference of the current chunk is greater than 1, and informs the chunkserver to copy the chunk, and all append operations of the client turn to the newly copied chunk

(2.2)chunkserver

        Files in gfs are stored with a fixed size (64M), and each chunk is identified by a globally unique 64-bit chunk handle. By default, each chunk will store 3 copies.

3. Key issues

     3.1、chunk size

      For the gfs system, the default size of the chunk size is 64M, which is larger than the block (4K?) of the general system

      advantage:

  • It can reduce the number of interactions between the GFS client and the GFS master. When the chunk size is relatively large, the data of a chunk may be read multiple times. In this way, the number of requests for the chunk location information from the GFS client to the GFS master can be reduced.
  • For the same chunk, GFS client can maintain persistent connection with GFS chunkserver to improve read performance
  • The larger the chunk size, the smaller the total size of the chunk's metadata, so that the chunk-related metadata can be stored in the memory of the GFS master

      shortcoming:

  • When the chunk size is larger, there may only be one chunk for some files, then the read and write of the file will fall on a GFS chunkserver and become a hot spot       

For the hotspot problem, the solution given by Google is that the application layer avoids reading and writing the same chunk at the same time at high frequency. A possible solution is also proposed that the GFS client finds other GFS clients to read data

    3.2. Lease Mechanism

        The gfs system is designed with a single master node. If all read and write operations pass through the master, the master will become the system bottleneck. Therefore, in gfs, the master passes the write permission of the chunk to the chunkserver through the lease mechanism. The chunkserver with write permission is the primary chunkserver, and the others are backups. Leases are for a single chunk. The lease duration is 60 seconds, and if there is no problem after the lease ends, the master tries to pay the lease to the chunkserver that originally held the lease.

    3.3. Consistency: gfs guarantees that the append is successful at least once

        1) An exception occurs: the client will re-request the append, so it may appear that records are appended multiple times in some copies, that is, duplicate records; there may also be some identifiable filled records.

        2) The gfs client supports concurrent appends. The order between multiple clients cannot be guaranteed, and multiple records that are successfully appended by the same client in succession may also be interrupted

    3.4, fault tolerance mechanism

        1) Master fault tolerance mechanism:

          i, operation log and checkpoint. All operation logs and checkpoint files of the master server are replicated across multiple machines.

         ii. "Shadow" master servers, mirrors of non-masters, they are slower to update than the master, usually less than 1 second

        iii. Persistent namespace and file-to-chunk mapping metadata

        2) Chunserver fault tolerance mechanism

          i. Store multiple chunk copies

         ii. Checksum the stored data. Each chunk of gfs is divided into blocks of a certain size, and each block has a corresponding checksum. When reading a chunk replica, the chunkserver compares the read data with the checksum. If the ratios match, a client error is returned and the master server is reported. After the client receives the error, it will request other replicas to read the data. At the same time, the master server will also clone data from other replicas for recovery. When a new replica is ready, the master server informs the wrong chunkserver to delete the wrong replica

    3.5. Additional process

       The append process of the gfs system is characterized by pipelining and separation of data flow and control flow. Pipelining reduces latency. Separating data flow and control flow can optimize data transmission (data flow can be transmitted through carefully arranged data transmission lines, each time to the nearest chunkserver).

       

   3.6. Expired and invalid copy detection

     When the Chunk server fails, the copy of the Chunk may expire due to missing some modification operations. The Master node saves the version number of each Chunk to distinguish the current copy from the expired copy.   

     Whenever the Master node signs a new lease with the Chunk, it increments the Chunk's version number and notifies the latest copy

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324624330&siteId=291194637