Hadoop4-HDFS distributed file system principle

I. Introduction

  1, Distributed File System Steel

    Distributed File System consists of multiple computer nodes in the cluster, the nodes are divided into two categories:

    The master node (MasterNode) or name of the node (the NameNode)

    From node (Slave Node) or a data node (DataNodes)

  2, HDFS can bring what benefits

    Compatible with inexpensive hardware devices

    Read and write data stream

    Large data sets

    Simple file model

    Powerful cross-platform compatibility

  3. Limitations

    Not suitable for low-latency data access

    Not efficiently store large amounts of small files

    It does not support multiple users to modify any file write section

Second, the concept

  1, a block

    A default HDFS block 64MB, a file is divided into a plurality of blocks, a block of memory units, the new version 2.x is 128MB

    The block size is much larger than an ordinary file system, the overhead can be minimized to find

    HDFS block the use of abstract concepts can bring about several significant benefits:

      Support mass storage file: file is stored in units of blocks, a large-scale file can be split into a number of blocks of each file, the file blocks are distributed to various different nodes, therefore, will not be the size of a file storage capacity of a single node limited to, much larger than the storage capacity of any node in the network

      Simplify system design: simplifies storage management, because the fixed file block size, which can be easily calculated by a few memory block number, followed by a convenient management of metadata, metadata files do not need to be stored together and block may be made of other systems load management metadata

      Suitable data backup: on each document the redundant memory block are a plurality of nodes, system fault tolerance and increase the availability

   2, node name

    Name node is responsible for management of distributed file systems command space (Namespace), save two core data structures, namely Fslmage and EditLog

      Fslmage for maintaining metadata file system tree and file tree of all files and folders

      The operation log file records the EditLog created for all files, delete, rename and other operations

      Node name records the location information of each file in the data nodes where the two plates

        

  3, Fslmage file

    Fslmage file contains all the serialized form of directories and files inode file system, each inode internal metadata representation of a file or directory, and contains such information: copy level file access and modification times, access, block block size and the composition of the file. For a directory, the storage modification time, permissions and metadata quota

    Fslmage file is not recorded in the data node which stores the block. But by the name of the node to these mappings remain in memory, when the data node joins HDFS cluster node will own the data contained in the block list to inform the name of the node, then will inform the implementation of such operations periodically to ensure that the name of the node when the latest block map

  4 start the process, the name of the node

    1) Start node name --- "will FsImage file ---" is loaded into memory --- "Performing EditLog ---" sync --- "memory readable metadata client

    2) memory and metadata mapping completed --- "to create a new file Fslmage + empty file Editlog

    Description: The name of the node up, HDFS in the update operation will be re-written Editlog file because Fslmage generally large file (GB level is very common), if all of the update operations are added to Fslmage file, this will cause the system is running very slow, however, if you write to the file EditLog which would not be so, because EditLog much smaller, each execution after the write operation, and before sending a success code to the client, edits files need to be synchronized update

  5, EditLog during operation node name gets bigger problem

    All operations of HDFS during operation --- "wrote EditLog ---" time (over time) --- "EditLog documents will become very large

    HDFS has little effect on the operation, but all contents of the image once to restart Fslmage inside into memory, and then one by one the implementation of EditLog, when EditLog very large file, the name of the node will lead to a very slow start, and this period is safe mode, can not provide write, affect users

    Solve this problem we need SecondaryNameNode second node name

  6, the second name of the node SecondaryNameNode

    1) It is used to save the backup node name HDFS metadata information, and subtracting the name of the node restart time, typically run on a separate machine

    2) SecondaryNameNode work

      1, SecondaryNameNode regularly in NameNode communications, requests to stop using EditLog file, temporarily writes the new file is written to give new edit.new, this moment is complete, the upper write the log function not feel the difference

      2, SecondaryNameNode obtain FsImage / EditLog file via HTTP GEt NameNode downloaded from the local to the appropriate directory

      3, SecondaryNameNode download Fslmage into memory and then perform the operation EditLog a section of that memory Fslmage to date, this is the EditLog and merge files FsImage

      4, SecondaryNameNode executing the merger will be sent by post to the way the new node FsImage file NameNode

      5, NameNode from SecondaryNameNode new FsImage replace received on the FsImage file while replacing EditLog file edit.new, then EditLog becomes small

 

  7, data node DataNode

    Data is distributed at each node, the data node is responsible for storing and reading data, can be stored and retrieved data according to the scheduling, or the name of the client node, and transmits the stored block to the name of the node list periodically

 

Third, architecture

   HDFS model adopted from the primary structure (M / S), the name of a namespace node as a central server, i.e., file system management access to the client file. Read data node is responsible for client / write request, the data is created in the name of the node scheduling scared, delete, copy operation

  1, namespaces

    HDFS namespace contains the directory, file, block

    HDFS using traditional hierarchical file system, therefore, can be used like ordinary file systems, create / delete directories and files, rename and transfer between directories, etc.

  2, the communication protocol

    HDFS communication protocol is built on TCP / IP protocol basis

    The client can be configured through a port tcp link to initiate a node name and use the name of the client node to interact with the agreement

    Data between nodes and node name node protocol to interact with data

    Client interaction with the data node via RPC (Remote Procedure Call) to achieve. In the design, the node name of initiating RPC, but the response from the client and the RPC request data node

  3, the client

    HDFS operating mode is the user's client, HDFS when deploying provides client

    When a client-side library HDFS, including HDFS file system interfaces that hides most of the complexity of the implementation of HDFS

    The client is not strictly speaking part of HDFS

    Client support for common operations to open, read, write, etc., and provides a command-line shell similar way to access data in HDFS

    HDFS provides a Java API, access to the file system as an application programming interface of the client

  4. Limitations

    HDFS only set up a node name, when the benefits of doing so simplify system design, but also with the limitations:

    1) namespace limits: kept in memory, so the node can accommodate the object name (file name of the node, block) the number of receive memory size limit

    2) performance bottlenecks: the overall throughput of the distributed file system, the limited throughput of a single node name

    3) isolate the problem: Because there is only one cluster node name, only one namespace, therefore, unable to isolate different applications

    4) Availability Cluster: Once the unique name node sends a failure, the entire cluster is unavailable

Fourth, the principle storage

  1, redundant data storage

    HDFS uses a multi-copy mode for data redundancy, multi-copy advantages as:

    1) improve data transfer rates

    2) easy to check data error

    3) ensure data reliability

  2, data access strategy

    1) Data storage

    The first copy: upload files on the data node, if filed outside the cluster, a randomly selected disk is not full, cpu less busy node

    Second copy of: placing a copy on a different node of the first rack

    Other copies of the same first node and the rack: third copy

    More copies: Random

  3, data read and write

  hdfs provides an API can determine a data node belongs rack ID, the client can also call the API to get your own rack ID

  When a client reads data, the name of the node data blocks to obtain different position in the list, the list contains the replica data nodes, find a database copy corresponding to the data frame corresponding to the same client ID and the rack ID, it is preferred the copy of the read data, it is not as random read

  4, fault tolerance - data and error recovery

    HDFS has a high fault tolerance, compatible with low-cost hardware, put it according to your opinions error seen as the norm, as well as mechanisms for data error detection and automatic recovery, fault tolerance major error by the name of the node, the node data errors, data errors.

    1) name of the node error

      Review: Fslmage, Editlog

      If the entire two files are corrupted, then the HDFS instance will fail

      hdfs provide SecondaryNameNode, will back up these two files, Fslmage when necessary, Editlog data recovery

    2) Error data node

      Heartbeat information report regularly to the name of the node's own state, if a problem, it will be marked as downtime, data on the node is marked unreadable, the name of the node will not send any to them I / O requests

      Such problems, some data is not available nodes may cause some redundancy factor is less than the number of copies of database, the name of the node will be detected regularly, once this happens redundant data replication will start generating a new copy of it

      The biggest difference between HDFS and other distributed file system that can adjust the position of redundant data

    3) data error

    A disk on the network error factors cause data errors, the client will be read after the data md5 / sha1 check the data to determine the correct read data

    Hidden files when creating a file, the client will extract information for each file, and writes this information to a path

    The client reading the file, first read the information was extracted, and then use this information to verify if an error occurs, the client will request another node to read, and to report the node name, the name node periodic testing and re-copy

 

Fifth, the data reading process

  1, read the file

    Process is simple: open file - "Get block information -" Read Request - "Read Data -" Get block information - "Read Data -" File Close

  2, the process of writing data

 

    Simple process: create a file - "Creating a file metadata -" Write Data - "-" write packet - "receiving confirmation packet -" close the file - "write operation is complete

 

Complete Reference learn: http://dblab.xmu.edu.cn/blog/290-2/ 

 

Please indicate the source: https://www.cnblogs.com/zhangxingeng/p/11819418.html

 

 

Guess you like

Origin www.cnblogs.com/zhangxingeng/p/11819418.html