NAMENODE metadata management in HDFS (super detailed)

What is metadata

  • In HDFS, metadata is mainly file-related metadata, which is managed and maintained by the namenode. From a broad perspective, because the namenode also needs to manage many DataNode nodes, the location and health status information of the DataNode also belongs to metadata.

Metadata management overview

In HDFS, there are two types of file-related metadata:

  • File own attribute information
    file name, permissions, modification time, file size, replication factor, data block size
  • File block location mapping information
    records the mapping information between file blocks and DataNode, that is, which block is located on which node. According to the
    Insert image description here
    storage form, there are two types: memory metadata and metadata files, which are stored in memory and disk respectively.

Memory metadata

  • In order to ensure efficient user interaction and low latency when operating metadata, the namenode stores all metadata in memory, which we call memory metadata. The metadata in memory is the most complete, including file attributes and file block location mapping information.
  • But the fatal problem of memory is that breakpoint data is lost and the data will not be persisted. Therefore, the namenode assists with metadata files to ensure the security and integrity of the transportation bureau.

metadata file

There are two types of metadata files: fsimage memory image file and Edits log.

fsimage memory image file

  • It is a persistent checkpoint of memory metadata. However, fsimage only contains metadata information related to the attributes of the file itself in the hadoop file, but does not contain information about the file block location. File block location information is only stored in memory. It is obtained by reporting data blocks to DataNode when DataNode starts to join the cluster, and subsequent data block reporting is performed at specified intervals.
  • The persistence action is the IO process of data from memory to disk. It will have a certain impact on the normal service of the namenode and cannot be persisted frequently.

edits logedit log

In order to avoid the problem of data loss between two persistences, the edits log editing log file is designed. What is recorded in the file is the log of all change operations (file creation, deletion or modification) of HDFS. Change operations performed by the file system client will first be recorded in the edits file.

Insert image description here

Namenode loads metadata file order

  • Both the fsimage and edits files are serialized. When the namenode starts, it will load the contents of the fsimage file into the memory, and then perform various operations in the edits file. The metadata in the memory and the actual Synchronization, metadata stored in memory supports client read operations and is also the most complete metadata.
  • When the client adds or modifies a file in HDFS, the operation record is first included in the edits log file. When the client operation is successful, the corresponding metadata will be updated to the memory metadata. Because fsimage files are generally very large (GB level is very common), if all update operations are added to the fsimage file, this will cause the system to run very slowly.
  • The design and implementation of HDFS focuses on: first, fast data update and query in memory, which greatly shortens the operation response time; second, the risk of metadata loss in memory is high (power outage T_T), so it assists the metadata image file (fsimage) + Backup mechanism for edit log files (edits) to ensure metadata security
  • The namenode maintains the entire file system metadata. Therefore, the accurate management of metadata affects HDFS’s ability to provide file storage services.

Catalog files related to metadata management

  • The namenode metadata storage directory is specified by the parameter: dfs.namenode.name.dir

  • After the formatting is completed, the following files will be created in the $hdfs.namenode.name.dir/current directory:
    Insert image description here

  • dfs.namenode.name.dir is configured in the hdfs-site.xml file. The default value is as follows
    Insert image description here

Metadata related files

VERSION

  • namespaceID/clusterID/blockpollID
    are the unique identifiers of the HDFS cluster. The identifier is used to prevent DataNodes from accidentally registering with a namenode in another cluster. These gems are particularly important in federation deployments. In federated mode, there will be multiple namenodes working independently. Each namenode provides a unique namespace ID and manages a unique set of file block pools (blockpoolID). The clusterID binds the entire cluster together as a single logical unit and is the same on all nodes in the cluster.
  • storageType
    indicates what process data structure information this file stores. If it is a DataNode node, storageType=DATA_NODE
  • cTime
    namenode storage system creation time, this attribute is 0 when the file system is first formatted, when asked about the timestamp after the file system upgrade
  • layoutVersion
    The version of the HDFS metadata format. HDFS will be updated when it is upgraded
    Insert image description here

Insert image description here

seen_txid

  • Contains the last transaction ID from the last checkpoint, which is not the last transaction ID accepted by the namenode.
  • The content of seen_txid will not be updated in every transactional operation, but will only be updated during checkpoint.
  • When the namenode starts, it checks the seen_txid file to verify that it can load at least that number of transactions. If the load transaction cannot be verified, the namenode will terminate the startup
    Insert image description here

Metadata file viewing (OIV, OEV)

  • The fsimage file is a permanent checkpoint of the hadoop file system metadata. It contains the serialized information of all directories and file idnodes in the hadoop file system; for files, the information included includes modification time, access time, and block size. and compose a file block information, etc.; for directories, it mainly contains information such as modification time, access control permissions, etc.
  • oiv is the abbreviation of offline image viewer, which can dump the contents of hdfs fsimage files into a human-readable format
  • Commonly used commands: hdfs oiv -i fsiamge_00000000000050 -p XML -o fsimage.xml
    Insert image description here
  • The edits log file stores all updated operation logs of the hadoop file system.
  • All write operations performed by the file system client are first logged to the edits file
  • oev is the abbreviation of offline edits viewer (offline edits viewer). This tool does not require the Hadoop cluster to be running.
  • Command: hdfs oev -i edits_0000000000000000090-00000000000000000000089 -o edits.xml
  • In the output file, each RECORD records an operation. The example is as follows:
    Insert image description here

Introduction to SecondaryNameNode

  • SNN can reduce the size of the edits logs file and obtain a latest fsimage file, which will also reduce the pressure on the namenode.

checkpoint mechanism

1. The core of checkpoint is the process of merging fsimage and edits log to generate a new fsimage. Then NN will generate a new edit log file: edits new to facilitate recording of subsequent operations. 2. SNN will merge the old edits log file with the previous one
. Copy the fsimage to its own local (using HTTP GET method)
3. SNN first loads the fsimage into the memory, and then executes the operations in the edits file one by one, so that the fsimage in the memory is constantly updated. This process is the merge of the edits and fsimage files. . After the merge is completed, SNN dumps the data in the memory to generate a new fsimage file
4. SNN copies the new Fimage new file to the NN node. This is just a cycle, waiting for the next checkpoint to trigger the secondarynamenode to work, and the cycle continues like this.
Insert image description here

SNN Checkpoint–triggering mechanism

  • core-site.xml
    dfs.namenode.checkpoint.period=3600 //The time interval between two consecutive checkpoints. The default is one hour
    dfs.namenode.checkpoint.txns=1000000 //The maximum number of checkpoint transactions that have not been executed will force an emergency checkpoint to be executed when the checkpoint cycle is reached. Default number of transactions is 1 million

Metadata file recovery

namenode stores multiple directories

  • The namenode metadata storage directory is determined by the parameter: dfs.namenode.name.dir
  • The dfs.namenode.name.dir attribute can configure multiple directories. The file structure and content stored in each directory are exactly the same, which is equivalent to a backup. The advantage of this is that when one of the directories is broken, it will not affect the Hadoop metadata. Data, especially when one of the directories is on NFS (network file system), even if your machine is damaged, the metadata will be saved

Recover from SNN

  • When SNN receives checkpoint, it will download fsimage and edits log to the local storage directory on its own machine. And it will not be deleted after checkpoint
  • If something goes wrong with the fsimage in NN, you can still replace the fsimage in NN with the fsimage in SNN. Although it is no longer the latest fsimage, we can minimize the loss.
    Insert image description here

Guess you like

Origin blog.csdn.net/weixin_49750432/article/details/132113999