The update process of Fsimage in HDFS

Before talking about the update operation of Fsimage, first understand why Fsimage is updated?

HDFS runs in a master-slave mode, and the master is the focus of our talk-NameNode. NameNode mainly manages the namespace of the file system, which maintains the file system tree and all files and directories in the entire tree. The information is permanently stored on the local disk in the form of two files—the command space image file Fsimage and the edit log file edits. These two files are the core files of the NameNode node. When the NameNode starts, it will first read the Fsimage file and load the directory tree information into memory. Edits stores log information. After the NameNode starts, all operations such as adding, deleting, and modifying the directory structure will be recorded in the edits file, and will not be synchronized to fsimage immediately.

When the NameNode node is shut down, the fsimage and edits files will not be merged, and the merge process is actually during the startup process of the NameNode. When the NameNode starts, it will first load the Fsimage file, then apply the edits file, and finally update the latest directory tree information to the new Fsimage, and then use the new edits file.

But there will be a problem when merging. If the edits file is extremely large, the merging process will be very long, and the startup time will also be longer, and it is uncontrollable. It may take several hours to start. So you need to use SecondaryNameNode at this time. It will be awakened according to certain rules, and then merge the Fsimage file and the edits file, so as to prevent the edits file from being too large. However, the SecondaryNameNode is only a backup of the NameNode to support the HA mode, and it is only used to periodically merge edits and Fsimage to reduce the load on the NameNode.

Update of Fsimage

The update of Fsimage requires two trigger points, both of which are configured in the configuration file core-site.xml. They are respectively fs.checkpoint.period [indicates how long to record a HDFS image, the default is one hour] and fs.checkpoint.size [indicates the size of a record, eidts will automatically trigger the merge after reaching the set size, the default 64MB].

 

  • SecondaryNameNode will periodically check the size of eidts, and when the edits reach the set size or when it is time to merge, they will be merged through the RollEditLog() method.
  • First, it will stop writing to the current edits, and generate a temporary edits.new file, and then write new operation records into edits.new.
  • SecondaryNameNode obtains new edits and fsimage files from NameNode through the HttpGet method.
  • SecondaryNameNode merges edits and fsimage files and generates fsimage.ckpt files.
  • Then SecondaryNameNode sends Fsimage.ckpt to NameNode in HttpPost mode
  • NameNode gets fsimage.ckpt and renames it to fsimage, that is, updates fsimage. After fsimage is updated, rename the eidts.new file to edits file.

 

 

Supongo que te gusta

Origin blog.csdn.net/qq_35363507/article/details/112917830
Recomendado
Clasificación