[Hadoop source code] An article takes you familiar with the process of interpreting Namenode startup and loading FsImage

foreword

NameNode is the component responsible for metadata management in HDFS. It saves the metadata information of the entire file system and acts as a command and dispatch DataNode. NameNode not only saves the metadata information of the file system in memory, but also periodically persists the metadata of the file system (file directory tree, file/directory metadata) to the local fsImage file to prevent Namenode from power failure or abnormal crash of the process .
If Namenode synchronizes the metadata in memory to the fsimage file in real time, it will consume a lot of resources and cause Namenode to run slowly. Therefore, Namenode will first save the modification operation of metadata in the editlog file, and then periodically merge the fsimage and editlog files

insert image description here

NameNode.main() // 入口函数
       |——createNameNode(); // 通过new NameNode()进行实例化
         |——initialize(); // 方法进行初始化操作
           |——startHttpServer(); // 启动HttpServer
           |——loadNamesystem(); // 加载元数据
           |——createRpcServer(); // 创建并初始化rpc server实例
           |——startCommonServices();
             |——namesystem.startCommonServices(); // 启动一些磁盘检查、安全模式等一些后台服务及线程
               |——new NameNodeResourceChecker(); // 实例化一个NameNodeResourceChecker并准备出所有需要检查的磁盘路径
               |——checkAvailableResources(); // 开始磁盘空间检查
               |——NameNode.getStartupProgress(); // 获取StartupProgress实例用来获取NameNode各任务的启动信息
               |——setBlockTotal(); // 设置所有的block,用于后面判断是否进入安全模式
               |——blockManager.activate(); // 启动BlockManager里面的一堆关于block副本处理的后台线程
             |——rpcServer.start(); // 启动rpcServer
       |——join()

Start the Namenode component

startup script

bin/hdfs --daemon start namenode

The above script will call the main method of org.apache.hadoop.hdfs.server.namenode.NameNode

public static void main(String argv[]) throws Exception {
    
    
    // ...省略
    // 创建 namenode
    NameNode namenode = createNameNode(argv, null);
    // ...省略
}

In the process of creating a NameNode, the parameters are analyzed first, and it is determined whether it is format / rollback / bootstrapStandby and other operation types. Then execute the modification logic in sequence. This chapter is mainly about the startup process of NameNode.

Read down in turn, enter the constructor of the NameNode to callinitialize(getConf());

Namenode.initialize

This method mainly does the following:

  • Configure security-related information UserGroupInformation
  • Start JvmPauseMonitor check
  • Start HTTP server (9870)
  • Initialize FSNameSystem core components, load image files and edit logs to memory
  • Initialize the rpc server component
  • start public service
protected void initialize(Configuration conf) throws IOException {
    
    
    if (conf.get(HADOOP_USER_GROUP_METRICS_PERCENTILES_INTERVALS) == null) {
    
    
      String intervals = conf.get(DFS_METRICS_PERCENTILES_INTERVALS_KEY);
      if (intervals != null) {
    
    
        conf.set(HADOOP_USER_GROUP_METRICS_PERCENTILES_INTERVALS,
          intervals);
      }
    }

    UserGroupInformation.setConfiguration(conf);
    loginAsNameNodeUser(conf);

    NameNode.initMetrics(conf, this.getRole());
    StartupProgressMetrics.register(startupProgress);

    pauseMonitor = new JvmPauseMonitor();
    pauseMonitor.init(conf);
    pauseMonitor.start();
    metrics.getJvmMetrics().setPauseMonitor(pauseMonitor);

    if (conf.getBoolean(DFS_NAMENODE_GC_TIME_MONITOR_ENABLE,
        DFS_NAMENODE_GC_TIME_MONITOR_ENABLE_DEFAULT)) {
    
    
      long observationWindow = conf.getTimeDuration(
          DFS_NAMENODE_GC_TIME_MONITOR_OBSERVATION_WINDOW_MS,
          DFS_NAMENODE_GC_TIME_MONITOR_OBSERVATION_WINDOW_MS_DEFAULT,
          TimeUnit.MILLISECONDS);
      long sleepInterval = conf.getTimeDuration(
          DFS_NAMENODE_GC_TIME_MONITOR_SLEEP_INTERVAL_MS,
          DFS_NAMENODE_GC_TIME_MONITOR_SLEEP_INTERVAL_MS_DEFAULT,
          TimeUnit.MILLISECONDS);
      gcTimeMonitor = new Builder().observationWindowMs(observationWindow)
          .sleepIntervalMs(sleepInterval).build();
      gcTimeMonitor.start();
      metrics.getJvmMetrics().setGcTimeMonitor(gcTimeMonitor);
    }

    if (NamenodeRole.NAMENODE == role) {
    
    
      // 启动 HTTP 服务端 (9870)
      startHttpServer(conf);
    }

    // 初始化FSNameSystem 核心组件 ,加载镜像文件和编辑日志到内存
    loadNamesystem(conf);
    startAliasMapServerIfNecessary(conf);

    //初始化rpc server 组件
    rpcServer = createRpcServer(conf);

    initReconfigurableBackoffKey();

    if (clientNamenodeAddress == null) {
    
    
      // This is expected for MiniDFSCluster. Set it now using 
      // the RPC server's bind address.
      clientNamenodeAddress = 
          NetUtils.getHostPortString(getNameNodeAddress());
      LOG.info("Clients are to use " + clientNamenodeAddress + " to access"
          + " this namenode/service.");
    }
    // 如果是NameNode 设置NameNodeAddress 以及  FsImage
    if (NamenodeRole.NAMENODE == role) {
    
    
      httpServer.setNameNodeAddress(getNameNodeAddress());
      httpServer.setFSImage(getFSImage());
      if (levelDBAliasMapServer != null) {
    
    
        httpServer.setAliasMap(levelDBAliasMapServer.getAliasMap());
      }
    }

    // 一些公共服务的初始化
    startCommonServices(conf);
    startMetricsLogger(conf);
  }

FSNamesystem.loadFromDisk

Then call the FSNamesystem static method loadFromDisk to load metadata
insert image description here

static FSNamesystem loadFromDisk(Configuration conf) throws IOException {
    
    

    checkConfiguration(conf);
    // 构建FSImage,从磁盘加载
    // FSImage 就是一个时间点的元数据快照信息,其实也就是元数据信息
    // FSNameSystem.getNamespaceDirs(conf) 获取元数据的目录
    // file://${hadoop.tmp.dir}/dfs/name  ${hadoop.tmp.dir}:/tmp/hadoop-${user.name}
    // 可以自己观察下启动的namenode进程的这个目录是否和这个匹配
    FSImage fsImage = new FSImage(conf,
        FSNamesystem.getNamespaceDirs(conf),
            //FSNamesystem.getNamespaceEditsDirs(conf)) 获取edits log 的目录
            // 默认情况下 edis log 和namespace 是在同一个目录下,可以进去看下配置信息
        FSNamesystem.getNamespaceEditsDirs(conf));
    // 实例化 FSnamesystem 对象,将fsImage对象放入到了FSNamesystem中
    FSNamesystem namesystem = new FSNamesystem(conf, fsImage, false);
    StartupOption startOpt = NameNode.getStartupOption(conf);
    if (startOpt == StartupOption.RECOVER) {
    
    
      namesystem.setSafeMode(SafeModeAction.SAFEMODE_ENTER);
    }

    long loadStart = monotonicNow();
    try {
    
    
      // 这里就是说通过FSNamesystem 将 fsImage 以及 edits log 加载到内存中
      // 然后在内存中合并两个文件,形成新的 fsImage 信息
      //(注:默认情况下 每隔一段时间 就会有checkpoint 将旧的 fsImage 与 edits log
      // 就行合并形成新的 fsImage 文件,启动的时候肯定也需要合并才能形成新的 fsImage 文件 对吧)
      // 最后再内存中持有一份完整的元数据信息
      namesystem.loadFSImage(startOpt);
    } catch (IOException ioe) {
    
    
      LOG.warn("Encountered exception loading fsimage", ioe);
      fsImage.close();
      throw ioe;
    }
    long timeTakenToLoadFSImage = monotonicNow() - loadStart;
    LOG.info("Finished loading FSImage in " + timeTakenToLoadFSImage + " msecs");
    NameNodeMetrics nnMetrics = NameNode.getNameNodeMetrics();
    if (nnMetrics != null) {
    
    
      nnMetrics.setFsImageLoadTime((int) timeTakenToLoadFSImage);
    }
    namesystem.getFSDirectory().createReservedStatuses(namesystem.getCTime());
    return namesystem;
  }

The memory structure
insert image description here
FSNameSystem adopts the facade mode here, and it is not responsible for the specific loading logic, and it is handed over to FsImage to work.
keep watchingFsImage.recoverTransitionRead(StartupOption startOpt, FSNamesystem target,MetaRecoveryContext recovery)

FsImage.recoverTransitionRead

Mainly did four things:

  • Check each data directory to determine whether the state is consistent
  • Format unformatted dirs. Format unformatted directories
  • Do transitions conversion operation
  • Really load fsImage and edits log files for merging
/**
   * Analyze storage directories.
   * Recover from previous transitions if required. 
   * Perform fs state transition if necessary depending on the namespace info.
   * Read storage info.
   * 分析存储的目录 就是存储 fsImage 以及 edits log 的目录
   * 从以前的状态恢复
   * 根据元信息 判断是否执行fs状态的转换
   * 读取存储的信息
   * 大概意思就是如果以前有 fsImage 和 edits log 就从文件信息中加载出来 并进行恢复
   * 
   * @throws IOException
   * @return true if the image needs to be saved or false otherwise
   */
  boolean recoverTransitionRead(StartupOption startOpt, FSNamesystem target,
      MetaRecoveryContext recovery)
      throws IOException {
    
    
    assert startOpt != StartupOption.FORMAT : 
      "NameNode formatting should be performed before reading the image";

    // 获取fsImage 文件资源地址 其实也就是目录
    Collection<URI> imageDirs = storage.getImageDirectories();

    // 获取edits log 目录
    Collection<URI> editsDirs = editLog.getEditURIs();

    // none of the data dirs exist
    if((imageDirs.size() == 0 || editsDirs.size() == 0) 
                             && startOpt != StartupOption.IMPORT)  
      throw new IOException(
          "All specified directories are not accessible or do not exist.");
    
    // 1. For each data directory calculate its state and 
    // check whether all is consistent before transitioning.
    // 检查每个数据目录,判断是否状态一致性
    // 进行数据恢复 里面就是对一些之前停机的时候 更新 回滚 新增数据的恢复操作
    Map<StorageDirectory, StorageState> dataDirStates = 
             new HashMap<StorageDirectory, StorageState>();
    boolean isFormatted = recoverStorageDirs(startOpt, storage, dataDirStates);

    if (LOG.isTraceEnabled()) {
    
    
      LOG.trace("Data dir states:\n  " +
        Joiner.on("\n  ").withKeyValueSeparator(": ")
        .join(dataDirStates));
    }
    
    if (!isFormatted && startOpt != StartupOption.ROLLBACK 
                     && startOpt != StartupOption.IMPORT) {
    
    
      throw new IOException("NameNode is not formatted.");      
    }


    int layoutVersion = storage.getLayoutVersion();
    if (startOpt == StartupOption.METADATAVERSION) {
    
    
      System.out.println("HDFS Image Version: " + layoutVersion);
      System.out.println("Software format version: " +
        HdfsServerConstants.NAMENODE_LAYOUT_VERSION);
      return false;
    }

    if (layoutVersion < Storage.LAST_PRE_UPGRADE_LAYOUT_VERSION) {
    
    
      NNStorage.checkVersionUpgradable(storage.getLayoutVersion());
    }
    if (startOpt != StartupOption.UPGRADE
        && startOpt != StartupOption.UPGRADEONLY
        && !RollingUpgradeStartupOption.STARTED.matches(startOpt)
        && layoutVersion < Storage.LAST_PRE_UPGRADE_LAYOUT_VERSION
        && layoutVersion != HdfsServerConstants.NAMENODE_LAYOUT_VERSION) {
    
    
      throw new IOException(
          "\nFile system image contains an old layout version " 
          + storage.getLayoutVersion() + ".\nAn upgrade to version "
          + HdfsServerConstants.NAMENODE_LAYOUT_VERSION + " is required.\n"
          + "Please restart NameNode with the \""
          + RollingUpgradeStartupOption.STARTED.getOptionString()
          + "\" option if a rolling upgrade is already started;"
          + " or restart NameNode with the \""
          + StartupOption.UPGRADE.getName() + "\" option to start"
          + " a new upgrade.");
    }

    // 执行一些启动选项以及一些二更操作
    storage.processStartupOptionsForUpgrade(startOpt, layoutVersion);

    // 2. Format unformatted dirs.
    for (Iterator<StorageDirectory> it = storage.dirIterator(); it.hasNext();) {
    
    
      StorageDirectory sd = it.next();
      StorageState curState = dataDirStates.get(sd);
      switch(curState) {
    
    
      case NON_EXISTENT:
        throw new IOException(StorageState.NON_EXISTENT + 
                              " state cannot be here");
      case NOT_FORMATTED:
        // Create a dir structure, but not the VERSION file. The presence of
        // VERSION is checked in the inspector's needToSave() method and
        // saveNamespace is triggered if it is absent. This will bring
        // the storage state uptodate along with a new VERSION file.
        // If HA is enabled, NNs start up as standby so saveNamespace is not
        // triggered.
        LOG.info("Storage directory " + sd.getRoot() + " is not formatted.");
        LOG.info("Formatting ...");
        sd.clearDirectory(); // create empty current dir
        // For non-HA, no further action is needed here, as saveNamespace will
        // take care of the rest.
        if (!target.isHaEnabled()) {
    
    
          continue;
        }
        // If HA is enabled, save the dirs to create a version file later when
        // a checkpoint image is saved.
        if (newDirs == null) {
    
    
          newDirs = new HashSet<StorageDirectory>();
        }
        newDirs.add(sd);
        break;
      default:
        break;
      }
    }

    // 3. Do transitions
    switch(startOpt) {
    
    
    case UPGRADE:
    case UPGRADEONLY:
      doUpgrade(target);
      return false; // upgrade saved image already
    case IMPORT:
      doImportCheckpoint(target);
      return false; // import checkpoint saved image already
    case ROLLBACK:
      throw new AssertionError("Rollback is now a standalone command, " +
          "NameNode should not be starting with this option.");
    case REGULAR:
    default:
      // just load the image
    }

    // 真正的加载 fsImage 和 edits log 文件进行合并
    return loadFSImage(target, startOpt, recovery);
  }

Because FsImage load has many overloads, it goes directly to the key steps.
In this method, determine whether md5 encryption is supported

/**
   *
   * 这里面就不仔细去详细的跟到文件加载了,
   * loadFSImage()方法就是最终加载文件的方法
   * @param target
   * @param recovery
   * @param imageFile
   * @param startupOption
   * @throws IOException
   */
  void loadFSImageFile(FSNamesystem target, MetaRecoveryContext recovery,
      FSImageFile imageFile, StartupOption startupOption) throws IOException {
    
    
    LOG.info("Planning to load image: " + imageFile);
    StorageDirectory sdForProperties = imageFile.sd;
    storage.readProperties(sdForProperties, startupOption);

    if (NameNodeLayoutVersion.supports(
        LayoutVersion.Feature.TXID_BASED_LAYOUT, getLayoutVersion())) {
    
    
      // For txid-based layout, we should have a .md5 file
      // next to the image file
      boolean isRollingRollback = RollingUpgradeStartupOption.ROLLBACK
          .matches(startupOption);
      loadFSImage(imageFile.getFile(), target, recovery, isRollingRollback);
    } else if (NameNodeLayoutVersion.supports(
        LayoutVersion.Feature.FSIMAGE_CHECKSUM, getLayoutVersion())) {
    
    
      // In 0.22, we have the checksum stored in the VERSION file.
      String md5 = storage.getDeprecatedProperty(
          NNStorage.DEPRECATED_MESSAGE_DIGEST_PROPERTY);
      if (md5 == null) {
    
    
        throw new InconsistentFSStateException(sdForProperties.getRoot(),
            "Message digest property " +
            NNStorage.DEPRECATED_MESSAGE_DIGEST_PROPERTY +
            " not set for storage directory " + sdForProperties.getRoot());
      }
      loadFSImage(imageFile.getFile(), new MD5Hash(md5), target, recovery,
          false);
    } else {
    
    
      // We don't have any record of the md5sum
      loadFSImage(imageFile.getFile(), null, target, recovery, false);
    }
  }

Call LoaderDelegator of FSImageFormat to load fsImage file

FSImageFormat.load

private void loadFSImage(File curFile, MD5Hash expectedMd5,
      FSNamesystem target, MetaRecoveryContext recovery,
      boolean requireSameLayoutVersion) throws IOException {
    
    
    // BlockPoolId is required when the FsImageLoader loads the rolling upgrade
    // information. Make sure the ID is properly set.
    target.setBlockPoolId(this.getBlockPoolID());

    // 一个持有FSNamesystem 以及 conf 对象的 loader,加载器
    FSImageFormat.LoaderDelegator loader = FSImageFormat.newLoader(conf, target);
    loader.load(curFile, requireSameLayoutVersion);

    // Check that the image digest we loaded matches up with what
    // we expected
    MD5Hash readImageMd5 = loader.getLoadedImageMd5();
    if (expectedMd5 != null &&
        !expectedMd5.equals(readImageMd5)) {
    
    
      throw new IOException("Image file " + curFile +
          " is corrupt with MD5 checksum of " + readImageMd5 +
          " but expecting " + expectedMd5);
    }

    long txId = loader.getLoadedImageTxId();
    LOG.info("Loaded image for txid " + txId + " from " + curFile);
    lastAppliedTxId = txId;
    storage.setMostRecentCheckpointInfo(txId, curFile.lastModified());
  }

Use the format of the fsimage file defined by protobuf, which includes 4 parts of information

■ MAGIC: The file header of fsimage is the binary form of the string "HDFSIMG1". The MAGIC header identifies that the current fsimage file is serialized in protobuf format. When the FSImage class reads the fsimage file, it will first determine whether the fsimage file contains the MAGIC header, and if so, use the protobuf format to deserialize the fsimage file.

■ SECTIONS: The fsimage file will save the same type of Namenode meta information in a section, for example, save the file system meta information in the NameSystemSection, save all the INode information in the file system directory tree in the INodeSection, and save the snapshot information in the SnapshotSectionMedium. The second part of the fsimage file is all the sections corresponding to various types of meta information of the Namenode, and each type of section contains attributes corresponding to the meta information of the Namenode.

■ FileSummary: FileSummary records the meta-information of the fsimage file and the information of all sections saved in the fsimage file. The ondiskVersion field in FileSummary records the version number of the fsimage file (the value of this field in version 3.2.1 is 1), the layoutVersion field records the current HDFS file system layout version number, the codec field records the compression code of the fsimage file
, sections The field records the meta information of each section field in the fsimage file, and each section recorded in the fsimage file has a corresponding section field in FileSummary. The section field of FileSummary records the name of the section in the corresponding fsimage, the length in the fsimage file, and the starting position of this section in the fsimage. When the FSImage class reads the fsimage file, it will first read the FileSummary part from the fsimage, and then use the meta information recorded in the FileSummary to guide the deserialization operation of the fsimage file.

■ FileSummaryLength: FileSummaryLength records the length of the FileSummary in the fsimage file. When the FSImage class reads the fsimage file, it will first read the FileSummaryLength to obtain the length of the FileSummary part, and then deserialize the FileSummary from the fsimage according to this length

public void load(File file, boolean requireSameLayoutVersion)
        throws IOException {
    
    
      Preconditions.checkState(impl == null, "Image already loaded!");

      InputStream is = null;
      try {
    
    
        is = Files.newInputStream(file.toPath());
        byte[] magic = new byte[FSImageUtil.MAGIC_HEADER.length];
        IOUtils.readFully(is, magic, 0, magic.length);
        // 判断头信息,是否是Protobuf格式
        if (Arrays.equals(magic, FSImageUtil.MAGIC_HEADER)) {
    
    
          FSImageFormatProtobuf.Loader loader = new FSImageFormatProtobuf.Loader(
              conf, fsn, requireSameLayoutVersion);
          impl = loader;
          loader.load(file);
        } else {
    
    
          Loader loader = new Loader(conf, fsn);
          impl = loader;
          loader.load(file);
        }
      } finally {
    
    
        IOUtils.cleanupWithLogger(LOG, is);
      }
    }

FSImageFormatProtobuf.load

The last and most important step is to deserialize the metadata through FSImageFormatProtobuf to generate a data structure in memory. After the serialization is
completed, the deserialized metadata information is stored in the FsNameSystem instance.

SectionName is an enumeration class that records a total of 12 types. [Arranged in the order of written fsimage]

serial number name Section type describe
1 NS_INFO NameSystemSection namespace information
2 ERASURE_CODING ErasureCodingSection EC Erasure Code
3 INODE INodeSection namespace information
4 INODE_DIR INodeDirectorySection namespace information
5 FILES_UNDERCONSTRUCTION FileUnderConstructionEntry namespace information
6 SNAPSHOT SnapshotSection snapshot information
7 SNAPSHOT_DIFF NameSystemSection Snapshot information comparison
8 INODE_REFERENCE InNodeReferenceSection inode reference information
9 SECRET_MANAGER SecretManagerSection Security Information
10 CACHE_MANAGER CacheManagerSection Cache information
11 STRING_TABLE StringTableSection namespace information
12 NS_INFO NameSystemSection permissions

Deserialize and load the content of the FsImage file

private void loadInternal(RandomAccessFile raFile, FileInputStream fin)
        throws IOException {
    
    
      if (!FSImageUtil.checkFileFormat(raFile)) {
    
    
        throw new IOException("Unrecognized file format");
      }
      FileSummary summary = FSImageUtil.loadSummary(raFile);
      if (requireSameLayoutVersion && summary.getLayoutVersion() !=
          HdfsServerConstants.NAMENODE_LAYOUT_VERSION) {
    
    
        throw new IOException("Image version " + summary.getLayoutVersion() +
            " is not equal to the software version " +
            HdfsServerConstants.NAMENODE_LAYOUT_VERSION);
      }

      FileChannel channel = fin.getChannel();
	  // inode加载器
      FSImageFormatPBINode.Loader inodeLoader = new FSImageFormatPBINode.Loader(
          fsn, this);
      FSImageFormatPBSnapshot.Loader snapshotLoader = new FSImageFormatPBSnapshot.Loader(
          fsn, this);

      ArrayList<FileSummary.Section> sections = Lists.newArrayList(summary
          .getSectionsList());
      Collections.sort(sections, new Comparator<FileSummary.Section>() {
    
    
        @Override
        public int compare(FileSummary.Section s1, FileSummary.Section s2) {
    
    
          SectionName n1 = SectionName.fromString(s1.getName());
          SectionName n2 = SectionName.fromString(s2.getName());
          if (n1 == null) {
    
    
            return n2 == null ? 0 : -1;
          } else if (n2 == null) {
    
    
            return -1;
          } else {
    
    
            return n1.ordinal() - n2.ordinal();
          }
        }
      });

      StartupProgress prog = NameNode.getStartupProgress();
      /**
       * beginStep() and the endStep() calls do not match the boundary of the
       * sections. This is because that the current implementation only allows
       * a particular step to be started for once.
       */
      Step currentStep = null;
      // 非常重要的参数,对于超大规模的集群,fsimage加载慢一直是通病,
      // 此参数可以对其进行优化,通过多现场去并行加载section
      boolean loadInParallel = enableParallelSaveAndLoad(conf);

      ExecutorService executorService = null;
      ArrayList<FileSummary.Section> subSections =
          getAndRemoveSubSections(sections);
      if (loadInParallel) {
    
    
        executorService = getParallelExecutorService();
      }

      for (FileSummary.Section s : sections) {
    
    
        channel.position(s.getOffset());
        InputStream in = new BufferedInputStream(new LimitInputStream(fin,
            s.getLength()));

        in = FSImageUtil.wrapInputStreamForCompression(conf,
            summary.getCodec(), in);

        String n = s.getName();
        SectionName sectionName = SectionName.fromString(n);
        if (sectionName == null) {
    
    
          throw new IOException("Unrecognized section " + n);
        }

        ArrayList<FileSummary.Section> stageSubSections;
        switch (sectionName) {
    
    
        case NS_INFO:
          loadNameSystemSection(in);
          break;
        case STRING_TABLE:
          loadStringTableSection(in);
          break;
        case INODE: {
    
    
          currentStep = new Step(StepType.INODES);
          prog.beginStep(Phase.LOADING_FSIMAGE, currentStep);
          stageSubSections = getSubSectionsOfName(
              subSections, SectionName.INODE_SUB);
          if (loadInParallel && (stageSubSections.size() > 0)) {
    
    
            inodeLoader.loadINodeSectionInParallel(executorService,
                stageSubSections, summary.getCodec(), prog, currentStep);
          } else {
    
    
            inodeLoader.loadINodeSection(in, prog, currentStep);
          }
        }
          break;
        case INODE_REFERENCE:
          snapshotLoader.loadINodeReferenceSection(in);
          break;
        case INODE_DIR:
          stageSubSections = getSubSectionsOfName(
              subSections, SectionName.INODE_DIR_SUB);
          if (loadInParallel && stageSubSections.size() > 0) {
    
    
            inodeLoader.loadINodeDirectorySectionInParallel(executorService,
                stageSubSections, summary.getCodec());
          } else {
    
    
            inodeLoader.loadINodeDirectorySection(in);
          }
          inodeLoader.waitBlocksMapAndNameCacheUpdateFinished();
          break;
        case FILES_UNDERCONSTRUCTION:
          inodeLoader.loadFilesUnderConstructionSection(in);
          break;
        case SNAPSHOT:
          snapshotLoader.loadSnapshotSection(in);
          break;
        case SNAPSHOT_DIFF:
          snapshotLoader.loadSnapshotDiffSection(in);
          break;
        case SECRET_MANAGER: {
    
    
          prog.endStep(Phase.LOADING_FSIMAGE, currentStep);
          Step step = new Step(StepType.DELEGATION_TOKENS);
          prog.beginStep(Phase.LOADING_FSIMAGE, step);
          loadSecretManagerSection(in, prog, step);
          prog.endStep(Phase.LOADING_FSIMAGE, step);
        }
          break;
        case CACHE_MANAGER: {
    
    
          Step step = new Step(StepType.CACHE_POOLS);
          prog.beginStep(Phase.LOADING_FSIMAGE, step);
          loadCacheManagerSection(in, prog, step);
          prog.endStep(Phase.LOADING_FSIMAGE, step);
        }
          break;
        case ERASURE_CODING:
          Step step = new Step(StepType.ERASURE_CODING_POLICIES);
          prog.beginStep(Phase.LOADING_FSIMAGE, step);
          loadErasureCodingSection(in);
          prog.endStep(Phase.LOADING_FSIMAGE, step);
          break;
        default:
          LOG.warn("Unrecognized section {}", n);
          break;
        }
      }
      if (executorService != null) {
    
    
        executorService.shutdown();
      }
    }

The following is the FsImage data structure in memory

FsImage memory data structure

FileSummary.json display

ondiskVersion: 1
layoutVersion: 4294967230
sections {
    
    
  name: "NS_INFO"
  length: 36
  offset: 8
}
sections {
    
    
  name: "ERASURE_CODING"
  length: 31
  offset: 44
}
sections {
    
    
  name: "INODE"
  length: 26174
  offset: 75
}
sections {
    
    
  name: "INODE_DIR"
  length: 1880
  offset: 26249
}
sections {
    
    
  name: "FILES_UNDERCONSTRUCTION"
  length: 0
  offset: 28129
}
sections {
    
    
  name: "SNAPSHOT"
  length: 5
  offset: 28129
}
sections {
    
    
  name: "INODE_REFERENCE"
  length: 0
  offset: 28134
}
sections {
    
    
  name: "SECRET_MANAGER"
  length: 9
  offset: 28134
}
sections {
    
    
  name: "CACHE_MANAGER"
  length: 7
  offset: 28143
}
sections {
    
    
  name: "STRING_TABLE"
  length: 102
  offset: 28150
}

The data structure of FileSummary,
insert image description here
the memory data structure of NameSystemSection,
insert image description here
StringTable, records the permissions of hdfs metadata, and the very
insert image description here
insert image description here
insert image description here
important content in HDFS metadata.
insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/u013412066/article/details/129374286