hadoop learning record—2.7.4documentation—hdfs

  1. web interface
    1. http://namenode-name:50070/ lists the datanodes in the cluster and basic statistics.
  2. shell commands
    1. bin/hdfs dfs
    2. bin/hdfs dfsadmin
  3. secondary namenode – always ready
    1. The namenode appends and stores the modification information of the file system as a log to the edits file of the file system. When the namenode is started, the namenode reads the hdfs status from the image, fsimage files, and uses the edits from the edits log file. Then write the new hdfs state to fsimage and start normal operation with an empty edits file. Since the namenode only integrates the fsimage and edits files during startup, the edits log file accumulated over time in a busy cluster can become very large. Another side effect of a large edits file is that it can take a long time for the namenode to start up next time.
    2. The secondary namenode periodically consolidates the fsimage and edits log files to keep the edits file within a range. Since the secondary namenode and the primary namenode have the same order of memory requirements, the secondary namenode generally runs on a different machine than the primary namenode.
    3. The checkpoint process of the secondary namenode is initially controlled by two configuration parameters
      1. dfs.namenode.checkpoint.period, which is set to 1 hour by default, defines the maximum delay between two checkpoints.
      2. dfs.namenode.checkpoint.txns, which is set to 1 million by default, defines the number of uncheckpoint services of the namenode, that is, even if the checkpoint period does not expire, the namenode will force an emergency checkpoint.
  4. checkpoint node—readable
    1. The namenode persists the namespace using two files: fsimage (the most recent checkpoint of the namespace) and edits (the changelog of the namespace after the checkpoint). When the namenode starts, it integrates the fsimage and edits logs to provide an up-to-date view of the filesystem metadata. The namenode then overwrites the fsimage with the new hdfs state and starts a new edits log.
    2. The checkpoint node periodically establishes namespace checkpoints. She downloads the fsimage and edits from the active namenode, integrates them locally, and sends back the new image to the active namenode. Because the checkpoint node is usually run on a different machine from the NameNode. The checkpoint node is started by executing bin/hdfs namnode -checkpoint on the node specified on the configuration file.
    3. The location of the checkpoint (or backup) node and its corresponding web interface are configured through the dfs.namenode.backup.address and dfs.namenode.backup.http-address configuration variables.
    4. The checkpoint node stores the latest checkpoints in a folder with the same directory structure as the NameNode. This makes the checkpointed image always readable by the NameNode if needed. See import checkpoint.
    5. Multiple checkpoint nodes can be specified in the cluster configuration file.
  5. backup node
    1. The backup node provides the same checkpoint functionality as the checkpoint node, but also maintains an up-to-date copy of the filesystem namespace in memory to synchronize the state of the active NameNode. Accepting a log stream of filesystem edits from the NameNode and persisting to disk, the backup node also applies these edits to its own in-memory copy of the namespace, thus creating a backup of the namespace.
    2. The backup node does not need to download the fsimage and edits from the active NameNode to establish a checkpoint, that is what the checkpoint node and the secondary node need to do, since it already has an up-to-date tablespace state in memory. The backup node's checkpoint process is more efficient because it only needs to save the tablespace to the local fsimage file and reset the edits.
    3. Because the backup node maintains an in-memory copy of the tablespace, its RAM requirements are the same as the node's.
    4. The namenode supports one backupnode at a time. If a backup node is in use, the checkpoint node cannot be registered. Using multiple backup nodes at the same time will be supported in the future.
    5. Backup node and checkpoint node have the same configuration. Start via bin/hdfs namenode -backup.
    6. The location of the backup or checkpoint node and the corresponding web interface are configured through the dfs.namenode.backup.address and dfs.namenode.backup.http-address configuration variables.
    7. The use of a backup node provides the option to run the namenode without persistent storage, taking responsibility for persisting the tablespace state to the backup node. In order to do this, start the namenode with the -importCheckpoint option and specify the persistent storage directory dfs.namenode.edits.dir without edits type in the namnode configuration.
    8. For a full discussion of the motivation behind setting up backup and checkpoint nodes, see HADOOP-4539 .
  6. import checkpoint

    1. If all other copies of images and edits are lost, the namenode can bring in the latest checkpoint. In order to do so requires:
      1. Create an empty directory specified in the dfs.namenode.name.dir configuration variable
      2. Specify the directory location of the checkpoint in the dfs.namenode.checkpoint.dir configuration variable
      3. Start the namenode with the -importCheckpoint option
    2. The namenode will upload the checkpoint from the dfs.namenode.checkpoint.dir directory, and then save it to the namenode directory (dfs.namenode.name.dir setting). If a valid image exists in dfs.namenode.name.dir, the namenode will fail to start. namenode verifies that the image in dfs.namenode.name.dir is consistently contiguous, but never changes it.
  7. balancer

  8. rack awareness
  9. safemode
  10. fsck
  11. fetchdt
  12. recovery mode
  13. upgrade and rollback
  14. DataNode hot swap dirve
  15. file permissions and security
  16. scalability
  17. related Documentation

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325575699&siteId=291194637