hadoop 2.7.2 yarn Chinese documentation - NodeManager restart

introduce
This document gives an overview of NodeManager (NM) restart, a feature that allows NodeManager to restart without losing active containers. At a high level, NM saves the necessary state information to the local state-store when processing container-management requests. When NM restarts, it first loads state information for each subsystem, and then lets those subsystems re-execute recovery with the loaded state.
Enable NM Restart
Step 1. Enable the NM Restart function and set the following properties in conf/yarn-site.xml to true.
Property Value
yarn.nodemanager.recovery.enabled true, (default is false)
Step 2. Configure the local filesystem directory where NodeManager can save its run state.
Property Description
yarn.nodemanager.recovery.dir When recovery is enabled, the local filesystem directory where the node manager can store state information. Default is $hadoop.tmp.dir/yarn-nm-recovery
Step 3. Configure a valid RPC address for NodeManager
Property Description
yarn.nodemanager.address Ephemeral ports cannot be used for NodeManager's RPC server because it will cause NM to use different ports before and after restart. This will disconnect the previously running client communicating with NM. Explicitly setting yarn.nodemanager.address to an address containing the port number is a prerequisite for enabling NM Restart.
Step 4. Auxiliary services.
  • In a YARN cluster, NodeManagers can be configured to run auxiliary services. Full NM Restart functionality relies on any auxiliary service being configured to support recovery. This usually includes: (1) avoiding the use of ephemeral ports so that previously running clients (in this case, usually the container) are not interrupted after a restart (2) ensuring the auxiliary service when the NodeManager restarts and reinitializes the auxiliary service The service itself supports the ability to recover by loading the previous state,
  • A simple example for the above auxiliary service is the 'ShuffleHandler' of MapReduce (MR). ShuffleHandler already satisfies the above two necessary conditions, so user/admin does not need to do anything to support NM Restart. (1) The configuration item mapreduce.shuffle.port controls the port bound to ShuffleHandler on NodeManager host, which defaults to a non-temporary one sex port. (2) The ShuffleHandler service also supports restoring the previous state after NM restarts.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326902852&siteId=291194637