ResourceManager HA configuration

The Hadoop cluster deployment and HDFS HA ​​configuration are completed one after another. After the ResourceManager HA is configured , the Hadoop cluster configuration is also complete, which can meet the needs of building a small and medium-sized production environment Hadoop cluster. If you really want to build a very large Hadoop cluster, these can only be regarded as references, and many other parameters need to be modified to make the performance better.

ResourceManager (RM) is responsible for tracking resource usage in the cluster and scheduling applications (such as MapReduce jobs). Before Hadoop 2.4, ResourceManager had a single point of failure, and HA needed to be implemented in other ways. The official HA solution is a redundancy method of Active/Standby ResourceManager, which is similar to the HDFS HA ​​solution, which is to eliminate single points of failure through redundancy.

HA architecture

The following figure is the architecture diagram of the ResourceManager HA solution:

RM HA

RM failover

ResourceManager HA is implemented through the Active/Standby redundancy architecture. At any point in time, one RM is in the Active state, and the other RMs are in the Standby state. The Standby state RM waits for Active to jump on the street or be withdrawn. Through the administrator command or automatic failover (need to turn on the automatic failover configuration), Standby will turn to Active state and provide external services.

  • Manual switchover and failover: When automatic failover is not enabled, the administrator needs to switch manually. First, change the RM in the Active state to the Standby state, and then change the one in the Standby state to the Active state. These operations need yarn rmadminto be operated by commands.
  • Automatic failover: RM can decide which RM should be in Active state through the embedded Active/Standby selector based on Zookeeper. When the Active performance drops or there is no response, an RM in the Standby state is recommended and transferred to the Active state to take over. There is no need to use a separate ZKFS daemon to assist in the switchover like HDFS HA ​​configuration, because this function is already embedded in RM.

When the client, ApplicationMaster, and NodeManager failover, they will train these RM nodes in turn, knowing to find the RM in the Active state. If the performance of the Active node drops, they will re-train to find a new RM in the Active state. The default rotation extension is org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider. You can implement your own logic through implementation org.apache.hadoop.yarn.client.RMFailoverProxyProviderand configuration yarn.client.failover-proxy-provider.

Restore RM state

When the ResourceManger restart state is restored , the RM in the new Active state will load the previous RM state and restore the previous operation as much as possible according to the state. The application will check regularly to avoid data loss. State storage needs to be visible to RMs in Active state and Standby state. Currently, RMStateStorethere are two persistence implementations, FileSystemRMStateStoreand ZKRMStateStore. ZKRMStateStoreImplicitly, only one RM write operation is allowed, and there is no separate protection mechanism to avoid the problem of breaking, so it is the recommended state storage method for HA clusters. When using ZKRMStateStoreit, it is recommended not to set the zookeeper.DigestAuthenticationProvider.superDigestconfiguration on the zookeeper cluster to ensure that the zk administrator cannot access the YARN information.

deploy

Configuration

Most failover functions can be adjusted using various configurations. The following table is the necessary and important parameter items. See yarn-default.xml for complete configuration and default values . For state storage, see ResourceManger state storage .

Configuration item description
yarn.resourcemanager.zk-address Zookeeper cluster address, used for state storage and internal leader election
yarn.resourcemanager.ha.enabled Turn on RM's HA
yarn.resourcemanager.ha.rm-ids RM logical id list, separated by commas, for example: rm1, rm2.
yarn.resourcemanager.hostname.[rm-id] For each rm-id, the hostname or ip address needs to be given.
yarn.resourcemanager.address.[rm-id] For each rm-id, specify the host:port address, this configuration will override yarn.resourcemanager.hostname.rm-id.
yarn.resourcemanager.scheduler.address.[rm-id] For each rm-id, specify the host:port address of the Scheduler that ApplicationMasters applies for resources. This configuration will override yarn.resourcemanager.hostname.rm-id.
yarn.resourcemanager.resource-tracker.address.[rm-id] For each rm-id, specify the host:port address of the NodeManager connection. This configuration will override yarn.resourcemanager.hostname.rm-id.
yarn.resourcemanager.admin.address.[rm-id] For each rm-id, specify the host:port address of the management command operation. This configuration will override yarn.resourcemanager.hostname.rm-id.
yarn.resourcemanager.webapp.address.[rm-id] For each rm-id, specify the host:port address of the RM web application. If yarn.http.policy is set to HTTPS_ONLY, there is no need to set this parameter. This parameter will override yarn.resourcemanager.hostname.rm-id.
yarn.resourcemanager.webapp.https.address.[rm-id] For each rm-id, specify the host:port address of the RM web application. If yarn.http.policy is set to HTTP_ONLY, there is no need to set this parameter. This parameter will override yarn.resourcemanager.hostname.rm-id.
yarn.resourcemanager.ha.id RM used to identify HA, optional configuration. If set, you need to make sure that all RMs have their own IDs.
yarn.resourcemanager.ha.automatic-failover.enabled Enable automatic failover; by default, it is only enabled when HA is enabled.
yarn.resourcemanager.ha.automatic-failover.embedded When automatic failover is enabled, use the embedded leader election to select Active RM. By default, it is only enabled when HA is enabled.
yarn.resourcemanager.cluster-id The cluster flag is used to ensure that RM will not become an Active node in another cluster.
yarn.client.failover-proxy-provider Used by the client, used for the client, ApplicationMaster, NodeManager to connect to the new Active RM.
yarn.client.failover-max-attempts The maximum number of attempts that FailoverProxyProvider should try.
yarn.client.failover-sleep-base-ms The sleep baseline (in milliseconds) used to calculate failover.
yarn.client.failover-sleep-max-ms The maximum time for failover to sleep (in milliseconds).
yarn.client.failover-retries The number of retry attempts to connect to RM.
yarn.client.failover-retries-on-socket-timeouts The number of allowed connection timeout attempts to connect to the RM.

Configuration example (configuration takes on Hadoop cluster deployment (yarn) , using s107 and s108 as RM dual nodes):

<!--Configurations for the state-store of ResourceManager-->
<property>
  <name>yarn.resourcemanager.recovery.enabled</name>
  <value>true</value>
</property>
<property>
  <name>yarn.resourcemanager.store.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
  <name>yarn.resourcemanager.zk-address</name>
  <value>10.6.3.109:2181,10.6.3.110:2181,10.6.3.111:2181</value>
  <description>ZooKeeper服务的地址,多个地址使用逗号隔开</description>
</property>
<!--Configurations for HA of ResourceManager-->
<property>
  <name>yarn.resourcemanager.ha.enabled</name>
  <value>true</value>
  <description>是否启用HA,默认false</description>
</property>
<property>
  <name>yarn.resourcemanager.ha.rm-ids</name>
  <value>rm1,rm2</value>
  <description>最少2个</description>
</property>
<property>
  <name>yarn.resourcemanager.hostname.rm1</name>
  <value>s107</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname.rm2</name>
  <value>s108</value>
</property>
<property>
  <name>yarn.resourcemanager.cluster-id</name>
  <value>yarn-ha</value>
  <description>集群HA的id,用于在ZooKeeper上创建节点,区分使用同一个ZooKeeper集群的不同Hadoop集群</description>
</property>
<property>
    <name>yarn.resourcemanager.webapp.address.rm1</name>
    <value>s107:8088</value>
</property>
<property>
    <name>yarn.resourcemanager.webapp.address.rm2</name>
    <value>s108:8088</value>
</property>

start up

You can start-yarn.shstart YARN directly on s107 , so ResourceManager will be started on s107, and NodeManager will be started on other nodes.

It should be noted that the ResourceManager will not be started by itself on the s108, and it needs to be started manually. yarn-daemon.sh start resourcemanagerStart manually by command .

Management commands

As for the management of YARN, as mentioned earlier, the commands used are yarn rmadminto check the health status of the RM, switch the Active/Standby status, etc., and need to use yarn.resourcemanager.ha.rm-idsthe id of the RM configured by the parameter as a parameter. For example, to view the RM status:

$ yarn rmadmin -getServiceState rm1
active

$ yarn rmadmin -getServiceState rm2
standby

Other commands can be obtained through yarn rmadmin -help.

Web management page

The management interface is the address configured by yarn.resourcemanager.webapp.address.[rm-id]. If you are accessing the RM address of the Standby, it will be automatically redirected to the RM address in the Active state. Except the About page, you can visit the About page to see which node is currently in Active state and which is in Standby state.


Reference article
1. ResourceManager High Availability


Personal Homepage: http://www.howardliu.cn

Personal blog post: ResourceManager HA configuration

CSDN homepage: http://blog.csdn.net/liuxinghao

Hirofumi CSDN: ResourceManager HA deployment

Guess you like

Origin blog.csdn.net/conansix/article/details/74984123