Hadoop YARN HA cluster installation and deployment detailed graphic tutorial

Table of contents

1. YARN cluster roles and deployment planning

1.1 Cluster roles - an overview

1.2 Cluster role--ResourceManager (RM) 

1.3 Cluster role--NodeManager (NM) 

1.4 HA cluster deployment planning

2. YARN RM restart mechanism

2.1 Overview 

2.2 Demonstration 

2.2.1 The phenomenon that the RM restart mechanism is not enabled 

2.3 Two implementation schemes and their differences 

2.3.1 Non-work-preserving RM restart

2.3.2 Work-preserving RM restart

2.3.3 Storage medium of RM state data 

2.4 ZKRMStateStore 

2.5 configuration 

2.5.1 yarn-site.xml 

2.6 Demonstration

2.6.1 The phenomenon of enabling the RM restart mechanism

3. YARN HA cluster 

3.1 Background 

3.2 Architecture 

3.3 Failover mechanism 

3.4 Failover principle (based on zk automatic switching) 

3.5 Construction steps 

3.5.1 Modify yarn-site.xml

3.5.2 Cluster synchronization yarn-site.xml configuration file 

3.5.3 Start 

3.5.4 Status View 

3.5.5 Web UI view

3.5.6 Automatic failover 


 

1. YARN cluster roles and deployment planning

1.1  Cluster role - overview

        Apache Hadoop YARN is a standard  Master/Slave  cluster (master-slave architecture) . Among them,  ResourceManager ( RM ) is  Master , and NodeManager ( NM ) is Slave . The common one is a multi-slave cluster with one master, and you can also build  an HA  high-availability cluster of  RM .

1.2  Cluster role -- ResourceManager ( RM ) 

        RM is  the main role in YARN, which determines the final authority of resource allocation among all applications in the system , that is, the final arbiter. RM receives job submissions from users, and  allocates and manages computing resources on each machine through NodeManager  . Resources are given in  the form of Containers  .

        In addition, RM also has a pluggable component  scheduler , which is responsible for allocating resources for various running applications and scheduling according to policies.

1.3  Cluster role -- NodeManager ( NM ) 

        NM is  a slave role in YARN , one on each machine, responsible for managing computing resources on this machine. NM  starts  the Container  container and monitors the resource usage of the container according to the RM  command. And  report resource usage to the RM  master role.

1.4 HA  cluster deployment planning

        In theory, a YARN cluster can be deployed on any machine, but in practice,  NodeManager  and  DataNode  are usually deployed on the same machine. ( Where there is data, it is possible to generate calculations, and the cost of mobile programs is lower than the cost of mobile data )

As  part of Apache Hadoop YARN  clusters are usually built together with  HDFS  clusters.

IP

server

run role

192.168.170.136

hadoop01

namenode datanode resourcemanager nodemanager

192.168.170.137

hadoop02

namenode resourcemanager secondarynamenode datanode nodemanager

192.168.170.138

hadoop03

datanode nodemanage

First install the Hadoop cluster: HDFS HA ​​High Availability Cluster Construction Detailed Graphic Tutorial_Stars.Sky's Blog-CSDN Blog

2. YARN RM  restart mechanism

2.1 Overview 

        ResourceManager is responsible for resource management and application scheduling. It is  the core component of YARN and has the problem of single point of failure . ResourceManager Restart The restart mechanism is  a feature that enables RM  to make the Yarn  cluster work normally when restarting  , and makes  the failure of RM  not known to users . The restart mechanism does not mean to automatically restart for us (RM needs to be started manually). So it cannot solve the single point of failure problem.

2.2 Demonstration 

2.2.1  The phenomenon that the RM  restart mechanism is not enabled  

First execute a program: 

[root@hadoop02 ~]# cd /bigdata/hadoop/server/hadoop-3.2.4/share/hadoop/mapreduce/
[root@hadoop02 /bigdata/hadoop/server/hadoop-3.2.4/share/hadoop/mapreduce]# yarn jar hadoop-mapreduce-examples-3.2.4.jar pi 10 10

Then quickly kill RM:

[root@hadoop01 ~]# jps
6467 Jps
5975 NodeManager
4744 QuorumPeerMain
5432 JournalNode
6440 JobHistoryServer
5833 ResourceManager
5038 NameNode
5182 DataNode
5631 DFSZKFailoverController
[root@hadoop01 ~]# kill -9 5833

# 再手动重启 RM
[root@hadoop01 ~]# yarn --daemon start resourcemanager

The previous state data is gone:
​The program also fails to run:
​Summary: If  the RM  fails and restarts, the previous information will disappear. Executing jobs will also fail. 

2.3 Two implementation schemes and their differences 

  • Non-work-preserving RM restart

RM  restart without preserving work  , implemented in  Hadoop 2.4.0  release .

  • Work-preserving RM restart

Work-preserving  RM  restart, implemented in  Hadoop 2.6.0  release .

        Do not keep the work. The RM startup mechanism only saves  the information submitted by the application  and the final execution status , and does not save the relevant data during the running process. Therefore,  after the RM  restarts, it will first kill the task that is being executed, and then resubmit it, and execute it from scratch. task .

        Preserve the work The RM restart mechanism saves the status data of the application  running , so  after the RM  restarts, it does not need to kill the previous tasks, but continues to execute  the original execution progress.

2.3.1 Non-work-preserving RM restart

        When Client submits an  application  to  RM  , RM  will  store the relevant information of the application  . The specific storage location can be specified in the configuration file, which can be stored on the local file system  , HDFS  or  Zookeeper  , In addition,  RM  will also save  the final status information of the application  ( failed , killed , finished), and if it is running in a safe environment, RM  will also save related certificate files .

        When RM is closed, NodeManager ( hereinafter referred to as  NM) and  Client  will continue to send messages to  RM  because they find that they cannot connect to  RM , so as to confirm  whether RM  has returned to normal in time. When  RM  restarts, it will send a message The re-sync ( re-synchronization ) command is given to all  NMs  and  ApplicationMasters ( hereinafter referred to as  AM) . After receiving the re-sync command, NM  will kill all running containers  and  re-register with  RM  . From  the perspective of RM  , each re-registered  NM  is the same as a newly added  NM  to the cluster .

        The AM will kill itself after receiving the command to resync. Next, the RM  will read out the  stored information about  the application  , and resubmit the application  whose final status is running before  the RM  is closed.

2.3.2 Work-preserving RM restart

        The difference from not retaining work is that RM  will record  the data of the entire life cycle of the container  , including  data related to application  running, resource application status, queue resource usage status, and other data .

        When the RM is restarted, it will read the previously stored  data about the running status of  the application  and send a re-sync  command at the same time. Unlike the first method, the NM  will not kill the application after receiving the re-sync command. The running  containers , but continue to run  the tasks in  the containers  , and at the same time send the running status of the containers  to  the RM . After that, the RM  reconstructs the container  instance and the related  application  running status according to the data it has mastered  . In this way, the After  the RM  is restarted, the execution state of the task immediately  following the RM  shutdown continues to execute .

2.3.3  Storage medium of RM  state data 

        If the RM's restart mechanism is enabled, the RM promoted to the Active state will initialize the internal state of the RM and restore the state left by the previous active RM, which depends on the restart characteristics of the RM. Jobs that were previously submitted to RM hosting will initiate new attempts. Applications submitted by users may consider performing periodic CheckPoints to avoid task loss.

        The restart mechanism of RM essentially writes  the internal state information of RM to the external storage medium. The directory of state information will be initialized when  RM  starts, and the relevant state will be written into the corresponding directory when the Application  is running . If RM  fails over or restarts, it can be recovered through external storage . The implementation of RM state storage is  the RMStateStore  abstract class, and YARN  provides  several instances of RMStateStore  :

From the class inheritance diagram, it can be seen that there are 5 ways to store the state, and the comparison is as follows:

state storage

illustrate

Memory

MemoryRMStateStore is a memory-based state storage implementation that uses RMState objects to store all states of RM.

ZooKeeper

ZKRMStateStore is a ZooKeeper-based state storage implementation that supports RM's HA. Only ZooKeeper-based state storage supports the isolation mechanism, which can avoid split-brain situations and allows multiple active RMs to edit the state storage at the same time. It is recommended to be used in YARN's HA.

FileSystem

FileSystemRMStateStore supports HDFS and local FS based state store implementations. Isolation mechanisms are not supported.

LevelDB

LeveldbRMStateStore is a state store implementation based on LevelDB, which is more lightweight than state stores based on HDFS and ZooKeeper. LevelDB supports better atomic operations, fewer I/O operations per state update, and far fewer total files on the filesystem. Isolation mechanisms are not supported.

Null

NullRMStateStore is an empty implementation.

2.4 ZKRMStateStore 

        All state information of RM is stored  under / rmstore / ZKRMStateRoot  of  ZooKeeper ; it mainly stores RM  resource  reservation information, application information, application  Token  information, and RM  version information .

ZNode name

illustrate

ReservationSystemRoot

RM's resource reservation system, the corresponding implementation is a subclass of the ReservationSystem interface.

RMAppRoot

Application information, the corresponding implementation is a subclass of the RMApp interface.

AMRMTokenSecretManagerRoot

The Token information of ApplicationAttempt, RM  will save each  Token  in the local memory until the application is finished running, and save it to  ZooKeeper  storage for restart. The corresponding implementation is  the AMRMTokenSecretManager class.

EpochNode

RM  saves job restart time information. The epoch is incremented each time RM restarts. It is used to ensure the uniqueness of ContainerId. The corresponding implementation is  the Epoch abstract class.

RMDTSecretManagerRoot

An RM  - specific  delegation token secrecy manager. The Secret Manager is responsible for generating and accepting a secret for each token.

RMVersionNode

RM version information.

2.5  configuration 

2.5.1 yarn-site.xml 

Configure and enable  the RM  restart function, and use  Zookeeper  for transition data storage. 

[root@hadoop01 ~]# cd /bigdata/hadoop/server/hadoop-3.2.4/etc/hadoop/
[root@hadoop01 /bigdata/hadoop/server/hadoop-3.2.4/etc/hadoop]# vim yarn-site.xml 
<property>    
    <name>hadoop.zk.address</name>    
    <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
<property>    
    <name>yarn.resourcemanager.recovery.enabled</name>    
    <value>true</value>
</property>
<property>
    <name>yarn.resourcemanager.store.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>

[root@hadoop01 /bigdata/hadoop/server/hadoop-3.2.4/etc/hadoop]# scp -r yarn-site.xml root@hadoop02:$PWD
yarn-site.xml                                                                                                         100% 2307     1.1MB/s   00:00    
[root@hadoop01 /bigdata/hadoop/server/hadoop-3.2.4/etc/hadoop]# scp -r yarn-site.xml root@hadoop03:$PWD
yarn-site.xml 

[root@hadoop01 /bigdata/hadoop/server/hadoop-3.2.4/etc/hadoop]# stop-yarn.sh 
[root@hadoop01 /bigdata/hadoop/server/hadoop-3.2.4/etc/hadoop]# start-yarn.sh 

2.6 Demonstration

2.6.1  The phenomenon of enabling  the RM  restart mechanism

If the RM  fails and restarts, if it is a job-retaining restart, the job will continue to execute. 

3. YARN HA  cluster 

3.1 Background 

        ResourceManager is responsible for resource management and application scheduling. It is  the core component of YARN and the main role of the cluster. Before Hadoop 2.4  , ResourceManager  was  SPOF ( Single Point of Failure , single point of failure) in  YARN  cluster . In order to solve the single point of failure problem of RM  , YARN  designed a  ResourceManager HA  architecture in Active/Standby  mode  .

        During operation, multiple ResourceManagers exist at the same time to increase redundancy and eliminate this single point of failure , and only one ResourceManager is in the Active state, and the others are in the Standby state. When the Active node fails to work normally, the rest of the Standby state points A new Active node will be generated through competitive election.

3.2 Architecture 

        Hadoop official recommended solution:  implement YARN HA based on Zookeeper  cluster  . The key to realizing the HA  cluster is: the synchronization of state data between the master and the backup , and the smooth switching between the master and the backup (failover mechanism). For the data synchronization problem, zk can be used  to store the state data of the shared cluster. Because  zk  is essentially a small file storage system. For the smooth switching of master and backup, it can be implemented manually or  automatically based on zk. 

3.3 Failover mechanism 

  • Option 1 : Manual Failover

Administrators use commands to manually switch states.

  • The second : automatic failover

RM  can optionally embed  Zookeeper  -  based ActiveStandbyElector  for automatic failover .

        The automatic failover of YARN does not need to run a separate  ZKFC  daemon  like  HDFS  , because ActiveStandbyElector  is a thread embedded in  RM  that acts as a failure detector and  Leader  election, rather than a separate  ZKFC  daemon.

3.4  Failover principle (based on  zk  automatic switching) 

  • Create a lock node : A lock node called  ActiveStandbyElectorLock  will be created on  ZooKeeper  . When all  RMs  are started, they will compete to write this temporary Lock  node, and  ZooKeeper  can guarantee that only one  RM  is  created successfully. The successfully created  RM  will switch to  the Active  state , and the unsuccessful  RM  will switch to  the Standby  state.
  • Register Watcher monitoring : The Standby state RM  registers  a node change  Watcher  monitoring with  the ActiveStandbyElectorLock  node . Using the characteristics of the temporary node (the node automatically disappears after the session ends) , it can quickly perceive  the operation of the Active  state  RM  .
  • Prepare to switch : When  the RM  in the Active  state  fails (such as downtime or network interruption), the  Lock  node created on  ZooKeeper  will be deleted. At this time, the other  RMs  in the Standby  state  will be notified by  the Watcher  event on the ZooKeeper  server  , and then Start to compete to write  Lock  child nodes, the successfully created ones will become  the Active  state, and the others will be in  the Standby  state .
  • Fencing ( isolation ) : In a distributed environment, machines often experience suspended animation (commonly due to  long GC  time-consuming, network interruption or  high CPU  load), which makes it impossible to respond to the outside world in a timely manner. If a  RM  in the Active  state  appears to be suspended animation, and other  RMs  have just elected a new  RM in the Active  state  . At this time, the suspended  RM  returns to normal and still thinks that it is in  the Active  state. This is a split-brain phenomenon in a distributed system. That is, there are multiple  RMs in the Active  state  , and the isolation mechanism can be used to solve such problems.

        YARN's  fencing  mechanism uses  the ACL  permission control of ZooKeeper  data nodes  to achieve  isolation between different RMs  . The created root  ZNode  must carry  the ACL  information of  ZooKeeper  in order to monopolize the node and prevent other  RMs  from updating the  ZNode  . After using this mechanism,  the RM  will try to update  the relevant information of  ZooKeeper  after the suspended animation, but finds that it does not have permission to update the node data, so it switches itself to the Standby  state.

3.5 Construction steps 

Deployment based on HDFS HA ​​high-availability cluster: HDFS HA ​​high-availability cluster construction detailed graphic tutorial_Stars.Sky's Blog-CSDN Blog 

3.5.1 Modify  yarn-site.xml

[root@hadoop01 /bigdata/hadoop/server/hadoop-3.2.4/etc/hadoop]# vim yarn-site.xml 
<configuration>
<!-- 启用RM HA -->
<property>
  <name>yarn.resourcemanager.ha.enabled</name>
  <value>true</value>
</property>
<!-- RM HA集群标识ID -->
<property>
  <name>yarn.resourcemanager.cluster-id</name>
  <value>yarn_cluster</value>
</property>
<!-- RM HA集群中各RM的逻辑标识 -->
<property>
  <name>yarn.resourcemanager.ha.rm-ids</name>
  <value>rm1,rm2</value>
</property>
<!-- rm1运行主机 -->
<property>
  <name>yarn.resourcemanager.hostname.rm1</name>
  <value>hadoop01</value>
</property>
<!-- rm2运行主机 -->
<property>
  <name>yarn.resourcemanager.hostname.rm2</name>
  <value>hadoop02</value>
</property>
<!-- rm1 WebUI地址 -->
<property>
  <name>yarn.resourcemanager.webapp.address.rm1</name>
  <value>hadoop01:8088</value>
</property>
<!-- rm2 WebUI地址 -->
<property>
  <name>yarn.resourcemanager.webapp.address.rm2</name>
  <value>hadoop02:8088</value>
</property>
<!-- 开启自动故障转移 -->
<property>
  <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
  <value>true</value>
</property>
<!-- NodeManager上运行的附属服务。需配置成mapreduce_shuffle,才可运行MR程序。-->
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
<!-- 每个容器请求的最小内存资源(以MB为单位)。-->
<property>
  <name>yarn.scheduler.minimum-allocation-mb</name>
  <value>512</value>
</property>
<!-- 每个容器请求的最大内存资源(以MB为单位)。-->
<property>
  <name>yarn.scheduler.maximum-allocation-mb</name>
  <value>2048</value>
</property>
<!-- 容器虚拟内存与物理内存之间的比率。-->
<property>
  <name>yarn.nodemanager.vmem-pmem-ratio</name>
  <value>4</value>
</property>
<!-- 开启 yarn 日志聚集功能,收集每个容器的日志集中存储在一个地方 -->
<property>
  <name>yarn.log-aggregation-enable</name>
  <value>true</value>
</property>
<!-- 日志保留时间设置为一天 -->
<property>
  <name>yarn.log-aggregation.retain-seconds</name>
  <value>86400</value>
</property>
<property>
  <name>yarn.log.server.url</name>
  <value>http://hadoop01:19888/jobhistory/logs</value>
</property>
<!-- zk 集群 -->
<property>    
  <name>hadoop.zk.address</name>    
  <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
<!-- 开启rm状态恢复重启机制 -->
<property>    
  <name>yarn.resourcemanager.recovery.enabled</name>    
  <value>true</value>
</property>
<!-- 使用zk集群存储RM状态数据 -->
<property>
  <name>yarn.resourcemanager.store.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
</configuration>

3.5.2  cluster synchronization  yarn-site.xml  configuration file 

[root@hadoop01 /bigdata/hadoop/server/hadoop-3.2.4/etc/hadoop]# scp -r yarn-site.xml root@hadoop02:$PWD
yarn-site.xml                                                                                                         100% 3292   716.1KB/s   00:00    
[root@hadoop01 /bigdata/hadoop/server/hadoop-3.2.4/etc/hadoop]# scp -r yarn-site.xml root@hadoop03:$PWD
yarn-site.xml                                                                                                         100% 3292   867.1KB/s   00:00    

3.5.3 Start 

On hadoop01  , start  the yarn  cluster, and you can find that two  RM  processes have been started:

3.5.4  Status Check 

View  HA YARN  cluster status 

[root@hadoop01 /bigdata/hadoop/server/hadoop-3.2.4/etc/hadoop]# yarn rmadmin -getAllServiceState
hadoop01:8033                                      standby   
hadoop02:8033                                      active

3.5.5 Web UI view

Log in to the Web UI  pages of the machines where  the two RMs  are located :

        After entering hadoop01:8088 , it will automatically jump to hadoop02 :8088 after pressing Enter, because  hadoop02  is in  Active  state, and only  Active  provides external services.

3.5.6  Automatic failover 

        Forcibly kill  the RM of the hadoop02 node , and  elect the RM  of the hadoop01  node  to  the Active  state based on ZooKeeper  's  ActiveStandbyElector  automatic failover strategy  , indicating that the failover configuration is correct. 

[root@hadoop02 ~]# yarn rmadmin -getAllServiceState
hadoop01:8033                                      standby   
hadoop02:8033                                      active    
[root@hadoop02 ~]# jps
3120 QuorumPeerMain
3267 NameNode
3348 DataNode
8532 Jps
8005 NodeManager
3451 JournalNode
3548 DFSZKFailoverController
7932 ResourceManager
[root@hadoop02 ~]# kill -9 7932
[root@hadoop02 ~]# yarn rmadmin -getAllServiceState
hadoop01:8033                                      active    
2023-09-05 15:38:30,727 INFO ipc.Client: Retrying connect to server: hadoop02/192.168.170.137:8033. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
hadoop02:8033                                      Failed to connect: Call From hadoop02/192.168.170.137 to hadoop02:8033 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

# 重新启动 hadoop02 的 RM
[root@hadoop02 ~]# yarn --daemon start resourcemanager
[root@hadoop02 ~]# yarn rmadmin -getAllServiceState
hadoop01:8033                                      active    
hadoop02:8033                                      standby   

Previous article: HDFS HA ​​High Availability Cluster Construction Detailed Graphical Tutorial_Stars.Sky's Blog-CSDN Blog

Guess you like

Origin blog.csdn.net/weixin_46560589/article/details/132686676