Table of contents
1. YARN cluster roles and deployment planning
1.1 Cluster roles - an overview
1.2 Cluster role--ResourceManager (RM)
1.3 Cluster role--NodeManager (NM)
1.4 HA cluster deployment planning
2.2.1 The phenomenon that the RM restart mechanism is not enabled
2.3 Two implementation schemes and their differences
2.3.1 Non-work-preserving RM restart
2.3.2 Work-preserving RM restart
2.3.3 Storage medium of RM state data
2.6.1 The phenomenon of enabling the RM restart mechanism
3.4 Failover principle (based on zk automatic switching)
3.5.2 Cluster synchronization yarn-site.xml configuration file
1. YARN cluster roles and deployment planning
1.1 Cluster role - overview
Apache Hadoop YARN is a standard Master/Slave cluster (master-slave architecture) . Among them, ResourceManager ( RM ) is Master , and NodeManager ( NM ) is Slave . The common one is a multi-slave cluster with one master, and you can also build an HA high-availability cluster of RM .
1.2 Cluster role -- ResourceManager ( RM )
RM is the main role in YARN, which determines the final authority of resource allocation among all applications in the system , that is, the final arbiter. RM receives job submissions from users, and allocates and manages computing resources on each machine through NodeManager . Resources are given in the form of Containers .
In addition, RM also has a pluggable component scheduler , which is responsible for allocating resources for various running applications and scheduling according to policies.
1.3 Cluster role -- NodeManager ( NM )
NM is a slave role in YARN , one on each machine, responsible for managing computing resources on this machine. NM starts the Container container and monitors the resource usage of the container according to the RM command. And report resource usage to the RM master role.
1.4 HA cluster deployment planning
In theory, a YARN cluster can be deployed on any machine, but in practice, NodeManager and DataNode are usually deployed on the same machine. ( Where there is data, it is possible to generate calculations, and the cost of mobile programs is lower than the cost of mobile data )
As part of Apache Hadoop , YARN clusters are usually built together with HDFS clusters.
IP | server |
run role |
192.168.170.136 | hadoop01 |
namenode datanode resourcemanager nodemanager |
192.168.170.137 | hadoop02 |
namenode resourcemanager secondarynamenode datanode nodemanager |
192.168.170.138 | hadoop03 |
datanode nodemanage |
First install the Hadoop cluster: HDFS HA High Availability Cluster Construction Detailed Graphic Tutorial_Stars.Sky's Blog-CSDN Blog
2. YARN RM restart mechanism
2.1 Overview
ResourceManager is responsible for resource management and application scheduling. It is the core component of YARN and has the problem of single point of failure . ResourceManager Restart The restart mechanism is a feature that enables RM to make the Yarn cluster work normally when restarting , and makes the failure of RM not known to users . The restart mechanism does not mean to automatically restart for us (RM needs to be started manually). So it cannot solve the single point of failure problem.
2.2 Demonstration
2.2.1 The phenomenon that the RM restart mechanism is not enabled
First execute a program:
[root@hadoop02 ~]# cd /bigdata/hadoop/server/hadoop-3.2.4/share/hadoop/mapreduce/
[root@hadoop02 /bigdata/hadoop/server/hadoop-3.2.4/share/hadoop/mapreduce]# yarn jar hadoop-mapreduce-examples-3.2.4.jar pi 10 10
Then quickly kill RM:
[root@hadoop01 ~]# jps
6467 Jps
5975 NodeManager
4744 QuorumPeerMain
5432 JournalNode
6440 JobHistoryServer
5833 ResourceManager
5038 NameNode
5182 DataNode
5631 DFSZKFailoverController
[root@hadoop01 ~]# kill -9 5833
# 再手动重启 RM
[root@hadoop01 ~]# yarn --daemon start resourcemanager
The previous state data is gone:
The program also fails to run:
Summary: If the RM fails and restarts, the previous information will disappear. Executing jobs will also fail.
2.3 Two implementation schemes and their differences
- Non-work-preserving RM restart
RM restart without preserving work , implemented in Hadoop 2.4.0 release .
- Work-preserving RM restart
Work-preserving RM restart, implemented in Hadoop 2.6.0 release .
Do not keep the work. The RM startup mechanism only saves the information submitted by the application and the final execution status , and does not save the relevant data during the running process. Therefore, after the RM restarts, it will first kill the task that is being executed, and then resubmit it, and execute it from scratch. task .
Preserve the work The RM restart mechanism saves the status data of the application running , so after the RM restarts, it does not need to kill the previous tasks, but continues to execute the original execution progress.
2.3.1 Non-work-preserving RM restart
When Client submits an application to RM , RM will store the relevant information of the application . The specific storage location can be specified in the configuration file, which can be stored on the local file system , HDFS or Zookeeper , In addition, RM will also save the final status information of the application ( failed , killed , finished), and if it is running in a safe environment, RM will also save related certificate files .
When RM is closed, NodeManager ( hereinafter referred to as NM) and Client will continue to send messages to RM because they find that they cannot connect to RM , so as to confirm whether RM has returned to normal in time. When RM restarts, it will send a message The re-sync ( re-synchronization ) command is given to all NMs and ApplicationMasters ( hereinafter referred to as AM) . After receiving the re-sync command, NM will kill all running containers and re-register with RM . From the perspective of RM , each re-registered NM is the same as a newly added NM to the cluster .
The AM will kill itself after receiving the command to resync. Next, the RM will read out the stored information about the application , and resubmit the application whose final status is running before the RM is closed.
2.3.2 Work-preserving RM restart
The difference from not retaining work is that RM will record the data of the entire life cycle of the container , including data related to application running, resource application status, queue resource usage status, and other data .
When the RM is restarted, it will read the previously stored data about the running status of the application and send a re-sync command at the same time. Unlike the first method, the NM will not kill the application after receiving the re-sync command. The running containers , but continue to run the tasks in the containers , and at the same time send the running status of the containers to the RM . After that, the RM reconstructs the container instance and the related application running status according to the data it has mastered . In this way, the After the RM is restarted, the execution state of the task immediately following the RM shutdown continues to execute .
2.3.3 Storage medium of RM state data
If the RM's restart mechanism is enabled, the RM promoted to the Active state will initialize the internal state of the RM and restore the state left by the previous active RM, which depends on the restart characteristics of the RM. Jobs that were previously submitted to RM hosting will initiate new attempts. Applications submitted by users may consider performing periodic CheckPoints to avoid task loss.
The restart mechanism of RM essentially writes the internal state information of RM to the external storage medium. The directory of state information will be initialized when RM starts, and the relevant state will be written into the corresponding directory when the Application is running . If RM fails over or restarts, it can be recovered through external storage . The implementation of RM state storage is the RMStateStore abstract class, and YARN provides several instances of RMStateStore :
From the class inheritance diagram, it can be seen that there are 5 ways to store the state, and the comparison is as follows:
state storage |
illustrate |
Memory |
MemoryRMStateStore is a memory-based state storage implementation that uses RMState objects to store all states of RM. |
ZooKeeper |
ZKRMStateStore is a ZooKeeper-based state storage implementation that supports RM's HA. Only ZooKeeper-based state storage supports the isolation mechanism, which can avoid split-brain situations and allows multiple active RMs to edit the state storage at the same time. It is recommended to be used in YARN's HA. |
FileSystem |
FileSystemRMStateStore supports HDFS and local FS based state store implementations. Isolation mechanisms are not supported. |
LevelDB |
LeveldbRMStateStore is a state store implementation based on LevelDB, which is more lightweight than state stores based on HDFS and ZooKeeper. LevelDB supports better atomic operations, fewer I/O operations per state update, and far fewer total files on the filesystem. Isolation mechanisms are not supported. |
Null |
NullRMStateStore is an empty implementation. |
2.4 ZKRMStateStore
All state information of RM is stored under / rmstore / ZKRMStateRoot of ZooKeeper ; it mainly stores RM resource reservation information, application information, application Token information, and RM version information .
ZNode name |
illustrate |
ReservationSystemRoot |
RM's resource reservation system, the corresponding implementation is a subclass of the ReservationSystem interface. |
RMAppRoot |
Application information, the corresponding implementation is a subclass of the RMApp interface. |
AMRMTokenSecretManagerRoot |
The Token information of ApplicationAttempt, RM will save each Token in the local memory until the application is finished running, and save it to ZooKeeper storage for restart. The corresponding implementation is the AMRMTokenSecretManager class. |
EpochNode |
RM saves job restart time information. The epoch is incremented each time RM restarts. It is used to ensure the uniqueness of ContainerId. The corresponding implementation is the Epoch abstract class. |
RMDTSecretManagerRoot |
An RM - specific delegation token secrecy manager. The Secret Manager is responsible for generating and accepting a secret for each token. |
RMVersionNode |
RM version information. |
2.5 configuration
2.5.1 yarn-site.xml
Configure and enable the RM restart function, and use Zookeeper for transition data storage.
[root@hadoop01 ~]# cd /bigdata/hadoop/server/hadoop-3.2.4/etc/hadoop/
[root@hadoop01 /bigdata/hadoop/server/hadoop-3.2.4/etc/hadoop]# vim yarn-site.xml
<property>
<name>hadoop.zk.address</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
[root@hadoop01 /bigdata/hadoop/server/hadoop-3.2.4/etc/hadoop]# scp -r yarn-site.xml root@hadoop02:$PWD
yarn-site.xml 100% 2307 1.1MB/s 00:00
[root@hadoop01 /bigdata/hadoop/server/hadoop-3.2.4/etc/hadoop]# scp -r yarn-site.xml root@hadoop03:$PWD
yarn-site.xml
[root@hadoop01 /bigdata/hadoop/server/hadoop-3.2.4/etc/hadoop]# stop-yarn.sh
[root@hadoop01 /bigdata/hadoop/server/hadoop-3.2.4/etc/hadoop]# start-yarn.sh
2.6 Demonstration
2.6.1 The phenomenon of enabling the RM restart mechanism
If the RM fails and restarts, if it is a job-retaining restart, the job will continue to execute.
3. YARN HA cluster
3.1 Background
ResourceManager is responsible for resource management and application scheduling. It is the core component of YARN and the main role of the cluster. Before Hadoop 2.4 , ResourceManager was SPOF ( Single Point of Failure , single point of failure) in YARN cluster . In order to solve the single point of failure problem of RM , YARN designed a ResourceManager HA architecture in Active/Standby mode .
During operation, multiple ResourceManagers exist at the same time to increase redundancy and eliminate this single point of failure , and only one ResourceManager is in the Active state, and the others are in the Standby state. When the Active node fails to work normally, the rest of the Standby state points A new Active node will be generated through competitive election.
3.2 Architecture
Hadoop official recommended solution: implement YARN HA based on Zookeeper cluster . The key to realizing the HA cluster is: the synchronization of state data between the master and the backup , and the smooth switching between the master and the backup (failover mechanism). For the data synchronization problem, zk can be used to store the state data of the shared cluster. Because zk is essentially a small file storage system. For the smooth switching of master and backup, it can be implemented manually or automatically based on zk.
3.3 Failover mechanism
- Option 1 : Manual Failover
Administrators use commands to manually switch states.
- The second : automatic failover
RM can optionally embed Zookeeper - based ActiveStandbyElector for automatic failover .
The automatic failover of YARN does not need to run a separate ZKFC daemon like HDFS , because ActiveStandbyElector is a thread embedded in RM that acts as a failure detector and Leader election, rather than a separate ZKFC daemon.
3.4 Failover principle (based on zk automatic switching)
- Create a lock node : A lock node called ActiveStandbyElectorLock will be created on ZooKeeper . When all RMs are started, they will compete to write this temporary Lock node, and ZooKeeper can guarantee that only one RM is created successfully. The successfully created RM will switch to the Active state , and the unsuccessful RM will switch to the Standby state.
- Register Watcher monitoring : The Standby state RM registers a node change Watcher monitoring with the ActiveStandbyElectorLock node . Using the characteristics of the temporary node (the node automatically disappears after the session ends) , it can quickly perceive the operation of the Active state RM .
- Prepare to switch : When the RM in the Active state fails (such as downtime or network interruption), the Lock node created on ZooKeeper will be deleted. At this time, the other RMs in the Standby state will be notified by the Watcher event on the ZooKeeper server , and then Start to compete to write Lock child nodes, the successfully created ones will become the Active state, and the others will be in the Standby state .
- Fencing ( isolation ) : In a distributed environment, machines often experience suspended animation (commonly due to long GC time-consuming, network interruption or high CPU load), which makes it impossible to respond to the outside world in a timely manner. If a RM in the Active state appears to be suspended animation, and other RMs have just elected a new RM in the Active state . At this time, the suspended RM returns to normal and still thinks that it is in the Active state. This is a split-brain phenomenon in a distributed system. That is, there are multiple RMs in the Active state , and the isolation mechanism can be used to solve such problems.
YARN's fencing mechanism uses the ACL permission control of ZooKeeper data nodes to achieve isolation between different RMs . The created root ZNode must carry the ACL information of ZooKeeper in order to monopolize the node and prevent other RMs from updating the ZNode . After using this mechanism, the RM will try to update the relevant information of ZooKeeper after the suspended animation, but finds that it does not have permission to update the node data, so it switches itself to the Standby state.
3.5 Construction steps
Deployment based on HDFS HA high-availability cluster: HDFS HA high-availability cluster construction detailed graphic tutorial_Stars.Sky's Blog-CSDN Blog
3.5.1 Modify yarn-site.xml
[root@hadoop01 /bigdata/hadoop/server/hadoop-3.2.4/etc/hadoop]# vim yarn-site.xml
<configuration>
<!-- 启用RM HA -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- RM HA集群标识ID -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn_cluster</value>
</property>
<!-- RM HA集群中各RM的逻辑标识 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- rm1运行主机 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop01</value>
</property>
<!-- rm2运行主机 -->
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop02</value>
</property>
<!-- rm1 WebUI地址 -->
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>hadoop01:8088</value>
</property>
<!-- rm2 WebUI地址 -->
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hadoop02:8088</value>
</property>
<!-- 开启自动故障转移 -->
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- NodeManager上运行的附属服务。需配置成mapreduce_shuffle,才可运行MR程序。-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 每个容器请求的最小内存资源(以MB为单位)。-->
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>512</value>
</property>
<!-- 每个容器请求的最大内存资源(以MB为单位)。-->
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
</property>
<!-- 容器虚拟内存与物理内存之间的比率。-->
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
</property>
<!-- 开启 yarn 日志聚集功能,收集每个容器的日志集中存储在一个地方 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 日志保留时间设置为一天 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>86400</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://hadoop01:19888/jobhistory/logs</value>
</property>
<!-- zk 集群 -->
<property>
<name>hadoop.zk.address</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
<!-- 开启rm状态恢复重启机制 -->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<!-- 使用zk集群存储RM状态数据 -->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
</configuration>
3.5.2 cluster synchronization yarn-site.xml configuration file
[root@hadoop01 /bigdata/hadoop/server/hadoop-3.2.4/etc/hadoop]# scp -r yarn-site.xml root@hadoop02:$PWD
yarn-site.xml 100% 3292 716.1KB/s 00:00
[root@hadoop01 /bigdata/hadoop/server/hadoop-3.2.4/etc/hadoop]# scp -r yarn-site.xml root@hadoop03:$PWD
yarn-site.xml 100% 3292 867.1KB/s 00:00
3.5.3 Start
On hadoop01 , start the yarn cluster, and you can find that two RM processes have been started:
3.5.4 Status Check
View HA YARN cluster status
[root@hadoop01 /bigdata/hadoop/server/hadoop-3.2.4/etc/hadoop]# yarn rmadmin -getAllServiceState
hadoop01:8033 standby
hadoop02:8033 active
3.5.5 Web UI view
Log in to the Web UI pages of the machines where the two RMs are located :
After entering hadoop01:8088 , it will automatically jump to hadoop02 :8088 after pressing Enter, because hadoop02 is in Active state, and only Active provides external services.
3.5.6 Automatic failover
Forcibly kill the RM of the hadoop02 node , and elect the RM of the hadoop01 node to the Active state based on ZooKeeper 's ActiveStandbyElector automatic failover strategy , indicating that the failover configuration is correct.
[root@hadoop02 ~]# yarn rmadmin -getAllServiceState
hadoop01:8033 standby
hadoop02:8033 active
[root@hadoop02 ~]# jps
3120 QuorumPeerMain
3267 NameNode
3348 DataNode
8532 Jps
8005 NodeManager
3451 JournalNode
3548 DFSZKFailoverController
7932 ResourceManager
[root@hadoop02 ~]# kill -9 7932
[root@hadoop02 ~]# yarn rmadmin -getAllServiceState
hadoop01:8033 active
2023-09-05 15:38:30,727 INFO ipc.Client: Retrying connect to server: hadoop02/192.168.170.137:8033. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
hadoop02:8033 Failed to connect: Call From hadoop02/192.168.170.137 to hadoop02:8033 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
# 重新启动 hadoop02 的 RM
[root@hadoop02 ~]# yarn --daemon start resourcemanager
[root@hadoop02 ~]# yarn rmadmin -getAllServiceState
hadoop01:8033 active
hadoop02:8033 standby
Previous article: HDFS HA High Availability Cluster Construction Detailed Graphical Tutorial_Stars.Sky's Blog-CSDN Blog