Table of contents
1. HDFS cluster rolling upgrade
1.2.1 Non-federated HA cluster
1.2.1.1 Rolling upgrade preparation
1.2.1.2 Upgrading Active NN and Standbys NN
1.2.1.4 Complete rolling upgrade
2. HDFS cluster downgrade and rollback
2.1 The difference between downgrade and rollback
2.2 HA cluster downgrade (downgrade)
2.2.2 Downgrade Active NameNode and Standby NameNode
2.2.3 Confirmation of downgrade operation
2.2.4 HA cluster downgrade (downgrade) considerations
2.3 Cluster rollback (rollback) operation
1. HDFS cluster rolling upgrade
1.1 Introduction
In Hadoop v2, HDFS supports NameNode High Availability ( HA ). It makes it feasible to upgrade HDFS without downtime . Note that rolling upgrades are only supported from Hadoop-2.4.0 onwards . So in order to upgrade an HDFS cluster without downtime , the cluster must be set up with HA .
In an HA cluster, there are two or more NameNodes ( NN ), many DataNodes ( DN ), some JournalNodes ( JN ) and some ZooKeeperNodes ( ZKN ). JN is relatively stable, and in most cases, no upgrade is required when upgrading HDFS .
During rolling upgrade, only for NNs and DNs , neither JNS nor ZKNs . Upgrading JN and ZKN may cause cluster downtime.
1.2 Non-stop rolling upgrade
1.2.1 Non-federated HA cluster
Suppose there are two namenodes NN1 and NN2 , where NN1 and NN2 are in Active and StandBy states respectively .
1.2.1.1 Rolling upgrade preparation
# 创建一个新的 fsimage 文件用于回滚
hdfs dfsadmin -rollingUpgrade prepare
# 不断运行下面命令检查回滚 fsimage 是否创建完毕。
# 如果显示 Proceeding with Rolling Upgrade 表示已经完成。
hdfs dfsadmin -rollingUpgrade query
1.2.1.2 Upgrading Active NN and Standbys NN
# 关闭 NN2:
hdfs --daemon stop namenode
# 升级启动 NN2:
hdfs --daemon start namenode -rollingUpgrade started
# 做一次 failover 切换,使得 NN2 成为 Active 节点,NN1 变为 Standby 节点。
# 关闭 NN1:
hdfs --daemon stop namenode
# 升级启动 NN1:
hdfs --daemon start namenode -rollingUpgrade started
1.2.1.3 Upgrade DN
# 选择整体中的一小部分 DataNode 节点进行升级(比如按照DataNode所在的不同机架来筛选)。
# 关闭升级所选的 DN 其中 IPC_PORT 由参数 dfs.datanode.ipc.address 指定,默认 9867。
hdfs dfsadmin -shutdownDatanode <DATANODE_HOST:IPC_PORT> upgrade
# 检查下线 DataNode 是否已经停止服务,如果还能得到节点信息,意味着此节点还未真正被关闭。
hdfs dfsadmin -getDatanodeInfo <DATANODE_HOST:IPC_PORT>
# 启动 DN 节点。
hdfs --daemon start datanode
# 对选中的所有 DN 节点执行以上步骤。重复上述步骤,直到升级群集中的所有 DN 节点。
1.2.1.4 Complete rolling upgrade
# 完成滚动升级
hdfs dfsadmin -rollingUpgrade finalize
1.2.2 Federation HA Cluster
A federation cluster is a cluster with multiple namespaces . Each namespace corresponds to a pair of active and standby NameNode nodes. The above set of clusters is commonly known as federation + HA cluster .
The upgrade process of federated clusters is similar to that of non-federated clusters, and there is no essential difference, except that the upgrade operation needs to be repeated several times for different namespaces .
#1、在每个 namespace 下执行升级准备
hdfs dfsadmin -rollingUpgrade prepare
#2、升级每个 namespace 下的 Active/Standby 节点
#2.1、关闭 NN2:
hdfs --daemon stop namenode
#2.2、升级启动 NN2:
hdfs --daemon start namenode -rollingUpgrade started
#2.3、做一次 failover 切换,使得 NN2 成为 Active节点,NN1 变为 Standby 节点。
#2.4、关闭 NN1:
hdfs --daemon stop namenode
#2.5、升级启动 NN1:
hdfs --daemon start namenode -rollingUpgrade started
#3、升级每个 DataNode 节点
#3.1、关闭升级所选的 DN 其中 IPC_PORT 由参数 dfs.datanode.ipc.address 指定,默认9867。
hdfs dfsadmin -shutdownDatanode <DATANODE_HOST:IPC_PORT> upgrade
#3.2、检查下线 DataNode 是否已经停止服务。如果还能得到节点信息,意味着此节点还未真正被关闭
hdfs dfsadmin -getDatanodeInfo <DATANODE_HOST:IPC_PORT>
#3.3、启动 DN 节点。
hdfs --daemon start datanode
#4、升级过程执行完毕,在每个 namespace 下执行 finalize 确认命令
hdfs dfsadmin -rollingUpgrade finalize
1.3 Downtime upgrade
1.3.1 Non-HA cluster
During the upgrade process, there is bound to be a short-term service stop time, because the NameNode needs to be restarted, and there is no standby node available during this time. The overall process is similar to the four steps of the non - federated HA mode . However, the process of step 2 needs to be slightly modified:
#Step1:滚动升级准备
#Step2:升级 NN 和 SNN
#1、关闭 NN
hdfs --daemon stop namenode
#2、升级启动 NN
hdfs --daemon start namenode -rollingUpgrade started
#3、停止 SNN
hdfs --daemon stop secondarynamenode
#4、升级启动SNN
hdfs --daemon start secondarynamenode -rollingUpgrade started
#Step3:升级 DN
#Step4:完成滚动升级
hdfs dfsadmin -rollingUpgrade finalize
2. HDFS cluster downgrade and rollback
2.1 The difference between downgrade ( downgrade ) and rollback ( rollback )
- common ground :
Will return the version to the version before the upgrade ;
After the finalize action of the upgrade is executed , the downgrade and rollback will not be allowed .
- Differences :
The downgrade can support the rollling method , which can be rolled down , and the rollback needs to stop the service for a period of time;
The downgrade process will only restore the software version to the one before the upgrade , and will retain the user's existing data status;
The rollback will restore the user data to the state mode before the upgrade, and the existing data state will not be saved.
Friendly reminder: be cautious about upgrading, and even more cautious about downgrading and rolling back .
In a production environment, scientific research must be conducted before cluster upgrades to evaluate the compatibility of the upgraded version with existing services. The upgrade process is completely simulated in the test environment, and the cluster status before the upgrade is backed up to avoid accidental cluster interruption. Don't expect to save the cluster through operations such as rollback and downgrade when the upgrade fails.
2.2 HA cluster downgrade ( downgrade )
If the upgraded version is not desired, or in some unlikely cases, the upgrade fails (due to a bug in the newer version), administrators can choose to downgrade HDFS to the pre-upgrade version, or roll back HDFS to the pre-upgrade version Version and pre-upgrade status.
Note that downgrades can be done in a rolling fashion, but not rolled back. Rollback requires cluster downtime.
Note also that downgrades and rollbacks are only possible after starting a rolling upgrade and before terminating the upgrade . Upgrades can be terminated by completing, downgrading, or rolling back. Therefore, it may not be possible to perform a rollback after completion or downgrade, or to perform a downgrade after completion.
2.2.1 Downgrade DataNodes
#1. 选中部分集合 DataNode 节点(可以按照机架进行区分)
# 执行降级操作,其中 IPC_PORT 由参数 dfs.datanode.ipc.address 指定,默认 9867。
hdfs dfsadmin -shutdownDatanode <DATANODE_HOST:IPC_PORT> upgrade
# 执行命令检查节点是否完全停止
hdfs dfsadmin -getDatanodeInfo <DATANODE_HOST:IPC_PORT>
# 在选中集合内的其他 DataNode 节点上重复执行上述操作
2.2.2 Downgrade Active NameNode and Standby NameNode
# 停止并降级 Standby NameNode.
# 正常启动 Standby NameNode
# 触发 failover 切换,使得主备角色对调
# 停止并降级之前属于 Active(现属于 Standby 的 NameNode)
# 正常启动作为 Standby 节点
2.2.3 Confirmation of downgrade operation
# 完成降级操作
hdfs dfsadmin -rollingUpgrade finalize
2.2.4 HA cluster downgrade ( downgrade ) precautions
Downgrading and upgrading have one thing in common in the HA mode: when operating the NameNode , they start from the Standby node first , wait for the Standby node to upgrade / downgrade, and do a switch to enable another node to perform the upgrade / downgrade operation. In the whole process, always maintain an Active node to provide external services .
The operation sequence of NameNode and DataNode in the downgrade process is just opposite to that during upgrade: the new version is generally compatible with the old version in terms of protocol and API . If NN is downgraded first , then DN will be the new version and NN will be the old version . Many protocols in the new version of DN may not be compatible in the old version of NN . Therefore, the DN must be downgraded first , and then the server NN must be downgraded . There is actually a deeper reason behind this seemingly simple order reversal .
The downgrade operation of federated clusters and non- HA clusters corresponds to the upgrade operation , just replace the corresponding operation commands .
2.3 Cluster rollback ( rollback ) operation
Notes on rollback: Rollback does not support rolling operations. During the operation, it requires the cluster to stop providing services to the outside world.
The Rollback operation will not only return the software version to the version before the upgrade , but also return the user data to the state before the upgrade.
Rollback steps:
#1.停止所有的 NameNode 和 DataNode 节点
#2.在所有的节点机器上恢复升级前的软件版本
#3.在 NN1 节点上执行 -rollingUpgrade rollback 命令来启动 NN1,将 NN1 作为 Active 节点
#4.在 NN2 上执行 -bootstrapStandby 命令并正常启动 NN2,将 NN2 作为 Standby 节点
#5.以 -rollback 参数启动所有的 DataNode
Previous article: HDFS HA High Availability Cluster Construction Detailed Graphical Tutorial_Stars.Sky's Blog-CSDN Blog