CDH big data node downtime test

1. Current status of the cluster

Cluster component description: At present, several big data related components are installed in the cluster, including storage related components such as HDFS, Hbase, and Hive, and data collection and processing related components such as Flume, Spark, and Kafka.
Insert picture description here
Cluster host description: There are currently 5 hosts in the cluster, which are 5 virtual machines on the same host. To ensure the normal distribution of HDFS replicas, the racks of hosts cdh1, cdh2, and cdh3 are set to "test1", and the racks of cdh4 and cdh5 are set to "test2"
Insert picture description here

Note: Hadoop is designed with data security and efficiency in mind. The data files are stored in three copies on HDFS by default. The storage strategy is: the
first copy is placed on one of the nodes, and the
second copy is placed on the same node as the first copy. The
third copy on other nodes in the same rack of the node is placed on a node in a different rack

2. Test content and objectives

When a node in the cluster is completely down, the original state of the cluster can be restored by adding a host. These include but are not limited to the following:
1. All data in the cluster remains unchanged and will not be lost.
2. The components in the cluster are guaranteed to operate normally.
3. The newly added host runs normally without exception.
We choose cdh3 as the downtime host. The services currently running in cdh3 are as follows. After cdh3 is down, we need to install the same service on the new host to restore the cluster.
Insert picture description here

Note: It is best to take a snapshot of all nodes in the cluster before testing, don’t ask me why I know

3. Node deletion
1. First, we let the cdh3 node "down", that is, shut down the node cdh3, the cluster will be in the following state
Insert picture description here
, delete the cdh3 down node in the "host", uncheck "skip management role", we need to complete the All components of this host are completely removed.
Insert picture description here
After the deletion is successful, the following content
Insert picture description here
will be displayed. Then we delete it from the CM management platform, which is to completely remove its nodes.
Insert picture description here
After the above operations are completed, there are only 4 nodes left in the cluster at present, and the current status of the cluster is as follows
Insert picture description here

Note: The reason for the abnormality of Oozie in the cluster is that the Oozie Server node is installed on cdh3, which has little impact on this test. We can add it again later

4. Data verification
Check the health of the data in hdfs, you can see that the data is normal, and there is no data loss or bad blocks due to the downtime of cdh3.
When deleting the cdh3 host, the cluster automatically balances the data and restores all the copy data in cdh3 to other nodes.Insert picture description here

Guess you like

Origin blog.csdn.net/mrliqifeng/article/details/106340915