hadoop hadoop combat base ---- (xiv) ----- hadoop management tools --- CDH remove hosts

Precautions

In the CDH's Hadoop cluster hosts have provided and will authorize release operation to remove a cluster node

Decommissioned nodes under normal circumstances will not lead to the loss of the blocks, but in some special scenes or lost a small amount of blocks will appear.

such as:

1, too many off the assembly line at the same time the number of nodes, if the copy number 3, while the recommended maximum of two disabled DataNode carried off the assembly line operation, wait off the assembly line is completed, the copy of the check is no problem and then the other nodes offline --- - that retains at least a DataNode

2, a copy off the assembly line before the data has been incomplete and need to be checked before a copy of the recommendation offline and off the assembly line

3, network bandwidth reasons, a large number of jobs in the cluster, high bandwidth, resulting in no copies of copies to other nodes, if there is data balancing operation, the data need to stop balancing act, a copy of the balance will move blocks of data.

Host with DataNode role of adjustment before removing HDFS prevent data loss

View master role, click the host in CDH Interface -> Role

Here Insert Picture Description

When we want to remove a DataNode, NameNode need to ensure that each blocks from DataNode still be available throughout the focus group under the direction of replication factors.

This process involves copying the blocks from the small quantities DataNode. If DataNode thousands of pieces, the deletion may take several hours.

So before using Cloudera Manager of DataNode decommissioning, you should first adjust HDFS, which can greatly shorten the time to delete.

First run commands each carried a copy of the inspection role DataNode

hdfs fsck / -list-corruptfileblocks -openforwri

Guess you like

Origin blog.csdn.net/q383965374/article/details/104019686