1. Node offline:
The first problem: a large batch of dead blocks (more than 2000)
Reason: When a node is offline, 10 nodes are dropped in one block.
Solution: When 3 copies are offline, it is best to use 2 and 2 machines. It also depends on the progress on the nn (50070 main page) page. After reaching 0, delete the node.
Text Explanation:
Decommission Datanode is to remove Datanode from HDFS cluster.
The Datanode stores the actual data, so the data on the Datanode needs to be migrated to other machines when Decommissioning the Datanode. When offline, Datanode will perform the following operations: 1: Calculate block information 2: Delete block 3: Copy block 4: Verify block informationOperation step
1: On the Namenode, add the machine name of the Datanode that needs Decommission to the file specified by dfs.hosts.exclude (the configuration item is in hdfs-site.xml), that is, tell the Namenode which Datanodes are to be decommissioned.
If it is hdfs-site.xml
not found dfs.hosts.exclude
, then manually add the following content to hdfs-site.xml
it, and then write the machine that needs Decommission to the file /etc/hadoop/conf/dfs.exclude
.
<property>
<name>dfs.hosts.exclude</name>
<value>/etc/hadoop/conf/dfs.exclude</value>
</property>
exclude 例子: pslave1 pslave2 pslave3
2. Start Decommission with the following command:
hdfs dfsadmin -refreshNodes
After refreshing, you can see the Decommission In Progress node on the hfs page
hadoop dfsadmin -report
: also view
note: If you accelerate offline, you can reduce the number of blocks copied