There are two ways to add and delete nodes in the Hadoop cluster: static and dynamic
. The newly added nodes in this article have been configured with other related basic settings such as password-free login.
1. Static mode
Static mode needs to stop NameNode
- 1. stop namenode
- 2. Modify the slaves file and update to each node
- 3. Start the namenode
- 4. Execute the hadoop balance command. (This item is used for the balance cluster, if you only add nodes, this step is not necessary)
Second, the dynamic way
- 1. First configure the hdfs-site.xml of the master node
to add a list of nodes that are allowed and denied to join the cluster (if the allowed list is empty, the default is to allow connections, and the denied list is empty, which means that no node refuses to connect to the cluster. The priority of the rejected list Level is greater than the allowed priority)
- 2. Modify the slaves file, add the node host or ip that needs to be added, and update it to each node
- 3. Start and execute the start datanode command in the datanode. Command: sh hadoop-daemon.sh start datanode
- 4. Refresh at the master node. hdfs dfsadmin -refreshNodes
- 5. You can check the node addition status through the web interface. Or use the command: sh hadoop dfsadmin -report
- 6. Execute the hadoop balance command. (This item is used for the balance cluster, if you only add nodes, this step is not necessary)
Third, delete the node
- 1. Configure the datanode-deny-list on the master node to refuse connection, add the deleted node name to the list
vi /home/hadoop/hadoop2.7/con/datanode-denylist
- 2. Refresh
hdfs dfsadmin -refreshNodes on the master node - 3. Check the status of the node (you can see that the status of the node becomes Decommissioned, and after a period of time it becomes Died)
hdfs dfsadmin -report - 4. Shut down the process on the deleted node
yarn-daemon.sh stop nodemanager
hadoop-daemon.sh stop datanode
Four, start-balancer.sh description
start-balancer.sh can execute the -threshold parameter.
-threshold Default setting: 10, parameter value range: 0-100, parameter meaning: the target parameter to determine whether the cluster is balanced, the difference between each datanode storage utilization rate and the cluster total storage utilization rate should be less than this threshold , theory On the above, the smaller the parameter is set, the more balanced the entire cluster will be, but in the online environment, when the Hadoop cluster is balancing, it is also writing and deleting data concurrently, so it may not reach the set balance parameter value.
dfs.balance.bandwidthPerSec Default setting: 1048576 (1 M / S), parameter meaning: Set the bandwidth that the balance tool can occupy during operation. If the setting is too large, mapred may run slowly.
It should be noted that because HDFS needs to start a separate Rebalance Server to perform Rebalance operations, try not to start-balancer.sh on the NameNode, but find a relatively idle machine.
Start: bin / start-balancer.sh 10
Stop: bin / stop-balancer.sh