Hadoop cluster add and delete nodes and cluster balance

There are two ways to add and delete nodes in the Hadoop cluster: static and dynamic
. The newly added nodes in this article have been configured with other related basic settings such as password-free login.

1. Static mode

Static mode needs to stop NameNode

  • 1. stop namenode
  • 2. Modify the slaves file and update to each node
  • 3. Start the namenode
  • 4. Execute the hadoop balance command. (This item is used for the balance cluster, if you only add nodes, this step is not necessary)

Second, the dynamic way

  • 1. First configure the hdfs-site.xml of the master node
    to add a list of nodes that are allowed and denied to join the cluster (if the allowed list is empty, the default is to allow connections, and the denied list is empty, which means that no node refuses to connect to the cluster. The priority of the rejected list Level is greater than the allowed priority)
    Insert picture description here
  • 2. Modify the slaves file, add the node host or ip that needs to be added, and update it to each node
  • 3. Start and execute the start datanode command in the datanode. Command: sh hadoop-daemon.sh start datanode
  • 4. Refresh at the master node. hdfs dfsadmin -refreshNodes
  • 5. You can check the node addition status through the web interface. Or use the command: sh hadoop dfsadmin -report
  • 6. Execute the hadoop balance command. (This item is used for the balance cluster, if you only add nodes, this step is not necessary)

Third, delete the node

  • 1. Configure the datanode-deny-list on the master node to refuse connection, add the deleted node name to the list
    vi /home/hadoop/hadoop2.7/con/datanode-denylist
    Insert picture description here
  • 2. Refresh
    hdfs dfsadmin -refreshNodes on the master node
  • 3. Check the status of the node (you can see that the status of the node becomes Decommissioned, and after a period of time it becomes Died)
    hdfs dfsadmin -report
  • 4. Shut down the process on the deleted node
    yarn-daemon.sh stop nodemanager
    hadoop-daemon.sh stop datanode

Four, start-balancer.sh description

start-balancer.sh can execute the -threshold parameter.
-threshold Default setting: 10, parameter value range: 0-100, parameter meaning: the target parameter to determine whether the cluster is balanced, the difference between each datanode storage utilization rate and the cluster total storage utilization rate should be less than this threshold , theory On the above, the smaller the parameter is set, the more balanced the entire cluster will be, but in the online environment, when the Hadoop cluster is balancing, it is also writing and deleting data concurrently, so it may not reach the set balance parameter value.
dfs.balance.bandwidthPerSec Default setting: 1048576 (1 M / S), parameter meaning: Set the bandwidth that the balance tool can occupy during operation. If the setting is too large, mapred may run slowly.
It should be noted that because HDFS needs to start a separate Rebalance Server to perform Rebalance operations, try not to start-balancer.sh on the NameNode, but find a relatively idle machine.

Start: bin / start-balancer.sh 10
Stop: bin / stop-balancer.sh

Published 9 original articles · praised 0 · visits 62

Guess you like

Origin blog.csdn.net/yangbllove/article/details/105546235