Adding data nodes and decommissioning data nodes in the cluster

1 Service the new data node

1.1 Requirements

With the growth of business, the amount of data will become larger and larger, and the capacity of the original data nodes can no longer meet the needs of storing data, so new data nodes need to be dynamically added on the basis of the original cluster.

1.2 Environment preparation

Clone a new virtual machine host, delete the cloned original data (./hadoop-3.1.3/data and logs), and then source the configuration file:

[pbh@hadoop105 hadoop-3.1.3]$ source /etc/profile

1.2 Specific steps for serving new nodes

  1. Start the DataNode directly to associate with the cluster (new nodes have been added)
[pbh@hadoop105 hadoop-3.1.3]$ hdfs --daemon start datanode
[pbh@hadoop105 hadoop-3.1.3]$ yarn --daemon start nodemanager
  1. In enterprise development, if tasks are often submitted on Hadoop102 and Hadoop104, and the number of copies is 2, due to the principle of data locality, there will be too much data in Hadoop102 and Hadoop104, and the amount of data stored in Hadoop103 will be small. Enable data balancing:
[pbh@hadoop105 hadoop-3.1.3]$ sbin/start-balancer.sh -threshold 10

​ For parameter 10, it means that the disk space utilization of each node in the cluster does not differ by more than 10%, which can be adjusted according to the actual situation.

​ Stop data balancing command:

[pbh@hadoop105 hadoop-3.1.3]$ sbin/stop-balancer.sh

2 Decommissioning old data nodes

2.1 Add whitelist and blacklist

Whitelist and blacklist are a mechanism for Hadoop to manage cluster hosts.

Host nodes added to the whitelist are allowed to access the NameNode, and host nodes not in the whitelist will be exited. Host nodes added to the blacklist are not allowed to access the NameNode and will exit after data migration.

In practice, the whitelist is used to determine the DataNode nodes that are allowed to access the NameNode, and the content configuration is generally consistent with the content of the workers file. The blacklist is used to retire DataNodes during cluster operation.

The specific steps to configure the whitelist and blacklist are as follows:


  1. Create whitelist and blacklist files in the /opt/module/hadoop-3.1.3/etc/hadoop directory of the NameNode node
[pbh@hadoop102 hadoop]$ pwd
/opt/module/hadoop-3.1.3/etc/hadoop
[pbh@hadoop102 hadoop]$ touch whitelist
[pbh@hadoop102 hadoop]$ touch blacklist

​ Add the following host names to the whitelist, assuming that the nodes that the cluster works normally are 102 103 104 105

hadoop102
hadoop103
hadoop104
hadoop105
  1. Add dfs.hosts and dfs.hosts.exclude configuration parameters in the hdfs-site.xml configuration file
<!-- 白名单 -->
<property>
	<name>dfs.hosts</name>
	<value>/opt/module/hadoop-3.1.3/etc/hadoop/whitelist</value>
</property>
<!-- 黑名单 -->
<property>
	<name>dfs.hosts.exclude</name>
	<value>/opt/module/hadoop-3.1.3/etc/hadoop/blacklist</value>
</property>
  1. Distribution configuration files whitelist, blacklist, hdfs-site.xml
  2. restart the cluster
  3. View the currently working DN node on a web browser

2.2 Blacklist Retirement

  1. Edit the blacklist file in the /opt/module/hadoop-3.1.3/etc/hadoop directory, and add the host name of the node to be decommissioned (take hadoop105 as an example)
  2. Distribute blacklist to all nodes
  3. Refresh NameNode, refresh ResourceManager
[pbh@hadoop102 hadoop-3.1.3]$ hdfs dfsadmin -refreshNodes
Refresh nodes successful
[pbh@hadoop102 hadoop-3.1.3]$ yarn rmadmin -refreshNodes
17/06/24 14:55:56 INFO client.RMProxy: Connecting to ResourceManager at 
hadoop103/192.168.1.103:8033
  1. Check the web browser, the status of the decommissioned node is decommission in progress (decommissioning), indicating that the data node is copying blocks to other nodes

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-JRBWekBO-1658999436556)(.../DataNode working mechanism/imgs/Blacklist retirement step 4.png)]

  1. Wait for the status of the decommissioned node to be decommissioned (all blocks have been copied), stop the node and the node resource manager. Note: If the number of replicas is 3, and the number of nodes in service is less than or equal to 3, the decommissioning cannot be successful. You need to modify the number of replicas before decommissioning

[External link picture transfer failed, the source site may have an anti-leech link mechanism, it is recommended to save the picture and upload it directly (img-Ft9ampGP-1658999436557)(.../DataNode Working Mechanism/imgs/Blacklist Retirement Step 5.png)]

[pbh@hadoop105 hadoop-3.1.3]$ hdfs --daemon stop datanode
stopping datanode
[pbh@hadoop105 hadoop-3.1.3]$ yarn --daemon stop nodemanager
stopping nodemanager
  1. If the data is unbalanced, you can use the command to rebalance the cluster

Note: The same host name is not allowed to appear in the whitelist and blacklist at the same time. Since the hadoop105 node has been successfully decommissioned by using the blacklist blacklist, it is necessary to remove hadoop105 in the whitelist whitelist.

Guess you like

Origin blog.csdn.net/meng_xin_true/article/details/126039457