HDFS Balancer load balancer

1. Background

When our hadoop cluster has been running for a period of time, everything is DataNodedifferent . For example: We added a new DataNode to an existing cluster.数据分布不一定均匀分布

DataNode data imbalance

2. What is balance

Here is my own simple understanding

The so-called balance means that the difference 每个DataNode的利用率between 集群的利用率and does not exceed a given threshold percentage. The balance here refers to the balance between each DataNode, and the disks between the same DataNode will not be balanced.

2.1 Utilization calculation of each DataNode

Utilization calculation for each DataNode
DataNode的利用率=Space used by dfs/space allocated to dfs.

Note: 分配给dfs的空间 not the total space of the disk.

2.2 Cluster Utilization

Cluster Utilization
集群的利用率=Space used by each datanode dfs / total space of each datanode

2.3 Balance

Assuming that the balance 阈值is 5%, and the utilization rate of the cluster is 37.5, then the utilization rate of each node is considered to be balanced between 32.5%. 42.5%That is to say, in extreme cases, the utilization rate of DataNode differs the most 10%.

3. hdfs balancer syntax

[hadoopdeploy@hadoop01 ~]$ hdfs balancer --help
Usage: hdfs balancer
	[-policy <policy>]	the balancing policy: datanode or blockpool
	[-threshold <threshold>]	Percentage of disk capacity
	[-exclude [-f <hosts-file> | <comma-separated list of hosts>]]	Excludes the specified datanodes.
	[-include [-f <hosts-file> | <comma-separated list of hosts>]]	Includes only the specified datanodes.
	[-source [-f <hosts-file> | <comma-separated list of hosts>]]	Pick only the specified datanodes as source nodes.
	[-blockpools <comma-separated list of blockpool ids>]	The balancer will only run on blockpools included in this list.
	[-idleiterations <idleiterations>]	Number of consecutive idle iterations (-1 for Infinite) before exit.
	[-runDuringUpgrade]	Whether to run the balancer during an ongoing HDFS upgrade.This is usually not desired since it will not affect used space on over-utilized machines.
	[-asService]	Run as a long running service.

Generic options supported are:
-conf <configuration file>        specify an application configuration file
-D <property=value>               define a value for a given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port>  specify a ResourceManager
-files <file1,...>                specify a comma-separated list of files to be copied to the map reduce cluster
-libjars <jar1,...>               specify a comma-separated list of jar files to be included in the classpath
-archives <archive1,...>          specify a comma-separated list of archives to be unarchived on the compute machines

The general command line syntax is:
command [genericOptions] [commandOptions]
parameter describe
threshold Percentage of disk capacity. The default value is 10%, which means it fluctuates by 10%.
policy Balance strategy.
datanode (default): When each DataNode is balanced, the cluster is balanced.
blockpool: When the blockpool in each DataNode is balanced, the cluster is balanced.
exclude DataNodes that do not participate in balance
include DataNode nodes participating in the balance
source Select only the specified data node as the source node
blockpools Balancer only runs in specified blockpools
idconductorations Number of consecutive idle iterations before exiting (-1 means infinite)
-runDuringUpgrade Whether to run the balancer during an ongoing HDFS upgrade. Usually this is not necessary as this will not affect the used space on an overused computer.
-asService run as a long-running service

4. Run a simple balance case

4.1 Setting Balanced Data Transmission Bandwidth

[hadoopdeploy@hadoop01 ~]$ hdfs dfsadmin  -setBalancerBandwidth 10485760
Balancer bandwidth is set to 10485760
[hadoopdeploy@hadoop01 ~]$

When our cluster load needs to be lowered, this value can be adjusted appropriately when our cluster load is low.

4.2 Execute banalce

[hadoopdeploy@hadoop01 ~]$ hdfs balancer -policy datanode -threshold 5
2023-03-26 14:10:09,785 INFO balancer.Balancer: Using a threshold of 5.0
2023-03-26 14:10:09,786 INFO balancer.Balancer: namenodes  = [hdfs://hadoop01:8020]
2023-03-26 14:10:09,786 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 5.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]
2023-03-26 14:10:09,786 INFO balancer.Balancer: included nodes = []
2023-03-26 14:10:09,786 INFO balancer.Balancer: excluded nodes = []
2023-03-26 14:10:09,786 INFO balancer.Balancer: source nodes = []
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved  NameNode
2023-03-26 14:10:09,787 INFO balancer.NameNodeConnector: getBlocks calls for hdfs://hadoop01:8020 will be rate-limited to 20 per second
2023-03-26 14:10:10,392 INFO balancer.Balancer: dfs.namenode.get-blocks.max-qps = 20 (default=20)
2023-03-26 14:10:10,392 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
2023-03-26 14:10:10,392 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
2023-03-26 14:10:10,392 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
2023-03-26 14:10:10,392 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
2023-03-26 14:10:10,392 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
2023-03-26 14:10:10,392 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 100 (default=100)
2023-03-26 14:10:10,392 INFO balancer.Balancer: dfs.datanode.balance.bandwidthPerSec = 104857600 (default=104857600)
2023-03-26 14:10:10,395 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
2023-03-26 14:10:10,395 INFO balancer.Balancer: dfs.blocksize = 134217728 (default=134217728)
2023-03-26 14:10:10,401 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.121.141:9866
2023-03-26 14:10:10,401 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.121.140:9866
2023-03-26 14:10:10,401 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.121.142:9866
2023-03-26 14:10:10,402 INFO balancer.Balancer: 0 over-utilized: []
2023-03-26 14:10:10,402 INFO balancer.Balancer: 0 underutilized: []
2023-3-26 14:10:10                0                  0 B                 0 B                0 B                  0  hdfs://hadoop01:8020
The cluster is balanced. Exiting...
2023-3-26 14:10:10       Balancing took 810.0 milliseconds
[hadoopdeploy@hadoop01 ~]$

5. Reference documents

1、https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer
2、https://help.aliyun.com/document_detail/449686.html

Guess you like

Origin blog.csdn.net/fu_huo_1993/article/details/129777880