Instructions for setting some parameters of HDFS

1. dfs.datanode.du.reserved

The free space reserved for each disk should be reserved for non- hdfs files. The default value is 0, and the unit is byte.Then the actual storage space available for HDFS is; total storage space - dfs.datanode.du.reserved.

If this parameter is configured as 10G, df -h will get the following results:

Filesystem Size Used Avail Use% Mounted on
/dev/sda4 243G 200G 31G 87% /data
Note: The total capacity is 243G, 200G has been used, and 31G is available. Usage + available amount! =Total capacity, less 12G, this is the problem.

Continue to view the HDFS situation through the command hadoop dfsadmin -report:

Configured Capacity: 228260941824 (232.58 GB)
DFS Used: 208414818078 (194.10 GB)
Non DFS Used: 0 (0 B)
DFS Remaining: 19846123746 (38.48 GB)
Configured Capacity displays the total capacity of the directory space specified by dfs.data.dir. Then DFS Remaining should be 232.58-194.10= 38.48G, but in fact it is only 18.48G. Why? Because there is actually only 31G of free space left in /dev/sda4, but DFS Remaining can store 38.48G of data, so if there is data, it will be stored in the node until it is full.

Solution: Set dfs.datanode.du.reserved to be larger. Currently set to 30G
, hadoop dfsadmin -report checks
Configured Capacity: 228260941824 (212.58 GB)
DFS Used: 208414818078 (194.10 GB)
Non DFS Used: 0 (0 B) DFS
Remaining: 19846123746 (18.48 GB)
DFS available Space 18.48<31G , so when the dfs is all used up, the disk /dev/sda4 still has 13G free space, achieving the desired effect!

2. dfs.datanode.max.transfer.threads

Indicates the number of threads responsible for file operations on the datanode. If there are too many files to be processed, and this parameter is set too low, some files cannot be processed, and an exception will be reported.
All file operations in the Linux system are bound to a socket, which can be regarded as a thread in more detail. And this parameter is to specify the number of such threads. There is a dedicated thread group in the datanode to maintain these threads. At the same time, there is a daemon thread to monitor the size of this thread group. It is responsible for monitoring whether the number of threads has reached the online limit. If it exceeds, an exception will be thrown. Because if there are too many such threads, the system memory will explode.
The dfs.datanode.max.xcievers attribute indicates the upper limit of the number of files that each datanode can open at any one time. This number cannot be greater than the setting of the number of open files in the system, that is, the value of nofile in /etc/security/limits.conf.

3.dfs.datanode.handler.count

The number of threads that the datanode can handle at the same time is 10 for client requests. If the cluster is large or there are many tasks running at the same time, it is recommended to increase this value.

Guess you like

Origin blog.csdn.net/victory0508/article/details/50754032