(Transfer) The problem of setting block size on Datanode in HDFS

Reprinted from: http://blog.csdn.net/pear_zi/article/details/8082752

In HDFS, the default block size on data nodes is 64MB (or 128MB or 256MB)

 

Question: Why is 64MB (or 128MB or 256MB) the best choice?

1. Why can't it be much less than 64MB (or 128MB or 256MB) (the data block size of a common file system is generally 4KB)

a. Reduce the hard disk seek time (disk seek time)

 The premise of HDFS design is to support large-capacity streaming data operations, so even general data read and write operations involve a relatively large amount of data. If the data block setting is too small, more data blocks need to be read. Since the data blocks are stored non-contiguously on the hard disk, the ordinary hard disk needs to move the head, so the random addressing is slow, and the more data blocks are read, the more data blocks are read. Increases the total hard disk seek time. When the hard disk seek time is much longer than the io time, the hard disk seek time becomes a bottleneck of the system. A suitable block size can help reduce hard disk seek time and improve system throughput.

b. Reduce Namenode memory consumption

 For HDFS, he has only one Namenode node, and its memory is extremely limited compared to Datanode. However, the namenode needs to record the data block information in the Datanode in its memory FSImage file. If the data block size is set too small, and the data block information that needs to be maintained will be too much, the memory of the Namenode may not be hurt. .

 

2. Why can't it be much larger than 64MB (or 128MB or 256MB), here is mainly discussed from the upper MapReduce framework

a.Map crash problem:

 The system needs to be restarted, and the startup process needs to reload data. The larger the data block, the longer the data loading time and the longer the system recovery process.

b. Supervision time issues:

 The master node supervises the situation of other nodes, and each node will periodically report back the completed work and status updates. If a node remains silent for more than a preset time interval, the master node records the node status as dead and sends the data assigned to this node to other nodes. For this "preset time interval", this is roughly estimated from the perspective of the data block. If it is for 64MB data blocks, I can assume that you can solve it in 10 minutes anyway, and if you don't respond for more than 10 minutes, it is dead. But for data above 640MB or 1G, how long should I estimate? If the estimated time is short, it will be misjudged dead, and even worse in minutes, all nodes will be judged dead. The estimated time is long, the waiting time is too long. So for too large data blocks, this "preset time interval" is not easy to estimate.

c. Problem decomposition problem:

 The amount of data is linearly related to the complexity of problem solving. For the same algorithm, the larger the amount of data processed, the greater its time complexity.

d. Constrain the Map output:

 In the Map Reduce framework, the data after the Map is sorted before the Reduce operation is performed. Think about the idea of ​​the merge sort algorithm, sort small files, and then merge the small files into large files, and then you will understand this....

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326399926&siteId=291194637