hadoop block

1. Block , file block: the most basic storage unit. The file is cut into multiple blocks, which are stored on the dataNode, and there will be multiple different blocks on the dataNode, and the same block exists on multiple dataNodes.

For the file content, the length of a file is size, then starting from the 0 offset of the file, the file is divided and numbered in sequence according to a fixed size, and each divided block is called a block. The default block size of HDFS is 128MB, with a 256MB file, there are 256/128=2 blocks in total.

2. Unlike ordinary file systems, in HDFS, if a file is smaller than the size of a data block, it does not occupy the entire data block storage space. Replication, multiple replicas: By default, each block has three replicas ( dfs.replication property of hdfs-site.xml) , hadoop2.0 default block size is 128M, 1.0 is 64M;

When one of the block copies is lost (for example, the disk is broken), the system will automatically copy the same block on other nodes to other idle nodes to add a copy;

3. Block copy placement strategy:

The first copy: put it on the dataNode where the file is uploaded. If it is submitted outside the cluster, randomly select a node whose disk is not too full and the CPU is not too busy;

The second copy: placed on a node in a rack different from the first copy, because the nodes on the same rack share a power supply, if the power supply is cut off, the nodes will all hang up;

Third replica: placed on a node on the same rack as the second replica;

More replicas: random nodes;

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326684374&siteId=291194637