BigData-HDFS的块分发机制

HDFS复制分发数据块:
HDFS Block Replication Strategy
▪ First copy of the block is placed on the same node as the client
─ If the client is not part of the cluster, the first block is placed on a random
node
─ System tries to find one which is not too busy
▪ Second copy of the block is placed on a node residing on a different rack
▪ Third copy of the block is placed on different node in the same rack as the
second copy

The default HDFS block placement policy provides a tradeoff between minimizing the write cost, and maximizing data reliability, availability and aggregate read bandwidth. When a new block is created, HDFS places the first replica on the node where the writer is located. The second and the third replicas are placed on two different nodes in a different rack. The rest are placed on random nodes with restrictions that no more than one replica is placed at any one node and no more than two replicas are placed in the same rack, if possible. The choice to place the second and third replicas on a different rack better distributes the block replicas for a single file across the cluster. If the first two replicas were placed on the same rack, for any file, two-thirds of its block replicas would be on the same rack.

翻译和理解:
默认的HDFS块放置策略在最小化写入成本和最大化数据可靠性、可用性和聚合读取带宽之间提供了一个折衷方案。
当创建一个新块时,HDFS将第一个副本放置在写入器所在的节点上。第二个和第三个副本放置在不同机架上的两个不同节点上。其余的被放置在随机节点上,限制在任何一个节点上放置不超过一个副本,并且尽可能将不超过两个副本放置在同一机架上。选择将第二个和第三个副本放在不同的机架上,以便更好地将单个文件的块副本分布到集群中。如果前两个副本放在相同的机架上,对于任何文件,其三分之二的块副本都将在同一机架上。

猜你喜欢

转载自blog.csdn.net/adson1987/article/details/90171638