Big Data: Hadoop (HDFS design ideas, design objectives, structure, replication mechanism, a copy of the storage policy)

A, HDFS design ideas

 1) ideas

  • Segmentation data, and storing multiple copies;

 

 2) If the file is only stored in multiple copies, without segmentation, what's the problem

  • Shortcoming

  1. No matter how much files are stored on a node, when data processing is difficult during parallel processing nodes may become network bottlenecks, difficult to handle big data;
  2. Storage load balancing is difficult, utilization of each node is very low;

 

 

Two, HDFS design goals

  • Hadoop Distributed File System (HDFS): from Google's GFS paper;
  • Design goals

  1. Distributed Storage: if necessary, to increase the level of lateral nodes;
  2. Run on commodity hardware cheap
  3. Easy to expand, to provide users with a good performance (if inexpensive hardware damage, will not bring serious damage to the user) file storage services;

 

 

Three, HDFS architecture

  • Usually a cluster of HDFS by a NameNode (NN) and a plurality of DataNodes (the DN) composition; DataNodes general NameNode and deployed on different nodes;
  • NameNode :

  • namespace file system management, and client access to files;
  • Features:

  1. Responsible for responding to client requests;
  2. Responsible for the source data (name of the file, a copy of the coefficient, Block storage DataNode) management;

 

  • DataNodes

  • Bolck of operation; generally each node has a DataNodes (there are several nodes in the cluster, it corresponds to a few DataNodes, a node can also run multiple DataNodes, but generally not used, use one), manage files should be stored to which node;
  • Features:

  1. Storing the user file corresponding to the data block (Block);
  2. Periodically sends the block itself and all of its information, the health of the NameNode;

 

  • Blocks is sliced ​​according blocksize; (blocksize = 128M, 130M ==> 128M + 2M)
  • Open, close, rename a file or directory rename; file operations;: namespaces

 

 

  • Four, HDFS copy mechanism

  • HDFS supports multi-tiered file storage (folder other folders),
  • File system namespace do anything, it will be recorded in NameNode;
  • Block all of a file, except the last one BLOCK, Block all other sizes are the same (the same Blocksize);

 

 

 

 

  • Fifth, a copy of the HDFS storage strategy

  •  Usually the default storage 3 copies: also fault-tolerant security reasons
  • The first copy is stored in the default operating current node;
  • The second copy is stored on a different node of the current node is located in a rack;
  • Third copy stored in the same rack as the second copy of the different nodes;

 

Guess you like

Origin www.cnblogs.com/volcao/p/11444679.html