The Road to Java Big Data--HDFS Detailed Explanation (2)--Technical Details

HDFS (Distributed File Storage System)--Technical Details

Table of contents

HDFS (Distributed File Storage System)--Technical Details

1. HDFS architecture

2. Block

Three, NameNode

4. Copy Placement Strategy

5. Rack Awareness Strategy

6. DataNodes

Seven, SecondaryNameNode


1. HDFS architecture

  1. In HDFS, when storing data, the data will be cut into blocks, and each block is called Block
  2. Itself is a distributed, scalable, reliable file system
  3. HDFS contains three main processes: NameNode (used to manage nodes and record metadata), DataNode (used to store data), and SecondaryNameNode. These three processes are generally distributed on different hosts, so it is generally customary to use the name of the process to call the node
  4. HDFS automatically backs up data twice, which is called replication. If not specified, the number of copies is 3 by default (two additional copies, plus the original to form 3 copies)

 

2. Block

  1. In HDFS, data is stored in blocks
  2. By default, the size of the Block in Hadoop1.0 is 64M, and the size of the Block in Hadoop2.0 is 128M, and the size is adjusted by dfs.blocksize
  3. If a file is less than a block size, the file as a whole is stored as a block, and the block size is consistent with the file size
  4. Advantages of dicing:
    1. File blocks can be stored on different nodes, capable of storing very large files
    2. It is conducive to data replication and quick backup (split large files and copy them by node)
  5. Different copies of a block (Block) must be on different nodes, and copies of different blocks may be on one node

Three, NameNode

  1. NameNode is the core node of HDFS, used to manage DataNode and store metadata
  2. NameNode maintains metadata information in HDFS:
    1. file storage path
    2. File Permissions
    3. File Size and Block Size Information
    4. BlockID
    5. Relationship information between Block and DataNode (node)
    6. number of copies
    7. Metadata format reference: FileName replicas block-Ids id2host. For example: /test/a.log,3,{b1,b2},[{b1:[h0,h1,h3]},{b2:[h0,h2,h4]}]
  3. Each piece of metadata is about 150B in size
  4. The NameNode stores metadata in memory as well as on disk:
    1. The purpose of storing in memory is for fast query
    2. stored on disk for crash recovery
  5. The metadata information will be persisted to the hard disk of the NameNode node, and the path of the persistent directory is determined by the dfs.tmp.dir attribute of core-site.xml . If this parameter is not configured, it is placed in /tmp by default

  6. Directory to store metadata: dfs/name/current

  7. Persistence files include:
    1. fsimage: metadata image file. Storing a NameNode metadata information does not synchronize the data in memory in real time. In general, the metadata in fsimage lags behind the metadata in memory
    2. edits: operation log file, which records the operations to be performed by NameNode
  8. When the NameNode receives the write operation, it first records this command in edits, and then changes the metadata in the memory. If the change is successful, it will return a success signal to the Client. It will not be recorded in fsimage in real time, but needs to meet certain conditions before updating. , so fsimage often lags behind the metadata in memory. The purpose of this design is to ensure the reliability of the operation - as long as the record is successful, this operation will definitely be executed.
  9. When there is more and more data in the memory, fsimage has not been updated, which will lead to more and more data differences. When fsimage is updated, it takes out the operations in the edits file one by one, and re-executes them in fsimage. At this time, edits_inprogress will be rolled into edits file, and a new edits_inprogress will be generated to record new operations.
  10. fsimage update and edits scrolling trigger conditions:
    1. Space: When the edits file reaches the specified size (the default is 64M, this size can be adjusted by fs.checkpoint.size----core-site.xml), the scrolling of the edits file will be triggered.
    2. Time: After the specified time from the last scrolling interval (the default is 3600S, this time can be adjusted by fs.checkpoint.period -----core-site.xml), the scrolling of the edits file will be triggered.
    3. Restart: Rollover is triggered when the NameNode restarts.
    4. Mandatory: The edits file is forced to roll through the hadoop dfsadmin -rollEdits command.
  11. DataNode will register and manage with NameNode through the RPC heartbeat mechanism --- DataNode will send heartbeat information to NameNode regularly (every 3S).
  12. Heartbeat information:
    1. Block information in the current DataNode
    2. The status of the current DataNode (pre-service, service, pre-retirement, decommissioning (the node cannot send the decommissioning status, the first three can be sent))
  13. If the NameNode does not receive the heartbeat from the DataNode within 10 minutes, it will consider that the DataNode has been lost, and will back up the data on this node (copies on other nodes) to other nodes to ensure the number of copies in the entire cluster
  14. After the NameNode restarts, update the fsimage file/roll the edits file, load the contents of the fsimage file into the memory, and wait for the heartbeat of the DataNode. If the heartbeat is not received within the specified time, it is considered that the DataNode has been lost and needs to be addressed. Processing, if the heartbeat is reached, the NameNode will verify the data of the DataNode to verify whether the data in the DataNode is consistent with the metadata records. If the verification fails, it will try to restore the data. It will be verified again after recovery. If the verification is successful, the service is provided externally, otherwise only the read service is provided externally. ---------This process is called safe mode (safe mode)
  15. In safe mode, the HDFS cluster only provides read services externally.
  16. It is also because of the verification problem of the security mode that the number of replicas must not exceed the number of nodes.
  17. If the cluster does not automatically exit safe mode within a reasonable amount of time, data loss may have occurred and this data cannot be recovered.
  18. Force exit safe mode: hadoop dsfadmin -safemode leave
  19. References

4. Copy Placement Strategy

  1. First copy:
    1. If it is uploaded within the cluster, whoever uploads the first copy belongs to him.
    2. If it is uploaded outside the cluster, the first copy will be placed on a relatively idle node.
  2. Second copy:
    1. Before Hadoop2.7: the second copy is placed on a node in a different rack from the first copy (to prevent the entire rack from going down)
    2. Hadoop2.7 starts: the second copy is placed on the same rack as the first copy (fast transmission inside the rack) node
  3. Third copy:
    1. Before Hadoop2.7: the third copy is placed on the same rack as the second copy (fast transmission inside the rack) node
    2. Hadoop2.7 starts: the third copy is placed on a node in a different rack from the second copy (to prevent the entire rack from going down)
  4. More copies:
    1. who idles on whom

5. Rack Awareness Strategy

  1. The so-called rack is not a physical rack but a logical rack, which is essentially a mapping
  2. Nodes of different physical racks can be mapped to the same logical rack
  3. In the actual process, the nodes of the same physical rack will be mapped to the same logical rack

6. DataNodes

  1. storage block
  2. DataNode sends heartbeat to NameNode
  3. The path to store the Block is also modified by the hadoop.tmp.dir property

Seven, SecondaryNameNode

  1. The SecondaryNameNode is not the backup of the NameNode. The role of this node is to assist the NameNode to complete the rollover of the edits file.
  2. In fully distributed, once the SecondaryNameNode appears, the scrolling of the edits file is performed on the SecondaryNameNode; if there is no SecondaryNameNode, the scrolling of the edits file is completed by the NameNode itself
  3. In the HDFS cluster, only the NameNode+SecondaryNameNode structure or the dual NameNode structure can be adopted, because the role of the SecondaryNameNode can be replaced by the NameNode, but if there is only one NameNode as the core node, it is easy to have a single point of failure, so the SecondaryNameNode is often discarded in the actual process Use dual NameNode structure to form NameNode HA (high availability)

Guess you like

Origin blog.csdn.net/a34651714/article/details/102812077
Recommended