Analysis of HDFS Storage Magic: Data Adventure in the Two-Dimensional World

Copyright Notice

  • The content of this blog is based on my personal study notes from the Dark Horse Programmer course. I hereby declare that all copyrights belong to Dark Horse Programmers or related rights holders. The purpose of this blog is only for personal learning and communication, not commercial use.
  • I try my best to ensure accuracy when organizing my study notes, but I cannot guarantee the completeness and timeliness of the content. The content of this blog may become outdated over time or require updating.
  • If you are a Dark Horse programmer or a related rights holder, if there is any copyright infringement, please contact me in time and I will delete it immediately or make necessary modifications.
  • For other readers, please abide by relevant laws, regulations and ethical principles when reading the content of this blog, refer to it with caution, and bear the resulting risks and responsibilities at your own risk. Some of the views and opinions in this blog are my own and do not represent the position of Dark Horse Programmers.

Zero Yin·Origin

  • In the vast two-dimensional world, a magical storage magic is quietly running, which is Hadoop Distributed File System (HDFS). Like a giant magic book library, HDFS cuts large files into small pieces and then scatters them across different magic bookshelves. Each nugget of books has multiple magicians keeping backups on different bookshelves, ensuring that even if something goes wrong with one of the magic bookshelves, the same treasure of knowledge can be retrieved from the other bookshelves.

  • How is this magical storage magic achieved?

Step One: Magic Cutting

  • In the center of the Magic Library, there is a super mage called NameNode. He is the manager of the entire magical library, and he cuts each large book into smaller pieces of the same size to make them easier to manage. These small pieces are called magic pages, and each page is a fixed size, usually 128MB. In this way, large books can be divided into many small pieces for easy storage and transportation.

Step Two: Magic Copy

  • In order to ensure that the magic book will not be lost due to the collapse of a certain bookshelf, each magic book page will be copied to other different bookshelves. These copies are called magic mirrors. By default, HDFS copies the magical image of each book page to three different bookshelves, so that even if one bookshelf is attacked by an evil force, two bookshelves retain its magical knowledge.

Step Three: Magic Mapping

  • The NameNode also creates a magic map that associates each magic book page with the location of their magic mirror. In this way, no matter where you enter the magic library, the NameNode can help you quickly find the magic knowledge you need.

Step 4: Magic Management

  • Each magic book page has its own guardian spirit, called a data node. They are responsible for keeping magic book pages and images on the bookshelf, and reporting their status to the NameNode regularly. If a magic book page is lost, the data node will restore it through other mirrors or new magic book pages to ensure that magic knowledge will not be damaged.

In this way, in this two-dimensional world full of magic and adventure, HDFS uses its storage magic to protect precious data treasures. Whether it's a large magic book or a small magic page, they are all safely and securely protected in this magical storage system. Join our data adventure team and explore this magical world full of wonders and challenges!

A storage principle

  • Distributed storage: Data stored in HDFS is distributed storage, that is, each server node is responsible for a part of the data.
  • Hadoop Distributed File System (HDFS) is a distributed file system in the Hadoop ecosystem that splits large files into multiple chunks and stores them on different nodes in the cluster. Each block is replicated to multiple nodes to provide fault tolerance.
    Insert image description here
  • Data is divided into Block blocks for storage on HDFS.
  • In order to solve the problem of different file sizes, the minimum storage unit of the file is set to block, with a size of 256MB.
    Insert image description here
  • On HDFS, data blocks can have multiple copies to improve data security.
  • To solve the problem of file loss and improve security. Put each Block block backup into a different server, the default number of backups is 2
    Insert image description here

2. fsck command

2.1 Configuration of the number of replica blocks

2.1.1 Global setting method

  • The data security of the HDFS file system is ensured by multiple copies. To set the number of copies of files uploaded to HDFS, you can hdfs-site.xmlconfigure the following properties in:
<property>
    <name>dfs.replication</name>
    <value>3</value>
</property>
  • This attribute defaults to 3. Under normal circumstances, there is no need to actively configure it (unless you need to set a value other than 3)
  • If you need to customize this attribute, please modify the hdfs-site.xml file of each server and set this attribute

2.1.2 Temporary setting method

  • In addition to configuring files, you can also temporarily decide how many copies of the uploaded file to store when uploading the file.
# 临时设置其副本数为2
hadoop fs -D dfs.replication=2 -put test.txt /tmp/
  • For files that already exist in HDFS, modifying dfs.replicationattributes will not take effect. If you want to modify an existing file, you can use the command
# 指定path的内容将会被修改为2个副本存储
# -R选项可选,使用-R表示对子目录也生效
hadoop fs -setrep [-R] 2 path

2.2 Check the number of copies of a file

  • Use the fsck (file system check) command provided by HDFS to check the number of copies of the file
hdfs fsck path [-files [-blocks [-locations]]]
  • fsck: You can check whether the specified path is normal
  • -files: Can list the file status within the path
  • -files -blocks: Output file block report (how many blocks, how many copies)
  • -files -blocks -locations: Output the details of each block

2.3 Block size and replication strategy configuration

  1. Block Size: HDFS divides large files into fixed-size blocks. Usually, the default block size is 256MB (can be modified through configuration). The choice of block size affects the efficiency and performance of file storage. Smaller block sizes may result in increased storage overhead, while larger block sizes may result in data that is not granular enough, affecting parallel processing capabilities.
    • Configuration item: dfs.blocksize, the default value is 256MB
    • Example configuration, set to 512MB:
    <property>
    	<name>dfs.blocksize</name>
    	<value>536870912</value>
    </property>
    
  2. Block replication policy (Replication Placement Policy): HDFS supports multiple block replication policies for determining the replication location of blocks. The default strategy is to randomly select nodes on different racks for replication to increase fault tolerance and performance.
    • You can choose to use different strategies, such as org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultor a custom strategy.
    • Configuration item: dfs.block.replicator.classname, the default value isorg.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault
    • Example configuration:
<property>
	<name>dfs.block.replicator.classname</name>
	<value>org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault</value>
</property>

Three NameNode metadata

  • In HDFS, files are divided into a bunch of blocks. Hadoop uses namenode to record and organize the relationship between files and blocks.
    Insert image description here
  • NameNode completes the management and maintenance of the entire file system based on a batch of edits and an fsimage file.

3.1 edits file

  • Edit Logs: Edit logs record all file system change operations, such as file creation, deletion, renaming, etc. The edit log retains the complete modification history of the file system and is the operation log of the file system.
  • The edits file is a journal file that records every operation in HDFS and the corresponding blocks of the files affected by this operation.
    Insert image description here

  • Merge of edits files
    Insert image description here

3.2 fsimage file

  • File System Image: A file system image is a snapshot that represents the current state of the file system, including all files, directories, permissions, attributes and other information.
  • The file system image captures the metadata information of the file system at a certain point in time and is static.
  • Merge all edits files into the final result to get an FSImage file
    Insert image description here

3.3 NameNode metadata management and maintenance

NameNode completes the management of the entire file system file based on the cooperation of edits and FSImage.

  1. Every operation on HDFS is recorded in the edits file
  2. After the edits reaches the size and goes online, a new edits record is opened.
  3. Regularly merge edits
    • If there is no fsimage file currently, merge all edits into the first fsimage
    • If the fsimage file currently exists, merge all edits with the existing fsimage to form a new fsimage.
  4. Repeat process 123

3.4 Metadata merge control parameters

  • For metadata merging, it is a timed process based on:
    • dfs.namenode.checkpoint.period, the default 3600(seconds) is 1 hour
    • dfs.namenode.checkpoint.txns, by default 1000000, 100W transactions
      will be executed as long as one of them meets the condition.
  • Check whether the conditions are met. The default is to check once every 60 seconds. It is determined based on: dfs.namenode.checkpoint.check.period, default .60(秒)

3.5 Checkpoint

  • Checkpoint: In order to prevent the edit log from growing indefinitely and improve the recovery efficiency of metadata, the NameNode will periodically merge the current edit log with the file system image to generate a new file system image. This process is called checkpointing.
  • A checkpoint contains the previous file system image and all edit log records since the last checkpoint.

3.6 Recovery and metadata operations

  • Recovery and metadata operations: When the NameNode starts, it loads the most recent checkpoint and associated edit logs.
  • First, it loads the checkpoint into an in-memory file system state. It then applies subsequent edits one by one, restoring the file system state to its latest state. This way, the NameNode ensures that the file system's metadata is consistent after startup.

3.7 The role of SecondaryNameNode

Insert image description here

  • Secondary NameNode is an auxiliary component in Hadoop Distributed File System (HDFS). Its main function is to help NameNode maintain some metadata and generate checkpoints to reduce the load on NameNode and Improve system availability . Despite the "Secondary" in its name, the Secondary NameNode is not a standby copy of the NameNode.

The functions of Secondary NameNode include the following aspects:

  1. Checkpoint generation: The NameNode in HDFS will continuously record the editing operations (edits) of the file system into the edit log during operation. In order to avoid excessively large edit logs and improve metadata recovery efficiency, the Secondary NameNode regularly copies the file system's edit log (edits) and file system image (fsimage) from the NameNode, and then merges this information to generate a new file system image. This merging process is checkpoint generation.

    • The generated checkpoint includes the most recent file system image and subsequent edit operations, but not previous edit operations. The generation of checkpoints reduces the workload of restoring the edit log at startup and improves system performance.
  2. Reduce NameNode load: During the checkpoint generation process, the Secondary NameNode will perform some metadata operations, such as merging file system images and editing logs. These operations can reduce the load on the NameNode, because originally these operations need to be performed by the NameNode itself. By delegating these operations to the Secondary NameNode, the NameNode can focus more on processing actual file system requests, improving the system's responsiveness.

  3. Secondary Failure Recovery: In some cases, the Secondary NameNode can act as a secondary tool if the NameNode fails and needs to be restored from backup. Although the Secondary NameNode itself cannot replace the operation of the NameNode, it can provide a merged file system image to make the recovery process more efficient.

  • Note that although the Secondary NameNode can reduce the load on the NameNode during the checkpoint generation process, it is not a standby copy of the NameNode . After the Hadoop 2.x version, the High Availability feature was introduced, allowing the use of two or more real active NameNode instances to provide higher availability and fault tolerance.

Four HDFS data learning reading and writing processes

4.1 Data writing process

Insert image description here

  1. The client initiates a request to the NameNode
  2. After the NameNode reviews the permissions and remaining space, it allows writing if the conditions are met, and informs the client of the DataNode address to write to.
  3. The client sends a data packet to the specified DataNode
  4. The DataNode to which data is written also completes the copy of the data and distributes the data it receives to other DataNodes.
  5. As shown in the figure above, DataNode1 is copied to DataNode2, and then copied to Datanode3 and DataNode4 based on DataNode2.
  6. When the writing is completed, the client notifies the NameNode, and the NameNode does the metadata recording work.

Key information points:

  • NameNode is not responsible for data writing, only metadata recording and permission approval.
  • The client directly writes data to a DataNode. This DataNode is usually the one closest to the client (network distance).
  • The copying of data block copies is completed between DataNodes (build a PipLine, copy and distribute in order, as shown in Figure 1 for 2, 2 for 3 and 4)

4.2 Data reading process

Insert image description here

  1. The client applies to NameNode to read a file
  2. After NameNode determines the client permissions and other details, it allows reading and returns the block list of this file.
  3. After the client gets the block list, it can find the DataNode and read it by itself.

  • key point:
    1. Data is also not provided through the NameNode
    2. The block list provided by NameNode will try to provide the one closest to the client based on network distance calculation. This is because there are three copies of a block, and it will try to find the one closest to the client for reading.

4.3 Summary of reading and writing process

  1. For the client's process of reading HDFS data
    • Regardless of reading or writing, the NameNode does not handle the data. The client communicates directly with the DataNode.
    • Otherwise, there will be too much pressure on NameNode.
  2. The process of writing and reading is simply:
    1. NameNode makes authorization judgment (whether it can be written, whether it can be read)
    2. The client directly connects to the DataNode for writing (the DataNode completes the copy copy itself), and the client directly connects to the DataNode for block reading.
    3. For writing, the client will be assigned to find the DataNode closest to itself to write data.
    4. Reading, the block list obtained by the client will be the one closest to the network
  3. network distance
    • The closest distance is on the same machine
    • The second is the same LAN (switch)
    • The second step is to cross the switch
    • The second step is to span the data center
    • HDFS has a built-in network distance calculation algorithm that can infer network distance through IP addresses and routing tables.

Guess you like

Origin blog.csdn.net/yang2330648064/article/details/132389371