Introduction to HDFS (10 minutes to understand HDFS, NameNode and DataNode)

Overview

 

First of all, let's get to know HDFS, HDFS (Hadoop Distributed File System) Hadoop distributed file system. It actually divides a large file into several blocks and saves them on multiple nodes on different servers. It makes users feel like they are viewing files locally through the Internet. In order to reduce errors caused by file loss, it will make multiple copies of each small file (three by default), so as to achieve multi-user sharing on multiple machines. files and storage space.

HDFS features:

    ① Save multiple copies, and provide a fault-tolerant mechanism, the copy is lost or automatically recovered from downtime. 3 copies are saved by default.

    ② run on cheap machines.

③ Suitable for big data processing. Because small files also occupy a block, the more small files (1000 1k files) the more blocks, the greater the pressure on the NameNode.

 

For example: the storage method of dividing a large file into three blocks A, B, and C

 

PS : Principles of data replication:

All blocks in the file except the last block are the same size.

HDFS placement strategy:

is to put one replica on one node in the local rack, another on a node in a different (remote) rack, and the last one on a different node in the remote rack.

Properties involved:

Block size: 64M by default in Hadoop1 version, 128M by default in Hadoop2 version

Replication factor: each file plus the number of copies of its file

 

Basic structure of HDFS

 

 

As shown in the figure above, the basic structure of HDFS is divided into NameNode, SecondaryNameNode, and DataNode.

NameNode : It is the Master node, which is somewhat similar to the root directory in Linux. Manage data block mapping; handle client read and write requests; configure replication policies; manage HDFS namespaces;

SecondaryNameNode : saves part of the information of the NameNode (not all the information is used to restore the data after the NameNode is down), it is the cold backup of the NameNode; merge fsimage and edits and then send it to the namenode. (a solution to prevent edits from being too large)

DataNode : Responsible for storing the data block block sent by the client; perform read and write operations of the data block. Is the younger brother of NameNode.

Hot backup: b is the hot backup of a, if a fails. Then b immediately runs the work in place of a.

Cold backup: b is a cold backup of a, if a fails. Then b cannot immediately replace a to work. However, some information of a is stored on b to reduce the loss after a is broken.

fsimage : Metadata image file (the directory tree of the filesystem.)

edits : Operation log of metadata (record of modification operations made to the file system)

Namenode memory is stored in =fsimage+edits.

 

Detailed NameNode

 

effect:

Namenode plays a leading role. Users can access and operate other data through namenode, which is similar to the feeling of the root directory.

Namenode contains: the relationship between directories and data blocks (implemented by fsimage and edits), the relationship between data blocks and nodes

 

The fsimage file and the edits file are the core files on the Namenode node.

Only the directory tree information is stored in the Namenode, and the location information about the BLOCK is uploaded from each Datanode to the Namenode.

The directory tree information of the Namenode is physically stored in the fsimage file . When the Namenode starts, it will first read the fsimage file and load the directory tree information into the memory.

The edits store log information. After the Namenode starts, all operations such as adding, deleting, and modifying the directory structure will be recorded in the edits file, and will not be recorded in the fsimage synchronously.

When the Namenode node is closed, the fsimage and edits files will not be merged. This merge process actually occurs during the Namenode startup process.

That is to say, when the Namenode starts, it first loads the fsimage file, then applies the edits file, and finally updates the latest directory tree information to the new fsimage file, and then enables the new edits file.

There is no problem with the whole process, but there is a small flaw, that is, if the Namenode changes too much after startup, it will cause the edits file to become very large, which is related to the update frequency of the Namenode.

Then in the process of the next Namenode startup, after reading the fsimage file, this incomparably large edits file will be applied, resulting in a longer startup time and uncontrollable. It may take several hours to start.

The problem that the edits file of Namenode is too large is the main problem to be solved by SecondNamenode.

The SecondNamenode will wake up according to certain rules, and then merge the fsimage file with the edits file to prevent the edits file from being too large and causing the Namenode to take too long to start.

 

Detailed explanation of DataNode

 

DataNodes actually store data in HDFS.

First explain the concept of block (block):

  1. When DataNode stores data, it reads and writes data in units of blocks. Block is the basic unit for reading and writing data in HDFS.
  2. Assuming that the file size is 100GB, starting from byte position 0, each 128MB byte is divided into a block, and so on, many blocks can be divided. Each block is 128MB in size.
  3. A block is essentially a logical concept, which means that the block does not actually store data, but only divides files.
  4. The copy will also be stored in the block. The advantage of the copy is security, but the disadvantage is that it takes up space.

SecondaryNode

 

Execution process: Download the metadata information (fsimage, edits) from the NameNode, then merge the two to generate a new fsimage, save it locally, push it to the NameNode, and reset the Edits of the NameNode.

 

Working principle (transferred from the blog of "Daniel Notes", since the implementation is clear and benefited a lot, no changes will be made here)

 

Write operation:

 

There is a file FileA, 100M size. Client writes FileA to HDFS.

HDFS is configured by default.

HDFS is distributed on three racks Rack1, Rack2, Rack3.

 

a.  Client divides FileA into 64M blocks. Divided into two blocks, block1 and Block2;

b.  Client sends a write data request to nameNode, as shown in the blue dotted line ①------>.

c.  NameNode node, record block information. And return the available DataNode, such as pink dotted line ②--------->.

    Block1: host2,host1,host3

    Block2: host7,host8,host4

    principle:

        The NameNode has RackAware rack awareness, which can be configured.

        If the client is a DataNode node, the rules for storing blocks are: copy 1, on the same node as the client; copy 2, on a node on a different rack; copy 3, on another node on the same rack as the second copy; other copies Pick at random.

        If the client is not a DataNode node, the rules for storing blocks are: copy 1, randomly select a node; copy 2, different copy 1, on the rack; copy 3, on another node that is the same as copy 2; other copies Pick at random.

d.  The client sends block1 to the DataNode; the sending process is stream writing.

    Streaming write process,

        1> Divide 64M block1 into 64k package;

        2> Then send the first package to host2;

        3> After host2 receives it, it sends the first package to host1, and the client wants to send the second package to host2;

        4> After host1 receives the first package, it sends it to host3, and receives the second package from host2 at the same time.

        5> And so on, as shown by the solid red line in the figure, until the block1 is sent.

        6> host2, host1, host3 send a notification to the NameNode and host2 to the Client, saying "the message has been sent". As shown by the solid line in pink color.

        7> After the client receives the message from host2, it sends a message to the namenode, saying that I have finished writing. This is really done. The thick yellow line as shown in the figure

        8> After sending block1, send block2 to host7, host8, host4, as shown by the blue solid line in the figure.

        9> After sending block2, host7, host8, host4 send notification to NameNode and host7 to Client, as shown by the light green solid line.

        10> The client sends a message to NameNode, saying that I have finished writing, as shown in the yellow thick solid line. . . That's it.

Analysis, through the writing process, we can learn:

     To write 1T files, we need 3T storage and 3T network traffic loans.

     During the process of reading or writing, the NameNode and DataNode communicate through HeartBeat to ensure that the DataNode is alive. If the DataNode is found dead, the data on the dead DataNode will be placed on other nodes. When reading, read other nodes.

     It doesn't matter if a node is hung up, there are other nodes that can be backed up; even, it doesn't matter if a certain rack is hung up; there are also backups on other racks.

 

Read operation:

 

The read operation is simpler. As shown in the figure, the client needs to read FileA from the datanode. And FileA consists of block1 and block2. 

 

Then, the read operation flow is:

a.  The client sends a read request to the namenode.

b.  The namenode checks the Metadata information and returns the location of the block of fileA.

    block1:host2,host1,host3

    block2:host7,host8,host4

c.  The position of the block is sequential, read block1 first, and then read block2. And block1 goes to host2 to read; then block2 goes to host7 to read;

 

In the above example, the client is located outside the rack, then if the client is located on a DataNode in the rack, for example, the client is host6. Then when reading, the rules to follow are:

It is preferable to read the data on this rack .

 

Computing and storage in the same server, each server can be a local server

 

Replenish

 

metadata

Metadata is defined as: data describing data, descriptive information about data and information resources. (similar to i-node in Linux)

Files starting with "blk_" are blocks that store data. The naming here is regular. In addition to the block file, there is also a file with the suffix "meta", which is the source data file of the block and stores some metadata information.

 

data replication

The NameNode makes all decisions about block replication. It periodically receives a heartbeat and a congestion report from each DataNode in the cluster. Receiving a heartbeat means that the DataNode is functioning properly. Blockreport contains a list of all blocks on the DataNode.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325304749&siteId=291194637