3 HDFS

A, HDFS and background definitions of

1.1 Background

We know that with the increasing amount of data freshman performance machine difficult to handle, so they need more than one machine cooperation, so while there is a need to manage multiple machines on the file system, which creates a distributed file system , HDFS is just one of them.

1.2 Definitions

HDFS (Hadoop Distribute File System) is a file system for storing files, to locate files by directory tree; secondly, it is distributed, there are many servers together to achieve its function, the cluster server has its own role .

HDFS applies scenes: suitable for write once, read many of the scenes, and does not support random modification of files . Suitable for left data analysis, do not suitable for network application.

Two, HDFS advantages and disadvantages

2.1 advantage

  1. High fault tolerance, data is automatically saved multiple copies, default is three, if data is lost in a table node, it is automatically copied copy from other nodes.

  2. Suitable for handling large data,

    (1) large-scale data: data can be processed even reach TB PB

    (2) File a large scale: the number of files can handle more than one million scale

  3. It can be built on low-cost machines, through a multi-copy mechanism to improve reliability

2.2 shortcomings

  1. Not suitable for low-latency data access

  2. Not suitable for storing large amounts of small files

    (1) If you store a large number of small files, it will take up metadata information and block information NameNode a lot of memory to store files in the NameNode, regardless of the size of the file, the size of the share metadata information are fixed, Note: File independent of disk storage space occupied by the block size, such as file size of 1 M, and the block size is 128M, then the files take up only 1 M of space on the disk.

    (2) small files stored addressing times more than read time, in violation of hdfs design objectives (addressing time is 1% of the transmission time)

  3. It does not support concurrent write, random modifications

Three, HDFS architecture

Here Insert Picture Description

As a file system, mainly to read and write, so there are three roles, there is a secondary role hdfs architecture major work (but very important)

NameNode :

(1) is responsible for managing hdfs namespace that stores metadata information (file name, size, location, etc.)

(2) Configuration Replication Strategy

(3) receive a heartbeat from the DN, DN and may send a command to

(4) receives the block information reported by DN

(4) a client request

DataNode :

(1) storing the actual data block (note, not the file itself)

(2) read and write operations of the data block

Client

(1) The document is divided into blocks, upload hdfs

(2) requesting to read and write files with NN interaction, and file location information obtained

(3) interaction with DN, read and write files

SecondaryNameNode

(1) aid NN, sharing its workload, such as regular merger Fsimage and Edits, then pushed NN (Note: Edits merger Fsimage and very much memory, 2NN NN and can not be placed on the same node, otherwise it will reduce the NN properties, part IV Detailed)

(2) In case of emergency, aid recovery NN

Four, HDFS block (Block)

Hdfs blocks are stored, the basic unit of reading and writing data.

The above-mentioned read and write data to the client DN when it is in blocks to read and write, then the block how to divide it.

HDFS design goal is addressing time 1% of the transmission time, under normal circumstances, seek time is 10ms (?), Then the transmission time is 1000 ms That 1s, and now most of the disk read and write speeds are 100 MB / s, then the data transmission unit is a 1s × 100 MB / s = 100 MB, and 100 closer to 128 (2 to the 7th power), so hdfs default block size is 128 MB.

Of course, I also said that most of the disk read and write speeds are 100 MB / s, and some good performance disks can achieve higher speeds, greater then it can be set to block.

Block to be incorporated into the disk read and write speed, neither too big nor too small, otherwise it will violate hdfs design goals:

(1) addressing time is up is too small special, but transmission time becomes king. Procedures when dealing with this data, will be very slow.

(2) is too small will lead to the block information stored too NN, NN addressing times increase.

To sum up: the impact that read and write speed hdfs block size of the disk. Of course, addressing not let too long, which requires that we can not store large amounts of small files in hdfs in.

Five, HDFS read and write data flow

5.1 write data flow

Here Insert Picture Description

1, Client issues a write request to the file a.txt NN

2, NN to see whether there has been a.txt, if there is no notice Client can write

3, Client write request to write block NN

4, NN find available DN (principle of proximity), return DN1, DN2, DN3

5, Client write data sent to establish a channel to DN1, and then establish the tunnel between the DN

6, DN channel after establishing a successful return to the Client ack

7, Client write data to DN1, DN1 copying data to other DN

8, Client notification NN own finished first Block

(Note: the parallel writing)

5.2 read data flow

Here Insert Picture Description

1, the client sends a request to read the file NN

2, NN view metadata information corresponding to the file return DN block located

3, the client selected a DN (the principle of proximity, then randomly) server, requesting to read data

4, DN transfer data to the client

Note: hdfs read data is not complicated, but reading of a block of one block read

5.2 topology - node distance calculation

When the above-mentioned read and write data has a principle of proximity, because the nearest small network transmission delay, then the node hdfs is how to determine the distance of it.

Node distance : the sum of the two nodes to reach the nearest common ancestor.

Here Insert Picture Description

Example: from node 9 and node 5 Calculated:

Common ancestor node 8, to node 5 from the node 3 to 8, from node 9 to node 8 is one, it is the distance between two 3 + 1 = 4

5.2 replication strategy

The first copy on the node where the Client, if the client in a cluster outside, picked at random.

A second copy on a different node and the first copy is located in the same rack.

The third copy on a different rack.
Here Insert Picture Description

Four, NameNode working mechanism with SecondaryNameNode

4.1 introduces

First, we know, NN the metadata information exists in the memory, random access because of the need, if there is a disk, the performance will inevitably be too low, but you have a memory problem - if NN down, then these data It will be lost, in order to solve data loss problems in NN maintains a the FsImage , which is used to back up the metadata information is then saved to disk.

But because fsimage backup metadata, so it will certainly be an extraordinary place the disk in order to synchronize the metadata random writes, which in turn leads to less efficient; so NN and the introduction of a Edits , it is only used to record the hdfs operation, the actual data does not exist, when the NN down, before the data can be restored by combining FsImage, because Edits need only append to the data in the disk, eliminating the need for disk access time, the speed will soon . (RDD feel like Edits and spark, are recording operation)

At this time, there is a problem, if NN run for a long time, then there will be a lot of Edits operational information, NN down until after the restart, Edits and fsimage time required for the merger would be particularly long, so Edits needs and fsimage periodically merge, but merge this work is also very much memory, if carried out on NN, NN will significantly reduce performance, so this merger work required to complete a new node, we call it - - Secondary the NameNode

4.2 Detailed Workflow

Here Insert Picture Description
NameNode :

1, NN start, combined Fsimage Edits and metadata information is stored in memory

2, the metadata client request CRUD

3, NN to the operation request edits the recording, NN Edits rolling to form a new edits_inprogress (assuming a checkpoint is needed), the operation of writing Edits_inprogress; no need to check if the point, then continue to write the edits to the original can be , and then 2NN will not operate.

4, NN client request

Secondary NameNode :

1,2NN NN asks if you need to check points (Edits or full time to check)

2, if a checkpoint needs to be performed, the NN 2NN fsimage of edits and copied (at this time will generate a new NN edits, the client requests after the checkpoint recorded in a new edits)

3,2NN copy edits over fsimage loaded into memory and combined to generate a new return NN fsimage

4, NN fsimage with the new over the old one.

Five, DataNode working mechanism

Here Insert Picture Description
1, DN started, register with NN

2, NN returns a registration success

3, DN periodically report information to all of its blocks NN

4, DN send heartbeat every three seconds to let NN NN know he was still alive, and bring back the operation command NN

5, if NN 10 minutes without a heartbeat received NN, it is considered that the node is hung up

Published 42 original articles · won praise 3 · Views 2059

Guess you like

Origin blog.csdn.net/stable_zl/article/details/104865269