On HDFS (a)

Background and Definitions

HDFS: distributed file system for storing files, the main features is its distributed, that many servers work together to achieve its function, the role of each server in the cluster have

  • With the increasing amount of data, an operating system all save any data, then distributed to more disk operating system management, but the management and maintenance of extremely inconvenient, so the urgent need for a system to manage multiple files on the machine , which is a distributed management system, HDFS is one .
  • HDFS is used for write once, read many times the scene, and does not support direct modifications to the document, only to support additional end of the file
  • HDFS using streaming data access methods : feature is like water, once the data is not over, but little by little "flow" over the processing of data is a little bit processing. If it is to be handled only after all the data over, then the delay will be great, and will consume a lot of memory.

Advantages and disadvantages

  1. High fault tolerance
    • Data is automatically saved multiple copies by adding a copy of the ways to improve fault tolerance
    • If after one copy is lost, it can be automatically assigned to other nodes as a new copy
  2. Large data
    • Data scale: the size of data that can be processed up to GB, TB, PB and even level of data
    • File Size: capable of handling more than one million the number of file size, the number is quite large
  3. It can be built on low-cost machines, through a multi-copy mechanism to improve reliability

Component Architecture

  1. namenode(nn): Is the Master, is a manager, a metadata store
    • HDFS namespace management
    • Configuring a copy of the policy
    • Map information management block
    • Reads and writes to the client's request
  2. datanode(dn): Is the slave, the real place to store files
    • Storing the actual data block
    • Read and write operations of the data block
  3. secondarynamenode(2nn): Not namenode hot standby, when namenode hang, and can not replace namenode and serve immediately
    • As an auxiliary namenode, and share their workload, such as regular merger Fsimage and Edits (article behind will talk about these two things), and pushed to namenode
    • In case of emergency, may assist recovery namenode, but can only restore some, but not all recovery
  4. client: Client
    • Segmentation file before uploading HDFS, client files into a cut of a Block, and then one by one to upload
    • namenode interact with, access to information files datanode
    • datanode interact with, read or write data
    • client provides commands to manage HDFS, such as formatting namenode
    • client to access HDFS through some commands, such as additions and deletions to search for HDFS reform and other

File block size

Why should file abstraction Block Block storage?

  1. split such that a single block may be greater than the size of the file capacity of the entire disk, Block configuration file may be distributed across the cluster, in theory, a single disk file cluster may occupy all machines.
  2. Block abstraction also simplifies storage system for the Block, without paying attention to the content of their rights, the owner, etc. (these elements are controlled at the file level).
  3. Block as fault tolerance and high availability mechanisms copy units, i.e. units of replication in Block.

HDFS files stored in physical memory carved block (Block), block size defaults to 128M in Hadoop2.x version, the old version is 64M, 128M so why is it?

In fact, the size of the block is provided HDFS disk transfer rate depends mainly as follows:

  1. If HDFS, the seek time is 10ms, that is, to find the target time is 10ms Block
  2. Experts say the best condition for the operation: addressing time is 1% of the transmission time , the transmission time of 1s
  3. The current disk transfer rate generally is 100M / s

Why not set the block size is too small, it can not be set too large?

  1. HDFS block set too small, the addressing time will increase, so that the program may have been looking for the start position of the block
  2. Will be very slow if the set is too large, the disk transfer time will be significantly larger than the data required to locate the block addressing time, this causes the program to process data

HDFS data stream

HDFS write data flow

  1. Distributed FileSystem module by the client to request NameNode upload files, whether NameNode check the target file already exists, the parent directory exists.
  2. NameNode return if it can be uploaded.
  3. The first Block client requests on several DataNode to upload to the server.
  4. NameNode returns three DataNode nodes, respectively, dn1, dn2, dn3, if there are multiple nodes, return the actual number of copies, and is calculated according to the distance and the load
  5. Client requests dn1 uploading data through FSDataOutputStream module, dn1 receipt of the request will continue to call dn2, then dn2 call dn3, will establish the communication pipeline is completed.
  6. dn1, dn2, dn3 step by step answer clients.
  7. The client first began to upload dn1 Block (starting with disk reads data into a local memory cache) to Packet units, dn1 Packet will receive a pass dn2, dn2 passed dn3; dn1 each pass a a reply packet will be placed in a queue waiting for a reply.
  8. When a Block transfer is complete, the client requests again uploaded NameNode Block second server. (Repeat steps 3-7).

Distance calculation node topology ---

In the process of writing data in HDFS, NameNode will choose to upload data from the received data DataNode closest distance, then the closest distance is how to calculate it?

Conclusion: The two nodes to reach the nearest common ancestor of the sum of the distance, is the node distance.

As shown in FIG:

  • Process nodes on the same node distance of 0
  • From different nodes on the same rack as the sum of the distances to the two nodes of a common rack r1, 2
  • Node distance different racks of the same data center as the common ancestor of two nodes to the cluster's distance and d1, 4
  • Different nodes of the data center distance of two nodes of the distance to the common ancestor of the data center and, 6

Rack awareness (node ​​selection copies stored)

We can set the number of copies from the configuration file, then select a node HDFS is how to store a copy of it?

As shown above, in order to improve fault tolerance, the following setting, there are three copies of added:

  • The first copy on the node where the Client, if the client outside the cluster, then a randomly selected
  • And a second copy of the first copy in the same rack, a random node
  • Third copy located in a different frame, random node

The aim is to improve fault tolerance.

HDFS read data flow

HDFS read data flow

  1. Distributed FileSystem request by the client to download files to NameNode, NameNode by querying metadata, to find the file DataNode address block is located.
  2. A selection DataNodes (principle of proximity, then randomly) server, a request to read data.
  3. DataNode start data transfer to the client (read data from a disk inside the input stream, to do calibration units Packet).
  4. Clients to Packet receiving units, the first in the local cache, and then written to the destination file.

Guess you like

Origin www.cnblogs.com/kocdaniel/p/11589382.html