What Hdfs distributed file systems that?

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/qq_41946557/article/details/102753444

Hdfs

concept:

Hadoop implements a distributed file system (Hadoop Distributed File System), referred to HDFS. Hadoop is widely used Apache Lucene text search founder Doug Cutting library development. It originated in the Apache Nutch, which is part of an open source web search engine itself is Luene project. Aapche Hadoop is an open source application framework MapReduce algorithm, is an important cornerstone of Google to create its empire.

Architecture design:

Using a structural model from the master (Master / Slave), a HDFS cluster consists of a plurality of NameNode and DataNode thereof. Wherein NameNode master server to manage the file system namespace and a file access client operation; DataNode management data stored in the cluster.

 

Deployment:

  1. Distributed structures

~ Run multiple processes (roles). This mode is used when the general learning Hadoop on a single server.  

~ Role NameNode - total control SecondaryNameNode - persistent DataNode - stored data

 

 

 

  1. Fully distributed structures

- Work should use patterns, different roles running on different servers.

~ Role . 1 / the NameNode 2 / SecondaryNameNode . 3 / DataNodes *. 3 (two copies)

 

 

  1. HA (High Available) mode

  

~ While fully distributed is used in the practice mode, but it is not reliable. The reason is simple, that is, the cluster will be a single point of failure, if namenode node failure, hang up, then such a cluster is not available and can not be accessed externally. Therefore, the cluster usually do HA.

角色1 / NameNode (active) 2 / NameNode (standby) 3 / DataNode 4 / Zookeeper (QA)

5/JournalNode(JNN)6/ZookeeperFailoverController(ZKFC)

 

 

Instructions:

I think to use, that is the core of the distributed file system - reading and writing processes.

Reading process:

 

Writing process:

 

Scenario:

  1.  HDFS is not suitable for storing large amounts of small files,
  2. HDFS for high throughput, not suitable for low time delay access
  3. Streaming read mode, is not suitable for multi-user to write a file (a file can be written while a client) and a write-anywhere (not support random write), supporting end of the file operation apend or overwriting a file ;
  4. HDFS more suitable for write once, read many times of application scenarios

Guess you like

Origin blog.csdn.net/qq_41946557/article/details/102753444