In Hadoop HDFS, MapReduce architecture

 

1.  hadoop concept and development process

Hadoop is an Apache a distributed computing open source framework for open source organizations, with a java realize open source software framework for language, achieved in a large number of computer cluster consisting of massive data distributed computing. Hadoop framework in the core design: HDFS and MapReduce , HDFS achieve storage, MapReduce implementation principle analysis. Data Hadoop flow of processing may be simply understood by the following diagram: data Hadoop obtained result postprocessing cluster, which is a tool for high-performance processing of large data sets.

Hadoop doing: for large distributed data storage and computing platforms.

 

2.  HDFS and MapReduce architecture

HDFS : hadoop Distributed File System, hadooop distributed file system, it is a highly fault-tolerant systems, suitable for deployment on low-cost machines. HDFS provides high throughput data access for those applications with large data sets.

 

 

HDFS architecture:

Master-slave structure: only one master node: NameNode; slave node, a plurality of, DataNodes .

Namenode responsible for: receiving a user request operation; maintenance agency directory file system; manage file and block the relationship between, block and datanode relationship between.

Datanode responsible for: storage file; file is divided into block stored on disk; To ensure data security, there will be multiple copies of files.

 

 

MapReduce file system: it is a programming model for large data sets (greater than 1TB ) parallel computing. MapReduce is divided into two parts: the Map (mapping), and the Reduce (reduction).

当你向mapreduce框架提交一个计算作业,它会首先把计算作业分成若干个map任务,然后分配到不同的节点上去执行,每一个map任务处理输入数据中的一部分,当map任务完成后,它会生成一些中间文件,这些中间文件将会作为reduce任务的输入数据。Reduce任务的主要目标就是把前面若干个map的数据汇总到一起并输出。

 

MapReduce的体系结构:

主从结构:主节点,只有一个:JobTracker;从节点,有很多个:Task Trackers

JobTracker负责:接收客户提交的计算任务;把计算任务分给Task Trackers执行;监控Task Tracker的执行情况;

Task Trackers负责:执行JobTracker分配的计算任务。

 

3. Hadoop的特点和集群特点

Hadoop集群的物理分布:

 

 

单节点物理结构:

 

 

Hadoop的特点:

1、扩容能力:能可靠地存储和处理千兆字节数据

2、成本低:可以通过普通机器组成的服务器群来分发以及处理数据。

3、高效率:通过分发数据,hadoop可以在数据所在的节点上并行地处理它们,这使得处理非常的快速。

4、可靠性:hadoop能自动维护数据的多份副本,并且在任务失败后能自动地重新部署计算任务。

 

 

Guess you like

Origin www.cnblogs.com/wendyw/p/11307515.html