Hadoop Map/Reduce Overview

Hadoop MapReduce is a software framework (framwork), the purpose of this architecture is to facilitate the programmer reliability in large clusters (may be as large as thousands of nodes) for very large scale data (multiple number to T), and the MapReduce and fault-tolerant features are very good.

A the MapReduce Job will enter the data into a plurality of separate blocks, the data blocks processed in parallel by the map tasks. MapReduce framework output map task to sort, reduce task as an input. Usual job input and output are stored in the file system. Framework is responsible for scheduling, monitoring these tasks, and will repeat the failure of that task.

Under normal circumstances, cluster computing and storage cluster is the same, that is to say, MapReduce framework and HDFS is running on the same number of clusters. Such configuration allows the framwork scheduled tasks above has data nodes, such that the bandwidth of the cluster of good use.

MapReduce framework consists of the following components: single master JobTracker, plus a slave TaskTracker running on each node of the cluster. Master role is to schedule and monitor those task running on the slave, while for the failure of the task, but also to try to re-execute. Slave simply need to execute the task master assigned to either.

Application of at least the input and output paths to formulate and implement the map and reduce interface / abstract class. There are other configurations, the composition of the job configuration. Then the Job Hadoop job client (like executable jar package) submitted, and is configured such that it JobTracker jar package and the slave configuration files to go, and scheduling and monitoring tasks.

Although the Hadoop framework is the Java TM achieved, but MapReduce applications do not necessarily have to be written in Java.

  • Hadoop Streaming is a MapReduce tool that allows users as mapper / reducer Jobs to create run any executable files (such as shell).
  • Hadoop Pipes is a set of API with C ++ implementation MapReduce

Reproduced in: https: //www.cnblogs.com/licheng/archive/2011/11/08/2241723.html

Guess you like

Origin blog.csdn.net/weixin_33795743/article/details/92627523