Hadoop Basics

Hadoop consists of two parts:

1.Hadoop Distributed File System (Hadoop Distributed File System)

HDFS having high fault tolerance, and may be deployed on low-cost hardware. HDFS is suitable for applications that have large data sets, and provides high throughput for data reading and writing. HDFS is the structure of a master / slave, it is usual deployments, running only a Namenode on the master, and run a Datanode on each slave.
  HDFS support traditional hierarchical file organization structure, with some of the existing file system is very similar in operation, for example, you can create and delete a file, move a file from one directory to another, rename, etc. operations. Namenode manage the entire distributed file system, file system operations (such as create, delete files and folders) are controlled by Namenode.

2, MapReduce implementation

Google's MapReduce is an important technique, which is a programming model to calculate a large amount of data. For the calculation of large data, the handling is usually parallel computing. At the moment at least, for many developers, parallel computing is still a relatively distant thing. MapReduce is a simplified programming model for parallel computation, it allows those who do not have much experience in parallel computing developers can also develop parallel applications.
  MapReduce name derives from two core operating this model: Map and Reduce. Simple to say, Map is set to a one to one mapping of data to another set of data, its mapping rule is specified by a function, for example, [1, 2, 3, 4] 2 would be mapped by becomes [2, 4, 6, 8]. Reduce is a set of data reduction, the reduction is a function specified by the rule, for example, [1, 2, 3, 4] reduction summing result is obtained 10, and it is normalized Quadrature The result is about 24.

Guess you like

Origin blog.csdn.net/kangshufu/article/details/92431861