Hadoop platform on HDFS and MAPREDUCE function, working principle and process

 

 

HDFS common functions

1. Metadata

2. checkpoint

3.DataNode function

 

HDFS works

A distributed file system, it manages files are provided a unified directory tree cut on the number of units stored in the server datanode .2 hdfs hdfs to locate the files in the specified directory tree as long as the client to access the file path you can, do not care about the specific physical location of the file. 3 Each cut every file in hdfs cluster can save multiple backups (default 3 copies), in hdfs-site.xml, the number of dfs.replication value is the number of backups. 4 hdfs there is a critical process in the service process: namenode, it maintains the mapping between a tree and hdfs of real storage location hdfs directory structure and files (metadata). The datanode service process responsible for receiving and managing the "File Block" -block. The default size is 128M (configurable) (dfs.blocksize). (Default block older versions of the hadoop is 64M)

HDFS work process

HDFS client To write data, the first communication keep namenode can write files to confirm and obtain the block datanode received file, then the file is passed to the client-by-block the respective datanode sequentially received by the block to be responsible datanode other datanode copy of a copy of the block.

MapReduce functions to achieve series:

 

MapReduce function for data conversion between a Hbase and Hdfs ---

 

MapReduce functions to achieve two sort ---

 

MapReduce functions to achieve the three --- Top N

 

MapReduce functions to achieve four --- small integrated (read statistics from hbase and output in descending order hdfs the Top 3)

 

--- MapReduce function to achieve five weight (Distinct), count (the Count)

 

--- MapReduce functions implemented six maximum (Max), sum (Sum), average (Avg.)

 

MapReduce functions to achieve seven small --- Comprehensive (serial processing calculates the average of multiple job)

 

MapReduce functions for Eight --- partition (Partition)

 

MapReduce functions to achieve nine --- Pv, Uv

 

MapReduce functions to achieve ten --- inverted index (Inverted Index)

 

MapReduce functions to achieve eleven --- join

MapReduce job process

一.Map task

二.Reduce task

 

Guess you like

Origin www.cnblogs.com/068zhengda/p/10965966.html