Parallel Distributed computing MapReduce

Job Source: https: //edu.cnblogs.com/campus/gzcc/GZCC-16SE2/homework/3319

1. Use your own words to clarify on the platform Hadoop HDFS and MapReduce function, working principle and process.

  HDFS Hadoop Distributed File System full name, its most important role is as a storage service in each Hadoop eco-system. Function: wherein the object is to be able to access large amounts of data may support a large number of cheap memory file from ten million dollars, is a very good data storage mode, in this mode the data batch into consideration, instead of user interaction processing, data access latency than the problem, but the key is access to high-throughput data. How it works:  where HDFS using master / slave architecture, is divided into two categories namely NameNode and DataNode nodes, where a node is the organization of work NameNode master node is responsible for access to the file, DataNode node primarily responsible for managing storage on the machine .

  MapReduce programming model is a function: mainly used to process large data, such as distribution grep, distribution sort, web connections FIG inversion, each machine word vector, web access log analysis, constructing a reverse index, document clustering , machine learning, statistical machine translation, and so on. How it works: As data is block-based data nodes are not only one, so the main idea is to parallel processing. And each block of data input we call fragmentation, fragmentation processes will map, each map is derived from the results of parallel passed to Reduce restraining then processed and output the final result.

  Both HDFS and MapReduce is essential, not missing a line, HDFS MapReduce is to be the source of data to be processed, but also MapReduce and HDFS data processing in order to have effect.

Guess you like

Origin www.cnblogs.com/hesz/p/10966228.html