MapReduce distributed computing system

MapReduce is a programming model for large data sets (greater than 1TB) parallel computing. The concept "Map (Mapping)" and "Reduce (reduction)", and their main idea is borrowed, and borrowed from the vector programming language properties from functional programming languages. It is very easy for programmers in the case will not be distributed and parallel programming will own programs running on a distributed system. The current software implementation is to specify a Map (mapping) function, a key-value pair is used to map into a new set of key-value pairs specified concurrent Reduce (reduction) function to ensure that all of the key-value mappings each group share the same key.

Run the program wordcount

cd /opt/module/hadoop-2.7.3/share/hadoop/mapreduce into the path wordcount lies.
Run touch in.txt, create In.txt file as input file.
(If in.txt empty file, run vi in.txt, statistical word frequency as the input content of the input file)
output directory / output must not exist, automatically create the program running.
Run wordcount:
hadoop hadoop-JAR-examples-2.7.3.jar MapReduce wordcount /adir/in.txt the Output /
after a successful run, enter / output directory, open the file part-r-00000 View counting results.

Guess you like

Origin www.cnblogs.com/jsg-1262534563/p/10926712.html