Python + Spark2.0 + hadoop study notes --Hadoop MapReduce

MapReduce is a program development mode, using a number of parallel processing to the server. MapReduce, is the Map job distribution, Reduce summary of the results of the work order.

Among this to WordCount as an example, the number of calculations every file English word appears.

1) Create a directory wordcount

mkdir -p ~/wordcount/input

cd ~/wordcount

Use sudo gedit WordCount.java to edit the document.

2) Compile WordCount.java

sudo gedit ~/.bashrc

Then add profiles

Let ~ / .bashrc settings to take effect

source ~/.bashrc

Then start the compilation

hadoop com.sun.tools.javac.Main WordCount.java

jar cf wc.jar WordCount*.class

ll

3) Create a test text file

cp /usr/local/hadoop/LICENSE.txt ~/wordcount/input

ll ~/wordcount/input

Next start all virtual servers

Start the cluster

start-all.sh

Upload test file to HDFS directory

hadoop fs -mkdir -p /user/wordcount/input

Switch to ~ / wordcount / input directory

cd ~/wordcount/input

Upload a text file to the HDFS

hadoop fs -copyFromLocal LICENSE.txt /user/wordcount/input

HDFS file list

hadoop fs -ls /user/wordcount/input

4) Run WordCount.java

Change directory

cd ~/wordcount

WordCount program run

hadoop jar wc.jar WordCount /user/wordcount/input/LICENSE.txt /user/wordcount/output

5) Check operating results

View directory of HDFS

hadoop fs -ls /user/wordcount/output

View the contents of the output file in HDFS

hadoop fs -cat /user/wordcount/output/part-r-00000 |more

WordCount program execution again please delete the output directory

hadoop fs -rm -R /user/wordcount/output

 

Hadoop's MapReduce is not very good, here briefly.

 

Guess you like

Origin www.cnblogs.com/zhuozige/p/12583709.html