Hadoop implementation of WordCount official demo case

Brief introduction

There are some official examples in /hadoop-2.7.2/share/hadoop/mapreduce.

Enter the /hadoop-2.7.2/share/hadoop/mapreduce directory

Create a new hello file and write something into it, and then use carriage return and line feed to write a few more lines to facilitate testing

Excuting an order

Prepare test data

[root@zjj102 demo]# vim hello 

skldjlkasjd
zhjj

zjj
sada

Ready to execute wordcount program


# 在HDFS上面创建wc目录
[root@zjj102 demo]# hadoop fs -mkdir /wc
# 上传hello文件到HDFS上面的wc目录下面
[root@zjj102 demo]# hadoop fs -put hello  /wc
[root@zjj102 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.2.jar  wordcount /wc /wcoutput 

Execute command interpretation
Command: hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount /wc /wcoutput

wordcount is a class name and also a demo name.
/wc means a file under the /wc directory of HDFS. You can create a file, write some content in it, and upload it to the /wc directory on HDFS.
/wcoutput means when The result of the program operation is output to the /wcoutput file

The above command means to execute this jar, the specified program is wordcount, count the /wc directory in the stored program specified by Hadoop, and output the result to the /wcoutput directory.

The above execution principle is that you need to have a program typed into a jar package, and then use the command to execute the jar package, and then pass the main class name you want to execute (wordcount is the main class name), and then hadoop automatically parses the class name you pass. Then you pass an input parameter, and an output parameter.

View execution results

It is found that there is an additional wcoutput directory in the console, and there is no use content in _SUCCESS below. It just tells the programmer that the program is successfully executed, and the execution result is in the file below

Insert picture description here
The console can view the contents of this file to see the results.

shell:

[root@zjj102 mapreduce]# hadoop fs -cat /wcoutput/part-r-00000
sada    1
skldjlkasjd     1
zhjj    1
zjj     1

Guess you like

Origin blog.csdn.net/qq_41489540/article/details/109118412