Teacher Yu takes you to learn big data-Spark fast big data processing Chapter 3 Section 6 MR basic use cases WordCount

MR basic example

MR basic example

Download program

1. Switch to the hadoop user.
Command: su – hadoop
Insert picture description here
2. Check whether the cluster is working properly.
Command: jps
Insert picture description here
3. Switch to the /tmp directory and create a mr file. Note: This mr was created during the Java example before, and can be deleted now.
Command: cd /tmp/, mkdir mr
Insert picture description here
4, into the Spark-stack / Hadoop / directory.
command:cd Spark-stack/Hadoop/

Insert picture description here
5. Copy WordCount-python to the /tmp/mr/ directory.
command:cp -r WordCount/* /tmp/mr/

Insert picture description here
6. Open the WordCount.java file in the org/apache/hadoop/examples directory.
command:vi org/apache/hadoop/examples/WordCount.java

Insert picture description here

7. Because the program is now compiled in the Hadoop cluster environment, instead of exporting the jar package in the idea, you need to annotate the package information.

Insert picture description here

Code explanation

Insert picture description here

Analyze the input parameters, input paths, and output paths of the input function.
Create Job, which represents the entire process of MapReduce.

Insert picture description here
Add multiple input files or paths to the job

Insert picture description here

Key is the line number, value is the string for each line of the map, and the context defines the entire context. First, change the value into a string, and then use the StringTokenizer method to turn into individual words and store them in the word, and then store the words and frequencies that appear in the word in the context.

Insert picture description here

The contents of the same key in the map have been grouped together. Perform a summation of the results of multiple maps of the same key.

Compile

1. First make sure that the HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar directory exists.

command:export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
Insert picture description here

2. Because the Java files are stored in the WordCount/org/apache/hadoop/examples/ directory, enter this directory.

command:cd org/apache/hadoop/examples/

Insert picture description here

3. Compile.

Command: hadoop com.sun.tools.javac.Main WordCount.javaNote: There will be many class files.

Insert picture description here

4. Package the class file and package it into an executable jar package.

command:jar cf WordCount.jar WordCount*.class

Insert picture description here

Run MapReduce

1. Check the catalog.

Command: hdfs dfs -ls /Note: There is an installation path installTest.

Insert picture description here

2. Check the installTest directory.

command:hdfs dfs -ls /installTest

Insert picture description here

3. Check the hadoop directory.

Command: hdfs dfs -ls /installTest/HadoopNote: The data directory is the directory during installation, and the output is the mapreduce task used in the hadoop installation test.

Insert picture description here

4. Check the data directory.

command:hdfs dfs -ls /installTest/hadoop/data

Insert picture description here

5. Submit the MapReduce program.

Command: hadoop jar WordCount.jar WordCount /installTest/hadoop/data /installTest/hadoop/output3Note: The input is /installTest/hadoop/data, the output is /installTest/hadoop/output3, make sure the output does not exist. Map needs to be started slowly, and reduce needs to be started before it runs.

Insert picture description here
Insert picture description here

6. View the results.

command:hdfs dfs -ls /installTest/hadoop/output3
Insert picture description here

7. The result is in the part-r-00000 file. View the file.

command:hdfs dfs -cat /installTest/hadoop/output3/part-r-00000

Insert picture description here

The preceding string is the input data, which is the string that has appeared in data, and the number is the number of occurrences.

Friendly reminder: For detailed learning content, please watch Spark's fast big data processing-Yu Haifeng https://edu.csdn.net/course/detail/24205

Guess you like

Origin blog.csdn.net/weixin_45810046/article/details/108823185