MR basic example
MR basic example
Download program
1. Switch to the hadoop user.
Command: su – hadoop
2. Check whether the cluster is working properly.
Command: jps
3. Switch to the /tmp directory and create a mr file. Note: This mr was created during the Java example before, and can be deleted now.
Command: cd /tmp/
, mkdir mr
4, into the Spark-stack / Hadoop / directory.
command:cd Spark-stack/Hadoop/
5. Copy WordCount-python to the /tmp/mr/ directory.
command:cp -r WordCount/* /tmp/mr/
6. Open the WordCount.java file in the org/apache/hadoop/examples directory.
command:vi org/apache/hadoop/examples/WordCount.java
7. Because the program is now compiled in the Hadoop cluster environment, instead of exporting the jar package in the idea, you need to annotate the package information.
Code explanation
Analyze the input parameters, input paths, and output paths of the input function.
Create Job, which represents the entire process of MapReduce.
Add multiple input files or paths to the job
Key is the line number, value is the string for each line of the map, and the context defines the entire context. First, change the value into a string, and then use the StringTokenizer method to turn into individual words and store them in the word, and then store the words and frequencies that appear in the word in the context.
The contents of the same key in the map have been grouped together. Perform a summation of the results of multiple maps of the same key.
Compile
1. First make sure that the HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar directory exists.
command:export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
2. Because the Java files are stored in the WordCount/org/apache/hadoop/examples/ directory, enter this directory.
command:cd org/apache/hadoop/examples/
3. Compile.
Command: hadoop com.sun.tools.javac.Main WordCount.java
Note: There will be many class files.
4. Package the class file and package it into an executable jar package.
command:jar cf WordCount.jar WordCount*.class
Run MapReduce
1. Check the catalog.
Command: hdfs dfs -ls /
Note: There is an installation path installTest.
2. Check the installTest directory.
command:hdfs dfs -ls /installTest
3. Check the hadoop directory.
Command: hdfs dfs -ls /installTest/Hadoop
Note: The data directory is the directory during installation, and the output is the mapreduce task used in the hadoop installation test.
4. Check the data directory.
command:hdfs dfs -ls /installTest/hadoop/data
5. Submit the MapReduce program.
Command: hadoop jar WordCount.jar WordCount /installTest/hadoop/data /installTest/hadoop/output3
Note: The input is /installTest/hadoop/data, the output is /installTest/hadoop/output3, make sure the output does not exist. Map needs to be started slowly, and reduce needs to be started before it runs.
6. View the results.
command:hdfs dfs -ls /installTest/hadoop/output3
7. The result is in the part-r-00000 file. View the file.
command:hdfs dfs -cat /installTest/hadoop/output3/part-r-00000
The preceding string is the input data, which is the string that has appeared in data, and the number is the number of occurrences.
Friendly reminder: For detailed learning content, please watch Spark's fast big data processing-Yu Haifeng https://edu.csdn.net/course/detail/24205