Big data combat Linux Ubuntu 20.04.1 hadoop 2.8.5 Write word counting program on MapReduce platform

1. Preliminary work

The hadoop environment was successfully built.
For details on how to build a hadoop environment, click here

2. Count words in the server minimization system

2.1 Switch users, view the process

Purpose: to ensure that the
su-angel
master node process is enabled in the hadoop cluster
Insert picture description here

Slave node process
Insert picture description here

2.3 Establish test documents

The path of the test document is /home/angel
vim.tiny sw1.txt
Insert picture description here

vim.tiny sw2.txt
Insert picture description here

2.4 Create a test folder and upload it to the cluster

View the file
hdfs dfs -ls
Insert picture description here
in the cluster / Create a test folder in the cluster as the test folder
hdfs dfs -mkdir /test
Insert picture description here
Upload the test folder to the cluster
hdfs dfs -put sw*.txt /test
Insert picture description here
View the test folder Did not upload successfully
hdfs dfs -ls /test
Insert picture description here
view the content of sw1.txt
hdfs dfs -cat /test/sw1.txt
Insert picture description here

2.5 Run the word count program in the hadoop package to count the words in the test file

2.5.1 Find related packages

cd /app/hadoop-2.8.5/
Insert picture description here
Insert picture description here
Insert picture description here

2.5.2 Run the classes in the hadoop package to count words

Run the wordcount class in the hadoop-mapreduce-examples-2.8.5 package with the hadoop command, enter /test, and output out1.
hadoop jar hadoop-mapreduce-examples-2.8.5 jar wordcount /test /out1
Insert picture description here

2.5.3 View output files in the cluster

Insert picture description here

Insert picture description here
At this point, the haddop cluster is established on the server minimization system, and MapReduce is run to count the words successfully.

3. Run eclipse and write a word counting program

3.1 Preliminary work

This article mainly explains the application
details on the MapReduce platform. Click here to view the preliminary work.
Note:
eclipse can only be run on the desktop version of the system, and cannot be run in the prompt window .

3.2 Open eclipse

Insert picture description here
Insert picture description here
Insert picture description here

3.3 Create the project and configure the Hadoop installation directory

3.3.1 Setting up the project

Execute "File"-"New"-"Other..." in the menu bar and select "Map/Reduce Project", as shown in the figure below.
Insert picture description here

Enter the project name "WordCount" and select "Configure Hadoop install directory..."
Insert picture description here

3.3.2 Configure Hadoop installation directory

Insert picture description here
Insert picture description here
Click next.
Insert picture description here
Select the project.
Insert picture description here
Insert picture description here
To this project name to create and configure the Hadoop installation directory is complete. Click next.

3.4 New Hadoop address

Insert picture description here
Enter "master" in the "Location name" text box, the original "localhost" of "Map/Reduce(V2) Mater" "Host" is changed to "master", the original "Port" of "DFS Master" is changed to "50040" As "9000", the others remain unchanged.

Insert picture description here
Insert:
Why is it 9000?
You can go to the configuration file to see.
cat /app/hadoop-2.8.5/etc/hadoop/core-site.xml to
Insert picture description here
see if there is a master in the blue on the right, and if there is a master, it proves that the new Hadoop address is successful.
Insert picture description here

3.5 Establish a class for writing word counts

In "WordCount"-"src", create a new "WordCount" class.
Insert picture description here

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
    
    
  public static class TokenizerMapper 
       extends Mapper<Object, Text, Text, IntWritable>{
    
    
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
    
    
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
    
    
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }
  public static class IntSumReducer 
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    
    
    private IntWritable result = new IntWritable();
    public void reduce(Text key, Iterable<IntWritable> values, 
                       Context context
                       ) throws IOException, InterruptedException {
    
    
      int sum = 0;
      for (IntWritable val : values) {
    
    
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }
  public static void main(String[] args) throws Exception {
    
    
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length < 2) {
    
    
      System.err.println("Usage: wordcount <in> [<in>...] <out>");
      System.exit(2);
    }
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    for (int i = 0; i < otherArgs.length - 1; ++i) {
    
    
      FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
    }
    FileOutputFormat.setOutputPath(job,
      new Path(otherArgs[otherArgs.length - 1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

Insert picture description here

3.6 Run class, you need to add program parameters and set the main class before

“Run As”-“Run_Confilgurations”,双击“Java Application”
Insert picture description here

3.6.1 Add program parameters

Double-click "Java Application", enter "WordCount" in the Name text box,
Insert picture description here

Select the "Arguments" tab and enter two lines of parameters in the "Program arguments" text box: "hdfs://master:9000/input", "hdfs://master:9000/output"
Insert picture description here

3.6.2 Set the main class

Select the main category of the project, click "Search", and
Insert picture description here
set "WordCount-(default package)" as the main category.
Insert picture description here

3.7 Clear the previous experiment files to prevent interference

Insert picture description here
Insert picture description here
Create test files/input
Insert picture description here
upload test files in the cluster
Insert picture description here

3.8 Reconnect to the cluster

Insert picture description here
Insert picture description here

3.9 Run Hadoop

Insert picture description here
Select the project, the
Insert picture description here
console does not report errors, only warnings.
Insert picture description here

3.10 View results

Insert picture description here
View the result in the character interface, which is consistent with the window on the desktop.
Insert picture description here
So far, writing a word counting program on the MapReduce platform is successful! !

Guess you like

Origin blog.csdn.net/qq_45059457/article/details/109153827