Hadoop [Lesson 8]: Hadoop creates MapReduce-based applications on Eclipse: WordCount

1. Create a Java Project

Refer to the blog Linux installation and configuration Eclipse.
My new project is named: WordCountZhm

2. Import the Jar package of Hadoop

(1) New Use Libraries

①Windows→Preferences
Insert picture description here
②Java→Build Path→Use Libraries→New
Insert picture description here
③Set the name as: HadoopJar ④Add
Insert picture description here
Jar package: Add External JARs
Insert picture description here

(2) The jar package under the directory /share/hadoop/common

Insert picture description here

(3) The jar package under the directory /share/hadoop/common/lib

Insert picture description here

(4) The jar package under the directory /share/hadoop/hdfs

Insert picture description here

(5) The jar package under the directory /share/hadoop/mapreduce

Insert picture description here
Insert picture description here

(6) Configure the Jar package path for the Java project


Insert picture description here
①Move the mouse to the project and right click→Build Path→Configure Build Path ②Libraries→Add Library
Insert picture description here
③Select Use Librariy
Insert picture description here
④Select HadoopJar
Insert picture description here

3. Write the code for WordCount

(1) Create 3 Java class files

Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here

(2) WCMapper.java

import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WCMapper  extends Mapper<LongWritable, Text, Text, LongWritable>{
    
    

	 //需要重写map方法
    @Override
    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, LongWritable>.Context context)
              throws IOException, InterruptedException {
    
    
            //接收数据V1
           String line = value.toString();
           //切分数据
           String[] words = line.split(" ");
           //循环输出word
           for(String word : words){
    
    
              //由于word是String类型数据,没有序列化,因此在写出去之前先序列化。
              //1是int类型,没有序列化,因此要序列化。
           context.write(new Text(word), new LongWritable(1));
           }
     }
}

(3)WCReduce.java

import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WCReduce extends Reducer<Text, LongWritable, Text, LongWritable>{
    
    

	@Override
    protected void reduce(Text key, Iterable<LongWritable> v2s,
                      Reducer<Text, LongWritable, Text, LongWritable>.Context context) throws IOException, InterruptedException {
    
    
            //定义一个counter用来统计某个单词出现的次数是多少  
            long counter=0;
            //其实v2s当中存储的都是一个个被序列化好了的1
            for(LongWritable i : v2s){
    
    
                  counter+=i.get();//跟我们熟悉的counter++是一个意思
            }
           //输出<K3、V3>,比如<"hello", 5>
           context.write(key, new LongWritable(counter));
    }
}

(4)wordcount.java


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class wordcount {
    
    

	public static void main(String[] args) throws Exception {
    
    
		long startTime = System.currentTimeMillis(); //获取开始时间
		// TODO Auto-generated method stub
		  //我们已经自定义好了Mapper和Reduce而现在我们要做的就是把MapReduce作业提交上去
        //现在我们把MapReduce作业抽象成Job对象了
        Job job = Job.getInstance(new Configuration());

        //注意:一定要将main方法所在的类设置进来。
        job.setJarByClass(wordcount.class);

        //接下来我们设置一下Job的Mapper相关属性
        job.setMapperClass(WCMapper.class);//设置Mapper类
        job.setMapOutputKeyClass(Text.class);//设置K2的类型
        job.setMapOutputValueClass(LongWritable.class);//设置V2的类型
        //接下来我们得告诉程序我们应该去哪里读取文件。需要注意的是Path是指在Hadoop的HDFS系统上的路径
        FileInputFormat.setInputPaths(job, new Path(args[0]));//这里我们采用变量的形式传进来地址

        //接下来我们来设置一下Job的Reducer相关属性
        job.setReducerClass(WCReduce.class);//设置Reducer类
        job.setOutputKeyClass(Text.class);//设置K3的类型
        job.setOutputValueClass(LongWritable.class);//设置V3的类型
        //接下来我们得告诉程序应该把结果信息写到什么位置。注意:这里的Path依然是指文件在Hadoop的HDFS系统
        //上的路径。
       FileOutputFormat.setOutputPath(job, new Path(args[1]));//我们依然采用变量的形式传进来输出地址。
       job.waitForCompletion(true);//把作业提交并且等待执行完成,参数为true的话,会打印进度和详情。
       long endTime = System.currentTimeMillis();//获取结束时间
       System.out.println("程序运行时间:" + (endTime - startTime) + "ms"); //输出程序运行时间
	}

}

4. Pack WordCount as Jar package


Insert picture description here
①Move the mouse to the project and right click →Export ②Java→JAR file→Next
Insert picture description here
The directory set here must remember, I set it as: /usr/local/src/hadoop/hadoop-2.7.7/
Insert picture description here

5. Run the test WordCount

(1) Enter the directory where the Jar package generated in the previous step is located

# cd /usr/local/src/hadoop/hadoop-2.7.7/

Insert picture description here

(2) Create a document file

# vi wordtest

Enter the document for which you want to count the number of words, I filled in the following content

hello nongyuanwei
hello shenyaxin
hello huchen
hello luying
hello wangli
hello danche
hello jiangyunsheng
hello huachenyu
shenyaxin is working there
nongyuanwei is playing there
they are stupid

Insert picture description here

(3) Upload files to HDFS system

# hadoop fs -put wordtest hdfs://localhost:9000/wordtest

(4) View the contents of the hafs file system

# hadoop fs -ls hdfs://localhost:9000/

Insert picture description here

(5) Execute the application and check the time cost

# time hadoop jar WordCountZhm.jar wordcount /wordtest /WCOut

execution succeed!
Insert picture description here
Time cost:
Insert picture description here

(6) View the contents of the hafs file system

# hadoop fs -ls hdfs://localhost:9000/

Insert picture description here
Insert picture description here
Insert picture description here

(7) View the results of the count of words

# hadoop fs -cat /WCOut/part-r-00000 

Insert picture description here

Guess you like

Origin blog.csdn.net/qq_41315788/article/details/109273101