MapReducer project structure analysis of Hadoop combat

1. MapReducer project structure analysis

1 Introduction

Before referring to this example:
1. Make sure that the hadoop cluster environment is set up.
2. Installed the eclipse development environment for hadoop.
3. This is a learning analysis of the three classes of Mapper, Reduce, and Job in MapReducer.

2. Hadoop's MapReducer model structure

(1) Hadoop development in eclipse:
Open eclipse on the system and create a new MapReducer project:

  1. Open eclipse, file->other->Map/Reducer Project->Next->Project name to create a project name named WordCount, then create a package named cn.edu.gznc, and then create three classes in the package, respectively are WordCountMapper, WordCountReduce, WordCountJob.
    Illustration:
    write picture description here
    write picture description here
    Here is the MapReducer project structure established during the demonstration of the usual hadoop combat.
    Next, let's analyze what MapReduce is.

3. Analysis

As we all know, HDFS and MapReduce are two important cores of Hadoop, and MapReduce is the distributed computing model of Hadoop.
A typical MapReduce is mainly divided into two steps: Map step and Reduce step. In order to facilitate learning and understanding, here is a story to explain:
Now you are asked to count how many books are in a library. In order to complete this task, you can assign Xiaoming Go to count bookshelf 1, assign Xiaohong to count bookshelf 2.... This assignment process is the Map step. Finally, after each person has counted the bookshelves they are responsible for, they will accumulate the results of each person. The cumulative statistical process is Reduce step.

This is a simple understanding. In order to facilitate our hadoop actual combat learning, if you want to understand in depth, you can do Baidu by yourself.

Next, we analyze the three necessary classes in the MapReduce project.

  1. xxxMapper.java
    generally needs to write a map method in the xxxmapper.java class.
    That is, the Map step mentioned above:
    First, to implement the Map step, it is actually to implement a class that inherits the Mapper class and rewrites the map method in it.

What's the point of overriding this map method?
Continuing to take the example of statistical books, when Xiao Ming is assigned to Bookshelf 1 to count books, Xiao Ming can be lazy. For those books he does not want to count, he can not count them; Xiao Ming can also be very responsible, and the results of the statistics reach 100%. One hundred accurate.
All in all, Xiao Ming only needs to show the statistical results to the person in charge of the summary. As for how he handles it, the person in charge of the summary has no control.
Rewriting this map method corresponds to the process of implementing this processing, which is responsible for converting the input

public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>{ 
    /* 
     * map方法是提供给map task进程来调用的,map task进程是每读取一行文本来调用一次我们自定义的map方法 
     * map task在调用map方法时,传递的参数: 
     *      一行的起始偏移量LongWritable作为key 
     *      一行的文本内容Text作为value 
     */  
    @Override  
    protected void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException {  
        //拿到一行文本内容,转换成String 类型  
        String line = value.toString();  
        //将这行文本切分成单词  
        String[] words=line.split(" ");  

        //输出<单词,1>  
        for(String word:words){  
            context.write(new Text(word), new IntWritable(1));  
        }  
    }  
  1. To implement the Reduce step, xxxReducer.java
    needs to implement a class that inherits the Reducer class and overrides the reduce method in it. So write a reduce method in xxxReducer.java.

Reminder:
The output result in the Map step is in the form of <word, 1>, and the merge process is performed, and the key-value pairs with the same key value are merged to form a

public class WordCountReducer  extends Reducer<Text, IntWritable, Text, IntWritable>{
    @Override  
    /* 
     * reduce方法提供给reduce task进程来调用 
     *  
     * reduce task会将shuffle阶段分发过来的大量kv数据对进行聚合,聚合的机制是相同key的kv对聚合为一组 
     * 然后reduce task对每一组聚合kv调用一次我们自定义的reduce方法 
     * 比如:<hello,1><hello,1><hello,1><tom,1><tom,1><tom,1> 
     *  hello组会调用一次reduce方法进行处理,tom组也会调用一次reduce方法进行处理 
     *  调用时传递的参数: 
     *          key:一组kv中的key 
     *          values:一组kv中所有value的迭代器 
     */  
    protected void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {  
        //定义一个计数器  
        int count = 0;  
        //通过value这个迭代器,遍历这一组kv中所有的value,进行累加  
        for(IntWritable value:values){  
            count+=value.get();  
        }  

        //输出这个单词的统计结果  
        context.write(key, new IntWritable(count));  
    }  

3. This step of xxxJob.java
generally provides the main function entry, loads the jar package, mapper, and reducer files under the job, and submits the task to the hadoop cluster.
In Hadoop, each MapReduce task is regarded as a Job (job). Before executing the task, the task must be configured first.
Generally, the following things need to be set in xxxJob.java:
• Set the class to process the job, setJarByClass()
• Set the name of the job, setJobName()
• Set the path where the input data of the job is located
• Set the path where the output results of the job are saved
• Set the class that implements the Map step, setMapperClass()
• Set the class that implements the Reduce step, setReducerClass()
• Set the type of the output result key, setOutputKeyClass()
• Set the type of the output result value, setOuputValueClass()
• Execute the job (submit to hadoop cluster)

Because after the coding is completed, it is generally no longer run in eclipse, but after the coding is completed, it is packaged as a jar package and exported to hadoop to run.


The analysis of the appeal is a brief analysis of the entire MapReducer project structure, mainly to help you understand why to do a hadoop analysis data, you need such three java class files and several necessary methods.

In the next article, start to record the number of words counted by WordCount in hadoop combat.


You got a dream, you gotta protect it.
If you have a dream, you gotta protect it. ——"When happiness knocks on the door"

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325366349&siteId=291194637