Hadoop 2.6.0: Intellij idea combined with Maven to run MapReduce program locally (without Hadoop and HDFS environment)

surroundings

  1. JDK 1.8
  2. Intellij Idea 2018.1
  3. Hadoop 2.6.0 (Hadoop is not installed locally)
  4. maven 3.5.4

 Create word count project

  1.  Create a new maven java project in idea (configure maven jdk slightly)

 

Configure pom dependencies 

 

  1.  pom.xml file
    
        <properties>
            <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
            <maven.compiler.source>1.8</maven.compiler.source>
            <maven.compiler.target>1.8</maven.compiler.target>
        </properties>
    
        <dependencies>
            <dependency>
                <groupId>junit</groupId>
                <artifactId>junit</artifactId>
                <version>4.11</version>
                <scope>test</scope>
            </dependency>
    
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-client</artifactId>
                <version>2.6.0</version>
            </dependency>
    
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
                <version>2.6.0</version>
            </dependency>
    
            <dependency>
                <groupId>commons-cli</groupId>
                <artifactId>commons-cli</artifactId>
                <version>1.2</version>
            </dependency>
    
            <dependency>
                <groupId>log4j</groupId>
                <artifactId>log4j</artifactId>
                <version>1.2.17</version>
            </dependency>
    
        </dependencies>

     

  2. Create mapper class

    package com.lens.task;
    
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;
    
    import java.io.IOException;
    import java.util.StringTokenizer;
    
    /**
     * @author lens
     * @create 2020-02-25 10:24
     */
    public class VoteCountMapper extends Mapper<Object, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();
    
        @Override
        protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            StringTokenizer words = new StringTokenizer(value.toString());
            while (words.hasMoreTokens()) {
                word.set(words.nextToken());
                context.write(word, one);
            }
        }
    }
    

     

  3. Create reducer class

    package com.lens.task;
    
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Reducer;
    
    import java.io.IOException;
    
    /**
     * @author lens
     * @create 2020-02-25 10:24
     */
    public class VoteCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();
    
    
        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
            int count = 0;
    
            for (IntWritable value : values) {
                count += value.get();
            }
            result.set(count);
            context.write(key, result);
    
        }
    }
    

     

  4. Create voteCount driver class

    package com.lens.task;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;
    
    
    /**
     * @author lens
     * @create 2020-02-25 10:22
     */
    public class VoteCount extends Configured implements Tool {
        public static void main(String[] args) throws Exception {
            int res = ToolRunner.run(new Configuration(),new VoteCount(),args);
            System.exit(res);
        }
    
    
        public int run(String[] args) throws Exception {
            if (args.length !=2){
                System.out.println("Incorrect input, expected: [input] [output]");
                System.exit(-1);
            }
    
            Configuration conf = this.getConf();
            Job job = new Job(conf, "word count");
            job.setJarByClass(VoteCount.class);
    
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(Text.class);
    
            job.setMapperClass(VoteCountMapper.class);
            job.setReducerClass(VoteCountReducer.class);
    
            job.setInputFormatClass(TextInputFormat.class);
            job.setOutputValueClass(TextOutputFormat.class);
    
            job.setMapOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
    
            FileInputFormat.setInputPaths(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));
    
            job.submit();
            return job.waitForCompletion(true) ? 0 : 1;
        }
    }
    

    Note: Here the File Format needs to import the package under lib

 5. Create wordcount file input directory input. The file characters are counted, and the count result is output.

First, you need to configure the input path. Here src, create a new folder under the project (the directory at the same level) inputand add one or more text files to inputit (uploaded) as an example.

需要注意:File-> Project Structure, Select Modulesitems in the pop-up dialog box , here the inputfolder is marked asExcluded .

 

 Configure operating parameters

Here you need to configure the input input output output path required by the Main class and VoteCount when the program is running.

Select- Run> in the Intellij menu bar Edit Configurationsand click in the dialog box that pops up to  +create a new Applicationconfiguration. Configure Main classas Vote Count (you can click on the ...selection on the right ), that Program argumentsis input/ output/, the input path is the created inputfolder, and the output isoutput(可以不配)

 

run

After the configuration is completed, click the menu bar- Run> Run 'VoteCount'to start running the MapReduce program. After the program is completed, a folder will appear on the upper left output, and part-r-00000the result is the operation!

Input file

operation result

Published 4 original articles · Like1 · Visits 196

Guess you like

Origin blog.csdn.net/weixin_40983094/article/details/104496282