IDEA & configure Hadoop development environment compile and run the program WordCount

Hadoop and related java installation configuration, see: https://www.cnblogs.com/lxc1910/p/11734477.html

 

1, the new Java project:

Select the appropriate JDK, as shown:

 

Name the project WordCount.

 

2, add WordCount class file:

Add a new Java class files in src, the class called WordCount, code is as follows:

 1 import java.io.IOException;
 2 
 3 import java.util.StringTokenizer;
 4 
 5 import org.apache.hadoop.conf.Configuration;
 6 
 7 import org.apache.hadoop.fs.Path;
 8 
 9 import org.apache.hadoop.io.IntWritable;
10 
11 import org.apache.hadoop.io.Text;
12 
13 import org.apache.hadoop.mapreduce.Job;
14 
15 import org.apache.hadoop.mapreduce.Mapper;
16 
17 import org.apache.hadoop.mapreduce.Reducer;
18 
19 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
20 
21 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
22 
23 import org.apache.hadoop.util.GenericOptionsParser;
24 
25 public class WordCount {
26     public static class TokenizerMapper //定义Map类实现字符串分解
27             extends Mapper<Object, Text, Text, IntWritable>
28     {
29         private final static IntWritable one = new IntWritable(1);
30          Private the Text Word = new new the Text ();
 31 is          // implemented map () function 
32          public  void Map (Object Key, the Text value, the Context context)
 33 is                  throws IOException, InterruptedException
 34 is          { // string disassembled into words 
35              StringTokenizer = ITR new new StringTokenizer (value.toString ());
 36              the while (itr.hasMoreTokens ())
 37 [              {word.set (itr.nextToken ()); // will write a word after word class exploded 
38 is                  context.write (Word, One); // collection <Key, value> 
39             }
40         }
41     }
42 
43     //定义Reduce类规约同一key的value
44     public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable>
45     {
46         private IntWritable result = new IntWritable();
47         //实现reduce()函数
48         public void reduce(Text key, Iterable<IntWritable> values, Context context )
49                 throws IOException, InterruptedException
50         {
51             int0 = SUM ;
 52 is              // traversal iteration values, all obtained for the same key value 
53 is              for (IntWritable Val: values) + SUM = { val.get ();}
 54 is              result.set (SUM);
 55              // generate outputs <Key, value> 
56 is              context.write (Key, Result);
 57 is          }
 58      }
 59  
60      public  static  void main (String [] args) throws Exception
 61 is      { // for the task, the profile 
62 is          the configuration = the conf new new the configuration ( );
 63          // command line parameters
64          String [] = otherArgs new new GenericOptionsParser (the conf, args) .getRemainingArgs ();
 65          IF (! OtherArgs.length = 2 )
 66          {System.err.println ( "the Usage: WordCount <in> <OUT>" );
 67              System.exit (2 );
 68          }
 69          the job job = Job.getInstance (the conf, "Word COUNT"); // create a user-defined the job 
70          job.setJarByClass (the WordCount. class ); // set the jar mission 
71 is          job.setMapperClass (TokenizerMapper. class ); // set Mapper classes 
72         job.setCombinerClass (IntSumReducer. class ); // set Combine class 
73 is          job.setReducerClass (IntSumReducer. class ); // set based Reducer 
74          job.setOutputKeyClass (the Text. class ); // set the key job output
 75          // set value job output 
76          job.setOutputValueClass (IntWritable. class );
 77          // set the input path to the file 
78          (Job, FileInputFormat.addInputPath new new the path (otherArgs [0 ]));
 79          // set the path of the output file 
80         FileOutputFormat.setOutputPath (Job, new new the Path (otherArgs [. 1 ]));
 81          // submit jobs and wait for the completion of the task 
82          System.exit (job.waitForCompletion ( to true ) 0:?. 1 );
 83      }
 84  
85 }

 

3, add dependent libraries:

Click File -> Project Structure -> Modules, select Dependencies, click the plus sign to add the following dependencies:

 

 

4, the compiler generates a JAR package:

Click File -> Project Structure -> Artifacts, click the plus sign -> JAR-> from modules with dependencies,

Mainclass select WordCount categories:

Here began compiled JAR package:

Click build-> build Artifacts-> build, after the completion of compilation, you will find more than a directory output.

 

5, the system operating in the JAR package hadoop:

Before I installed the pseudo-distributed system in hadoop hadoop users, first copy the JAR package to the next hadoop user directory.

Start hadoop service :( sbin folder in the file hadoop installation directory)

./start-all.sh

New in hdfs test-in folder and put file1.txt, file2.txt two documents,

1 hadoop fs -mkdir test-in
2 hadoop fs -put file1.txt file2.txt test-in/

执行jar包:

1 hadoop jar WordCount.jar test-in test-out

因为之前生成JAR包时设置了主类,所以WordCount.jar后面不需要再加WordCount.

另外需要注意运行JAR包之前hdfs中不能有test-out文件夹。

 

6、查看运行结果

可通过http://localhost:50070/查看hadoop系统状况,

点击Utilities->Browse the file system即可查看hdfs文件系统:

可以看到test-out文件下有输出文件,可通过命令:

1 hadoop fs -cat test-out/part-r-00000

查看文件输出情况:

 

7、参考

https://blog.csdn.net/chaoping315/article/details/78904970

https://blog.csdn.net/napoay/article/details/68491469

https://blog.csdn.net/ouyang111222/article/details/73105086

Guess you like

Origin www.cnblogs.com/lxc1910/p/11798479.html