Hadoop and related java installation configuration, see: https://www.cnblogs.com/lxc1910/p/11734477.html
1, the new Java project:
Select the appropriate JDK, as shown:
Name the project WordCount.
2, add WordCount class file:
Add a new Java class files in src, the class called WordCount, code is as follows:
1 import java.io.IOException; 2 3 import java.util.StringTokenizer; 4 5 import org.apache.hadoop.conf.Configuration; 6 7 import org.apache.hadoop.fs.Path; 8 9 import org.apache.hadoop.io.IntWritable; 10 11 import org.apache.hadoop.io.Text; 12 13 import org.apache.hadoop.mapreduce.Job; 14 15 import org.apache.hadoop.mapreduce.Mapper; 16 17 import org.apache.hadoop.mapreduce.Reducer; 18 19 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 20 21 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 22 23 import org.apache.hadoop.util.GenericOptionsParser; 24 25 public class WordCount { 26 public static class TokenizerMapper //定义Map类实现字符串分解 27 extends Mapper<Object, Text, Text, IntWritable> 28 { 29 private final static IntWritable one = new IntWritable(1); 30 Private the Text Word = new new the Text (); 31 is // implemented map () function 32 public void Map (Object Key, the Text value, the Context context) 33 is throws IOException, InterruptedException 34 is { // string disassembled into words 35 StringTokenizer = ITR new new StringTokenizer (value.toString ()); 36 the while (itr.hasMoreTokens ()) 37 [ {word.set (itr.nextToken ()); // will write a word after word class exploded 38 is context.write (Word, One); // collection <Key, value> 39 } 40 } 41 } 42 43 //定义Reduce类规约同一key的value 44 public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> 45 { 46 private IntWritable result = new IntWritable(); 47 //实现reduce()函数 48 public void reduce(Text key, Iterable<IntWritable> values, Context context ) 49 throws IOException, InterruptedException 50 { 51 int0 = SUM ; 52 is // traversal iteration values, all obtained for the same key value 53 is for (IntWritable Val: values) + SUM = { val.get ();} 54 is result.set (SUM); 55 // generate outputs <Key, value> 56 is context.write (Key, Result); 57 is } 58 } 59 60 public static void main (String [] args) throws Exception 61 is { // for the task, the profile 62 is the configuration = the conf new new the configuration ( ); 63 // command line parameters 64 String [] = otherArgs new new GenericOptionsParser (the conf, args) .getRemainingArgs (); 65 IF (! OtherArgs.length = 2 ) 66 {System.err.println ( "the Usage: WordCount <in> <OUT>" ); 67 System.exit (2 ); 68 } 69 the job job = Job.getInstance (the conf, "Word COUNT"); // create a user-defined the job 70 job.setJarByClass (the WordCount. class ); // set the jar mission 71 is job.setMapperClass (TokenizerMapper. class ); // set Mapper classes 72 job.setCombinerClass (IntSumReducer. class ); // set Combine class 73 is job.setReducerClass (IntSumReducer. class ); // set based Reducer 74 job.setOutputKeyClass (the Text. class ); // set the key job output 75 // set value job output 76 job.setOutputValueClass (IntWritable. class ); 77 // set the input path to the file 78 (Job, FileInputFormat.addInputPath new new the path (otherArgs [0 ])); 79 // set the path of the output file 80 FileOutputFormat.setOutputPath (Job, new new the Path (otherArgs [. 1 ])); 81 // submit jobs and wait for the completion of the task 82 System.exit (job.waitForCompletion ( to true ) 0:?. 1 ); 83 } 84 85 }
3, add dependent libraries:
Click File -> Project Structure -> Modules, select Dependencies, click the plus sign to add the following dependencies:
4, the compiler generates a JAR package:
Click File -> Project Structure -> Artifacts, click the plus sign -> JAR-> from modules with dependencies,
Mainclass select WordCount categories:
Here began compiled JAR package:
Click build-> build Artifacts-> build, after the completion of compilation, you will find more than a directory output.
5, the system operating in the JAR package hadoop:
Before I installed the pseudo-distributed system in hadoop hadoop users, first copy the JAR package to the next hadoop user directory.
Start hadoop service :( sbin folder in the file hadoop installation directory)
./start-all.sh
New in hdfs test-in folder and put file1.txt, file2.txt two documents,
1 hadoop fs -mkdir test-in 2 hadoop fs -put file1.txt file2.txt test-in/
执行jar包:
1 hadoop jar WordCount.jar test-in test-out
因为之前生成JAR包时设置了主类,所以WordCount.jar后面不需要再加WordCount.
另外需要注意运行JAR包之前hdfs中不能有test-out文件夹。
6、查看运行结果
可通过http://localhost:50070/查看hadoop系统状况,
点击Utilities->Browse the file system即可查看hdfs文件系统:
可以看到test-out文件下有输出文件,可通过命令:
1 hadoop fs -cat test-out/part-r-00000
查看文件输出情况:
7、参考
https://blog.csdn.net/chaoping315/article/details/78904970