Mapreduce two ways to run programs on hadoop

Before learning the knowledge of hadoop period of time to learn the theoretical basis of the actual operation at the same time it can be more skilled, ado it said to run on hadoop one of the most simple words count program


First of all I would like to paste the program's source code for your reference code is divided into three parts written Run, map phase, reduce stage


Map:


  
  
  1. package wordsCount;
  2. import java.io.IOException;
  3. import java.util.StringTokenizer;
  4. import org.apache.hadoop.io.IntWritable;
  5. import org.apache.hadoop.io.LongWritable;
  6. import org.apache.hadoop.io.Text;
  7. import org.apache.hadoop.mapreduce.Mapper;
  8. public class WordsMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
  9. @Override
  10. protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
  11. throws IOException, InterruptedException {
  12. String line = value.toString();
  13. StringTokenizer st = new StringTokenizer(line);
  14. while(st.hasMoreTokens()){
  15. String word = st.nextToken();
  16. context.write( new Text(word), new IntWritable( 1));
  17. }
  18. }
  19. }




Reduce:


  
  
  1. package wordsCount;
  2. import java.io.IOException;
  3. import org.apache.hadoop.io.IntWritable;
  4. import org.apache.hadoop.io.Text;
  5. import org.apache.hadoop.mapreduce.Reducer;
  6. public class WordsReduce extends Reducer<Text, IntWritable, Text, IntWritable>{
  7. @Override
  8. protected void reduce(Text key, Iterable<IntWritable> iterator,
  9. Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
  10. // TODO method of automatically generating stubs
  11. int I = 0 ;
  12. for(IntWritable i:iterator){
  13. sum = sum + i.get();
  14. }
  15. context.write(key, new IntWritable(sum));
  16. }
  17. }



Run:


  
  
  1. package wordsCount;
  2. import org.apache.hadoop.conf.Configuration;
  3. import org.apache.hadoop.fs.Path;
  4. import org.apache.hadoop.io.IntWritable;
  5. import org.apache.hadoop.io.Text;
  6. import org.apache.hadoop.mapreduce.Job;
  7. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
  8. import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
  9. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
  10. import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
  11. public class Run {
  12. public static void main(String[] args) throws Exception{
  13. // TODO method of automatically generating stubs
  14. Configuration configuration = new Configuration();
  15. Job job = new Job(configuration);
  16. job.setJarByClass(Run.class);
  17. job.setJobName( "words count!");
  18. job.setOutputKeyClass(Text.class);
  19. job.setOutputValueClass(IntWritable.class);
  20. job.setInputFormatClass(TextInputFormat.class);
  21. job.setOutputFormatClass(TextOutputFormat.class);
  22. job.setMapperClass(WordsMapper.class);
  23. job.setReducerClass(WordsReduce.class);
  24. FileInputFormat.addInputPath(job, new Path( "hdfs://192.168.1.111:9000/user/input/wc/"));
  25. FileOutputFormat.setOutputPath(job, new Path( "hdfs://192.168.1.111:9000/user/result/"));
  26. job.waitForCompletion( true);
  27. }
  28. }

Run inside the input and output paths according to their modified to

This program will not have to explain it can be found everywhere


First run on hadoop this program in two ways


Method One: to compile their own software and connected hadoop (I use MyEclipse to link hadoop), run the program directly. MyEclipse tutorials will be connected hadoop I will give a link for your reference at the end of the article.




See the following message it means you are successful then you then your output folder which will be able to view the results of the operation


The second file which is the content output





The second method: the mapreduce packaged into a jar file

Here a brief speech method of packaging



Then the next step can be completed


On the packaged jar files to your install hadoop machine (my hadoop cluster is installed in the linux virtual machine) after using SSH pass over the jar:


Hadoop executable file in the bin directory under your installation directory hadoop, and then do the following on it:


Under the statement to explain my shell


/home/xiaohuihui/wordscount.jar: the location of the jar file after the package (spread virtual machine location)

wordsCount / Run: The name of the place your jar package main function (main function here is Run.class) can open your jar file viewer will know


You can also add a file input and output path, but that I have set up after this statement in my program

See the following output after the shell if you run the above statement, then congratulations to you, success! !



You can see the results in your Eclipse connected hadoop view, it can also hdfs page to view the file system (localhost: 50070).


There is also a very important step is, before running to ensure your hadoop has started, you can to see if your cluster hadoop process has been started by jps



Eclipse连接hadoop:http://blog.csdn.net/xjavasunjava/article/details/12320045


Published 31 original articles · won praise 0 · Views 855

Guess you like

Origin blog.csdn.net/weixin_45678149/article/details/104978864