Development environment
System : CentOS release 6.5
jdk: jdk1.7.0_45
hadoop: 2.5.2Hadoop cluster construction
reference : https://blog.csdn.net/soundslow/article/details/80101146Eclipse plug-in configuration
Due to the need to use cross-platform file transfer, a hadoop plug-in, hadoop-eclipse-plugin-2.5.2.jar, is required.
Put it in the eclipse plug-in file, restart eclipse, and map/reduce will appear in new Options
config mapreduce=
Write a calculation file
Calculation process:
1. Segmentation class map: WordCountMapper
package com.sound.mr.wc;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.util.StringUtils;
/**
*
* @author 53033 <KEYIN, VALUEIN, KEYOUT, VALUEOUT>
*/
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
// 对每一行句子进行map-split(相当于一个切分)
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
String[] words = StringUtils.split(value.toString(), ' ');
for (String word : words) {
context.write(new Text(word), new IntWritable(1));
}
}
}
2. Summary class: WordCountReducer
package com.sound.mr.wc;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
// key唯一,将数字相加
protected void reduce(Text arg0, Iterable<IntWritable> arg1,
Reducer<Text, IntWritable, Text, IntWritable>.Context arg2) throws IOException, InterruptedException {
int sum =0 ;
for(IntWritable iw : arg1) {
sum+=iw.get();
}
arg2.write(arg0, new IntWritable(sum));
}
}
3. Implement the main calculation class: MainJob
package com.sound.mr.wc;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MainJob {
public static void main(String[] args) {
Configuration conf = new Configuration();
try {
Job job = Job.getInstance(conf);
job.setJarByClass(MainJob.class);
job.setJobName("WordCount");
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
FileInputFormat.addInputPath(job, new Path("/usr/input/"));
Path outPath = new Path("/usr/output/wc");
FileSystem fs = FileSystem.get(conf);
if(fs.exists(outPath)){
fs.delete(outPath,true);
}
FileOutputFormat.setOutputPath(job, outPath);
boolean finished = job.waitForCompletion(true);
if(finished) {
System.out.println("finished success!");
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
4. At DFS, manual creation of input and output files (the output files here are created using code)
may fail to create, it may be that the permissions fail, and the permissions need to be modified. Refer to another blog https://blog.csdn.net/soundslow/article/details/80111713
Package the program
Since the windows platform does not have a related hadoop environment, it needs to be copied to the CentOS system for execution, here select node3
hadoop operation and results
hadoop jar wc.jar com.sound.mr.wc.MainJob