第一步——编写源码

编写自己的wordcount代码，这是编写最终的源码。

MyMap

package MyWordCount;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class MyMap extends Mapper<LongWritable, Text, Text, IntWritable>{
	
	Text k = new Text();
	IntWritable v = new IntWritable(1);
	
	@Override
	protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
		
		// 1 获取一行
		String line = value.toString();
		
		// 2 切割获取的字符串
		String[] words = line.split(" ");
		
		// 3 输出到ReduceTask
		for(String word:words) {
			k.set(word);
			context.write(k, v);
		}
		
	}

}

MyReduce

package MyWordCount;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MyReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
	
	@Override
	protected void reduce(Text key, Iterable<IntWritable> value, Context context) throws IOException, InterruptedException {

		// 1 累加求和
		int sum = 0;
		for (IntWritable i : value) {
			sum += i.get();
		}
		
		// 2 输出
		context.write(key, new IntWritable(sum));
		
	}

}

MyDriver

package MyWordCount;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class MyDriver {

	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
		
		// 1 获取配置信息
		Configuration conf = new Configuration();
		Job job = Job.getInstance(conf);
		
		// 2 设置jar加载路径
		job.setJarByClass(MyDriver.class);
		
		// 3 设置map和reduce类
		job.setMapperClass(MyMap.class);
		job.setReducerClass(MyReduce.class);
		
		// 4 设置map输出
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(IntWritable.class);
		
		// 5 设置reduce输出
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		
		// 6 设置输入和输出路径
		FileInputFormat.setInputPaths(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		
		// 7 提交job
		boolean result = job.waitForCompletion(true);
		
		// 8 打印结果（可有可无）
		System.exit(result ? 0 : 1);
		
	}

}

第二步——本地运行

这一步遇到了各种报错信息，接下来记录我一一解决的过程。

增加 log4j.properties 文件在项目src文件下

报错提示如下：

log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

log4j.properties 文件内容为

log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

更换老版本的JRE

新版的有警告，不过程序可以运行。
警告就不贴了。

修改NativeIO.java源码

新建org.apache.hadoop.io.nativeio包在项目src文件下，将修改的NativeIO.java放进去就可以了。
此时需要用JRE1.8，新版的会报错 sun.misc.Cleaner 找不到该类型。

    public static boolean access(String path, AccessRight desiredAccess)
        throws IOException {
    	return true;
//      return access0(path, desiredAccess.accessRight());
//		该函数在Windows静态类下面
    }

导入错了包

应该 import org.apache.hadoop.io.Text;
而不是 import com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider.Text;
报错信息如下：

java.lang.ClassCastException:classcom.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider$Text
	atjava.lang.Class.asSubclass(Unknown Source)

eclipse运行wordcount

本地编写wordcount过程