工作环境:Windows10
Hadoop版本:hadoop-2.6.4
开发工具:IDEA
测试代码为Hadoop权威指南书上的代码具体不细说,直接贴代码:
MaxTemperatureMapper.java
public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private static final int MISSING = 9999; @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String year = line.substring(15, 19); int airTemperature; if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs airTemperature = Integer.parseInt(line.substring(88, 92)); } else { airTemperature = Integer.parseInt(line.substring(87, 92)); } String quality = line.substring(92, 93); if (airTemperature != MISSING && quality.matches("[01459]")) { context.write(new Text(year), new IntWritable(airTemperature)); } } }
MaxTemperatureReduce.java
public class MaxTemperatureReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int maxValue = Integer.MIN_VALUE; for (IntWritable value : values) { maxValue = Math.max(maxValue, value.get()); } context.write(key, new IntWritable(maxValue)); } }
MaxTemperature.java
public class MaxTemperature { public static void main(String[] args) throws Exception { if(args.length != 2){ System.err.println("Usage: MaxTemperature <input path> <output path>"); System.exit(-1); } Job job = Job.getInstance(); job.setJarByClass(MaxTemperature.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(MaxTemperatureMapper.class); job.setReducerClass(MaxTemperatureReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
1、下载hadoop-2.6.4到windows环境下
2、配置path
HADOOP_HOME D:\installApp\Apache\apache-hadoop-2.6.4
path 添加上%HADOOP_HOME%\bin
3、hadoop_dll2.6.0下的文件拷贝到hadoop/bin下,详见附件,可能需要重启电脑
4、core-site.xml文件放入你的resources文件夹下面作为资源文件
5、把测试的1901.gz和1902.gz文件放入一个input文件夹中
6、设置好你的args两个参数输入路径input 输出路径output,然后运行就OK啦!
7、可能会遇到报错java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
下载hadoop源文件,拷贝并修改NativeIO.class
不想下载源码的同学可以下载附件中的该文件
放入特定路径中org.apache.hadoop.io.nativeio 用于覆盖源文件
项目总目录预览