Hadoop学习笔记--开发WordCount

1.准备待处理文件，并将其上传到HDFS

hadoop dfs -put book.txt

2.Eclipse上创建项目

直接Finish即可

这里有一点需要注意，若你的项目jdk版本比虚拟机上Hadoop版本高，则运行会出错，修改项目jdk

3.编写Mapper与Reducer

这里需要说明一点，为了打包并部署是的便利，我们将所有class写在一个文件中，并且，若以内部类形式书写时，内部类必须加上public static；若以并列的类书写时，则无此要求

package mr;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Run {
	enum Counter
	{
		LINESKIP;//出错的行数
	}
	
	public static class WordMapper extends Mapper<LongWritable, Text, Text, IntWritable>
	{
		Text text=new Text();
		IntWritable one=new IntWritable(1);
		
		/*
		 * 每次只处理input格式下的一条数据，在这里就是一行文本
		 */
		@Override
		protected void map(LongWritable key, Text value,Context context)throws IOException, InterruptedException {
			try
			{
				String[] words=value.toString().split(" ");
				for(String word:words)
				{
					text.set(word);
					context.write(text, one);
				}
			}
			catch(Exception e)
			{
				context.getCounter(Counter.LINESKIP).increment(1);
			}
		}
	}
	
	public static class WordReducer extends Reducer<Text, IntWritable, Text, IntWritable>
	{
		/*
		 * 会得到combine后的数据，即同一key的数据都会得到
		 */
		@Override
		protected void reduce(Text key, Iterable<IntWritable> values,Context context)throws IOException, InterruptedException {
			int total=0;
			for(IntWritable val:values)
			{
				total+=val.get();
			}
			context.write(key, new IntWritable(total));
		}
	}
	
	public static void main(String[] args) throws Exception {
		Configuration conf=new Configuration();
		Job job=new Job(conf, "word_count");//作业名
		
		//设置作业执行类
		job.setJarByClass(Run.class);
		
		//设置mapper,reducer
		job.setMapperClass(WordMapper.class);
		job.setReducerClass(WordReducer.class);
		
		//设置输出文件格式
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		
		//设置输入输出文件位置
		FileInputFormat.addInputPath(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		
		//是否完成
		System.exit(job.waitForCompletion(true)?1:0);
	}
}

注意导包

4.打包项目

右键单击项目 --> Export

取名

确定主函数所在类

打包成功！

5.将jar包传给虚拟机，我这里使用了WinSCP工具

6.找到jar包位置，运行指令

hadoop jar wc.jar ./data/in/book.txt ./data/out/wc/

最后两属性依次为输入文件位置，输出文件夹位置（不能存在！）

2.创建

Hadoop学习笔记--开发WordCount

猜你喜欢