hadoop学习记（4）--MapReduce（wordcount）

mapreduce原理我就不讲了，这篇已经讲过

这篇学习如何通过java来编写一个mapreduce模型的wordcount程序用于统计单词出现个数

所需的jar包与上一篇一致

编码

TokenizerMapper.java

package com.cwh.mapreduce;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{

	@Override
	protected void map(Object key, Text value,Context context)throws IOException, InterruptedException {
		//拿到一行文本内容，转换成String 类型  
        String line = value.toString();  
        //将这行文本切分成单词  
        String[] words=line.split(" ");  
        for(String word:words){
        	context.write(new Text(word), new IntWritable(1));
        }
		
	}
}

IntSumReducer.java

package com.cwh.mapreduce;

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

	protected void reduce(Text key, Iterable<IntWritable> value,Context context)throws IOException, InterruptedException {
		Iterator<IntWritable> values = value.iterator();
		int count = 0;
		while(values.hasNext()){
			count += values.next().get();
		}
		context.write(key, new IntWritable(count));
	}
}

WordCount.java

package com.cwh.mapreduce;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
		Configuration conf = new Configuration();  
        Job wordCountJob = Job.getInstance(conf);  
          
        //重要：指定本job所在的jar包  
        wordCountJob.setJarByClass(WordCount.class);  
          
        //设置wordCountJob所用的mapper逻辑类为哪个类  
        wordCountJob.setMapperClass(TokenizerMapper.class);  
        //设置wordCountJob所用的reducer逻辑类为哪个类  
        wordCountJob.setReducerClass(IntSumReducer.class);  
          
        //设置map阶段输出的kv数据类型  
        wordCountJob.setMapOutputKeyClass(Text.class);  
        wordCountJob.setMapOutputValueClass(IntWritable.class);  
          
        //设置最终输出的kv数据类型  
        wordCountJob.setOutputKeyClass(Text.class);  
        wordCountJob.setOutputValueClass(IntWritable.class);  
          
        //设置要处理的文本数据所存放的路径  
        FileInputFormat.setInputPaths(wordCountJob, "hdfs://192.168.27.131:9000/hdfsTest/");  
        FileOutputFormat.setOutputPath(wordCountJob, new Path("hdfs://192.168.27.131:9000/hdfsTest/output/"));  
          
        //提交job给hadoop集群  
        boolean flag =  wordCountJob.waitForCompletion(true);  
        if (flag){  
            System.out.println("操作成功!");  
        }else {  
            System.out.println("操作失败!");  
        }  
        System.exit(1);  
	}
}

运行测试

我是windows下eclipse开发运行的，所以会报如下错误

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries

我们只需下载 https://github.com/srccodes/hadoop-common-2.2.0-bin这个文件然后解压，环境变量配置下，重启电脑即可

添加HADOOP_HOME

然后path添加：%HADOOP_HOME%\bin

classpath添加：%HADOOP_HOME%\bin\winutils.exe;

接着运行会报权限错误，我干脆把hdfs的权限关闭即可；

修改hdfs-site.xml，添加如下内容，修改后需要重启hadoop

<configuration>
 <property>
   <name>dfs.permissions</name>
   <value>false</value>
 </property>
</configuration>

这样运行后会把结果输出到/hdfsTest/output下

上一篇写到的是上传了一个文件到hdfsTest目录下名为text.txt，现在我们直接用它即可，text.txt内容如下：

运行后，查看hadoop客户端如下：

可看到生成了两个文件，part-r-0000就是我们结果文件，可下载打开查看：

ok!至此我们就实现了个简单的wordcount

hadoop学习记（4）--MapReduce（wordcount）

猜你喜欢