hadoop学习记(4)--MapReduce(wordcount)

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/u011320740/article/details/78900151

mapreduce原理我就不讲了,这篇已经讲过

这篇学习如何通过java来编写一个mapreduce模型的wordcount程序用于统计单词出现个数

所需的jar包与上一篇一致


编码


TokenizerMapper.java

package com.cwh.mapreduce;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{

	@Override
	protected void map(Object key, Text value,Context context)throws IOException, InterruptedException {
		//拿到一行文本内容,转换成String 类型  
        String line = value.toString();  
        //将这行文本切分成单词  
        String[] words=line.split(" ");  
        for(String word:words){
        	context.write(new Text(word), new IntWritable(1));
        }
		
	}
}


IntSumReducer.java

package com.cwh.mapreduce;

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

	protected void reduce(Text key, Iterable<IntWritable> value,Context context)throws IOException, InterruptedException {
		Iterator<IntWritable> values = value.iterator();
		int count = 0;
		while(values.hasNext()){
			count += values.next().get();
		}
		context.write(key, new IntWritable(count));
	}
}

WordCount.java
package com.cwh.mapreduce;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
		Configuration conf = new Configuration();  
        Job wordCountJob = Job.getInstance(conf);  
          
        //重要:指定本job所在的jar包  
        wordCountJob.setJarByClass(WordCount.class);  
          
        //设置wordCountJob所用的mapper逻辑类为哪个类  
        wordCountJob.setMapperClass(TokenizerMapper.class);  
        //设置wordCountJob所用的reducer逻辑类为哪个类  
        wordCountJob.setReducerClass(IntSumReducer.class);  
          
        //设置map阶段输出的kv数据类型  
        wordCountJob.setMapOutputKeyClass(Text.class);  
        wordCountJob.setMapOutputValueClass(IntWritable.class);  
          
        //设置最终输出的kv数据类型  
        wordCountJob.setOutputKeyClass(Text.class);  
        wordCountJob.setOutputValueClass(IntWritable.class);  
          
        //设置要处理的文本数据所存放的路径  
        FileInputFormat.setInputPaths(wordCountJob, "hdfs://192.168.27.131:9000/hdfsTest/");  
        FileOutputFormat.setOutputPath(wordCountJob, new Path("hdfs://192.168.27.131:9000/hdfsTest/output/"));  
          
        //提交job给hadoop集群  
        boolean flag =  wordCountJob.waitForCompletion(true);  
        if (flag){  
            System.out.println("操作成功!");  
        }else {  
            System.out.println("操作失败!");  
        }  
        System.exit(1);  
	}
}

运行测试


我是windows下eclipse开发运行的,所以会报如下错误

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries
我们只需下载 https://github.com/srccodes/hadoop-common-2.2.0-bin这个文件然后解压,环境变量配置下,重启电脑即可

添加HADOOP_HOME

然后path添加:%HADOOP_HOME%\bin

classpath添加:%HADOOP_HOME%\bin\winutils.exe;


接着运行会报权限错误,我干脆把hdfs的权限关闭即可;

修改hdfs-site.xml,添加如下内容,修改后需要重启hadoop

<configuration>
 <property>
   <name>dfs.permissions</name>
   <value>false</value>
 </property>
</configuration>


这样运行后会把结果输出到/hdfsTest/output下


上一篇写到的是上传了一个文件到hdfsTest目录下名为text.txt,现在我们直接用它即可,text.txt内容如下:


运行后,查看hadoop客户端如下:



可看到生成了两个文件,part-r-0000就是我们结果文件,可下载打开查看:



ok!至此我们就实现了个简单的wordcount

猜你喜欢

转载自blog.csdn.net/u011320740/article/details/78900151