Hadoop编写MapReduce之统计学生平均成绩

统计学生的平均成绩

先在集群里创建几个文件

  • 第一个:vim score.txt

[hadoop@master mapreduce]$ cd
[hadoop@master ~]$ ls
hadoop-2.7.7.master.tar.gz hadoop-2.7.7.tar.gz

[hadoop@master ~]$ vim score.txt
linli math 95
linli chinese 90
linli english 100
liming math 78
liming chinese 86
liming english 90
me math 90
me chinese 90
me english 90
  • 第二个:vim score1.txt
[hadoop@master ~]$ vim score1.txt
root math 67
root chinese 89
root english 78
hadoop math 90
hadoop chinese 93
hadoop english 89

文件写好就上传到分布式文件系统

[hadoop@master ~]$ hadoop fs -mkdir /score //同样我会先创建一个存放目录

[hadoop@master ~]$ hadoop fs -lsr / //由于文件有点多,我只复制了相应的文件
lsr: DEPRECATED: Please use ‘ls -R’ instead.
drwxr-xr-x - hadoop supergroup 0 2020-04-19 21:38 /data
-rw-r–r-- 3 hadoop supergroup 51 2020-04-19 21:38 /data/1.txt
-rw-r–r-- 3 hadoop supergroup 53 2020-04-19 21:38 /data/2.txt
drwxr-xr-x - hadoop supergroup 0 2020-04-19 23:45 /out-jar
-rw-r–r-- 3 hadoop supergroup 0 2020-04-19 23:45 /out-jar/_SUCCESS
-rw-r–r-- 3 hadoop supergroup 78 2020-04-19 23:45 /out-jar/part-r-00000
drwxr-xr-x - hadoop supergroup 0 2020-04-19 21:41 /out-word
-rw-r–r-- 3 hadoop supergroup 0 2020-04-19 21:41 /out-word/_SUCCESS
-rw-r–r-- 3 hadoop supergroup 78 2020-04-19 21:41 /out-word/part-r-00000
drwxr-xr-x - hadoop supergroup 0 2020-04-20 00:29 /score

[hadoop@master ~]$ hadoop fs -put score.txt score1.txt /score/
[hadoop@master ~]$ hadoop fs -lsr /
lsr: DEPRECATED: Please use ‘ls -R’ instead.
drwxr-xr-x - hadoop supergroup 0 2020-04-19 21:38 /data
-rw-r–r-- 3 hadoop supergroup 51 2020-04-19 21:38 /data/1.txt
-rw-r–r-- 3 hadoop supergroup 53 2020-04-19 21:38 /data/2.txt
drwxr-xr-x - hadoop supergroup 0 2020-04-19 23:45 /out-jar
-rw-r–r-- 3 hadoop supergroup 0 2020-04-19 23:45 /out-jar/_SUCCESS
-rw-r–r-- 3 hadoop supergroup 78 2020-04-19 23:45 /out-jar/part-r-00000
drwxr-xr-x - hadoop supergroup 0 2020-04-19 21:41 /out-word
-rw-r–r-- 3 hadoop supergroup 0 2020-04-19 21:41 /out-word/_SUCCESS
-rw-r–r-- 3 hadoop supergroup 78 2020-04-19 21:41 /out-word/part-r-00000
drwxr-xr-x - hadoop supergroup 0 2020-04-20 00:30 /score
-rw-r–r-- 3 hadoop supergroup 139 2020-04-20 00:30 /score/score.txt
-rw-r–r-- 3 hadoop supergroup 96 2020-04-20 00:30 /score/score1.txt
[hadoop@master ~]$

编写JAVA程序Score.java

package com.hadoop.ComputerScore;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;


public class Score {

//Map类
	public static class MyMapper extends Mapper<Object, Text, Text, FloatWritable>
	{
	
		@Override
		protected void map(Object key, Text value, Mapper<Object, Text, Text, FloatWritable>.Context context)
				throws IOException, InterruptedException {
			// TODO Auto-generated method stub
			String val = value.toString();
			String [] vals = val.split(" ");  //一定要注意这个空格只空一次,空多了会出错。比如你空两个,文本里没有两个的,那就不会被分割,最后还是一整行为一列,然后下面的转换成小数那里就没有2了,就会报错说下标不对
			
			float sc = Float.parseFloat(vals[2]);
			context.write(new Text(vals[0]), new FloatWritable(sc));
		}
	}

    //Reducer
    // liming {90, 80}
    public static class MyReducer extends Reducer<Text, FloatWritable, Text, FloatWritable>
    {

        @Override
        protected void reduce(Text key, Iterable<FloatWritable> values,
                Reducer<Text, FloatWritable, Text, FloatWritable>.Context context)
                throws IOException, InterruptedException {
            // TODO Auto-generated method stub
            float sum = 0;
            int i = 0;
            for(FloatWritable value : values)
            {
                sum += value.get();
                i++;
            }
            sum = sum / i;
            context.write(key, new FloatWritable(sum));
        }

    }

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException
    {
        // TODO Auto-generated method stub
        if(args.length<2)
        {
            System.out.println("the arguments are adfadf");
            System.exit(0);
        }
        Configuration conf = new Configuration();

        String []arg = new GenericOptionsParser(conf, args).getRemainingArgs();
        @SuppressWarnings("deprecation")
        Job job = new Job(conf, "score");     //设置环境参数
        job.setJarByClass(Score.class);     //设置整个程序的类名(驱动类)

        job.setMapperClass(MyMapper.class);     //添加 Mapper类
        job.setReducerClass(MyReducer.class);   //添加Reducer类

        job.setOutputKeyClass(Text.class);      //设置输出类型
        job.setOutputValueClass(FloatWritable.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));    //设置输入文件   
        FileOutputFormat.setOutputPath(job, new Path(arg[1]));   //设置输出文件
        System.exit(job.waitForCompletion(true)?0:1);

    }
}

转成jar包,放入集群

[hadoop@master ~]$ ls
??  hadoop-2.7.7.master.tar.gz  hadoop-2.7.7.tar.gz  score1.txt  score.txt
[hadoop@master ~]$ rz
rz waiting to receive.
¿ªÊ¼ zmodem ´«Êä¡£  °´ Ctrl+C È¡Ïû¡£
  100%       8 KB    8 KB/s 00:00:01       0 Errors

[hadoop@master ~]$ ls
??                 hadoop-2.7.7.master.tar.gz  score1.txt
ComputerScore.jar  hadoop-2.7.7.tar.gz         score.txt

编译成功

[hadoop@master mapreduce]$ hadoop jar computerScore.jar /score/ /score/out

查看结果

[hadoop@master mapreduce]$ hadoop fs -lsr /score/out/
lsr: DEPRECATED: Please use ‘ls -R’ instead.
-rw-r–r-- 3 hadoop supergroup 0 2020-04-20 03:17 /score/out/_SUCCESS
-rw-r–r-- 3 hadoop supergroup 63 2020-04-20 03:17 /score/out/part-r-00000
[hadoop@master mapreduce]$ hadoop fs -cat /score/out/part-r-00000
hadoop 90.666664
liming 84.666664
linli 95.0
me 90.0
root 78.0
[hadoop@master mapreduce]$

发布了22 篇原创文章 · 获赞 11 · 访问量 1110

猜你喜欢

转载自blog.csdn.net/SartinL/article/details/105723246