大数据学习03——Eclipse运行第一个MapReduce程序

文章目录

1. 插件安装

1.1 基础环境准备
1.2 ant环境配置
1.3 编译hadoop-eclipse-plugin-2.6.5.jar

2. eclipse环境配置

2.1 上传插件
2.2 启动eclipse
2.3 设置Hadoop安装路径
2.4 显示Hadoop连接配置窗口
2.5 启动Hadoop集群，连接Hadoop

3. 运行MapReduce程序

3.1 新建MapReduce项目
3.2数据上传
3.3 创建类
3.4 运行程序

1. 插件安装

1.1 基础环境准备

1.2 ant环境配置

文件上传
解压缩

[root@master ~]# unzip -o -d /root hadoop2x-eclipse-plugin.zip 
[root@master ~]# unzip -o -d /root apache-ant-1.10.7-bin.zip

在这里插入图片描述

配置环境变量

[root@master ~]# vi .bash_profile

配置如下信息，然后保存退出，并执行[root@master ~]# source .bash_profile使之生效。

export ANT_HOME=/root/apache-ant-1.10.7
export PATH=$JAVA_HOME/bin:$ANT_HOME/bin:$PATH

在这里插入图片描述

验证ant

1.3 编译hadoop-eclipse-plugin-2.6.5.jar

进入到下面的路径：

[root@master ~]# cd hadoop2x-eclipse-plugin/src/contrib/eclipse-plugin/

执行下面的命令进行编译：

[root@master eclipse-plugin]# ant jar -Dversion=2.6.5 -Dhadoop.version=2.6.5 -Declipse.home=/root/eclipse -Dhadoop.home=/root/hadoop-2.6.5

说明
-Dhadoop.version是hadoop的版本号
-Declipse.home是eclipse的安装路径
-Dhadoop.home是hadoop的安装路径

出现ivy-resolve-common:一直停留的情况。
在这里插入图片描述
修改build.xml文件，去除对ivy的依赖：

删除depends="init, ivy-retrieve-common"，保存并退出。

 <target name="compile" depends="init, ivy-retrieve-common" unless="skip.contrib">
    <echo message="contrib: ${name}"/>
    <javac
     encoding="${build.encoding}"
     srcdir="${src.dir}"
     includes="**/*.java"
     destdir="${build.classes}"
     debug="${javac.debug}"
     deprecation="${javac.deprecation}">
     <classpath refid="classpath"/>
    </javac>
  </target>

继续执行编译操作，出现如下信息，错误提示为Warning: Could not find file /root/hadoop-2.6.5/share/hadoop/common/lib/commons-collections-3.2.1.jar to copy.
在这里插入图片描述
按照错误提示进入到相应的路径[root@master eclipse-plugin]# cd /root/hadoop-2.6.5/share/hadoop/common/lib/查看需要的jar包

进入下面的路径[root@master eclipse-plugin]# cd /root/hadoop2x-eclipse-plugin/ivy/，修改libraries.properties

将commons-collections.version=3.2.1修改为commons-collections.version=3.2.2
在这里插入图片描述
再回到/root/hadoop2x-eclipse-plugin/src/contrib/eclipse-plugin，重新编译，出现如下页面，证明编译成功！

进入如下路径[root@master eclipse-plugin]# cd /root/hadoop2x-eclipse-plugin/build/contrib/eclipse-plugin/进行查看。

编译成功！！！

2. eclipse环境配置

2.1 上传插件

将编译好的hadoop-eclipse-plugin-2.6.5.jar拷贝到eclipse的plugins目录下

[root@master eclipse-plugin]# cp hadoop-eclipse-plugin-2.6.5.jar /root/eclipse/plugins/

在这里插入图片描述

2.2 启动eclipse

在这里插入图片描述

2.3 设置Hadoop安装路径

依次点击Window 、Preference 、Hadoop Map/Reduce，设置Hadoop的安装路径。
在这里插入图片描述

2.4 显示Hadoop连接配置窗口

依次点击Window—>OpenPerspective—>Other—>MapReduce
在这里插入图片描述

2.5 启动Hadoop集群，连接Hadoop

在这里插入图片描述
填写相应的信息，点击Finish

出现下面的页面，证明eclipse的配置成功！！！

3. 运行MapReduce程序

求一年之内最大最小温度；气象数据集下载地址：ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1901/

3.1 新建MapReduce项目

在这里插入图片描述

3.2数据上传

在这里插入图片描述

3.3 创建类

新建类MaxTemperatureWithCombiner、MaxTemperatureMapper、MaxTemperatureReducer；将源代码（见文章末尾）拷贝进入，并作调试。
在这里插入图片描述

3.4 运行程序

右键MaxTemperatureWithCombiner,选择Run As —> Run Configurations；选择Java Application ，点击左上角的图标，之后点击Arguments，填入下面的内容。

hdfs://master:9000/user/root/
hdfs://master:9000/user/output/

在这里插入图片描述
然后点击Run，稍等片刻，刷新，即可看到如下的内容输出。

源代码来自《Hadoop权威指南》,稍作调试。

MaxTemperatureWithCombiner.java

package cust.test;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MaxTemperatureWithCombiner {

  public static void main(String[] args) throws Exception {
    if (args.length != 2) {
      System.err.println("Usage: MaxTemperatureWithCombiner <input path> " +
          "<output path>");
      System.exit(-1);
    }
    
    @SuppressWarnings("deprecation")
	Job job = new Job();
    job.setJarByClass(MaxTemperatureWithCombiner.class);
    job.setJobName("Max temperature");

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    
    job.setMapperClass(MaxTemperatureMapper.class);
    /*[*/job.setCombinerClass(MaxTemperatureReducer.class)/*]*/;
    job.setReducerClass(MaxTemperatureReducer.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

MaxTemperatureMapper.java

package cust.test;

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class MaxTemperatureMapper
  extends Mapper<LongWritable, Text, Text, IntWritable> {

  private static final int MISSING = 9999;
  
  @Override
  public void map(LongWritable key, Text value, Context context)
      throws IOException, InterruptedException {
    
    String line = value.toString();
    String year = line.substring(15, 19);
    int airTemperature;
    if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs
      airTemperature = Integer.parseInt(line.substring(88, 92));
    } else {
      airTemperature = Integer.parseInt(line.substring(87, 92));
    }
    String quality = line.substring(92, 93);
    if (airTemperature != MISSING && quality.matches("[01459]")) {
      context.write(new Text(year), new IntWritable(airTemperature));
    }
  }
}

MaxTemperatureReducer.java

package cust.test;

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MaxTemperatureReducer
  extends Reducer<Text, IntWritable, Text, IntWritable> {
  
  @Override
  public void reduce(Text key, Iterable<IntWritable> values,
      Context context)
      throws IOException, InterruptedException {
    
    int maxValue = Integer.MIN_VALUE;
    for (IntWritable value : values) {
      maxValue = Math.max(maxValue, value.get());
    }
    context.write(key, new IntWritable(maxValue));
  }
}

云！

发布了5 篇原创文章 · 获赞 2 · 访问量 1288

私信关注