hadoop[11]-本地运行模式

每次调试都打包上传到服务器,效率很低,所以可以在本地模拟运行,以第9节的代码为例,设置要处理的文本和输出目录为本地目录:

//设置要处理的文本数据存放路径
FileInputFormat.setInputPaths(wordCountJob, "d:/wordcount/srcdata");
//设置最终输出结果存放路径
FileOutputFormat.setOutputPath(wordCountJob, new Path("d:/wordcount/output"));

完整代码如下:

package com.wange;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCountJobSubmitter {
    public static void main(String[] args) throws Exception {
        //System.setProperty("hadoop.home.dir", "E:/soft/hadoop-2.4.1");
        Configuration config = new Configuration();
        // 是否在本地运行,本质上是一下两个参数。没设置则会在本地模拟运行,设置了就会提交到yarn运行
        //config.set("mapreduce.framework.name", "yarn");
        //config.set("yarn.resourcemanager.hostname", "hadoop-server-00:9000");// 运行在远程的yarn集群中
        Job wordCountJob = Job.getInstance(config);

        //指定job所在的jar包
        wordCountJob.setJarByClass(WordCountJobSubmitter.class);

        //设置mapper和reduce逻辑类
        wordCountJob.setMapperClass(WordCountMapper.class);
        wordCountJob.setReducerClass(WordCountReducer.class);

        //设置map和reduce阶段输出的kv数据类型
        wordCountJob.setMapOutputKeyClass(Text.class);
        wordCountJob.setMapOutputValueClass(IntWritable.class);
        wordCountJob.setOutputKeyClass(Text.class);
        wordCountJob.setOutputValueClass(IntWritable.class);

        //设置要处理的文本数据存放路径
        //FileInputFormat.setInputPaths(wordCountJob, "hdfs://hadoop-server-00:9000/wordcount/srcdata/");
        FileInputFormat.setInputPaths(wordCountJob, "d:/wordcount/srcdata");
        //设置最终输出结果存放路径
        //FileOutputFormat.setOutputPath(wordCountJob, new Path("hdfs://hadoop-server-00:9000/wordcount/output/"));
        FileOutputFormat.setOutputPath(wordCountJob, new Path("d:/wordcount/output"));

        // 提交给hadoop集群,true 是否要打印处理信息
        wordCountJob.waitForCompletion(true);
    }
}
View Code

然后运行main程序,会遇到一些小坑,运行不起来。

坑1:Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the co

解决方法:需要引入jar包

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-common</artifactId>
    <version>2.4.1</version>
</dependency>

坑2:Exception in thread "main" java.lang.NullPointerException

解决方法:设置 System.setProperty("hadoop.home.dir", "E:/soft/hadoop-2.4.1");  下载windows下的运行需要的库文件:https://pan.baidu.com/s/17lkdxPTcKeWN-puLEqqXKw 提取码: ds5k,将下载的文件解压到本地hadoop的bin目录下,此处的本地目录为:E:\soft\hadoop-2.4.1\bin

这样就可以完美运行了,在本地模拟运行,也可以使用hdfs的路径,如:

FileInputFormat.setInputPaths(wordCountJob, "hdfs://hadoop-server-00:9000/wordcount/srcdata/");
FileOutputFormat.setOutputPath(wordCountJob, new Path("hdfs://hadoop-server-00:9000/wordcount/output/"));

运行的时候会出现权限问题,需要登录到hdfs服务器,设置目录权限就可以了,设置权限命令为:hadoop fs -chmod 777 /wordcount

猜你喜欢

转载自www.cnblogs.com/wange/p/10068307.html