大数据学习之--win7下搭建hadoop MapReduce开发环境

前言

上一篇《大数据学习之--hadoop2.7.3环境搭建》已经在一台笔记本上完成了模拟hadoop集群的搭建,这次准备在win7操作系统中采用idea开发工具 搭建hadoop MapReduce的开发环境。直接在idea中执行MapReduce程序,即可直连vmware中的hadoop集群进行模拟大数据运算。

准备工作

1、安装idea(或者eclips),jdk1.8,安装步骤这里不讲了。

2、下载hadoop2.7.3,解压到本地目录(我的D:\myself\hadoop-2.7.3)。

3、下载hadoop.ll和winutils.exe放到D:\myself\hadoop-2.7.3\bin,否则执行MapReduce程序会报错:

2017-01-24 11:17:25,757 ERROR [main] util.Shell (Shell.java:getWinUtilsPath(397)) - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
	at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
	at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
	at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
	at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
	at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2823)
	at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2818)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2684)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
	at com.sky.hadoop.dfs.HadoopFileTest.main(HadoopFileTest.java:24)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

3、配置hadoop环境变量HADOOP_HOME为D:\myself\hadoop-2.7.3

4、添加path:%HADOOP_HOME%\bin

Idea开发环境搭建

搭建过程可以参考:http://blog.csdn.net/u011672579/article/details/54599056 

我这里主要讲讲,在执行MapReduce程序过程中遇到的问题:

首先我写了一段程序,作用是查看和获取hadoop集群中的文件,代码如下:

package com.sky.hadoop.dfs;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;

import java.io.InputStream;
import java.net.URI;

import java.net.URI;


/**
 * Created by gantianxing on 2017/1/24.
 */
public class HadoopFileTest {

    public static void main(String args[]) throws Exception{
        String uri = "hdfs://192.168.26.128:9000/";
        Configuration config = new Configuration();
        FileSystem fs = FileSystem.get(URI.create(uri), config);

        // step1:列出hdfs上/test目录下的所有文件和目录
        FileStatus[] statuses = fs.listStatus(new Path("/test"));
        for (FileStatus status : statuses) {
            System.out.println("xxxx:"+status);
        }

        // step2:在hdfs的/test目录下创建一个文件,并写入一行文本
        FSDataOutputStream os = fs.create(new Path("/test/flie1.log"));
        os.write("Hello World!".getBytes());
        os.flush();
        os.close();

        // step3:显示在hdfs的/input下指定文件的内容
        InputStream is = fs.open(new Path("/test/flie1.log"));
        IOUtils.copyBytes(is, System.out, 1024, true);

    }
}

程序很简单,分三步:step1 查看hdfs的/test目录下的内容,step2写入一个文件到hdfs的/test目录下,step3查看hdfs下某个文件的内容。由于hadoop集群的MapReduce计算结果也是放到hdfs中,这个程序可以用来查看MapReduce执行的结果。

异常1:权限问题

但在我本地执行时报如下错误:

Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=gantianxing, access=WRITE, inode="/input/test.log":root:supergroup:drwxr-xr-x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1728)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1712)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1695)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2515)

这里是说用户:gantianxing没有权限,gantianxing是我win7的用户名。我linux虚拟机 hadoop的用户名是hadoop。这里有两种解决方案:

方案一:在idea启动参数中添加参数:-DHADOOP_USER_NAME=hadoop



 

方案二:在hadoop集群中执行./hadoop-2.7.3/bin/hadoop fs -chmod 777 /

具体可以参考http://www.huqiwen.com/2013/07/18/hdfs-permission-denied/ 讲的比较透彻。

异常二:程序执行卡死

日志显示如下,程序卡死不执行了

[main] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
[main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
[main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar
[main] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1485352260073_0004
[main] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1485352260073_0004
[main] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://hadoop1:8088/proxy/application_1485352260073_0004/
[main] INFO org.apache.hadoop.mapreduce.Job - Running job: job_1485352260073_0004

 通过观察日志发现是内存设置不够,修改yarn-site.xml中的yarn.nodemanager.resource.memory-mb 为2048即可,我之前是526

    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>2048</value>
    </property>

 异常三:

Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.sky.hadoop.mapreduce.WordCount$WcMapper not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
	at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class com.sky.hadoop.mapreduce.WordCount$WcMapper not found
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
	... 8 more

这个异常只有在window idea中执行才会有,上传jar包hadoop集群中执行没有问题。修改代码后每次都要上传到hadoop才能调试太麻烦。

要在window下的开发环境中直接执行MapReduce程序,需要在Configuration中指定jar路径,如下:

Configuration conf = new Configuration();

//换成你的jar包目录
conf.set("mapred.jar", "D:\\work\\code\\myHadoop\\out\\artifacts\\myHadoop_jar\\myHadoop.jar");

添加完成后,就可以在window idea中直接运行你的MapReduce程序啦。运行结果也可以用前面写的HadoopFileTest类进行查看,完全不用登陆到hadoop集群中。

猜你喜欢

转载自moon-walker.iteye.com/blog/2354809
今日推荐