Big data learning--build hadoop MapReduce development environment under win7

foreword

The previous article "Big Data Learning--hadoop2.7.3 Environment Construction" has completed the construction of a simulated hadoop cluster on a laptop. This time, I plan to use the idea development tool to build a hadoop MapReduce development environment in the win7 operating system. By directly executing the MapReduce program in the idea, you can directly connect to the hadoop cluster in vmware to simulate big data operations.

 

Ready to work

1. Install idea (or eclips), jdk1.8, the installation steps are not mentioned here.

2. Download hadoop2.7.3 and extract it to a local directory (my D:\myself\hadoop-2.7.3).

3. Download hadoop.ll and winutils.exe and put them in D:\myself\hadoop-2.7.3\bin, otherwise an error will be reported when the MapReduce program is executed:

2017-01-24 11:17:25,757 ERROR [main] util.Shell (Shell.java:getWinUtilsPath(397)) - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
	at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
	at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
	at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
	at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
	at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2823)
	at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2818)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2684)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
	at com.sky.hadoop.dfs.HadoopFileTest.main(HadoopFileTest.java:24)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

 

3. Configure the hadoop environment variable HADOOP_HOME as D:\myself\hadoop-2.7.3

4, addition path:% HADOOP_HOME% \ bin

 

Idea development environment to build

For the construction process, please refer to: http://blog.csdn.net/u011672579/article/details/54599056 

I mainly talk about the problems encountered in the process of executing MapReduce programs:

First, I wrote a program to view and get files in the hadoop cluster. The code is as follows:

package com.sky.hadoop.dfs;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;

import java.io.InputStream;
import java.net.URI;

import java.net.URI;


/**
 * Created by gantianxing on 2017/1/24.
 */
public class HadoopFileTest {

    public static void main(String args[]) throws Exception{
        String uri = "hdfs://192.168.26.128:9000/";
        Configuration config = new Configuration();
        FileSystem fs = FileSystem.get(URI.create(uri), config);

        // step1: List all files and directories in the /test directory on hdfs
        FileStatus[] statuses = fs.listStatus(new Path("/test"));
        for (FileStatus status : statuses) {
            System.out.println("xxxx:"+status);
        }

        // step2: Create a file in the /test directory of hdfs and write a line of text
        FSDataOutputStream os = fs.create(new Path("/test/flie1.log"));
        os.write("Hello World!".getBytes());
        os.flush();
        os.close();

        // step3: Display the content of the specified file under /input of hdfs
        InputStream is = fs.open(new Path("/test/flie1.log"));
        IOUtils.copyBytes(is, System.out, 1024, true);

    }
}

The program is very simple and consists of three steps: step1 to view the contents of the /test directory of hdfs, step2 to write a file to the /test directory of hdfs, and step3 to view the contents of a file under hdfs. Since the MapReduce calculation results of the hadoop cluster are also stored in hdfs, this program can be used to view the results of MapReduce execution.

 

Exception 1: Permission issue

But when I run it locally, I get the following error:

Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=gantianxing, access=WRITE, inode="/input/test.log":root:supergroup:drwxr-xr-x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1728)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1712)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1695)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal (FSNamesystem.java:2515)

 

Here is to say that the user: gantianxing has no permissions, and gantianxing is my win7 username. The username of my linux virtual machine hadoop is hadoop. There are two solutions here:

Option 1: Add parameters to the idea startup parameters: -DHADOOP_USER_NAME=hadoop



 

Option 2: Execute ./hadoop-2.7.3/bin/hadoop fs -chmod 777 / in the hadoop cluster

For details, please refer to http://www.huqiwen.com/2013/07/18/hdfs-permission-denied/  for a more thorough explanation.

 

Exception 2: Program execution stuck

The log shows the following, the program is stuck and will not be executed

[main] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
[main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
[main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar
[main] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1485352260073_0004
[main] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1485352260073_0004
[main] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://hadoop1:8088/proxy/application_1485352260073_0004/
[main] INFO org.apache.hadoop.mapreduce.Job - Running job: job_1485352260073_0004

 By observing the log, it is found that the memory setting is not enough. Modify yarn.nodemanager.resource.memory-mb in yarn-site.xml to 2048. I used to be 526.

    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>2048</value>
    </property>

 

 Exception three:

Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.sky.hadoop.mapreduce.WordCount$WcMapper not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
	at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class com.sky.hadoop.mapreduce.WordCount$WcMapper not found
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
	... 8 more

This exception can only be executed in the window idea, and there is no problem in uploading the jar package to the hadoop cluster. After modifying the code, it is too troublesome to upload it to hadoop every time to debug.

 

To directly execute the MapReduce program in the development environment under the window, you need to specify the jar path in Configuration, as follows:

Configuration conf = new Configuration();

//Change to your jar package directory
conf.set("mapred.jar", "D:\\work\\code\\myHadoop\\out\\artifacts\\myHadoop_jar\\myHadoop.jar");

 

After the addition is complete, you can run your MapReduce program directly in the window idea. The running results can also be viewed with the HadoopFileTest class written earlier , without logging in to the hadoop cluster at all.

 

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326755585&siteId=291194637