Proficient in HADOOP (6) - Getting to Know Hadoop - Solving Problems/Summary

1.1 Problem solving

If you encounter problems executing the sample programs in this book, it is most likely caused by a difference in the execution environment, or because of insufficient storage space on your computer.

Then, the settings of the following environment variables are important:

JAVA_HOME: This is the installation root path of the JDK. All sample programs assume that the JAVA_HOME environment variable points to the installation root path of JDK 1.6_07. It is assumed here that the JDK is installed in /usr/java/jdk1.6.0_07. So, we should set JAVA_HOME as follows: export JAVA_HOME=/usr/java/jdk1.6.0_07.

DADOOP_HOME: This is the Hadoop installation root directory. You should unzip the hadoop-0.19.0.tar.gz download file to the ~/src directory, so that the hadoop program will be located at ~/src/hadoop-0.19.0/binhadoop. The HADOOP_HOME environment variable should point to the root directory of the Hadoop installation, which is ~/src/haoop-0.19.0. So, we should setHADOOP_HOME如下:HADOOP_HOME=~/src/hadoop-0.19.0。

PATH: The user path should contain ${JAVA_HOME}/bin and ${HADOOP_HOME}/bin, preferably in the first two elements of the PATH. Therefore, we should set the PATH as follows: export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${PATH}.

For Windows users, you must add C:/cygwin/bin;C:/cygwin/usr/bin to the system path environment variable, otherwise the Hadoop core server will not work properly. You can set environment variables through the system control panel. In the System Properties dialog box, click Advanced Workbook, and then click the Environment Variables button. In the System Variables section of the Environment Variables dialog, select Path, click the Edit button, and add the following string:

C:/cygwin/bin;C:/cygwin/usr/bin

The semicolon ";" is the element separator.

In addition to this, we generally assume that the working directory of the shell session used to execute the Hadoop sample programs is ${HADOOP}.

If you see errors similar to java.long.OutOfMemoryError: Java Heap Space in the output, then your computer may not have enough RAM memory or there is not enough memory allocated for the Java heap. A PiEstimator program with 2 Map jobs and 100 samples should run on a JVM that provides up to 128MB (-Xmx128M) of heap storage. You can do this with the following command:

HADOOP_OPTS="-Xmx128m" hadoop jar hadoop-0.19.0-examples.jar pi 2 100

 

1.2 Summary

The Hadoop core provides a robust framework for executing distributed computing tasks on a large number of general-purpose computers. Application developers need to develop Map and Reduce job code for their data processing and use one of the existing input and output formats. The framework provides rich input and output processors. You can create custom input and output handlers if needed.

It takes some effort to overcome the difficulties encountered in the installation process, however, more and more developers and organizations encounter such problems and continue to improve the installation process, which makes the installation easier and easier. Cloudera (http://www.cloudera.com) provides an RPM package to automatically install Hadoop.

Many features and functions are still experimental. Start joining the development mailing list now (if you want to join the core mailing list, send an email to [email protected]) and refer to the information at http://hadoop.apache.org/core It's a good idea to develop your application, and I hope Hadoop brings you joy.

When you start developing your own Hadoop applications, the following chapters are sure to help you solve some of the problems you encounter.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325446468&siteId=291194637