Detailed explanation of installing Hadoop (2.7.1) under Linux and running WordCount

 

I. Introduction

  After completing the environment configuration of Storm, I wanted to fiddle with the installation of Hadoop. There are many tutorials on the Internet, but none of them are particularly suitable, so I still encountered a lot of troubles during the installation process, and I kept checking the information in the end. Finally solved the problem, I still feel very good, let's not talk nonsense, and start to get to the point.

  The configuration environment of this machine is as follows:

    Hadoop(2.7.1)

    Ubuntu Linux (64-bit system)

  The following is divided into several steps to explain the configuration process in detail.

Second, install the ssh service

  Enter the shell command and enter the following command to check whether the ssh service has been installed. If not, use the following command to install it:

    sudo apt-get install ssh openssh-server

  The installation process is relatively easy and enjoyable.

3. Use ssh to log in without password authentication

  1. Create ssh-key, here we use rsa method, use the following command:

    ssh-keygen -t rsa -P ""

  2. A graphic appears, the graphic that appears is the password, don't care about it

    cat ~/.ssh/id_rsa.pub >> authorized_keys (it seems to be omitted)

  3. Then you can log in without password authentication, as follows:

    ssh localhost

  The successful screenshot is as follows:

    

Fourth, download the Hadoop installation package

  There are also two ways to download Hadoop installation

    1. Download directly from the official website, http://mirrors.hust.edu.cn/apache/hadoop/core/stable/hadoop-2.7.1.tar.gz

    2. Use the shell to download, the command is as follows:

      wget http://mirrors.hust.edu.cn/apache/hadoop/core/stable/hadoop-2.7.1.tar.gz

  It seems that the second method is faster. After a long wait, the download is finally completed.

5. Unzip the Hadoop installation package

  Unzip the Hadoop installation package using the following command

    tar -zxvf hadoop-2.7.1.tar.gz

  After the decompression is completed, the folder of hadoop2.7.1 appears

6. Configure the corresponding files in Hadoop

  The files that need to be configured are as follows, hadoop-env.sh, core-site.xml, mapred-site.xml.template, hdfs-site.xml, all files are located under hadoop2.7.1/etc/hadoop, the specific configuration required as follows:

  1.core-site.xml is configured as follows:    

    <configuration>
      <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/home/leesf/program/hadoop/tmp</value>
        <description>Abase for other temporary directories.</description>
      </property>
      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
      </property>
    </configuration>

  The path of hadoop.tmp.dir can be set according to your own habits.

  2.mapred-site.xml.template is configured as follows:    

    <configuration>
      <property>
        <name>mapred.job.tracker</name>
        <value>localhost:9001</value>
      </property>
    </configuration>

  3.hdfs-site.xml is configured as follows:

    <configuration>
      <property>
        <name>dfs.replication</name>
        <value>1</value>
      </property>
      <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/home/leesf/program/hadoop/tmp/dfs/name</value>
      </property>
      <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/home/leesf/program/hadoop/tmp/dfs/data</value>
      </property>
    </configuration>

  The paths of dfs.namenode.name.dir and dfs.datanode.data.dir can be set freely, preferably under the directory of hadoop.tmp.dir.

  In addition, if you find that jdk cannot be found when running Hadoop, you can directly place the path of jdk in hadoop.env.sh, as follows:

    export JAVA_HOME="/home/leesf/program/java/jdk1.8.0_60"

7. Running Hadoop

  After the configuration is complete, run hadoop.

  1. Initialize the HDFS system

    Use the following command in the hadop2.7.1 directory:

    bin / hdfs purpose -format

    Screenshot below:

     

    The process requires ssh authentication, and you have logged in before, so you can type y between the initialization processes.

    The successful screenshot is as follows:

    

    Indicates that the initialization has been completed.

  2. Start NameNodeand DataNodedaemonize

    Use the following command to open:

    sbin/start-dfs.sh,成功的截图如下:

    

  3. View process information

    Use the following command to view process information

    jps, the screenshot is as follows:

    

    Indicates that both DataNode and NameNode are enabled

  4. View the Web UI

Enter http://localhost:50070    in the browser to view the relevant information. The screenshots are as follows:

    

  So far, the hadoop environment has been set up. Let's start using hadoop to run a WordCount example.

8. Run WordCount Demo

  1. Create a new file locally. The author creates a new words document in the home/leesf directory, and the content in it can be filled in casually.

  2. Create a new folder in HDFS for uploading local words documents. Enter the following command in the hadoop2.7.1 directory:

    bin/hdfs dfs -mkdir /test, indicating that a test directory is created under the root directory of hdfs

    Use the following command to view the directory structure under the HDFS root directory

    bin/hdfs dfs -ls /

    The specific screenshots are as follows:

    

    Indicates that a test directory has been created in the root directory of HDFS

  3. Upload the local words document to the test directory

    Use the following command to upload:

    bin/hdfs dfs -put /home/leesf/words /test/

    Use the following command to view

    bin/hdfs dfs -ls /test/

    The screenshot of the result is as follows:

    

    Indicates that the local words document has been uploaded to the test directory.

  4. Run wordcount

    Run wordcount with the following command:

    bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /test/words /test/out

    Screenshot below:

    

    After the operation is completed, a file named out is generated in the /test directory, and the following command is used to view the files in the /test directory

    bin/hdfs dfs -ls /test

    Screenshot below:

    

    Indicates that there is already a file directory named Out in the test directory

    Enter the following command to view the files in the out directory:

    bin/hdfs dfs -ls /test/out, the result screenshot is as follows:

    

    Indicates that it has been successfully run, and the result is saved in part-r-00000.

  5. View the running results

    Use the following command to view the running result:

    bin/hadoop fs -cat /test/out/part-r-00000

    The screenshot of the result is as follows:

    

  At this point, the operation process has been completed.

9. Summary

  There are many problems in the hadoop configuration process this time. The commands of hadoop1.x and 2.x are still very different. During the configuration process, the problems are solved one by one. The configuration is successful, and there are many gains. Sharing the configuration experience is also convenient for gardeners who want to configure the hadoop environment. Any questions during the configuration process are welcome to discuss, thank you for watching~

 

The reference link is as follows:

http://www.linuxidc.com/Linux/2015-02/113487.htm

http://www.cnblogs.com/madyina/p/3708153.html

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325175219&siteId=291194637