CentOS7 install stand-alone Hadoop2.7.3

CentOS7 install stand-alone Hadoop2.7.3

Preliminary preparation

  • Install java, copy jdk-8u111-linux-x64.rpm downloaded from the oracle official website to the virtual machine. And install:

    rpm -i jdk-8u111-linux-x64.rpm
    
  • Set the java path as an environment variable, edit the /etc/profile file, and add a line:

    export JAVA_HOME=/usr/java/latest
    
  • Apply the JAVA_HOME environment variable through the source command and verify

    echo $JAVA_HOME  //输出空路径
    source /etc/profile
    echo $JAVA_HOME  //此时输出正确的JAVA_HOME
    
  • Copy hadoop-2.7.3.tar.gz to the user path.

  • unzip

    tar -xf hadoop-2.7.3.tar.gz
    
  • Obtain the hadoop-2.7.3 directory and enter it.

Run the hadoop program in standalone form

For details, please refer to: http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/SingleCluster.html

The first step is to run the Hadoop program in stand-alone mode:

  • Configure the Java running path for Hadoop, edit the etc/hadoop/hadoop-env.sh file under the Hadoop-2.7.3 path to add the java path, as follows:

     export JAVA_HOME=/usr/java/latest
    
  • Execute Hadoop's own mapReduce example:

    //在Hadoop-2.7.3路径下创建input目录
    $ mkdir input
    //将hadoop的配置拷贝到刚创建的input目录下
    $ cp etc/hadoop/*.xml input
    //对input路径下的文件执行Hadoop自带示例中的MapReduce程序,并将输出写入到output目录中。**这里增加了2>>err.txt,由于Hadoop的输出一闪而过,通过这条参数将其输出到err.txt文件中。**
    $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input/ output/ 'dfs[a-z.]+' 2>>err.txt
    $ cat output/*
    
  • After execution, check the err.txt file and find that an error is reported, but the reason for the error is temporarily unknown:

    EBADF:Bad file descriptor
    
  • View the output in the output directory through the command

    1 dfsadmin
    
  • Note that if you want to run the mapReduce program again, you must delete the output command first, otherwise an error message that the output directory already exists will appear.

The second step is to build a pseudo-distributed (open HDFS)

  • According to the Apache tutorial, configure core-site.xml and hdfs-site.xml

  • Install ssh

  • Format the distributed file system

    $ bin / hdfs purpose -format

  • To start the distributed file system, you need to confirm several times to enter yes in the middle. Start namenode, datanode, and secondarynamenode respectively.

    $ sbin/start-dfs.sh

  • When the browser visits http://localhost:50070, you can see the page of remotely accessing hdfs, and the HDFS is successfully started.

  • Create an input directory on hdfs

    bin/hdfs dfs -mkdir /input
    
  • Copy all the configuration files in the configuration path etc/hadoop under the Hadoop-2.7.3 installation directory in the file system to the /input directory on hdfs

    bin/hdfs dfs -put etc/hadoop/* /input
    
  • Run the previous mapReduce example on HDFS and output the error log to err2.txt:

      $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+' 2>>err2.txt
    
  • You can see that it still contains the bad file description exception. Look for the cause of this exception later.

  • View the output of the mapReduce calculation just now

    $ bin/hdfs dfs -cat /output/*
    
  • If you want to turn off hdfs use:

    $ sbin/stop-dfs.sh
    

The third step is to start yarn on a single node

  • According to the official Apache tutorial, edit the two configuration files mapred-site.xml and yarn-site.xml

  • Type the command to start yarn

    sbin/start-yarn.sh
    
  • If you want to close yarn, type the command:

    sbin/stop-yarn.sh
    

to sum up

  • Initially, only 2G of memory and single-core CPU were allocated to the virtual machine. When running the above MapReduce example, it will get stuck. Analysis of the log found that there were two exceptions, OutOfMemory and TimeOut. By increasing the CPU to 4 cores and the memory to 5G, the MapReduce example can be run smoothly on yarn.

  • In addition, HDFS always feels slow when using the bin/hdfs dfs… command operation. I don’t know if it can be speeded up later on the server.

  • When the Hadoop single machine executes its own example, the background reports EBADF: Bad file description exception. Still don't know how to eliminate the problem. Friends who understand this are welcome to reply and give pointers. Thank you.

Guess you like

Origin blog.csdn.net/killingbow/article/details/54693656