Hadoop environment configuration and testing

Hadoop environment configuration and testing

In the previous experiment, we have prepared and configured the Linux environment and the Hadoop environment. Therefore, in this experiment, we will configure and test the Hadoop environment on the basis of the previous experiment.

Linux environment installation and configuration before Hadoop environment is built
https://blog.csdn.net/weixin_43640161/article/details/108614907
JDK software installation and configuration under Linux
https://blog.csdn.net/weixin_43640161/article/details /108619802
Master the installation and configuration of Eclipse software under Linux
https://blog.csdn.net/weixin_43640161/article/details/108691921
Familiar with Hadoop download and decompression
https://blog.csdn.net/weixin_43640161/article/details/ 108697510

There are three ways to install Hadoop: stand-alone mode, pseudo-distributed mode, and distributed mode.
• Stand-alone mode: The default mode of Hadoop is non-distributed mode (local mode), and it can run without other configuration. Non-distributed or single Java process, convenient for debugging.
• Pseudo-distributed mode: Hadoop can run in a pseudo-distributed manner on a single node. The Hadoop process runs as a separate Java process. The node acts as both a NameNode and a DataNode. At the same time, it reads files in HDFS.
• Distributed mode: Use multiple nodes to form a cluster environment to run Hadoop.
• This experiment adopts a stand-alone pseudo-distributed mode for installation.

Important knowledge tips:

  1. Hadoop can run in a pseudo-distributed manner on a single node. The Hadoop process runs as a separate Java process. The node acts as both a NameNode and a DataNode. At the same time, it reads files in HDFS.
  2. The configuration file of Hadoop is located in hadoop/etc/hadoop/. Pseudo-distribution needs to modify five configuration files hadoop-env.sh, core-site.xml, hdfs-site.xml, mapred-site.xml and yarn-site. xml
  3. The Hadoop configuration file is in xml format, and each configuration implements the
    experiment steps by declaring the name and value of the property :
  4. Modify configuration files: hadoop-env.sh, core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml
  5. Initialize the file system hadoop namenode -format
  6. Start all processes start-all.sh or start-dfs.sh, start-yarn.sh
  7. Visit the web interface to view Hadoop information
  8. Run instance
  9. Stop all instances: stop-all.sh

The first step: configure the Hadoop environment (jdk version is different, the modified content is also different, I here is jdk1.8.0_181 and hadoop-3.1.1)

1. Configure Hadoop (pseudo-distributed), modify 5 configuration files

  1. Enter the Hadoop etc directory.
    Terminal command: cd /bigdata/hadoop-3.1.1/etc/hadoop
    Insert picture description here

  2. Modify the first configuration file
    Terminal command: sudo vi hadoop-env.sh
    Insert picture description here

Find line 54 and modify JAVA_HOME as follows (remember to remove the # sign in front):

export JAVA_HOME=/opt/java/jdk1.8.0_181

Insert picture description here

  1. Modify the second configuration file
    Terminal command: sudo vi core-site.xml
    Insert picture description here

Insert picture description here

<configuration>
  <!-- 配置hdfs的namenode(老大)的地址 -->
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
  </property>

  <!-- 配置Hadoop运行时产生数据的存储目录,不是临时的数据 -->
  <property>
    <name>hadoop.tmp.dir</name>
    <value>file:/bigdata/hadoop-3.1.1/tmp</value>
  </property>
</configuration>
  1. Modify the third configuration file
    Terminal command: sudo vi hdfs-site.xml
    Insert picture description here
    Insert picture description here
<configuration>
 <!-- 指定HDFS存储数据的副本数据量 -->
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
<property>
        <name>dfs.namenode.http-address</name>
        <value>localhost:50070</value>
</property>

<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/bigdata/hadoop-3.1.1/tmp/dfs/name</value>
 </property>
 <property>
     <name>dfs.datanode.data.dir</name>
     <value>file:/bigdata/hadoop-3.1.1/tmp/dfs/data</value>
 </property>

</configuration>

In addition, although pseudo-distribution only needs to configure fs.defaultFS and dfs.replication to run (the official tutorial is the case), if the hadoop.tmp.dir parameter is not configured, the default temporary directory used is /tmp/hadoo-hadoop, And this directory may be cleaned up by the system when restarting, so format must be executed again. So we set it up and also specify dfs.namenode.name.dir and dfs.datanode.data.dir, otherwise there may be errors in the next steps.

  1. Modify the fourth configuration file:
    Terminal command: sudo vi mapred-site.xml
    Insert picture description here
    Insert picture description here
<configuration>
  <!-- 指定mapreduce编程模型运行在yarn上  -->
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>
 
  1. Modify the fifth configuration file
    sudo vi yarn-site.xml
    Insert picture description here
    Insert picture description here
<configuration>
  <!-- 指定yarn的老大(ResourceManager的地址) -->
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>localhost</value>
  </property>
  
  <!-- mapreduce执行shuffle时获取数据的方式 -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
</configuration>
 
  1. Initialize hdfs (format HDFS)
    terminal command:
    cd /bigdata/hadoop-3.1.1/bin/
    sudo ./hdfs namenode -format
    Insert picture description here

  2. If the following information is prompted, the formatting is successful:

Insert picture description here

Step 2: Start and test Hadoop

Terminal command:
cd /bigdata/hadoop-3.1.1/sbin/
ssh localhost
sudo ./start-dfs.sh
sudo ./start-yarn.sh
start-all.sh
Insert picture description here

If the above error is reported, please modify the following 4 files as follows:
Under the /hadoop/sbin path:
add the following parameters to the top of the start-dfs.sh and stop-dfs.sh files

#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

Terminal command: sudo vi start-dfs.sh

Insert picture description here

Terminal command: sudo vi stop-dfs.sh

Insert picture description here

Also, start-yarn.sh and stop-yarn.sh also need to add the following parameters at the top:

#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

Terminal command: sudo vi start-yarn.sh
Insert picture description here

Terminal command: sudo vi start-yarn.sh
Insert picture description here

Restart ./start-all.sh after modification, success!
Insert picture description here
In addition, if the following error occurs:
Insert picture description here
solve it in the following way:
terminal command:
ssh localhost
cd /bigdata/hadoop-3.1.1/
sudo chmod -R 777 logs
sudo chmod -R 777 tmp

Insert picture description here

  1. Use the jps command to check whether the process exists. There are a total of 5 processes (except jps). Each time you restart, the process ID number will be different. If you want to shut down you can use the stop-all.sh command.
    4327 DataNode
    4920 NodeManager
    4218 NameNode
    4474 SecondaryNameNode
    4651 ResourceManager
    5053 Jps
    Insert picture description here

  2. Access the management interface of hdfs
    localhost:50070
    Insert picture description here

  3. Access yarn management interface
    localhost:8088

Insert picture description here

  1. If you click on Nodes, you will find that ubuntu:8042 is also accessible

Insert picture description here
Insert picture description here

  1. If you want to stop all services, please enter sbin/stop-all.sh

Insert picture description here

The above is the content of Hadoop environment configuration and testing. If you encounter some weird errors, you can leave a message in the comment area.

Guess you like

Origin blog.csdn.net/weixin_43640161/article/details/108745864