The road to Java big data--Hadoop (2) pseudo-distributed installation

Pseudo-distributed installation

Table of contents

Pseudo-distributed installation

1. Steps

1. Turn off the firewall

2. Configure the host name

3. Configure the hosts file to map the host name and ip address

4. Configure ssh for password-free communication

5. Download files

6. Configure hadoop-env.sh

7. Configure core-site.xml

          8. Configure hdfs-site.xml

         9. Configure yarn-site.xml

         10. Configure slaves

11. Configure hadoop environment variables

12. Format namenode: hadoop namenode -format

13. Start hadoop: start-all.sh

2. Matters needing attention

3. Frequently asked questions


1. Steps

1. Turn off the firewall

Temporary shutdown: service iptables stop

Permanent shutdown: chkconfig iptables off

2. Configure the host name

It should be noted that the hostname in the Hadoop cluster cannot have _ . If there is _ , the Hadoop cluster will not be able to find this group of hosts, so it will not be able to start!

Edit the network file: vim /etc/sysconfig/network

Change the HOSTNAME attribute to the specified hostname, for example: HOSTNAME=hadoop01

Let the network file take effect again: source /etc/sysconfig/network

 

3. Configure the hosts file to map the host name and ip address

Edit the hosts file: vim /etc/hosts

Match the host name to the ip address, for example: 10.42.3.8 hadoop01

4. Configure ssh for password-free communication

Generate your own public and private keys, and the generated public and private keys will be automatically stored in the /root/.ssh directory: ssh-keygen

Copy the generated public key to the remote machine in the format: ssh-copy-id [user]@host, for example: ssh-copy-id root@hadoop01

5. Download files

  1. Restart Linux to make the modification of the host name take effect: reboot
  2. Install JDK
  3. Upload or download the Hadoop installation package to Linux
  4. Unzip the installation package: tar -xvf hadoop-2.7.1_64bit.tar.gz
  5. Enter the subdirectory etc/hadoop of the Hadoop installation directory and configure Hadoop: cd hadoop2.7.1/etc/hadoop

6. Configure hadoop-env.sh

  1. Edit hadoop-env.sh: vim hadoop-env.sh
  2. Modify the path of JAVA_HOME to a specific path. For example: export JAVA_HOME=/home/software/jdk1.8
  3. Modify the path of HADOOP_CONF_DIR to a specific path, for example: export HADOOP_CONF_DIR=/home/software/hadoop-2.7.1/etc/hadoop
  4. save exit file
  5. Reloading takes effect: source hadoop-env.sh

7. Configure core-site.xml

vim core-site.xml

		<property>
		    <!-- 指定HDFS中的主节点 - namenode -->
		    <name>fs.defaultFS</name>               
		    <value>hdfs://hadoop01:9000</value>
		</property>
		<property>
		    <!-- 执行Hadoop运行时的数据存放目录 -->
		    <name>hadoop.tmp.dir</name>
		    <value>/home/software/hadoop-2.7.1/tmp</value>
		</property>

8. Configure hdfs-site.xml

将mapred-site.xml.template复制为mapred-site.xml:cp mapred-site.xml.template mapred-site.xml

编辑mapred-site.xml:vim mapred-site.xml

		<property>
		    <!-- 指定将MapReduce在Yarn上运行  -->
		    <name>mapreduce.framework.name</name>
		    <value>yarn</value>
		</property>

9. Configure yarn-site.xml

Edit yarn-site.xml: vim yarn-site.xml

		<!-- 指定Yarn的主节点 - resourcemanager -->
		<property>
		    <name>yarn.resourcemanager.hostname</name>
		    <value>hadoop01</value>
		</property>
		<!-- NodeManager的数据获取方式 -->
		<property>
		    <name>yarn.nodemanager.aux-services</name>
		    <value>mapreduce_shuffle</value>
		</property>

10. Configure slaves

  1. Edit slaves: vim slaves
  2. Add slave node information, for example: hadoop01
  3. save and exit

11. Configure hadoop environment variables

  1. Edit profile file: vim /etc/profile
  2. Add Hadoop environment variables, for example:
    1. export HADOOP_HOME=/home/software/hadoop-2.7.1      
    2. export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
  3. save and exit
  4. Re-validation: source /etc/profile

12. Format namenode: hadoop namenode -format

13. Start hadoop: start-all.sh

2. Matters needing attention

1. If the Hadoop configuration does not take effect, you need to restart Linux

2. When formatting, there will be such an output: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted. If this sentence appears, the formatting is successful

3. If Hadoop starts successfully, JPS will have 5 processes: Namenode, Datanode, Secondarynamenode, ResourceManager, NodeManager

4. After Hadoop starts successfully, you can access the HDFS page through the browser, and the access address is: IP address: 50070

5. After Hadoop starts successfully, you can visit the Yarn page through the browser, and the access address is: http://IP address:8088

3. Frequently asked questions

1. Execute Hadoop commands, such as formatting: hadoop namenode -format appears: command not found error

Solution: Check: Hadoop configuration in /etc/profile

2. Less HFDS-related processes, such as less namenode, datanode

Solution: You can go to the logs directory under the Hadoop installation directory to view the startup log files of the corresponding process.

Method 1: ①Stop all HDFS-related processes first (stop-dfs.sh or kill -9) ②Restart HDFS (start-dfs.sh)

Method 2: ① Stop all HDFS-related processes first ② Delete the metadata directory ③ Re-format: hadoop namenode -format ④ Start Hadoop: start-all.sh

3. If XXXXManager, then modify mapred, yarn, restart

4. The command cannot be found, the hadoop-env.sh configuration is wrong, and the profile configuration is wrong

 

Guess you like

Origin blog.csdn.net/a34651714/article/details/102806605