Pseudo-distributed installation
Table of contents
Pseudo-distributed installation
3. Configure the hosts file to map the host name and ip address
4. Configure ssh for password-free communication
11. Configure hadoop environment variables
12. Format namenode: hadoop namenode -format
13. Start hadoop: start-all.sh
1. Steps
1. Turn off the firewall
Temporary shutdown: service iptables stop
Permanent shutdown: chkconfig iptables off
2. Configure the host name
It should be noted that the hostname in the Hadoop cluster cannot have _ . If there is _ , the Hadoop cluster will not be able to find this group of hosts, so it will not be able to start!
Edit the network file: vim /etc/sysconfig/network
Change the HOSTNAME attribute to the specified hostname, for example: HOSTNAME=hadoop01
Let the network file take effect again: source /etc/sysconfig/network
3. Configure the hosts file to map the host name and ip address
Edit the hosts file: vim /etc/hosts
Match the host name to the ip address, for example: 10.42.3.8 hadoop01
4. Configure ssh for password-free communication
Generate your own public and private keys, and the generated public and private keys will be automatically stored in the /root/.ssh directory: ssh-keygen
Copy the generated public key to the remote machine in the format: ssh-copy-id [user]@host, for example: ssh-copy-id root@hadoop01
5. Download files
- Restart Linux to make the modification of the host name take effect: reboot
- Install JDK
- Upload or download the Hadoop installation package to Linux
- Unzip the installation package: tar -xvf hadoop-2.7.1_64bit.tar.gz
- Enter the subdirectory etc/hadoop of the Hadoop installation directory and configure Hadoop: cd hadoop2.7.1/etc/hadoop
6. Configure hadoop-env.sh
- Edit hadoop-env.sh: vim hadoop-env.sh
- Modify the path of JAVA_HOME to a specific path. For example: export JAVA_HOME=/home/software/jdk1.8
- Modify the path of HADOOP_CONF_DIR to a specific path, for example: export HADOOP_CONF_DIR=/home/software/hadoop-2.7.1/etc/hadoop
- save exit file
- Reloading takes effect: source hadoop-env.sh
7. Configure core-site.xml
vim core-site.xml
<property>
<!-- 指定HDFS中的主节点 - namenode -->
<name>fs.defaultFS</name>
<value>hdfs://hadoop01:9000</value>
</property>
<property>
<!-- 执行Hadoop运行时的数据存放目录 -->
<name>hadoop.tmp.dir</name>
<value>/home/software/hadoop-2.7.1/tmp</value>
</property>
8. Configure hdfs-site.xml
将mapred-site.xml.template复制为mapred-site.xml:cp mapred-site.xml.template mapred-site.xml
编辑mapred-site.xml:vim mapred-site.xml
<property>
<!-- 指定将MapReduce在Yarn上运行 -->
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
9. Configure yarn-site.xml
Edit yarn-site.xml: vim yarn-site.xml
<!-- 指定Yarn的主节点 - resourcemanager -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop01</value>
</property>
<!-- NodeManager的数据获取方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
10. Configure slaves
- Edit slaves: vim slaves
- Add slave node information, for example: hadoop01
- save and exit
11. Configure hadoop environment variables
- Edit profile file: vim /etc/profile
- Add Hadoop environment variables, for example:
- export HADOOP_HOME=/home/software/hadoop-2.7.1
- export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
- save and exit
- Re-validation: source /etc/profile
12. Format namenode: hadoop namenode -format
13. Start hadoop: start-all.sh
2. Matters needing attention
1. If the Hadoop configuration does not take effect, you need to restart Linux
2. When formatting, there will be such an output: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted. If this sentence appears, the formatting is successful
3. If Hadoop starts successfully, JPS will have 5 processes: Namenode, Datanode, Secondarynamenode, ResourceManager, NodeManager
4. After Hadoop starts successfully, you can access the HDFS page through the browser, and the access address is: IP address: 50070
5. After Hadoop starts successfully, you can visit the Yarn page through the browser, and the access address is: http://IP address:8088
3. Frequently asked questions
1. Execute Hadoop commands, such as formatting: hadoop namenode -format appears: command not found error
Solution: Check: Hadoop configuration in /etc/profile
2. Less HFDS-related processes, such as less namenode, datanode
Solution: You can go to the logs directory under the Hadoop installation directory to view the startup log files of the corresponding process.
Method 1: ①Stop all HDFS-related processes first (stop-dfs.sh or kill -9) ②Restart HDFS (start-dfs.sh)
Method 2: ① Stop all HDFS-related processes first ② Delete the metadata directory ③ Re-format: hadoop namenode -format ④ Start Hadoop: start-all.sh
3. If XXXXManager, then modify mapred, yarn, restart
4. The command cannot be found, the hadoop-env.sh configuration is wrong, and the profile configuration is wrong