Hadoop distributed cluster construction
environment
VMware virtual machine under Windows, using cnetos to build three hadoop distributed clusters
Download package
1. Create a hadoop user
useradd -m hadoop -s /bin/bash # 创建新用户hadoop
passwd hadoop 给用户添加密码
2. Modify network information (static ip)
Modify the hosts file. If the hostname in etc/hosts in the experimental machine is inconsistent with the actual hostname, modify etc/hosts to be the current hostname
sudo vi /etc/sysconfig/network #修改主机名 sudo vi /etc/hosts #修改ip
After modification, save and exit and restart.
reboot
3. Install SSH and configure SSH passwordless login
Use ssh-keygen to generate a key and add the key to the authorization:
ssh-keygen -t rsa # There will be a prompt, just press Entercat id_rsa.pub >> authorized_keys # 加入授权 chmod 600 ./authorized_keys # 修改文件权限
Then, on the Master node, transfer the public key to the Slave node:
scp ~/.ssh/id_rsa.pub hadoop@Slave1:/home/hadoop/ mkdir ~/.ssh # 如果不存在该文件夹需先创建,若已存在则忽略 cat ~/id_rsa.pub >> ~/.ssh/authorized_keys chmod 600 authorized_keys rm ~/id_rsa.pub # 用完就可以删掉了
4. The CentOS system needs to close the firewall
sudo service iptables stop # 关闭防火墙服务
sudo chkconfig iptables off # 禁止防火墙开机自启,就不用手动关闭了
5. Install the Java environment
sudo tar -zxvf ~/downloads/jdk-7u91-linux-x64.tar.gz -C /usr/local #解压到/usr/local目录下
Configure the JAVA_HOME environment variable:
vi ~/.bashrcAppend the following and save:
export JAVA_HOME=/usr/local/jdk1.7.0_91 export PATH=$JAVA_HOME/bin:$PATH:
* source ~/.bashrc # Make variable settings take effect *
After setting, let's check whether the settings are correct:
java -version
6. Now install hadoop2 on the master node
sudo tar -zxvf ~/downloads/hadoop-2.6.1.tar.gz -C /usr/local # 解压到/usr/local中
sudo mv hadoop-2.6.1 hadoop # 将文件夹名改为hadoop
sudo chown -R hadoop:hadoop hadoop # 修改文件权限
Before building Hadoop, we also need to set the HADOOP environment variable:
vi ~/.bashrc
Add the following at the end of the file:
# Hadoop Environment Variablesexport HADOOP_HOME=/usr/local/hadoop export HADOOP_INSTALL=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin export PATH=$PATH:/usr/local/hadoop/sbin:/usr/local/hadoop/bin
* source ~/.bashrc #Make the command take effect *
7. Modify and configure hadoop configuration file information
Modify the configuration file core-site.xml (vi ./etc/hadoop/core-site.xml)
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://Master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
</configuration>
File hdfs-site.xml, dfs.replication is generally set to 3
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>Master:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>
The file mapred-site.xml (may need to be renamed first, the default file name is mapred-site.xml.template), and then the configuration is modified as follows:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>Master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>Master:19888</value>
</property>
</configuration>
File yarn-site.xml:
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
8. After the configuration is completed. After configuration, copy the /usr/local/Hadoop folder on the Master to each node. Execute on the Master node:
If the Master ran pseudo-distributed before, it would delete unnecessary information
sudo rm -r ./hadoop/tmp # 删除 Hadoop 临时文件
sudo rm -r ./hadoop/logs/* # 删除日志文件
Pack and send
tar -zcf ~/hadoop.master.tar.gz ./hadoop # 先压缩再复制
scp ./hadoop.master.tar.gz Slave1:/home/hadoop
Execute on the Slave1 node:
sudo rm -r /usr/local/hadoop # 删掉旧的(如果存在)
sudo tar -zxf ~/hadoop.master.tar.gz -C /usr/local
sudo chown -R hadoop /usr/local/hadoop
For the first startup, you need to format the NameNode on the Master node first:
hdfs namenode -format # 首次运行需要执行初始化,之后不需要
Start hadoop cluster:
/usr/local/hadoop/sbin/start-all.sh
Verify hadoop cluster jps
主节点
Jps
2986 SecondaryNameNode
3143 ResourceManager
2791 NameNode
3212 Jps
子节点
jps
1950 Jps
1831 NodeManager
1725 DataNode
Problems encountered and solutions
no namenode process
- Check whether the ip in ifconfig and /etc/hosts are consistent
no datenode process
- Delete all files under hadoop/tmp, format dfs, and then start Hadoop again. This method will lose data and is not recommended.
This problem occurs because the clusterID in * /usr/local/hadoop/tmp/dfs/data/current/VERSION is inconsistent with the clusterID in /usr/local /
hadoop/tmp/dfs/name/current/VERSION * , keep the data in the data and the name consistent.