Distributed deployment of hadoop

This installation is a distributed installation, one master node, one slave node, and the installation environment is the centos6 system. The installation of hadoop uses the phq user, and no new hadoop user is installed.


1 Configure clock synchronization

(1) Automatically synchronize time
  • This item needs to be configured on the master node and slave node. Use the linux command as:
[root@localhost phq]# crontab -e
  • This command is the vi editing command, press i to enter insert mode, type the following code:
0 1 * * * /usr/sbin/ntpdate cn.pool.ntp.org
(2) Manual synchronization time
  • Run the following commands directly in the terminal:
[root@localhost phq]# /usr/sbin/ntpdate cn.pool.ntp.org

2 Configure the hostname

  • Use vi to edit the hostname, change the hostname of the master node to master and the hostname of the slave node to slave
[root@localhost phq]# vim /etc/sysconfig/network
修改如下:
NETWORKING=yes
HOSTNAME=master
NTPSERVERARGS=iburst

Confirm that the modification command takes effect

[root@localhost phq]# hostname master
  • Modify the hostname from the node with the same operation

3 Configure the network

  • According to the actual situation, the master and slave nodes are configured with static IP. The latest vmware virtual machine already uses static ip by default and does not need to be configured.

4 Turn off firewall and selinux

  • [x] Turn off the firewall and selinux of the master and slave node machines
(1) Turn off the firewall
  • Execute the following command in the terminal to close the firewall, and it will take effect after restarting
[root@master phq]# chkconfig iptables off
(2) Close selinux
  • Edit /etc/selinux/config in the terminal, find the SELINUX line and change it to: SELINUX=disabled
[root@master phq]# vim /etc/selinux/config


# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

5 Configure hosts

  • Edit the hostname list on the master-slave machine
[root@master phq]# vim /etc/hosts
添加如下内容:
192.168.231.131 master
192.168.231.132 slave

6 Install jdk

  • Both master and slave nodes have jdk installed and installed using rpm.
[root@master software]# rpm -ivh jdk-8u91-linux-x64.rpm 
  • Configure environment variables
[root@master ~]# vim  /etc/profile
在后面加入:
#set java environment
export JAVA_HOME=/usr/java/jdk1.8.0_91
export CLASSPATH=.:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH

Configured under normal user

[phq@master ~]$ vim .bash_profile
在后面加入:
#set java environment
export JAVA_HOME=/usr/java/jdk1.8.0_91
export CLASSPATH=.:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH

7 Configure keyless login

  • The operations in this part are operated under the phq user.
(1) Master node
  1. To generate a key in the terminal, the command is as follows (press Enter all the way to generate the key):
[phq@master ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/phq/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/phq/.ssh/id_rsa.
Your public key has been saved in /home/phq/.ssh/id_rsa.pub.
The key fingerprint is:
18:66:c2:b9:3e:88:b1:f9:d5:c5:09:e3:63:54:9a:ce phq@master
The key's randomart image is:
+--[ RSA 2048]----+
|        .        |
|   . . +         |
|    + X          |
|     X * .       |
|.   . E S        |
| = o o o         |
|+ . + .          |
| . . .           |
|  .              |
+-----------------+

  1. copy key file
[phq@master .ssh]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  1. Modify authorized_keys file permissions
[phq@master .ssh]$ chmod 600 ~/.ssh/authorized_keys
  1. Copy the authorized_keys file to the slave node
[phq@master .ssh]$ scp ~/.ssh/authorized_keys phq@slave:~/
(2) Slave node
  1. To generate a key in the terminal, the command is as follows (press Enter all the way to generate the key):
[phq@slave ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/phq/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/phq/.ssh/id_rsa.
Your public key has been saved in /home/phq/.ssh/id_rsa.pub.
The key fingerprint is:
eb:6b:5a:9f:fd:5c:29:a8:4c:79:9d:bb:fb:2d:7e:8d phq@slave
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
|                 |
|                 |
|                 |
|        S        |
|         .. o . .|
|        oo o + oo|
|       o+.oo .E.+|
|      .oo+o .*B+.|
+-----------------+

  1. Move the master node's authorized_keys file to the .ssh directory:
[phq@slave ~]$ mv authorized_keys ~/.ssh/
  1. Modify the permissions of the authorized_keys file
[phq@slave .ssh]$ chmod 600 authorized_keys
(3) Verify keyless login
  • Execute the ssh command on the master node, and the following command appears, indicating that the configuration is successful
[phq@master .ssh]$ ssh slave
Last login: Thu Sep  1 16:44:16 2016 from 192.168.231.1

8 hadoop configuration deployment

  • The hadoop configuration of each node is basically the same, operating on the master node, and then replicated to the slave nodes.
(1) Unzip the installation package
[phq@master ~]$ tar -xvf ~/hadoop-2.5.2.tar.gz
(2) Configure hadoop-env.sh
  • This part only needs to configure the jdk path. found at the beginning of the file
# The java implementation to use.
export JAVA_HOME=${JAVA_HOME}

Modified to the following code:

export JAVA_HOME=/usr/java/jdk1.8.0_91
(3) Configure yarn-env.sh
  • This part only needs to configure the jdk path. found at the beginning of the file
# some Java parameters
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/

Modify it to the following code (remove the #)

# some Java parameters
export JAVA_HOME=/usr/java/jdk1.8.0_91
(4) Configure the core component core-site.xml
  • Replace the content of core-site.xml with the code below
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl"
href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/phq/hadoopdata</value>
</property>
</configuration>
(5) Configure the system file hdfs-site.xml
  • Replace the content of hdfs-site.xml with the code below
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl"
href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
(6) Placement yarn-site.xml
  • Replace the content of yarn-site.xml with the code below
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:18030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:18141</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:18088</value>
</property>
</configuration>
(7) Configure the computing framework mapred-site.xml
  1. Copy the mapred-site.xml.template file to mapred-site.xml
[phq@master hadoop]$ cp mapred-site.xml.template mapred-site.xml
  1. Replace the content of mapred-site.xml with the following code
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl"
href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
(8) Configure the slaves file on the master node
  • Replace the content in the slaves file with the following code
slave
(9) Copy to slave node
[phq@master ~]$ scp -r hadoop-2.5.2 phq@slave:~/

9 Start the cluster

(1) Configure environment variables for hadoop startup
  • This part of the master and slave nodes need to be configured
[phq@master ~]$ vim .bash_profile
文件末尾追加以下内容:
#HADOOP
export HADOOP_HOME=/home/phq/hadoop-2.5.2
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

then make the file valid

[phq@master ~]$ source ~/.bash_profile
(2) Create a data directory
  • This part of the master and slave nodes need to be configured
[phq@master ~]$ mkdir /home/phq/hadoopdata
(3) Start hadoop cluster
  1. format file system
  • Format the filesystem on the master node
[phq@master hadoop]$ hdfs namenode -format
  • If there is no Exception/Error, the format is successful
  1. start hadoop
  • Start the cluster with start-all.sh
[phq@master sbin]$ ./start-all.sh
  1. View progress
  • [x] Execute the jps command in the terminal, you can see 4 processes on the master node:
[phq@master sbin]$ jps
4226 ResourceManager
4503 Jps
3898 NameNode
4077 SecondaryNameNode

There are 3 processes visible from the node:

[phq@slave ~]$ jps
3267 Jps
3160 NodeManager
3065 DataNode

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325496061&siteId=291194637