1. Copy the virtual machine, do five sets.
1: namenode, resourcemanager
2:secondardNameNode
3,4,5: DataNode
2, the modified card configuration, connected SecureCRT
User ---------- --------- root ----
3, date view time
4, boot the ntpdate
chkconfig --list
chkconfig --level 12345 ntpdate on
5, ntpdate time synchronization
service ntpdate restart
6, see the JAVA_HOME
echo $JAVA_HOME
-rw-r--r--. 2 root root 158 Jan 12 2010 hosts
-rw-r--r--. 1 root root 1796 Oct 2 2013 profile
8, each configuration of the machine domain
[root@zengmg etc]# vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoop1
hostname node1 changes to take effect without rebooting.
9, each configuration hosts file
Multi-window to send commands
vi /etc/hosts
192.168.18.131 hadoop1
192.168.18.132 hadoop2
192.168.18.133 hadoop3
192.168.18.134 hadoop4
192.168.18.135 hadoop5
// To write a shell script, if the cluster has dozens of how to do?
10, turn off the firewall
chkconfig iptables off
service iptables stop
11, created hadoop user and password
adduser hadoop
passwd hadoop
12, a user profile root group hadoop
vi /etc/sudoers
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
Add here
hadoop ALL=(ALL) ALL
--------------- hadoop ---- user -----------------------------
1, ssh password free
hadoop1 is NameNode. Each machine to each other without having to ssh password-free. As long as hadoop1 able to avoid close access to other machines
hadoop1 machine:
ssh-keygen -t rsa
ssh-copy-id hadoop1
ssh-copy-id hadoop2
ssh-copy-id hadoop3
ssh-copy-id hadoop4
ssh-copy-id hadoop5
The secret ssh login is free for hadoop user. root user is not configured, can not avoid each other secret login. Free user secret is effective ssh
1, JDK and upload hadoop
2, in the home directory hadoop built folder application. In (the root directory) / New no authority, only root can create.
[hadoop@hadoop1 /]$ mkdir application
mkdir: cannot create directory `application': Permission denied
3, extract hadoop, jdk archive to the application folder
-C application compressed tar -xzvf
4, configuration jdk, hadoop environment variables
[hadoop@hadoop1 etc]$ sudo vi /etc/profile
export JAVA_HOME=/home/hadoop/application/jdk1.8.0_73
export HADOOP_HOME=/home/hadoop/application/hadoop-2.7.3
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin/:$HADOOP_HOME/sbin
5, the variable configuration to take effect
source /etc/profile
6, hadoop configuration profiles
1), hadoop-env.sh shell configuration file javahome
# The java implementation to use.
export JAVA_HOME=/home/hadoop/application/jdk1.8.0_73
2), core-site.xml NameNode configuration file access address and data storage path namenode
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop_data</value>
</property>
------ The following configuration is to visit hive beeline configuration -
hadoop increase the allocation of core-site.xml
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>hadoop</value>
</property>
// If <value> * </ value> is that all users
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
// Here all configured ip allow proxy access. If not, then configure the IP address
3), hdfs-site.xml configuration data file copy number may not be configurable. Because the default is 3
This document also configure SecondaryNameNode
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop2:50090</value>
</property>
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>hadoop2:50091</value>
</property>
4), map-site.xml specified mr yarn run on the platform, are arranged resourcemanager
mv mapred-site.xml.template mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
5), yarn-site.xml specified YARN boss (the ResourceManager) address, reducer data acquisition mode
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop1:8088</value>
</property>
This port can be modified
6) configured slaves
Note:. Slaves in the configuration is the start - * sh startup script to read cluster configuration file. If not specified, it will run the stand-alone version.
The default is localhost
hadoop3,4,5是DataNode。
hadoop3
hadoop4
hadoop5
----------------------
Note: At this point namenode node already initialized. Namenode can be started individually, DataNode each start one, would like namenode registered one. NameNode After initialization, you can not learn under [start-all.sh inside the wording, how to traverse] or the following command.
As needed, up to the start node DataNode command.
hadoop-daemon.sh start namenode
hadoop-daemon.sh start datanode
7, scp application files to other nodes
scp application hadoop2:/home/hadoop
.........
8, scp / etc / profile to another node
sudo scp /etc/profile hadoop2:/etc/profile
..........
9, source / etc / profile for each node
source /etc/profile
10, namenode formatted (initialized is namenode)
hdfs namenode -format
Storage directory /home/hadoop/hadoop_data/dfs/name has been successfully formatted.
11, start hadoop
hadoop1:
start-dfs.sh
12, start YARN
hadoop1:
start-yarn.sh
13, verify successful
jsp
Web access:
http://192.168.18.131:50070 (HDFS Management Interface)
http://192.168.18.131:8088/ (the MR Management Interface)