Linux versions: Ubuntu 16.04 Server LTS
1. Install Linux, the initial user name to hadoop, host are:
Lead1,Lead2,Register1,Register2,Register3,Follower1,,Follower2,Follower3,Follower4,Follower5
HA Lead1, Lead2 Namenode and for placement of Resourcemanager
Register1, Register2, Register3 used to run ZooKeeper cluster service and qjournal
Follower1,, Follower2, Follower3, Follower4, Follower5 used to run Datanode and Nodemanager
2. Install the software: openjdk 1.8, vim openssh-server,
3. Configure ssh-free dense Login:
a. ssh -keygen -t rsa
b. cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
c scp ~ / .ssh / authorized_keys xxx (hostname):. ~ / .ssh / authorized_keys until all authorized_keys file contains all the hosts are host id_rsa.pub
4. Check the content using ifconfig ip address sequentially modify all hosts / etc / hosts, while removing their mapping point 127.0.0.1 etc.
5. sudo chmod 777 / opt and applied to all hosts
ZooKeeper Version: 3.4.10
1. ZooKeeper-3.4.10 extract to Redister1, Redister2, Redister3 three hosts / opt under
2. Add /etc/profile.d/zookeeper.sh, says:
export ZOOKEEPER_HOME=/opt/hadoop-2.7.3
export PATH=$ZOOKEEPER_HOME/bin:$PATH
3. Copy zookeeper-3.4.10 / conf / is in zoo_sample.cfg zoo.cfg, change as follows
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit = 10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir = / opt / zookeeper / data (self-set as needed)
dataLogDir=/opt/zookeeper/log
# the port at which the clients will connect
clientPort = 2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
# Autopurge.purgeInterval = 1
server.1=Register1:2888:3888
server.2=Register2:2888:3888
server.3=Register3:2888:3888
4. Start ZooKeeper
ssh yrf@Register1 << Function
zkServer.sh start
exit
Function
ssh yrf@Register2 << Function
zkServer.sh start
exit
Function
ssh yrf@Register3 << Function
zkServer.sh start
exit
Function
5. Verify successful start ZooKeeper
ssh yrf@Register1 << Function
zkServer.sh status
exit
Function
ssh yrf@Register2 << Function
zkServer.sh status
exit
Function
ssh yrf@Register3 << Function
zkServer.sh status
exit
Function
6. End ZooKeeper
ssh yrf@Register1 << Function
zkServer.sh stop
exit
Function
ssh yrf@Register2 << Function
zkServer.sh stop
exit
Function
ssh yrf@Register3 << Function
zkServer.sh stop
exit
Function
Hadoop Version: 2.7.3
1. The downloaded hadoop-2.7.3 moved to the / opt directory
2. hadoop-2.7.3 / etc / hadoop / in several configuration files to configure
a. core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://NAMENODE/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/temp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>Register1:2181,Register2:2181,Register3:2181</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
</configuration>
b. hadoop-env.sh
Locate and modification in which the installation java path is the path, if it is installed through the system, it is generally export JAVA_HOME = / usr
c. hdfs-site.xml
<configuration>
<property>
<name>dfs.nameservices</name>
<value>NAMENODE</value>
</property>
<property>
<name>dfs.ha.namenodes.NAMENODE</name>
<value>namenode1,namenode2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.NAMENODE.namenode1</name>
<value>Lead1:9000</value>
</property>
<property>
<name>dfs.namenode.rpc-address.NAMENODE.namenode2</name>
<value>Lead2:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.NAMENODE.namenode1</name>
<value>Lead1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.NAMENODE.namenode2</name>
<value>Lead2:50070</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://Register1:8485;Register2:8485;Register3:8485/NAMENODE</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/hadoop/journal/data</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.NAMENODE</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>10000</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/yrf/.ssh/id_rsa</value>
</property>
</configuration>
d. mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
e. slaves
Follower1
Follower2
Follower3
Follower4
Follower5
f. yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>YARN</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>yarn1,yarn2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.yarn1</name>
<value>Lead1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.yarn2</name>
<value>Lead2</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>Register1:2181,Register2:2181,Register3:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
3. Start Hadoop:
. A first start:
ssh yrf@Register1 << Function
hadoop-daemon.sh start journalnode
exit
Function
ssh yrf@Register2 << Function
hadoop-daemon.sh start journalnode
exit
Function
ssh yrf@Register3 << Function
hadoop-daemon.sh start journalnode
exit
Function
ssh yrf@Lead1 << Function
hdfs zkfc -formatZK
hdfs namenode -format
start-dfs.sh
start-yarn.sh
exit
Function
ssh yrf@Lead2 << Function
yarn-daemon.sh start resourcemanager
exit
Function
. B General start:
ssh yrf@Lead1 << Function
start-dfs.sh
start-yarn.sh
yarn-daemon.sh start resourcemanager
exit
Function
ssh yrf@Lead2 << Function
yarn-daemon.sh start resourcemanager
exit
Function
4. Stop Hadoop:
ssh yrf@Lead1 << Function
stop-yarn.sh
exit
Function
ssh yrf@Lead2 << Function
yarn-daemon.sh stop resourcemanager
exit
Function
ssh yrf@Lead1 << Function
stop-dfs.sh
exit
Function
5. Add /etc/profile.d/hadoop.sh hadoop command can be used globally
export HADOOP_HOME=/opt/hadoop-2.7.3
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH