hadoop cluster deployment
1. Prepare three machines, such as `10.8.177.23`, `10.8.177.24`, `10.8.177.25`
2. Modify the host name and configure the `hosts` file (operate under the root user):
# Execute on each machine, I start with hd here, and the number behind is consistent with the last group of the machine ip hostnamectl set-hostname hd-23 hostnamectl set-hostname hd-23 --static # Modify the hosts file, vi /etc/hosts #Add routing configuration 10.8.177.23 hd-23 10.8.177.24 hd-24 10.8.177.25 hd-25
3. Create a user on each machine, such as hadoop:
useradd -d /home/hadoop -m hadoop # It's better to create a user, don't use root to operate directly
4. Set password-free login (==hadoop user, the same below==)
> Only need to set the master for password-free login for the other two machines
# 1. Generate the ssh public key in the home directory on the Master machine ssh-keygen -t rsa # 2. Create the .ssh directory in the home directory on the rest of the machines (you can also execute the above command) # 3. Send the Master's public key to the other two servers (need to enter password-free) scp id_rsa.pub hadoop@hd-24:/home/hadoop/.ssh/id_rsa.pub.23 scp id_rsa.pub hadoop@hd-25:/home/hadoop/.ssh/id_rsa.pub.23 # 4. Create the authorized_keys file in .shh and authorize touch authorized_keys chmod 644 authorized_keys # 5. Add the Master's public key to the authorization file echo id_rsa.pub.23 >> authorized_keys # Above, you can access 24,25 from 23 without password, you can use the following command to test: ssh hd-24
5. 下载jdk、hadoop、hbase、zookeeper
> - jdk (you can also download it yourself): `wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn- pub/java/jdk/8u91-b14/jdk-8u91-linux-x64.tar.gz` > - zookeeper-3.4.8.tar.gz: `wget http://mirrors.hust.edu.cn/apache/zookeeper/zookeeper-3.4.8/zookeeper-3.4.8.tar.gz` > - hbase-1.2.2-bin.tar.gz:`wget http://mirrors.hust.edu.cn/apache/hbase/1.2.2/hbase-1.2.2-bin.tar.gz` > - hadoop-2.7.2.tar.gz:`wget http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz`
6. Unzip the above files
7. Configure environment variables
vi ~ / .bashrc JAVA_HOME=/home/hadoop/jdk1.8.0_77 JRE_HOME=$JAVA_HOME/jre CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar HADOOP_HOME=/home/hadoop/hadoop-2.7.2 HBASE_HOME=/home/hadoop/hbase-1.2.2 PATH=.:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin:$HBASE_HOME/sbin expor JAVA_HOME JRE_HOME CLASSPATH HADOOP_HOME HBASE_HOME # After the configuration is completed, compile and take effect source .bashrc #Send to other machines scp .bashrc hadoop@hd-24:/home/hadoop/
8. Configure hadoop
The hadoop configuration file is located under `hadoop-2.7.2/etc/hadoop`, you need to configure `core-site.xml`, `hdfs-site.xml`, `yran-site.xml`, `mapred-site.xml` `,`hadoop-env.sh`,`slaves`
core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hd-23:6000</value> <final>true</final> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/configsets/hadoop_tmp</value> </property> <property> <name>fs.checkpoint.period</name> <value>3600</value> </property> <property> <name>fs.checkpoint.size</name> <value>67108864</value> </property> </configuration>
hdfs-site.xml
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>/home/hadoop/configsets/metadata</value> </property> <property> <name>dfs.http.address</name> <value>hd-23:50070</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>hd-23:50090</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/hadoop/configsets/data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration>
yarn-site.xml
<configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>hd-23</value> </property> <property> <name>yarn.resourcemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.log.dir</name> <value>/home/hadoop/configsets/yarn_log</value> </property> </configuration>
mapred-site.xml, if this file does not exist `cp mapred-site.xml.template mapred-site.xml` a
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.cluster.temp.dir</name> <value>/home/hadoop/configsets/mr_tmp</value> <final>true</final> </property> <property> <name>mapreduce.jobhistory.address</name> <value>hd-23:6002</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hd-23:6003</value> </property> </configuration>
hadoop-env.sh configure JAVA_HOME into it
#Annotate the sentence below, it is not easy to use, but it should be able to be used, but the result is not easy to use #export JAVA_HOME=${JAVA_HOME} export JAVA_HOME=/home/hadoop/jdk1.8.0_77 ```
Add 25,24 to the slaves file
hd-24 hd-25
9. Pack and send to other machines
jdk, hadoop, hbase, zookeeper can all be configured in this way and sent, and zookeeper is slightly different (see later for details)
tar cf hadoop-2.7.2.tar hadoop-2.7.2 scp hadoop-2.7.2.tar hadoop@hd-24:/home/hadoop scp hadoop-2.7.2.tar hadoop@hd-25:/home/hadoop ssh hd-24 tar xf hadoop-2.7.2.tar exit ssh hd-25 tar xf hadoop-2.7.2.tar exit
10. Formatting name nodes
hadoop purpose -pharmacy
11. Start and stop hadoop cluster
# start up start-all.sh #stop stop-all.sh
12. jsp view
[hadoop@hd-23 ~]$ jps 12304 QuorumPeerMain 16208 ResourceManager 24322 Jps 15843 NameNode 16042 SecondaryNameNode [root@hd-24 home]# jps 12082 QuorumPeerMain 15116 Jps 12924 DataNode 13036 NodeManager [hadoop@hd-25 ~]$ jps 20130 DataNode 20242 NodeManager 19317 QuorumPeerMain 21755 Jps
13. Browser View
http://hd-23:50070/ http://hd-23:8088/
zookeeper cluster deployment
1. Configuration, the configuration file is located in `/home/hadoop/zookeeper-3.4.8/conf`
cp zoo_sample.cfg zoo.cfg vi zoo.cfg tickTime=2000 initLimit = 10 syncLimit=5 dataDir=/home/hadoop/zookeeper-3.4.8/data clientPort=2181 # Here you need to pay attention to server.{id} server.23=10.8.177.23:2181:3887 server.24=10.8.177.24:2182:3888 server.25=10.8.177.25:2183:38892. Data directory
> zoo.cfg defines dataDir, this directory needs to be created on each server, and a myid file is created to store the id value of server.{id} in zoo.cfg internally
mkdir /home/hadoop/zookeeper-3.4.8/data cd /home/hadoop/zookeeper-3.4.8/data vi myid 23 ssh hd-24 mkdir /home/hadoop/zookeeper-3.4.8/data cd /home/hadoop/zookeeper-3.4.8/data vi myid 24 exit ssh hd-25 mkdir /home/hadoop/zookeeper-3.4.8/data cd /home/hadoop/zookeeper-3.4.8/data vi myid 25 exit3. Start and stop
cd /home/hadoop/zookeeper-3.4.8/bin ./zkServer.sh start
HBase deployment
1. Configure hbase-site.xml
<configuration> <property> <name>hbase.rootdir</name> <value>hdfs://hd-23:6000/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>hd-23,hd-24,hd-25</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/home/hadoop/zookeeper-3.4.8/data</value> </property> </configuration>2. Configure regionservers
there regionservers hd-23 hd-243, scp sent to other machines
> For details, please refer to Chapter 1, Section 9
4. Start and stop hbase
==Start hdfs before starting hbase==
start-hbase.sh stop-hbase.sh
5, jps view
[hadoop@hd-23 bin]$ jps 12304 QuorumPeerMain 16208 ResourceManager 24592 Jps 22898 HMaster 15843 NameNode 23139 HRegionServer 16042 SecondaryNameNode [root@hd-24 home]# jps 14512 HRegionServer 12082 QuorumPeerMain 15276 Jps 12924 DataNode 13036 NodeManager
6, browser view
http://hd-23:16030/
Epilogue
Summarize
Through the above steps, the hadoop environment is quickly built. During this period, only when performing SSH password-free login, you need to log in to the other two machines to add the public key file, and the rest are done through an SSH client window (in fact, password-free login is also possible). The Linux distribution version is centos7. If you use the centos 6.x version, modify the host name slightly differently (`etc/sysconfig/network`, `hosts`, `reboot`).
> - guess
> The purpose of building this environment is twofold:
> 1. Provide a hadoop test environment.
> 2. Pre-research for subsequent rapid deployment using docker. Through the above construction process, we can see that the content of the myid file in the dataDir directory of zookeeper is different, and the rest of the content is the same, and the content of myid can be obtained by reading the zoo.cfg file, so if you want to do a multi-machine docker cluster at this time , as long as multiple docker containers can access each other (the same local area network), the same image can be used for rapid deployment. To enable multi-machine docker containers to access, open vSwitch can be used to build a local area network, which is also the goal of the next experiment.