Quickly build hadoop+hbase distributed cluster

hadoop cluster deployment

1. Prepare three machines, such as `10.8.177.23`, `10.8.177.24`, `10.8.177.25`

2. Modify the host name and configure the `hosts` file (operate under the root user):

# Execute on each machine, I start with hd here, and the number behind is consistent with the last group of the machine ip
hostnamectl set-hostname hd-23
hostnamectl set-hostname hd-23 --static

# Modify the hosts file,
vi /etc/hosts
#Add routing configuration
10.8.177.23 hd-23
10.8.177.24 hd-24
10.8.177.25 hd-25

 

3. Create a user on each machine, such as hadoop:

useradd -d /home/hadoop -m hadoop
# It's better to create a user, don't use root to operate directly

 4. Set password-free login (==hadoop user, the same below==)

> Only need to set the master for password-free login for the other two machines

# 1. Generate the ssh public key in the home directory on the Master machine
ssh-keygen -t rsa
# 2. Create the .ssh directory in the home directory on the rest of the machines (you can also execute the above command)
# 3. Send the Master's public key to the other two servers (need to enter password-free)
scp id_rsa.pub hadoop@hd-24:/home/hadoop/.ssh/id_rsa.pub.23
scp id_rsa.pub hadoop@hd-25:/home/hadoop/.ssh/id_rsa.pub.23
# 4. Create the authorized_keys file in .shh and authorize
touch authorized_keys
chmod 644 authorized_keys
# 5. Add the Master's public key to the authorization file
echo id_rsa.pub.23 >> authorized_keys
# Above, you can access 24,25 from 23 without password, you can use the following command to test:
ssh hd-24

 5. 下载jdk、hadoop、hbase、zookeeper

> - jdk (you can also download it yourself): `wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn- pub/java/jdk/8u91-b14/jdk-8u91-linux-x64.tar.gz`
> - zookeeper-3.4.8.tar.gz: `wget http://mirrors.hust.edu.cn/apache/zookeeper/zookeeper-3.4.8/zookeeper-3.4.8.tar.gz`
> - hbase-1.2.2-bin.tar.gz:`wget http://mirrors.hust.edu.cn/apache/hbase/1.2.2/hbase-1.2.2-bin.tar.gz`
> - hadoop-2.7.2.tar.gz:`wget http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz`

 6. Unzip the above files

7. Configure environment variables

vi ~ / .bashrc
JAVA_HOME=/home/hadoop/jdk1.8.0_77
JRE_HOME=$JAVA_HOME/jre
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

HADOOP_HOME=/home/hadoop/hadoop-2.7.2
HBASE_HOME=/home/hadoop/hbase-1.2.2
PATH=.:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin:$HBASE_HOME/sbin
expor JAVA_HOME JRE_HOME CLASSPATH HADOOP_HOME HBASE_HOME
# After the configuration is completed, compile and take effect
source .bashrc

#Send to other machines
scp .bashrc hadoop@hd-24:/home/hadoop/

 8. Configure hadoop

The hadoop configuration file is located under `hadoop-2.7.2/etc/hadoop`, you need to configure `core-site.xml`, `hdfs-site.xml`, `yran-site.xml`, `mapred-site.xml` `,`hadoop-env.sh`,`slaves`

 core-site.xml

<configuration>
 <property>
  <name>fs.defaultFS</name>
  <value>hdfs://hd-23:6000</value>
  <final>true</final>
 </property>

 <property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hadoop/configsets/hadoop_tmp</value>
 </property>

 <property>
  <name>fs.checkpoint.period</name>
  <value>3600</value>
 </property>
 
 <property>
  <name>fs.checkpoint.size</name>
  <value>67108864</value>
 </property>

</configuration>

 hdfs-site.xml 

<configuration>
 <property>
  <name>dfs.namenode.name.dir</name>
  <value>/home/hadoop/configsets/metadata</value>
 </property>
 
 <property>
  <name>dfs.http.address</name>
  <value>hd-23:50070</value>
 </property>

 <property>
  <name>dfs.namenode.secondary.http-address</name>
  <value>hd-23:50090</value>
 </property>

 <property>
  <name>dfs.datanode.data.dir</name>
  <value>/home/hadoop/configsets/data</value>
 </property>

 <property>
  <name>dfs.replication</name>
  <value>2</value>
 </property>
</configuration>

  yarn-site.xml

<configuration>

 <property>
  <name>yarn.resourcemanager.hostname</name>
  <value>hd-23</value>
 </property>

 <property>
  <name>yarn.resourcemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
 </property>

 <property>
  <name>yarn.log.dir</name>
  <value>/home/hadoop/configsets/yarn_log</value>
 </property>
</configuration>

 mapred-site.xml, if this file does not exist `cp mapred-site.xml.template mapred-site.xml` a

<configuration>
 <property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
 </property>

 <property>
  <name>mapreduce.cluster.temp.dir</name>
  <value>/home/hadoop/configsets/mr_tmp</value>
  <final>true</final>
 </property>

 <property>
  <name>mapreduce.jobhistory.address</name>
  <value>hd-23:6002</value>
 </property>

 <property>
  <name>mapreduce.jobhistory.webapp.address</name>
  <value>hd-23:6003</value>
 </property>
</configuration>

 hadoop-env.sh configure JAVA_HOME into it

#Annotate the sentence below, it is not easy to use, but it should be able to be used, but the result is not easy to use
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/home/hadoop/jdk1.8.0_77
```

 Add 25,24 to the slaves file

hd-24
hd-25

 9. Pack and send to other machines

jdk, hadoop, hbase, zookeeper can all be configured in this way and sent, and zookeeper is slightly different (see later for details)

tar cf hadoop-2.7.2.tar hadoop-2.7.2
scp hadoop-2.7.2.tar hadoop@hd-24:/home/hadoop
scp hadoop-2.7.2.tar hadoop@hd-25:/home/hadoop
ssh hd-24
tar xf hadoop-2.7.2.tar
exit
ssh hd-25
tar xf hadoop-2.7.2.tar
exit

 10. Formatting name nodes

hadoop purpose -pharmacy

 11. Start and stop hadoop cluster

# start up
start-all.sh

#stop
stop-all.sh

 12. jsp view

[hadoop@hd-23 ~]$ jps
12304 QuorumPeerMain
16208 ResourceManager
24322 Jps
15843 NameNode
16042 SecondaryNameNode

[root@hd-24 home]# jps
12082 QuorumPeerMain
15116 Jps
12924 DataNode
13036 NodeManager

[hadoop@hd-25 ~]$ jps
20130 DataNode
20242 NodeManager
19317 QuorumPeerMain
21755 Jps

 13. Browser View

http://hd-23:50070/
http://hd-23:8088/

 

zookeeper cluster deployment

1. Configuration, the configuration file is located in `/home/hadoop/zookeeper-3.4.8/conf`

 

cp zoo_sample.cfg zoo.cfg
vi zoo.cfg
tickTime=2000
initLimit = 10
syncLimit=5
dataDir=/home/hadoop/zookeeper-3.4.8/data
clientPort=2181

# Here you need to pay attention to server.{id}
server.23=10.8.177.23:2181:3887
server.24=10.8.177.24:2182:3888
server.25=10.8.177.25:2183:3889
 2. Data directory

 

> zoo.cfg defines dataDir, this directory needs to be created on each server, and a myid file is created to store the id value of server.{id} in zoo.cfg internally

 

mkdir /home/hadoop/zookeeper-3.4.8/data
cd /home/hadoop/zookeeper-3.4.8/data
vi myid
23

ssh hd-24
mkdir /home/hadoop/zookeeper-3.4.8/data
cd /home/hadoop/zookeeper-3.4.8/data
vi myid
24
exit

ssh hd-25
mkdir /home/hadoop/zookeeper-3.4.8/data
cd /home/hadoop/zookeeper-3.4.8/data
vi myid
25
exit
 3. Start and stop

 

 

cd /home/hadoop/zookeeper-3.4.8/bin
./zkServer.sh start
 

 

HBase deployment

1. Configure hbase-site.xml

 

<configuration>
 <property>
  <name>hbase.rootdir</name>
  <value>hdfs://hd-23:6000/hbase</value>
 </property>

 <property>
  <name>hbase.cluster.distributed</name>
  <value>true</value>
 </property>

<property>
  <name>hbase.zookeeper.quorum</name>
  <value>hd-23,hd-24,hd-25</value>
 </property>

 <property>
  <name>hbase.zookeeper.property.dataDir</name>
  <value>/home/hadoop/zookeeper-3.4.8/data</value>
 </property>
</configuration>
 2. Configure regionservers

 

 

there regionservers
hd-23
hd-24
 3, scp sent to other machines

 

> For details, please refer to Chapter 1, Section 9

4. Start and stop hbase

==Start hdfs before starting hbase==

start-hbase.sh
stop-hbase.sh

 5, jps view

[hadoop@hd-23 bin]$ jps
12304 QuorumPeerMain
16208 ResourceManager
24592 Jps
22898 HMaster
15843 NameNode
23139 HRegionServer
16042 SecondaryNameNode

[root@hd-24 home]# jps
14512 HRegionServer
12082 QuorumPeerMain
15276 Jps
12924 DataNode
13036 NodeManager

 6, browser view

 

http://hd-23:16030/

Epilogue

Summarize

Through the above steps, the hadoop environment is quickly built. During this period, only when performing SSH password-free login, you need to log in to the other two machines to add the public key file, and the rest are done through an SSH client window (in fact, password-free login is also possible). The Linux distribution version is centos7. If you use the centos 6.x version, modify the host name slightly differently (`etc/sysconfig/network`, `hosts`, `reboot`).

 

> - guess

 

> The purpose of building this environment is twofold:

  

> 1. Provide a hadoop test environment.

 

> 2. Pre-research for subsequent rapid deployment using docker. Through the above construction process, we can see that the content of the myid file in the dataDir directory of zookeeper is different, and the rest of the content is the same, and the content of myid can be obtained by reading the zoo.cfg file, so if you want to do a multi-machine docker cluster at this time , as long as multiple docker containers can access each other (the same local area network), the same image can be used for rapid deployment. To enable multi-machine docker containers to access, open vSwitch can be used to build a local area network, which is also the goal of the next experiment.

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326273844&siteId=291194637