hadoop distribution and high availability clusters

What is hadoop?

Hadoop implements a distributed file system (Hadoop Distributed File System), referred to HDFS. HDFS fault tolerant characteristic, and designed to be deployed on low (low-cost) hardware; and it provides a high throughput (high throughput) to access the application data, for those (large data sets with large data set) applications. HDFS relaxed (relax) POSIX requirements can be accessed as a stream

Hadoop deployment

Host computer ip
server1 172.25.26.11
server2 172.25.26.12
server3 172.25.26.13

1. Create hadoop

[root@server1 ~]# useradd -u 800 hadoop
[root@server1 ~]# passwd hadoop

2. Switch hadoop user to install jdk

[root@server1 ~]# su - hadoop
[hadoop@server1 ~]$ ls
hadoop-3.0.3.tar.gz  jdk-8u181-linux-x64.tar.gz  zookeeper-3.4.9.tar.gz
[hadoop@server1 ~]$ tar zxf jdk-8u181-linux-x64.tar.gz 
[hadoop@server1 ~]$ ls
hadoop-3.0.3.tar.gz  jdk1.8.0_181  jdk-8u181-linux-x64.tar.gz  zookeeper-3.4.9.tar.gz
[hadoop@server1 ~]$ ln -s jdk1.8.0_181 java
[hadoop@server1 ~]$ ls
hadoop-3.0.3.tar.gz  jdk1.8.0_181                zookeeper-3.4.9.tar.gz
java                 jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ cd java
[hadoop@server1 java]$ ls
bin             jre      README.html                         THIRDPARTYLICENSEREADME.txt
COPYRIGHT       lib      release
include         LICENSE  src.zip
javafx-src.zip  man      THIRDPARTYLICENSEREADME-JAVAFX.txt

Decompression jdk, do soft connection.

[hadoop@server1 ~]$ vim .bash_profile
10 PATH=$PATH:$HOME/bin:/home/hadoop/java/bin
[hadoop@server1 ~]$ jps 
1054 Jps

Modify environment variables.
3. Installation Configuration hadoop

[hadoop@server1 ~]$ tar zxf hadoop-3.0.3.tar.gz 
[hadoop@server1 ~]$ cd hadoop-3.0.3/etc/hadoop/
[hadoop@server1 hadoop]$ vim hadoop-env.sh 
 54 export JAVA_HOME=/home/hadoop/java

Here Insert Picture Description

[hadoop@server1 ~]$ cd hadoop-3.0.3
[hadoop@server1 hadoop-3.0.3]$ mkdir input
[hadoop@server1 hadoop-3.0.3]$ cp etc/hadoop/*.xml input/
[hadoop@server1 hadoop-3.0.3]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce \
 -examples-3.0.3.jar grep input output 'dfs[a-z.]+'

Here Insert Picture Description

[hadoop@server1 hadoop-3.0.3]$ cd output/
[hadoop@server1 output]$ ls
part-r-00000  _SUCCESS
[hadoop@server1 output]$ cat *
1	dfsadmin

Build a pseudo-distributed cluster

1. Modify the configuration, the configuration of the core attributes Hadoop

[hadoop@server1 hadoop-3.0.3]$ cd etc/hadoop/
[hadoop@server1 hadoop]$ vim core-site.xml 
 19 <configuration>
 20         <property>
 21                 <name>fs.defaultFS</name>
 22                 <value>hdfs://172.25.26.11:9000</value>
 23         </property>
 24 </configuration>

Here Insert Picture Description

[hadoop@server1 hadoop]$ vim hdfs-site.xml
 19 <configuration>
 20         <property>
 21                 <name>dfs.replication</name>
 22                 <value>1</value>
 23         </property>
 24 </configuration>

Here Insert Picture Description
2. Set free secret landing

[hadoop@server1 hadoop]$ ssh-keygen

Here Insert Picture Description

[hadoop@server1 hadoop]$ ssh-copy-id 172.25.26.11

Here Insert Picture Description
Send key.
4. Start hdf, node name format

[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop-3.0.3/etc/hadoop
[hadoop@server1 hadoop]$ vim slaves

Here Insert Picture Description

[hadoop@server1 hadoop-3.0.3]$ pwd
/home/hadoop/hadoop-3.0.3
[hadoop@server1 hadoop-3.0.3]$  bin/hdfs namenode -format

Here Insert Picture Description

[hadoop@server1 hadoop-3.0.3]$ sbin/start-dfs.sh 
[hadoop@server1 hadoop-3.0.3]$ jps

Here Insert Picture Description
In the browser to view:
Here Insert Picture Description

[hadoop@server1 hadoop-3.0.3]$  bin/hdfs dfs -mkdir /user
[hadoop@server1 hadoop-3.0.3]$ bin/hdfs dfs -mkdir /user/hadoop
[hadoop@server1 hadoop-3.0.3]$ bin/hdfs dfs -ls /user
[hadoop@server1 hadoop-3.0.3]$  bin/hdfs dfs -put input/

Create a directory, input the directory upload content.
Here Insert Picture Description
Click utilities, click the browse file system, view the contents.

[hadoop@server1 hadoop-3.0.3]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce- \
examples-3.0.3.jar grep input output 'dfs[a-z.]+'
[hadoop@server1 hadoop-3.0.3]$  bin/hdfs dfs -ls
[hadoop@server1 hadoop-3.0.3]$  bin/hdfs dfs -cat output/*
[hadoop@server1 hadoop-3.0.3]$  bin/hdfs dfs -get output

hadoop fully distributed cluster deployment

server1 close the service, switch to superuser
Here Insert Picture Description

In server1, server2, and servre3 end mounting nfs-utils:

[root@server1 ~]# yum install -y nfs-utils
[root@server1 ~]# vim /etc/exports 
  1 /home/hadoop *(rw,anonuid=800,anongid=800)

Here Insert Picture Description

[root@server1 ~]# /etc/init.d/rpcbind start
[root@server1 ~]# /etc/init.d/nfs start

Here Insert Picture Description
Start the service.

[root@server1 ~]# showmount -e 172.25.26.11

Here Insert Picture Description
Refresh mount.
In server2 and server3:

[root@server2 ~]# yum install -y nfs-utils
[root@server2 ~]# /etc/init.d/rpcbind start                                      
[root@server2 ~]# /etc/init.d/nfs start
[root@server2 ~]# useradd -u 800 hadoop
[root@server2 ~]# mount 172.25.26.11:/home/hadoop/ /home/hadoop/
[root@server2 ~]# df
[root@server2 ~]# su - hadoop
[hadoop@server2 ~]$ ls

After installing the nfs, start the service, build hadoop user, mount found, server1hadoop user's home directory files are synchronized over.

Here Insert Picture Description
test:

[hadoop@server1 ~]$ ssh 172.25.26.12

Here Insert Picture Description
Hadoop used to connect the user server1 not require a password.

[hadoop@server1 ~]$ ln -s hadoop-3.03 hadoop

Make soft links.

[hadoop@server1 ~]$ cd hadoop/etc/hadoop/
[hadoop@server1 hadoop]$ vim hdfs-site.xml 

Here Insert Picture Description

[hadoop@server1 hadoop]$ vim slaves

Here Insert Picture Description
Settings from the device.

[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ bin/hdfs namenode -format
[hadoop@server1 hadoop]$ sbin/start-dfs.sh

Initialization, and start the service.

Add Nodes

In server4:

[root@server4 ~]# yum install nfs-utils -y

Here Insert Picture Description

    [hadoop@server4 ~]$ cd hadoop/etc/hadoop/
    [hadoop@server4 hadoop]$ vim slaves 

Here Insert Picture Description

    [hadoop@server4 ~]$ cd hadoop
    [hadoop@server4 hadoop]$ sbin/hadoop-daemon.sh start datanode

Open Data Node

[hadoop@server4 hadoop]$ bin/hdfs dfsadmin -report

Here Insert Picture Description
View the added node server4.
Here Insert Picture Description
The same way, the server2 and server3 join node.

hadoop + zookeeper High Availability Cluster

hdfs high availability:

Host computer ip
server1 172.25.26.11
server2 172.25.26.12
server3 172.25.26.13
server4 172.25.26.14
server5 172.25.26.15

You need to use the 5 virtual machines.

[root@server5 ~]# yum install nfs-utils -y
[root@server5 ~]# /etc/init.d/rpcbind start 
[root@server5 ~]# /etc/init.d/nfs start
[root@server5 ~]# useradd -u 800 hadoop
[hadoop@server5 ~]$ mount 172.25.26.11:/home/hadoop/ /home/hadoop/

Add users, start the service.
Here Insert Picture Description
As before configuration server5, it can connect a normal home directory of server1.
Configured in server1:

[root@server1 ~]# su - hadoop 
[hadoop@server1 ~]$ tar zxf zookeeper-3.4.9.tar.gz 
[hadoop@server1 ~]$ cd zookeeper-3.4.9/conf
[hadoop@server1 conf]$ cp zoo_sample.cfg zoo.cfg
  1 tickTime=2000
  2 initLimit=10
  3 syncLimit=5
  4 dataDir=/tmp/zookeeper
  5 clientPort=2181
  6 server.2=172.25.26.12:2888:3888
  7 server.3=172.25.26.13:2888:3888
  8 server.4=172.25.26.14:2888:3888

Modify the configuration file, server2, server3, server4 write cluster. Because 5 hosts use nfs file system, the configuration of several hosts of the same. Here Insert Picture Description
Delete the file on server2 tmp directory, zookeeper build directory and create myid file, write a number ranging from 0 to 255. Similarly, in the server4 server3 and also the same operation.
Start the service in server2, server3, server4 node :

[hadoop@server4 ~]$ cd zookeeper-3.4.9
[hadoop@server4 zookeeper-3.4.9]$ bin/zkServer.sh start

Here Insert Picture Description3 hosts in the same operation, which will be two hosts is Mode follower, is a Leader.

Server1 configured in the end:

[hadoop@server1 ~]$ cd hadoop/etc/hadoop/
[hadoop@server1 hadoop]$ vim core-site.xml
19 <configuration>
20         <property>
21                 <name>fs.defaultFS</name>
22                 <value>hdfs://masters</value>
23         </property>
24         <property>
25                 <name>ha.zookeeper.quorum</name>
26                 <value>172.25.26.12:2181,172.25.26.13:2181,172.25.26.14:2181</value>
27         </property>
28 </configuration>
[hadoop@server1 hadoop]$ vim hdfs-site.xml
 19 <configuration>
 20         <property>
 21                 <name>dfs.replication</name>
 22                 <value>3</value>
 23         </property>
 <!-- 指定 hdfs 的 nameservices 为 masters,和 core-site.xml 文件中的设置保持一致 -->
 24         <property>
 25                 <name>dfs.nameservices</name>
 26                 <value>masters</value>
 27         </property>
 <!-- masters 下面有两个 namenode 节点,分别是 h1 和 h2 (名称可自定义)-->
 28         <property>
 29                 <name>dfs.ha.namenodes.masters</name>
 30                 <value>h1,h2</value>
 31         </property>
 <!-- 指定 h1 节点的 rpc 通信地址 -->
 32         <property>
 33                 <name>dfs.namenode.rpc-address.masters.h1</name>
 34                 <value>172.25.26.11:9000</value>
 35         </property>
 <!-- 指定 h1 节点的 http 通信地址 ,注意版本不同端口也会不同-->
 36         <property>
 37                 <name>dfs.namenode.http-address.masters.h1</name>
 38                 <value>172.25.26.11:9870</value>
 39         </property>
 <!-- 指定 h2 节点的 rpc 通信地址 -->
 40         <property>
 41                 <name>dfs.namenode.rpc-address.masters.h2</name>
 42                 <value>172.25.26.15:9000</value>
 43         </property>
 <!-- 指定 h2 节点的 http 通信地址 -->
 44         <property>
 45                 <name>dfs.namenode.http-address.masters.h2</name>
 46                 <value>172.25.26.15:9870</value>
 47         </property>
 <!-- 指定 NameNode 元数据在 JournalNode 上的存放位置 -->
 48         <property>
 49                 <name>dfs.namenode.shared.edits.dir</name>
 50                 <value>qjournal://172.25.26.12:8485;172.25.26.13:8485;172.25.26.14:8485/masters</value>
 51         </property>
 <!-- 指定 JournalNode 在本地磁盘存放数据的位置 -->
 52         <property>
 53                 <name>dfs.journalnode.edits.dir</name>
 54                 <value>/tmp/journaldata</value>
 			  </property>
 <!-- 开启 NameNode 失败自动切换 -->
 55         <property>
 56                 <name>dfs.ha.automatic-failover.enabled</name>
 57                 <value>true</value>
 58         </property>
 <!-- 配置失败自动切换实现方式 -->
 59         <property>
 60                 <name>dfs.client.failover.proxy.provider.masters</name>
 61                 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
 62         </property>
 <!-- 配置隔离机制方法,每个机制占用一行-->
 63         <property>
 64                 <name>dfs.ha.fencing.methods</name>
 65                 <value>
 66                 sshfence
 67                 shell(/bin/true)
 68                 </value>
 69         </property>
 <!-- 使用 sshfence 隔离机制时需要 ssh 免密码 -->
 70         <property>
 71                 <name>dfs.ha.fencing.ssh.private-key-files</name>
 72                 <value>/home/hadoop/.ssh/id_rsa</value>
 73         </property>
 <!-- 配置 sshfence 隔离机制超时时间 -->
 74         <property>
 75                 <name>dfs.ha.fencing.ssh.connect-timeout</name>
 76                 <value>30000</value>
 77         </property>
 78 </configuration>
[hadoop@server1 hadoop]$ vim slaves

Here Insert Picture Description
Start hdfs cluster:
Start zookeeper cluster (ie server2,3,4) turn on three DN

[hadoop@server2 zookeeper-3.4.9]$ bin/zkServer.sh start

Start journalnode turn on three DN (first start hdfs must first start journalnode)

[hadoop@server2 ~]$ cd hadoop
[hadoop@server2 hadoop]$ sbin/hadoop-daemon.sh start journalnode
[hadoop@server2 hadoop]$ jps

Here Insert Picture Description
In the words of hdfs cluster server format :

[hadoop@server1 hadoop]$ bin/hdfs namenode -format
[hadoop@server1 hadoop]$ scp -r /tmp/hadoop-hadoop 172.25.26.15:/tmp

Copy the file to server5 tmp directory.

Formatting zookeeper (you can only execute on h1)

[hadoop@server1 hadoop]$ bin/hdfs zkfc -formatZK
[hadoop@server1 hadoop]$ sbin/start-dfs.sh 

Formatting After starting the service.
View the status of each node:

server1:
[hadoop@server1 hadoop]$ jps
12372 Jps
7050 DFSZKFailoverController
12332 NameNode
server2:
[hadoop@server2 hadoop]$ jps
1664 JournalNode
2325 Jps
1535 QuorumPeerMain
2212 DataNode
server3:
[hadoop@server3 hadoop]$ jps
1918 Jps
1651 JournalNode
1755 DataNode
1543 QuorumPeerMain
server4:
[hadoop@server4 hadoop]$ jps
2050 DataNode
1475 QuorumPeerMain
1600 JournalNode
2152 Jps
server5:
[hadoop@server5 dfs]$ jps
1306 DFSZKFailoverController
1511 Jps
1376 NameNode

test:
Here Insert Picture Description

Here Insert Picture Description
server1 is as h1, server5 as h2, server1 is the case active, server5 is stabdby, server5 as a backup node.

Guess you like

Origin blog.csdn.net/qq_41961805/article/details/90448553