What is hadoop?
Hadoop implements a distributed file system (Hadoop Distributed File System), referred to HDFS. HDFS fault tolerant characteristic, and designed to be deployed on low (low-cost) hardware; and it provides a high throughput (high throughput) to access the application data, for those (large data sets with large data set) applications. HDFS relaxed (relax) POSIX requirements can be accessed as a stream
Hadoop deployment
Host computer | ip |
---|---|
server1 | 172.25.26.11 |
server2 | 172.25.26.12 |
server3 | 172.25.26.13 |
1. Create hadoop
[root@server1 ~]# useradd -u 800 hadoop
[root@server1 ~]# passwd hadoop
2. Switch hadoop user to install jdk
[root@server1 ~]# su - hadoop
[hadoop@server1 ~]$ ls
hadoop-3.0.3.tar.gz jdk-8u181-linux-x64.tar.gz zookeeper-3.4.9.tar.gz
[hadoop@server1 ~]$ tar zxf jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ ls
hadoop-3.0.3.tar.gz jdk1.8.0_181 jdk-8u181-linux-x64.tar.gz zookeeper-3.4.9.tar.gz
[hadoop@server1 ~]$ ln -s jdk1.8.0_181 java
[hadoop@server1 ~]$ ls
hadoop-3.0.3.tar.gz jdk1.8.0_181 zookeeper-3.4.9.tar.gz
java jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ cd java
[hadoop@server1 java]$ ls
bin jre README.html THIRDPARTYLICENSEREADME.txt
COPYRIGHT lib release
include LICENSE src.zip
javafx-src.zip man THIRDPARTYLICENSEREADME-JAVAFX.txt
Decompression jdk, do soft connection.
[hadoop@server1 ~]$ vim .bash_profile
10 PATH=$PATH:$HOME/bin:/home/hadoop/java/bin
[hadoop@server1 ~]$ jps
1054 Jps
Modify environment variables.
3. Installation Configuration hadoop
[hadoop@server1 ~]$ tar zxf hadoop-3.0.3.tar.gz
[hadoop@server1 ~]$ cd hadoop-3.0.3/etc/hadoop/
[hadoop@server1 hadoop]$ vim hadoop-env.sh
54 export JAVA_HOME=/home/hadoop/java
[hadoop@server1 ~]$ cd hadoop-3.0.3
[hadoop@server1 hadoop-3.0.3]$ mkdir input
[hadoop@server1 hadoop-3.0.3]$ cp etc/hadoop/*.xml input/
[hadoop@server1 hadoop-3.0.3]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce \
-examples-3.0.3.jar grep input output 'dfs[a-z.]+'
[hadoop@server1 hadoop-3.0.3]$ cd output/
[hadoop@server1 output]$ ls
part-r-00000 _SUCCESS
[hadoop@server1 output]$ cat *
1 dfsadmin
Build a pseudo-distributed cluster
1. Modify the configuration, the configuration of the core attributes Hadoop
[hadoop@server1 hadoop-3.0.3]$ cd etc/hadoop/
[hadoop@server1 hadoop]$ vim core-site.xml
19 <configuration>
20 <property>
21 <name>fs.defaultFS</name>
22 <value>hdfs://172.25.26.11:9000</value>
23 </property>
24 </configuration>
[hadoop@server1 hadoop]$ vim hdfs-site.xml
19 <configuration>
20 <property>
21 <name>dfs.replication</name>
22 <value>1</value>
23 </property>
24 </configuration>
2. Set free secret landing
[hadoop@server1 hadoop]$ ssh-keygen
[hadoop@server1 hadoop]$ ssh-copy-id 172.25.26.11
Send key.
4. Start hdf, node name format
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop-3.0.3/etc/hadoop
[hadoop@server1 hadoop]$ vim slaves
[hadoop@server1 hadoop-3.0.3]$ pwd
/home/hadoop/hadoop-3.0.3
[hadoop@server1 hadoop-3.0.3]$ bin/hdfs namenode -format
[hadoop@server1 hadoop-3.0.3]$ sbin/start-dfs.sh
[hadoop@server1 hadoop-3.0.3]$ jps
In the browser to view:
[hadoop@server1 hadoop-3.0.3]$ bin/hdfs dfs -mkdir /user
[hadoop@server1 hadoop-3.0.3]$ bin/hdfs dfs -mkdir /user/hadoop
[hadoop@server1 hadoop-3.0.3]$ bin/hdfs dfs -ls /user
[hadoop@server1 hadoop-3.0.3]$ bin/hdfs dfs -put input/
Create a directory, input the directory upload content.
Click utilities, click the browse file system, view the contents.
[hadoop@server1 hadoop-3.0.3]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce- \
examples-3.0.3.jar grep input output 'dfs[a-z.]+'
[hadoop@server1 hadoop-3.0.3]$ bin/hdfs dfs -ls
[hadoop@server1 hadoop-3.0.3]$ bin/hdfs dfs -cat output/*
[hadoop@server1 hadoop-3.0.3]$ bin/hdfs dfs -get output
hadoop fully distributed cluster deployment
server1 close the service, switch to superuser
In server1, server2, and servre3 end mounting nfs-utils:
[root@server1 ~]# yum install -y nfs-utils
[root@server1 ~]# vim /etc/exports
1 /home/hadoop *(rw,anonuid=800,anongid=800)
[root@server1 ~]# /etc/init.d/rpcbind start
[root@server1 ~]# /etc/init.d/nfs start
Start the service.
[root@server1 ~]# showmount -e 172.25.26.11
Refresh mount.
In server2 and server3:
[root@server2 ~]# yum install -y nfs-utils
[root@server2 ~]# /etc/init.d/rpcbind start
[root@server2 ~]# /etc/init.d/nfs start
[root@server2 ~]# useradd -u 800 hadoop
[root@server2 ~]# mount 172.25.26.11:/home/hadoop/ /home/hadoop/
[root@server2 ~]# df
[root@server2 ~]# su - hadoop
[hadoop@server2 ~]$ ls
After installing the nfs, start the service, build hadoop user, mount found, server1hadoop user's home directory files are synchronized over.
test:
[hadoop@server1 ~]$ ssh 172.25.26.12
Hadoop used to connect the user server1 not require a password.
[hadoop@server1 ~]$ ln -s hadoop-3.03 hadoop
Make soft links.
[hadoop@server1 ~]$ cd hadoop/etc/hadoop/
[hadoop@server1 hadoop]$ vim hdfs-site.xml
[hadoop@server1 hadoop]$ vim slaves
Settings from the device.
[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ bin/hdfs namenode -format
[hadoop@server1 hadoop]$ sbin/start-dfs.sh
Initialization, and start the service.
Add Nodes
In server4:
[root@server4 ~]# yum install nfs-utils -y
[hadoop@server4 ~]$ cd hadoop/etc/hadoop/
[hadoop@server4 hadoop]$ vim slaves
[hadoop@server4 ~]$ cd hadoop
[hadoop@server4 hadoop]$ sbin/hadoop-daemon.sh start datanode
Open Data Node
[hadoop@server4 hadoop]$ bin/hdfs dfsadmin -report
View the added node server4.
The same way, the server2 and server3 join node.
hadoop + zookeeper High Availability Cluster
hdfs high availability:
Host computer | ip |
---|---|
server1 | 172.25.26.11 |
server2 | 172.25.26.12 |
server3 | 172.25.26.13 |
server4 | 172.25.26.14 |
server5 | 172.25.26.15 |
You need to use the 5 virtual machines.
[root@server5 ~]# yum install nfs-utils -y
[root@server5 ~]# /etc/init.d/rpcbind start
[root@server5 ~]# /etc/init.d/nfs start
[root@server5 ~]# useradd -u 800 hadoop
[hadoop@server5 ~]$ mount 172.25.26.11:/home/hadoop/ /home/hadoop/
Add users, start the service.
As before configuration server5, it can connect a normal home directory of server1.
Configured in server1:
[root@server1 ~]# su - hadoop
[hadoop@server1 ~]$ tar zxf zookeeper-3.4.9.tar.gz
[hadoop@server1 ~]$ cd zookeeper-3.4.9/conf
[hadoop@server1 conf]$ cp zoo_sample.cfg zoo.cfg
1 tickTime=2000
2 initLimit=10
3 syncLimit=5
4 dataDir=/tmp/zookeeper
5 clientPort=2181
6 server.2=172.25.26.12:2888:3888
7 server.3=172.25.26.13:2888:3888
8 server.4=172.25.26.14:2888:3888
Modify the configuration file, server2, server3, server4 write cluster. Because 5 hosts use nfs file system, the configuration of several hosts of the same.
Delete the file on server2 tmp directory, zookeeper build directory and create myid file, write a number ranging from 0 to 255. Similarly, in the server4 server3 and also the same operation.
Start the service in server2, server3, server4 node :
[hadoop@server4 ~]$ cd zookeeper-3.4.9
[hadoop@server4 zookeeper-3.4.9]$ bin/zkServer.sh start
3 hosts in the same operation, which will be two hosts is Mode follower, is a Leader.
Server1 configured in the end:
[hadoop@server1 ~]$ cd hadoop/etc/hadoop/
[hadoop@server1 hadoop]$ vim core-site.xml
19 <configuration>
20 <property>
21 <name>fs.defaultFS</name>
22 <value>hdfs://masters</value>
23 </property>
24 <property>
25 <name>ha.zookeeper.quorum</name>
26 <value>172.25.26.12:2181,172.25.26.13:2181,172.25.26.14:2181</value>
27 </property>
28 </configuration>
[hadoop@server1 hadoop]$ vim hdfs-site.xml
19 <configuration>
20 <property>
21 <name>dfs.replication</name>
22 <value>3</value>
23 </property>
<!-- 指定 hdfs 的 nameservices 为 masters,和 core-site.xml 文件中的设置保持一致 -->
24 <property>
25 <name>dfs.nameservices</name>
26 <value>masters</value>
27 </property>
<!-- masters 下面有两个 namenode 节点,分别是 h1 和 h2 (名称可自定义)-->
28 <property>
29 <name>dfs.ha.namenodes.masters</name>
30 <value>h1,h2</value>
31 </property>
<!-- 指定 h1 节点的 rpc 通信地址 -->
32 <property>
33 <name>dfs.namenode.rpc-address.masters.h1</name>
34 <value>172.25.26.11:9000</value>
35 </property>
<!-- 指定 h1 节点的 http 通信地址 ,注意版本不同端口也会不同-->
36 <property>
37 <name>dfs.namenode.http-address.masters.h1</name>
38 <value>172.25.26.11:9870</value>
39 </property>
<!-- 指定 h2 节点的 rpc 通信地址 -->
40 <property>
41 <name>dfs.namenode.rpc-address.masters.h2</name>
42 <value>172.25.26.15:9000</value>
43 </property>
<!-- 指定 h2 节点的 http 通信地址 -->
44 <property>
45 <name>dfs.namenode.http-address.masters.h2</name>
46 <value>172.25.26.15:9870</value>
47 </property>
<!-- 指定 NameNode 元数据在 JournalNode 上的存放位置 -->
48 <property>
49 <name>dfs.namenode.shared.edits.dir</name>
50 <value>qjournal://172.25.26.12:8485;172.25.26.13:8485;172.25.26.14:8485/masters</value>
51 </property>
<!-- 指定 JournalNode 在本地磁盘存放数据的位置 -->
52 <property>
53 <name>dfs.journalnode.edits.dir</name>
54 <value>/tmp/journaldata</value>
</property>
<!-- 开启 NameNode 失败自动切换 -->
55 <property>
56 <name>dfs.ha.automatic-failover.enabled</name>
57 <value>true</value>
58 </property>
<!-- 配置失败自动切换实现方式 -->
59 <property>
60 <name>dfs.client.failover.proxy.provider.masters</name>
61 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
62 </property>
<!-- 配置隔离机制方法,每个机制占用一行-->
63 <property>
64 <name>dfs.ha.fencing.methods</name>
65 <value>
66 sshfence
67 shell(/bin/true)
68 </value>
69 </property>
<!-- 使用 sshfence 隔离机制时需要 ssh 免密码 -->
70 <property>
71 <name>dfs.ha.fencing.ssh.private-key-files</name>
72 <value>/home/hadoop/.ssh/id_rsa</value>
73 </property>
<!-- 配置 sshfence 隔离机制超时时间 -->
74 <property>
75 <name>dfs.ha.fencing.ssh.connect-timeout</name>
76 <value>30000</value>
77 </property>
78 </configuration>
[hadoop@server1 hadoop]$ vim slaves
Start hdfs cluster:
Start zookeeper cluster (ie server2,3,4) turn on three DN
[hadoop@server2 zookeeper-3.4.9]$ bin/zkServer.sh start
Start journalnode turn on three DN (first start hdfs must first start journalnode)
[hadoop@server2 ~]$ cd hadoop
[hadoop@server2 hadoop]$ sbin/hadoop-daemon.sh start journalnode
[hadoop@server2 hadoop]$ jps
In the words of hdfs cluster server format :
[hadoop@server1 hadoop]$ bin/hdfs namenode -format
[hadoop@server1 hadoop]$ scp -r /tmp/hadoop-hadoop 172.25.26.15:/tmp
Copy the file to server5 tmp directory.
Formatting zookeeper (you can only execute on h1)
[hadoop@server1 hadoop]$ bin/hdfs zkfc -formatZK
[hadoop@server1 hadoop]$ sbin/start-dfs.sh
Formatting After starting the service.
View the status of each node:
server1:
[hadoop@server1 hadoop]$ jps
12372 Jps
7050 DFSZKFailoverController
12332 NameNode
server2:
[hadoop@server2 hadoop]$ jps
1664 JournalNode
2325 Jps
1535 QuorumPeerMain
2212 DataNode
server3:
[hadoop@server3 hadoop]$ jps
1918 Jps
1651 JournalNode
1755 DataNode
1543 QuorumPeerMain
server4:
[hadoop@server4 hadoop]$ jps
2050 DataNode
1475 QuorumPeerMain
1600 JournalNode
2152 Jps
server5:
[hadoop@server5 dfs]$ jps
1306 DFSZKFailoverController
1511 Jps
1376 NameNode
test:
server1 is as h1, server5 as h2, server1 is the case active, server5 is stabdby, server5 as a backup node.