HDFS distributed clusters, high availability clustering to build
A. To build a fully distributed (1.x version)
HDP-01 | HDP-02 | HDP-03 | HDP-04 | HDP-05 | |
---|---|---|---|---|---|
namenode | √ | ||||
secondarynamenode | √ | ||||
DataNode | √ | √ | √ | √ | √ |
Step1: another free host keys
- Generate their own keys to
- ssh-keygen -t rsa -P ‘’ -f ~/.ssh/id_rsa
- The copy of their public key to others (myself need to configure their own free keys)
- ssh-copy-id [email protected] ~/.ssh/id_rsa.pub
- ssh-copy-id [email protected] ~/.ssh/id_rsa.pub
- ssh-copy-id [email protected] ~/.ssh/id_rsa.pub
- ssh-copy-id [email protected] ~/.ssh/id_rsa.pub
- -ssh-copy-id [email protected] ~/.ssh/id_rsa.pub
- The other side of the address to the known_hosts
- ssh root @ hdp-01, enter yes
- ssh root@hdp-02…
- ssh root@hdp-03…
- ssh root@hdp-04
- ssh root@hdp-05
- ssh root@localhost
- ssh [email protected]
Step2: Configure Hadoop
-
Extracting hadoop-2.10.0.tar.gz, and installed / opt / Hadoop
(/ opt directory typically custom software installation directory)- tar -zxvf hadoop-2.10.0.tar.gz -C /opt/hadoop
-
Modify JAVA_HOME
- Modify hadoop-env.sh, mapred-env.sh, yarn-env.sh the JAVA_HOME
-
Modified core-site.xml
-
vim core-site.xml
-
<property> <name>fs.defaultFS</name> <value>hdfs://hdp-01:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/var/hadoop/full</value> </property>
-
-
Modify hdfs-site.xml
-
vim hdfs-site.xml
-
<property> <name>dfs.namenode.secondary.http-address</name> <value>hdp-02:50090</value> </property> <property> <name>dfs.namenode.secondary.https-address</name> <value>hdp-02:50091</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property>
-
-
Modify slaves
-
vim slaves
-
hdp-01 hdp-02 hdp-03 hdp-04 hdp-05
-
-
Modify environment variables
-
vim /etc/profile
-
export JAVA_HOME=/opt/jdk/jdk1.8.0_51 export HADOOP_HOME=/opt/hadoop-2.10.0/hadoop-2.10.0 export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
-
source /etc/profile
-
Copy environment variable to other nodes
-
[root@hdp-01 ~]scp -r /etc/profile root@hdp-02:/etc/profile
-
[root@hdp-02 ~] source /etc/profile
-
…
-
-
Sequentially copied to the other node Hadoop
- [root@hdp-02 ~]scp -r root@hdp-01:/opt/hadoop/hadoop2.10.0 /opt/hadoop
- …
-
Formatting node NameNode
-
[root@hdp-01 ~]hdfs namenode -format
-
[root@hdp-01 ~]start-dfs.sh
-
-
Browse HDFS
- http://192.168.183.21:50070/
-
Close cluster
- [1]stop-dfs.sh
II. To build a high-availability cluster (Hadoop HA) (2.x version)
HDP-01 | HDP-02 | HDP-03 | HDP-04 | HDP-05 | |
---|---|---|---|---|---|
activenamenode | √ | ||||
Standby particular node | √ | ||||
DataNode | √ | √ | √ | √ | √ |
zookeeper | √ | √ | √ | ||
journalnode | √ | √ | √ |
Build Zookeeper
-
Upload Zookeeper, unzip, copy
- [root@hdp-03 ~]# tar -zxvf zookeeper-3.4.6.tar.gz -C /opt/zookeeper
-
Modify the configuration file
-
[root@hdp-03 ~]# cd /opt/zookeeper/zookeeper-3.4.6/conf/
-
[root@hdp-03 ~]# cp zoo_sample.cfg zoo.cfg
-
I came zoo.cfg
-
#修改zookeeper数据存放的目录 dataDir=/var/zookeeper # 设置服务器内部通信的地址和zk集群的节点 server.1=hdp-03:2888:3888 server.2=hdp-0:2888:3888 server.3=hdp-05:2888:3888
-
-
Creating myid
-
[hdp-03、hdp-04、hdp-05] mkdir -p /var/sxt/zookeeper
-
[root@hdp-03 ~] echo 1 > /var/zookeeper/myid
-
[root@hdp-04 ~] echo 2 > /var/zookeeper/myid
-
[root@hdp-05 ~] echo 3 > /var/zookeeper/myid
-
-
Copy Zookeeper
- [hdp-04、hdp-05]scp -r root@hdp-01:/opt/zookeeper/zookeeper-3.4.6 /opt/zookeeper/
-
Set Environment Variables
-
Increase ZOOKEEPER_HOME
-
Copy to hdp-04, hdp-05, and source
-
-
Enable the cluster
- [hdp-03、hdp-04、hdp-05] zkServer.sh start
-
Build Hadoop-HA
-
On the basis of fully distributed cluster hadoop (hadoop 1.x version of the architecture) prior to build on, and then modify the following configuration
-
Modified core-site.xml
-
vim core-site.xml
-
<property> <name>fs.defaultFS</name> <value>hdfs://myCluster</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>hdp-03:2181,hdp-04:2181,hdp-05:2181</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/var/sxt/hadoop/ha</value> </property>
-
-
Modify hdfs-site.xml
-
vim hdfs-site.xml
-
<property> <name>dfs.nameservices</name> <value>myCluster</value> </property> <property> <name>dfs.ha.namenodes.myCluster</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.myCluster.nn1</name> <value>hdp-01:8020</value> </property> <property> <name>dfs.namenode.rpc-address.myCluster.nn2</name> <value>hdp-02:8020</value> </property> <property> <name>dfs.namenode.http-address.myCluster.nn1</name> <value>hdp-01:50070</value> </property> <property> <name>dfs.namenode.http-address.myCluster.nn2</name> <value>hdp-02:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hdp-03:8485;hdp-04:8485;hdp-05:8485/myCluster</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/var/hadoop/ha/jn</value> </property> <property> <name>dfs.client.failover.proxy.provider.myCluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> <value>shell(true)</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property>
-
-
Modify slaves
-
vim slaves
-
hdp-01 hdp-02 hdp-03 hdp-04 hdp-05
-
-
The hadoop folder etc directory copied to other nodes
-
Modify environment variables / etc / profile
-
Check for ZOOKEEPER_HOME
-
Start a separate thread JournalNode
- [hdp-03、hdp-04、hdp-05] hadoop-daemon.sh start journalnode
- jps
- cd / var / SXT / Hadoop / ha / jn /
-
Format primary NameNode
- [root@hdp-01 ~]# hdfs namenode -format
- [root@hdp-01 ~]# hadoop-daemon.sh start namenode
-
Start the standby node NameNode
- [root@hdp-02 ~]# hdfs namenode -bootstrapStandby
-
Zookeeper start on hdp-03 ~ 05
- zkServer.sh start
- zkServer.sh status
-
Formatting ZKFC
- [HDP-01, HDP-02] hdfs zkfc -formatZK
-
Re-open the cluster
- First open zookeeper cluster, then you can start-dfs.sh