HDFS distributed cluster structures [Advanced]
NameNode HDFS cluster to ensure high availability, in order to allow NameNode safer choice here with ZooKeeper clusters to ensure
Environmental and ready
Ibid articles Edition
Build a zookeeper cluster
- Download and unzip the zookeeper
- Create a data folder in the root directory zookeeper
- Modify the configuration into the conf folder
3.1 Modify zoo_sample.cfg named zoo.cfg
3.2 editing zoo.cfg
dataDir=/opt/install/zookeeper-3.4.5/data
server.0=hadoop1.msk.com:2888:3888
server.1=hadoop2.msk.com:2888:3888
server.2=hadoop3.msk.com:2888:3888
- Myid create files in the zookeeper / data
第一台节点myid里面填0 第二台 1 以此类推(三台机器分别为 0,1,2)
- With each primary node density log ssh Free three machines (including the master node itself)
ZooKeeper start and stop commands
bin/zkServer.sh start | stop | restart | status
zookeeper client commands
- Note: The master node runs the zookeeper
bin/zkCli.sh
HA-HDFS distributed Cluster Setup
- If you are using the previous ordinary cluster it is recommended to empty the data / tmp, if the new environment can refer to the previous article to build a foundation based version of the cluster environment
- Modify the configuration file
core-site.xml
<!-- 这里的ns随意 只是入口 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/install/hadoop-2.5.2/data/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop1.msk.com:2181,hadoop2.msk.com:2181,hadoop3.msk.com:2181</value>
</property>
**hdfs-site.xml **
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<!--指定hdfs的nameservice为ns,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>ns</value>
</property>
<!-- ns下面有两个NameNode,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.ns</name>
<value>nn1,nn2</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns.nn1</name>
<value>hadoop1.msk.com:8020</value>
</property>
<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns.nn1</name>
<value>hadoop1.msk.com:50070</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns.nn2</name>
<value>hadoop2.msk.com:8020</value>
</property>
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns.nn2</name>
<value>hadoop2.msk.com:50070</value>
</property>
<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop1.msk.com:8485;hadoop2.msk.com:8485;hadoop3.msk.com:8485/ns</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/install/hadoop-2.5.2/journal</value>
</property>
<!-- 开启NameNode故障时自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.ns</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制,如果ssh是默认22端口,value直接写sshfence即可 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<!-- 使用隔离机制时需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
export JAVA_HOME=/usr/java/jdk1.7.0_71
- First start zookeeper cluster (three to be executed to start zookeeper Services Directive)
- In the master node NameNode formatted zkfc
bin/hdfs zkfc -formatZK
- Journalnode start with the following command in each node journalnode
sbin/hadoop-daemon.sh start journalnode
- In the master node namenode journalnode directory and formatting namenode
bin/hdfs namenode -format ns
- Namenode process started in the master node namenode
sbin/hadoop-daemon.sh start namenode
- Preparation namenode executed first command node, this is the format the directory nodes and the standby namenode metadata over from the master node namenode Copy, and this command will not re-formatted directory journalnode! And then start the backup process with a second command namenode
bin/hdfs namenode -bootstrapStandby
sbin/hadoop-daemon.sh start namenode
- In two namenode nodes execute the following command
sbin/hadoop-daemon.sh start zkfc
- Start datanode datanode in all nodes execute the following command
sbin/hadoop-daemon.sh start datanode
- Daily start and stop commands
sbin/start-dfs.sh
sbin/stop-dfs.sh