hadoop zookeeper configure HA

Original address:

https://blog.csdn.net/everl_1/article/details/52303011


ZooKeeper is a distributed, open source distributed application coordination service, an open source implementation of Google’s Chubby, and an important component of Hadoop and Hbase. It is a distributed application to provide consistent service business software, offering features include: configuration maintenance, domain name service, distributed synchronization, group services.

Non-HA drawbacks

The distributed storage of the HDFS cluster is realized by the namenode node (namenode is responsible for responding to client requests). Once the namenode is down in a non-HA cluster, although the metadata will not be lost, the entire cluster will not be able to provide services to the outside world, resulting in low reliability of the HDFS service, which is obviously not feasible in actual application scenarios.

HA mechanism

It is known that the reason for the low reliability of the service is the downtime of the namenode node, so how can we avoid the downtime of this namenode node? An easy solution is to deploy two namenode nodes to form an active/standby mode (active/standby mode), so that once the active node goes down, the standby node immediately switches to the active mode. In fact, the HA mechanism is the scheme adopted. To implement this mechanism, the following problems need to be solved:

1. Why choose the active/standby mode instead of the active/active mode, that is, let both namenode nodes respond to client requests

        An obvious prerequisite is that the two namenode nodes need to store consistent metadata.

        We know that the namenode node is used to manage these metadata. When responding to a client request (upload), metadata information needs to be added. If the master mode is used, then both nodes will write metadata. How to synchronize is very difficult. Difficult question. Therefore, only one machine can respond to the request, that is, the node in the active state (which can be called the master node), and the other namenode is only used to synchronize the metadata information of the active node when the master node is working normally. The namenode is called the standby node (in the standby state). It can be seen that the main problem to be solved is how to synchronize the metadata information of the active node.

2. How to synchronize the metadata of two namenode nodes

      The active node responds to the client request, so only the active node saves the latest metadata. Metadata is divided into two parts, one is the newly written metadata (edits), and the other is the older merged (fsimage). The HA mechanism to solve the synchronization problem is to put the edits metadata newly written by the active node on the zookeeper cluster (the main function of the zookeeper cluster is to realize distributed synchronization management of a small amount of data), and the standby node only needs to be used in the normal situation of the active node The edits file on the zookeeper cluster can be synchronized to your own fsimage.

       The hadoop framework wrote a distributed application qjournal for this cluster (implemented by zookeeper), and the node that implements qjournal is called journalnode.

3. How to sense whether the active node is down and quickly switch the standby node to the active state?

        The solution is to specifically start a monitoring process on the namenode node to monitor the status of the namenode at all times. For the namenode in the active state, if it is found to be abnormal, write some data to the zookeeper cluster. For the namenode in the standby state, the monitoring process reads data from the zookeeper cluster to sense whether the active node is normal. If an abnormality is found, the monitoring process is responsible for switching the standby state to the active state. This monitoring process is called zkfc in hadoop (implemented by zookeeper).

4. How to avoid brain split during state switching?

        Split-brain: After the active namenode works abnormally, zkfc writes some data in zookeeper, indicating that it is abnormal. At this time, zkfc in the standby namenode reads the abnormal information and sets the standby node to active. However, if the previous active namenode did not really die, and there was a fake death (it became normal after a while, haha, is it funny), then two namenodes will work at the same time. This phenomenon is called split brain.

        Solution: The standby namenode will not immediately switch the state after sensing that the active node is abnormal. Zkfc will first remotely kill the namenode process of the active node through SSH (kill -9 process number). But (this is not enough, surprised), what if the kill command is not executed successfully? ? If you do not receive a successful execution receipt within a period of time, the standby node will execute a custom script to try to ensure that there will be no split-brain problems! This mechanism is called fencing in hadoop (including ssh sending kill commands and executing custom scripts for two guarantees)


After solving the appeal problem, hadoop HA was basically realized.

HA implementation 1. HA
cluster planning
CPU name software process
sempplsl-02 jdk,hadoop,zookeeper QuorumPeerMain(zookeeper),journalnode,datanode,nodemanager
sempplsl-03 jdk,hadoop,zookeeper QuorumPeerMain(zookeeper),journalnode,datanode,nodemanager
sempplsl-04 jdk,hadoop,zookeeper QuorumPeerMain(zookeeper),journalnode,datanode,nodemanager
sempplsl-05 jdk,hadoop purpose, zkfc (active)
sempplsl-06 jdk,hadoop intentions, zkfc
sempplsl-07 jdk,hadoop resourcemanager
sempplsl-08 jdk,hadoop resourcemanager

(Note: datanode and nodemanager are generally put together. Journalnode relies on zookeeper to achieve, so QuorumPeerMain(zookeeper), journalnode must be put together!)

2.hadoop HA cluster configuration

core-site.xml   ---->

<property>
<!-- 指定hdfs的nameservice为ns1 -->
<name>fs.defaultFS</name>
<value>hdfs://ns1/</value>
</property>
<!-- 指定hadoop临时目录 -->	
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/app/hadoop-2.4.1/tmp</value>
</property>
<!-- 指定zookeeper地址 -->				
<property>
<name>ha.zookeeper.quorum</name>
<value>sempplsl-02:2181,sempplsl-03:2181,sempplsl-04:2181</value>
</property>

hdfs-site.xml  --->

<!--指定hdfs的nameservice为ns1,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<!-- ns1下面有两个NameNode,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>sempplsl-05:9000</value>
</property>
<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>sempplsl-05:50070</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>sempplsl-06:9000</value>
</property>
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>sempplsl-06:50070</value>
</property>
<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://sempplsl-02:8485;sempplsl-03:8485;sempplsl-04:8485/ns1</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/app/hadoop-2.4.1/journaldata</value>
</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用sshfence隔离机制时需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<!-- 配置sshfence隔离机制超时时间 -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>

yarn-site.xml  --->

<!-- 开启RM高可用 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定RM的cluster id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<!-- 指定RM的名字 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 分别指定RM的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>sempplsl-07</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>sempplsl-08</value>
</property>
<!-- 指定zk集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>sempplsl-02:2181,sempplsl-02:2181,sempplsl-02:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

marped-site.xml --->

<!-- 指定mr框架为yarn方式 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

3. HA cluster startup

3.1. Modify the slaves file

    The slaves file saves the location of the child nodes. To start hdfs on sempplsl-05, according to the cluster configuration, you need to specify the datanode on sempplsl-02, sempplsl-03, sempplsl-04, and the method is to enter hadoop-2.4 in the sempplsl-05 machine .1/etc/hadoop installation folder.

    In addition, start yarn on sempplsl-07, according to the cluster configuration, you need to specify nodemanager on sempplsl-02, sempplsl-03, sempplsl-04, the method is the same as above.

3.2. Configure keyless login

    Configure keyless login from sempplsl-05 to sempplsl-02, sempplsl-03, sempplsl-04, and sempplsl-06; (ssh-keygen -t rsa, ssh-copy-id target host)

    配置sempplsl-07到sempplsl-02,sempplsl-03,sempplsl-04,sempplsl-08的无密钥登陆;

3.3.将配置好的hadoop copy到集群其它节点

    scp -r 

3.4.启动zookeeper集群

    分别在sempplsl-02,sempplsl-03,sempplsl-04机器上执行启动指令:./zkServer.sh start  

    查看zookeeper状态:./zkServer.sh status, 正确的状态是一个leader,两个follower。

3.5.启动journalnode

    分别在sempplsl-02,sempplsl-03,sempplsl-04机器上执行启动指令:sbin/hadoop-daemon.sh start journalnode。

    启动成功后会多出一个JournalNode进程。

3.6. 格式化HDFS

    在sempplsl-05上执行格式化指令:hadoop namenode -format 

3.7.格式化zkfc

    在sempplsl-05上执行格式化指令: hdfs zkfc -formatZK

    格式化成功后会在zookeeper集群建立新的文件路径(该路径下存放zkfc监控namenode节点的信息)

3.8.启动HDFS

    在sempplsl-05上执行:start-dfs.sh。

3.9.启动yarn

   在sempplsl-07上执行sbin/start-yarn.sh

   在sempplsl-08上执行./yarn-daemon.sh start resourcemanager

至此,HA集群启动成功!


Guess you like

Origin blog.csdn.net/lcalqf/article/details/81051005