Hadoop2.5.2 HA high reliability cluster construction (Hadoop+Zookeeper)

I. Overview

Please reprint from the source: http://eksliang.iteye.com/blog/2226986

1.1 Single point problem of hadoop1.0

The NameNode in Hadoop is like the heart of a person. It is very important and must not stop working. In the hadoop1 era, there was only one NameNode. If the NameNode data is lost or becomes inoperable, the entire cluster cannot be recovered. This is a single point problem in hadoop1 and a manifestation of hadoop1 being unreliable. As shown in the figure below, it is the architecture diagram of hadoop1.0;



 

1.2 The solution of hadoop2.0 to the single point problem of hadoop1.0

In order to solve the single-point problem in hadoop1, the new NameNode in hadoop2 is no longer only one, there can be multiple (currently only 2 are supported). Each has the same function. One is in active state and the other is in standby state. When the cluster is running, only the NameNode in the active state is working normally, and the NameNode in the standby state is in the standby state, and the data of the NameNode in the active state is synchronized at all times. Once the NameNode in the active state cannot work, the NameNode in the standby state can be changed to the active state by manual or automatic switching, and it can continue to work. This is high reliability.

 

1.3 Use JournalNode to realize the sharing of NameNode (Active and Standby) data

In Hadoop2.0, the data of two NameNodes is actually shared in real time. The new HDFS adopts a sharing mechanism, Quorum Journal Node (JournalNode) cluster or Nnetwork File System (NFS) for sharing. NFS is at the operating system level, and JournalNode is at the hadoop level. We use the JournalNode cluster for data sharing (this is also the mainstream practice). As shown in the figure below, it is the architecture diagram of JournalNode.



 For data synchronization, two NameNodes communicate with each other through a set of independent processes called JournalNodes. Most of the JournalNodes processes are notified when the namespace of the active NameNode is modified. The NameNode in the standby state has the ability to read the change information in the JNs, and always monitor the changes of the edit log, and apply the changes to its own namespace. standby ensures that in the event of a cluster failure, the namespace state is fully synchronized

 

1.4 Failover between NameNodes

For HA clusters, it is critical to ensure that only one NameNode is active at a time. Otherwise, the data states of the two NameNodes will diverge, data may be lost, or erroneous results may be produced. In order to ensure this, this requires the use of ZooKeeper. First, both NameNodes in the HDFS cluster are registered in ZooKeeper. When the NameNode in the active state fails, ZooKeeper can detect this situation, and it will automatically switch the NameNode in the standby state to the active state.

 

2. Construction of Hadoop (HA) cluster

2.1 Configuration Details
CPU name IP NameNode DataNode Year Zookeeper JournalNode
mast1 192.168.177.131 Yes Yes no Yes Yes
mast2 192.168.177.132 Yes Yes no Yes Yes
mast3 192.168.177.133 no Yes Yes Yes Yes

 

2.2 Install jdk

(Omitted) Install jdk and configure environment variables

 

2.2 SSH login-free

(omitted), reference: http://eksliang.iteye.com/blog/2187265

 

2.4 Zookeeper cluster construction

(Omitted), for reference, http://eksliang.iteye.com/blog/2107002, this is my solr cluster deployment, which is also managed by zookeeper. The steps in zookeeper are exactly the same as the operation. Finally, my zoo.cfg file is as follows Show

 

[hadoop@Mast1 conf]$ cat zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit = 10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=/home/hadoop/zookeeper/data
dataLogDir=/home/hadoop/zookeeper/datalog
# the port at which the clients will connect
clientPort=2181
server.1=mast1:2888:3888  
server.2=mast2:2888:3888  
server.3=mast3:2888:3888 

 

 
2.5 Configure the Hadoop configuration file

First configure the machine mast1, after the configuration, copy the configuration environment to mast2 and mast3!

The configuration of hadoop2.0 is stored in the ~/etc/hadoop directory.

 

  • core.xml
<configuration>
 <!-- Specify the nameservice of hdfs as ns -->
 <property>    
      <name>fs.defaultFS</name>    
      <value>hdfs://ns</value>    
 </property>
 <!--Specify the temporary storage directory of hadoop data-->
 <property>
      <name>hadoop.tmp.dir</name>
      <value>/home/hadoop/workspace/hdfs/temp</value>
 </property>   
                          
 <property>    
      <name>io.file.buffer.size</name>    
      <value>4096</value>    
 </property>
 <!--Specify the zookeeper address-->
 <property>
      <name>ha.zookeeper.quorum</name>
      <value>mast1:2181,mast2:2181,mast3:2181</value>
 </property>
 </configuration>

 

 

  • hdfs-site.xml
<configuration>
    <!--Specify the nameservice of hdfs as ns, which needs to be consistent with that in core-site.xml-->    
    <property>    
        <name>dfs.nameservices</name>    
        <value>ns</value>    
    </property>  
    <!-- There are two NameNodes under ns, nn1, nn2 -->
    <property>
       <name>dfs.ha.namenodes.ns</name>
       <value>nn1,nn2</value>
    </property>
    <!-- nn1's RPC address -->
    <property>
       <name>dfs.namenode.rpc-address.ns.nn1</name>
       <value>mast1:9000</value>
    </property>
    <!-- http communication address of nn1-->
    <property>
        <name>dfs.namenode.http-address.ns.nn1</name>
        <value>mast1:50070</value>
    </property>
    <!-- nn2's RPC address -->
    <property>
        <name>dfs.namenode.rpc-address.ns.nn2</name>
        <value>mast2:9000</value>
    </property>
    <!-- http communication address of nn2-->
    <property>
        <name>dfs.namenode.http-address.ns.nn2</name>
        <value>mast2:50070</value>
    </property>
    <!-- Specify the storage location of the NameNode's metadata on the JournalNode-->
    <property>
         <name>dfs.namenode.shared.edits.dir</name>
         <value>qjournal://mast1:8485;mast2:8485;mast3:8485/ns</value>
    </property>
    <!-- Specify the location where JournalNode stores data on the local disk-->
    <property>
          <name>dfs.journalnode.edits.dir</name>
          <value>/home/hadoop/workspace/journal</value>
    </property>
    <!-- Enable automatic switchover when NameNode fails -->
    <property>
          <name>dfs.ha.automatic-failover.enabled</name>
          <value>true</value>
    </property>
    <!-- Configuration failure automatic switching implementation-->
    <property>
            <name>dfs.client.failover.proxy.provider.ns</name>
            <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    <!-- Configure isolation mechanism-->
    <property>
             <name>dfs.ha.fencing.methods</name>
             <value>sshfence</value>
    </property>
    <!-- SSH login-free is required when using the isolation mechanism-->
    <property>
            <name>dfs.ha.fencing.ssh.private-key-files</name>
            <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
    
     <!-- hdfs and ssh login-free connection timeout time-->
    <property>
          <name>dfs.ha.fencing.ssh.connect-timeout</name>
          <value>5000</value>
    </property>                  
    <property>    
        <name>dfs.namenode.name.dir</name>    
        <value>file:///home/hadoop/workspace/hdfs/name</value>    
    </property>    
    
    <property>    
        <name>dfs.datanode.data.dir</name>    
        <value>file:///home/hadoop/workspace/hdfs/data</value>    
    </property>    
    
    <property>    
       <name>dfs.replication</name>    
       <value>2</value>    
    </property>   
    <!-- Enable WebHDFS (REST API) function on NN and DN, not required-->                                                                    
    <property>    
       <name>dfs.webhdfs.enabled</name>    
       <value>true</value>    
    </property>    
</configuration>

 

 

  • mapred-site.xml
<configuration>
 <property>    
        <name>mapreduce.framework.name</name>    
        <value>yarn</value>    
 </property>    
</configuration>

 

  • yarn-site.xml
<configuration>
    <!-- Specify the way to load server when nodemanager starts as shuffle server -->
    <property>    
            <name>yarn.nodemanager.aux-services</name>    
            <value>mapreduce_shuffle</value>    
     </property>  
     <!-- Specify resourcemanager address -->
     <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>mast3</value>
      </property>
</configuration>

 

  • slaves
[hadoop@Mast1 hadoop]$ cat slaves
mast1
mast2
mast3

 

  • Modify JAVA_HOME

 

Add the JAVA_HOME configuration to the files hadoop-env.sh and yarn-env.sh respectively

#export JAVA_HOME=${JAVA_HOME} --原来   
export JAVA_HOME=/usr/local/java/jdk1.7.0_67

 Although the environment variable of ${JAVA_HOME} is configured by default, when hadoop starts, it will prompt that it cannot be found. There is no way to specify the absolute path. This is necessary.

 

  • Configure the environment variables of hadoop, refer to my configuration
[hadoop@Mast1 hadoop]$ vim ~/.bash_profile  
export HADOOP_HOME="/home/hadoop/hadoop-2.5.2"  
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH  
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native  
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"  

 

  • Copy the configuration to mast2, mast3
scp -r ~/.bash_profile hadoop@mast2:/home/hadoop/  
scp -r ~/.bash_profile hadoop@mast3:/home/hadoop/  
scp -r $HADOOP_HOME/etc/hadoop hadoop@mast2:/home/hadoop/hadoop-2.5.2/etc/  
scp -r $HADOOP_HOME/etc/hadoop hadoop@mast3:/home/hadoop/hadoop-2.5.2/etc/  

 

 

So far the Hadoop configuration is complete, the next step is to start the cluster

 

3. Startup of the cluster

3.1 Start the zookeeper cluster

Execute the following commands on mast1, mast2, and mast3 to start the zookeeper cluster;

[hadoop@Mast1 bin]$ sh zkServer.sh start

 To verify whether the cluster zookeeper cluster is started, execute the following commands on mast1, mast2, and mast3 to verify whether the zookeeper cluster is started, the cluster is successfully started, and there are two follower nodes and one leader node;

[hadoop@Mast1 bin]$ sh zkServer.sh status
JMX enabled by default
Using config: /home/hadoop/zookeeper/zookeeper-3.3.6/bin/../conf/zoo.cfg
Mode: follower
 3.2 Start the journalnode cluster

Execute the following command on mast1 to start the JournalNode cluster

[hadoop@Mast1 hadoop-2.5.2]$ sbin/hadoop-daemons.sh start journalnode

 Execute the jps command, you can view the java process pid of JournalNode

 

3.3 Format zkfc to generate ha nodes in zookeeper

Execute the following command on mast1 to complete the format

hdfs zkfc –formatZK

 (Note that this command is best to be entered manually. There may be problems with direct copy execution. I was in pain for a long time when deploying)

  After the format is successful, you can see it in zookeeper

[zk: localhost:2181(CONNECTED) 1] ls /hadoop-ha
[ns]
 
3.4 Format hdfs
hadoop purpose-format

 (Note that this command is best to enter manually, there may be problems with direct copy execution)

 

3.5 Start NameNode

First start the active node on mast1 and execute the following command on mast1

[hadoop@Mast1 hadoop-2.5.2]$ sbin/hadoop-daemon.sh start namenode

 Synchronize the data of the namenode on mast2 and start the namenode of the standby at the same time, the command is as follows

#Synchronize the data of NameNode to mast2
[hadoop@Mast2 hadoop-2.5.2]$ hdfs namenode –bootstrapStandby
#Start the namenode on mast2 as standby
[hadoop@Mast2 hadoop-2.5.2]$ sbin/hadoop-daemon.sh start namenode

 

3.6 start start datanode

Execute the following command on mast1

[hadoop@Mast1 hadoop-2.5.2]$ sbin/hadoop-daemons.sh start datanode

 

3.7 Start year  

Start on the machine as the resource manager, I am mast3 here, execute the following command to complete the start of year

[hadoop@Mast3 hadoop-2.5.2]$ sbin/start-yarn.sh

 

3.8 Start ZKFC

Execute the following command on mast1 to complete the startup of ZKFC

[hadoop@Mast1 hadoop-2.5.2]$ sbin/hadoop-daemons.sh start zkfc

After all start, execute jps on mast1, mast2, mast3 respectively, you can see the following processes

java PID process on #mast1
[hadoop@Mast1 hadoop-2.5.2]$ jps
2837 NodeManager
3054 DFSZKFailoverController
4309 Jps
2692 DataNode
2173 QuorumPeerMain
2551 NameNode
2288 JournalNode
java PID process on #mast2
[hadoop@Mast2 ~]$ jps
2869 DFSZKFailoverController
2353 DataNode
2235 JournalNode
4522 Jps
2713 NodeManager
2591 NameNode
2168 QuorumPeerMain
java PID process on #mast3
[hadoop@Mast3 ~]$ jps
2167 QuorumPeerMain
2337 JournalNode
3506 Jps
2457 DataNode
2694 NodeManager
2590 ResourceManager

 

4. Test the high availability of HA

After startup, the namenode of mast1 and the namenode of mast2 are as follows:



 



 

At this point, execute the following command on mast1 to close the namenode on mast1

[hadoop@Mast1 hadoop-2.5.2]$ sbin/hadoop-daemon.sh stop namenode

 Check the namenode on mast1 again and find that it is automatically switched to active! The evidence is as follows:



 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326311273&siteId=291194637