hadoop HA cluster structures (pro-test)

1.hadoop-env.sh

 

2.core-site.xml

<configuration>

            <-! Hdfs of nameservice designated as ns1 ->

            <property>

                <name>fs.defaultFS</name>

                <value>hdfs://ns1/</value>

            </property>

            <! - Specifies the temporary directory hadoop ->

            <property>

                <name>hadoop.tmp.dir</name>

                <value>/home/hadoop/app/hadoop-2.7.2/tmp</value>

            </property> 

            <-! Zookeeper specified address ->

            <property>

                <name>ha.zookeeper.quorum</name>

               <value>spark05:2181,spark06:2181,spark07:2181</value>

            </property>

        </configuration>

 

 

 

3.hdfs-site.xml

<configuration>

    <! - Specifies the nameservice hdfs is ns1, need to be consistent, and the core-site.xml ->

    <property>

        <name>dfs.nameservices</name>

        <value>ns1</value>

    </property>

    <-! Ns1 following two NameNode, are nn1, nn2 ->

    <property>

        <name>dfs.ha.namenodes.ns1</name>

        <value>nn1,nn2</value>

    </property>

    <-! Nn1 the RPC communication address ->

    <property>

        <name>dfs.namenode.rpc-address.ns1.nn1</name>

        <value>spark01:9000</value>

    </property>

    <-! Nn1 address of http traffic ->

    <property>

        <name>dfs.namenode.http-address.ns1.nn1</name>

        <value>spark01:50070</value>

    </property>

    <-! Nn2 the RPC communication address ->

    <property>

        <name>dfs.namenode.rpc-address.ns1.nn2</name>

        <value>spark02:9000</value>

    </property>

    <-! Nn2 address of http traffic ->

    <property>

        <name>dfs.namenode.http-address.ns1.nn2</name>

        <value>spark02:50070</value>

    </property>

    <! - NameNode metadata specifies the storage location in the JournalNode ->

    <property>

        <name>dfs.namenode.shared.edits.dir</name>

    <value>qjournal://spark05:8485;spark06:8485;spark07:8485/ns1</value>

    </property>

    <! - Specifies the position JournalNode data stored in the local disk ->

    <property>

        <name>dfs.journalnode.edits.dir</name>

        <value>/home/hadoop/app/hadoop-2.7.2/journaldata</value>

    </property>

    <! - automatic switch turn on failed NameNode ->

    <property>

        <name>dfs.ha.automatic-failover.enabled</name>

        <value>true</value>

    </property>

    <! - automatic switching implementation configuration failure ->

    <property>                      <name>dfs.client.failover.proxy.provider.ns1</name>                     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

    </property>

    <! - isolation mechanism disposed method, a plurality of line feed mechanism of splitting, i.e., with each mechanism temporary line ->

    <property>

        <name>dfs.ha.fencing.methods</name>

        <value>sshfence

                shell(/bin/true)

        </value>

    </property>

    <-! Required when using ssh-free isolation mechanism sshfence Login ->

    <property>

        <name>dfs.ha.fencing.ssh.private-key-files</name>

        <value>/home/hadoop/.ssh/id_rsa</value>

    </property>

    <! - Configure sshfence isolation mechanism timeout ->

    <property>

        <name>dfs.ha.fencing.ssh.connect-timeout</name>

        <value>30000</value>

    </property>

</configuration>

 

3.mapred-site.xml

<configuration>

    <! - Specifies the way mr framework for yarn ->

    <property>

        <name>mapreduce.framework.name</name>

        <value>yarn</value>

    </property>

</configuration>

 

4.yarn-site.xml

<configuration>

    <! - RM opened high availability ->

    <property>                      <name>yarn.resourcemanager.ha.enabled</name>

        <value>true</value>

    </property>

    <! - Specifies the RM of cluster id ->

    <property>

        <name>yarn.resourcemanager.cluster-id</name>

        <value>yrc</value>

    </property>

    <! - Specifies the name RM ->

    <property>

        <name>yarn.resourcemanager.ha.rm-ids</name>

        <value>rm1,rm2</value>

    </property>

    <! - are specified address of RM ->

    <property>                         <name>yarn.resourcemanager.hostname.rm1</name>

       <value>spark03</value>

        </property>

    <property>                     <name>yarn.resourcemanager.hostname.rm2</name>

         <value>spark04</value>

    </property>

    <! - Specifies the cluster address zk ->

    <property>

        <name>yarn.resourcemanager.zk-address</name>                           <value>spark05:2181,spark06:2181,spark07:2181</value>

    </property>

    <property>

        <name>yarn.nodemanager.aux-services</name>

        <value>mapreduce_shuffle</value>

    </property>

</configuration>

 

 

5.slaves

spark05 

spark06 

spark07 

 

 

start up

1. Start three zookeeper

 

2. Start three journalnode

hadoop-daemon.sh start journalnode

 

3. Format HDFS 

# Execute commands on spark01: 

hdfs namenode -format 

After the # format will generate a configuration file based core-site.xml in hadoop.tmp.dir, I configured here is /hadoop/hadoop-2.7.2/tmp, then /weekend/hadoop-2.4.1 / tmp hadoop02 copied to the /weekend/hadoop-2.7.2/. 

scp -r tmp/ hadoop02:/home/hadoop/app/hadoop-2.7.2/ 

## can also be so, it is recommended hdfs namenode -bootstrapStandby 

 

4 format ZKFC (can be executed on hadoop [spark] 01) 

hdfs zkfc -formatZK

 

5. Start HDFS (executed on hadoop [spark] 01) 

sbin/start-dfs.sh 

 

6. Start YARN (##### ##### Note: the implementation of start-yarn.sh on spark03, to separate namenode and resourcemanager because of performance issues, because they have to take up a lot of resources, so take them apart , they were separated will start on different machines) 

sbin/start-yarn.sh

 

And then manually start the second resourcemanager

yarn-daemon.sh start resourcemanager

 

7. Check nanenode in the browser 

http://node1:50070

http://node2:50070

 

8. Verify HDFS HA 

First, upload a file to hdfs 

hadoop fs -put /etc/profile /profile 

hadoop fs -ls / 

And then kill off the active NameNode 

kill -9 

Accessed through a browser: http: //192.168.1.202: 50070 

NameNode 'hadoop02: 9000' (active) 

NameNode on weekend02 this time becomes active 

 

9. execute the command: 

hadoop fs -ls / 

-rw-r–r– 3 root supergroup 1926 2014-02-06 15:36 /profile 

Just upload files still exist! !

Manually start the hang of NameNode 

sbin/hadoop-daemon.sh start namenode 

Accessed through a browser: http: //192.168.1.201: 50070 

NameNode ‘hadoop01:9000’ (standby) 

Verify YARN: 

WordCount run the program demo hadoop offered in: 

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount /profile /out

 

1.hadoop-env.sh
2.core-site.xml<configuration>            <!-- 指定hdfs的nameservice为ns1 -->            <property>                <name>fs.defaultFS</name>                <value>hdfs://ns1/</value>            </property>            <!-- 指定hadoop临时目录 -->            <property>                <name>hadoop.tmp.dir</name>                <value>/home/hadoop/app/hadoop-2.7.2/tmp</value>            </property>             <!-- 指定zookeeper地址 -->            <property>                <name>ha.zookeeper.quorum</name>               <value>spark05:2181,spark06:2181,spark07:2181</value>            </property>        </configuration>


3.hdfs-site.xml <configuration> <-! Hdfs designated as a nameservice ns1, we need to be consistent, and the core-site.xml -> <property> <name> dfs.nameservices </ name> <value > ns1 </ value> </ property> <-! ns1 following two NameNode, are nn1, nn2 -> <property> <name> dfs.ha.namenodes.ns1 </ name> <value> nn1 , nn2 </ value> </ property> <property> <name> dfs.namenode.rpc-address.ns1.nn1 </ name> <value> spark01 <- - nn1 RPC traffic address!>: 9000 < / value> </ property> <- nn1 communication address of http -> <property> <name> dfs.namenode.http-address.ns1.nn1 </ name> <value> spark01:! 50070 </ value> </ property> <-! nn2 the RPC communication address -> <property> <name> dfs.namenode.rpc-address.ns1.nn2 </ name> <value> spark02: <! - nn2 communication address of http -> 9000 </ value> </ property> <property> <name> dfs.namenode.http -address.ns1.nn2 </ name> <value> spark02: 50070 </ value> </ property> <property> <name> dfs <- - NameNode specified metadata storage position on the JournalNode!>. namenode.shared.edits.dir </ name> <value> qjournal: // spark05: 8485; spark06: 8485; spark07: 8485 / ns1 </ value> </ property> <- designated JournalNode data stored in the local disk! position -> <property> <name> dfs.journalnode.edits.dir </ name> <value> /home/hadoop/app/hadoop-2.7.2/journaldata </ ​​value> </ property> <! - - Enable automatic switching NameNode failure -> <property> <name> dfs.ha.automatic-failover.enabled </ name> <value> true </ value> </ property> <-! automatic switching implementation configuration failure -> <property> <name> dfs.client.failover.proxy.provider.ns1 </ name > <value> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider </ value> </ property> <-! isolation mechanism disposed method, a plurality of line feed mechanism of splitting, i.e., with each mechanism temporary line -> <property> <name> dfs.ha.fencing.methods </ name> <value> sshfence shell (/ bin / true) </ value> </ property> <-! sshfence require isolation mechanism using ssh free landing -> <property> <name> dfs.ha.fencing.ssh.private-key-files </ name> <value> /home/hadoop/.ssh/id_rsa </ value> </ property> <! - isolation mechanism disposed sshfence timeout -> <property>        <name>dfs.ha.fencing.ssh.connect-timeout</name>        <value>30000</value>    </property></configuration>
3.mapred-site.xml<configuration>    <!-- 指定mr框架为yarn方式 -->    <property>        <name>mapreduce.framework.name</name>        <value>yarn</value>    </property></configuration>
4.yarn-site.xml<configuration>    <!-- 开启RM高可用 -->    <property>                      <name>yarn.resourcemanager.ha.enabled</name>        <value>true</value>    </property>    <!-- 指定RM的cluster id -->    <property>        <name>yarn.resourcemanager.cluster-id</name>        <value>yrc</value>    </property>    <!-- 指定RM的名字 -->    <property>        <name>yarn.resourcemanager.ha.rm-ids</name>        <value>rm1,rm2</value>    </property>    <!-- 分别指定RM的地址 -->    <property>                         <name>yarn.resourcemanager.hostname.rm1</name>       <value>spark03</value>        </property>    <property>                     <name>yarn.resourcemanager.hostname.rm2</name>         <value>spark04</value>    </property>    <!-- 指定zk集群地址 -->    <property>        <name>yarn.resourcemanager.zk-address</name>                           <value>spark05:2181,spark06:2181,spark07:2181</value>    </property>    <property>        <name>yarn.nodemanager.aux-services</name>        <value>mapreduce_shuffle</value>    </property></configuration>

5.slavesspark05 spark06 spark07 

Start Start three ZooKeeper 1.
2. Start Start journalnode three journalnodehadoop-daemon.sh
3. formatted on spark01 HDFS # Run: hdfs namenode -format # after formatting in accordance with core-site the hadoop.tmp.dir generate .xml configuration file, and here I configuration is /hadoop/hadoop-2.7.2/tmp, then /weekend/hadoop-2.4.1/tmp copied to the hadoop02 / weekend / hadoop -2.7.2 / down. scp -r tmp / hadoop02: /home/hadoop/app/hadoop-2.7.2/ ## may be so, is recommended hdfs namenode -bootstrapStandby 4 format ZKFC (can be performed on the spark] [01 hadoop) hdfs zkfc - formatZK
5. the start the HDFS ([spark] execution on Hadoop 01) sbin / start-dfs.sh 
6. the start YARN (##### ##### Note: start-yarn.sh is performed on spark03, namenode and resourcemanager to separate because of performance issues, because they have to take up a lot of resources, so to separate them, and they are separated will start on different machines) sbin / start-yarn.sh
then manually start the second Start the ResourceManager resourcemanageryarn-daemon.sh
7. Check in your browser nanenode http: // node1: 50070http: // node2 : 50070
8. Verify HDFS HA first upload a file hadoop fs -put / etc to hdfs / profile / profile hadoop fs -ls / and then kill off the active NameNode kill -9 accessed through a browser: http: //192.168.1.202: 50070 NameNode 'hadoop02: 9000' (active ) NameNode on weekend02 this time becomes Active 
9. the execution of the command: hadoop fs -ls / -rw-r -r- 3 root supergroup 1926 2014-02-06 15:36 / profile just upload files still exist! ! ! Manually start the hang of NameNode sbin / hadoop-daemon.sh start namenode accessed through a browser: http: //192.168.1.201: 50070 NameNode ' hadoop01: 9000' (standby) verification YARN: demo hadoop provided running about in WordCount procedure: Share Hadoop JAR / Hadoop / MapReduce / Hadoop-MapReduce--2.4.1.jar examples WordCount / Profile / OUT

https://blog.csdn.net/u013821825/article/details/51377415

Guess you like

Origin www.cnblogs.com/ylht/p/11032451.html