1.hadoop-env.sh
2.core-site.xml
<configuration>
<-! Hdfs of nameservice designated as ns1 ->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1/</value>
</property>
<! - Specifies the temporary directory hadoop ->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/app/hadoop-2.7.2/tmp</value>
</property>
<-! Zookeeper specified address ->
<property>
<name>ha.zookeeper.quorum</name>
<value>spark05:2181,spark06:2181,spark07:2181</value>
</property>
</configuration>
3.hdfs-site.xml
<configuration>
<! - Specifies the nameservice hdfs is ns1, need to be consistent, and the core-site.xml ->
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<-! Ns1 following two NameNode, are nn1, nn2 ->
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<-! Nn1 the RPC communication address ->
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>spark01:9000</value>
</property>
<-! Nn1 address of http traffic ->
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>spark01:50070</value>
</property>
<-! Nn2 the RPC communication address ->
<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>spark02:9000</value>
</property>
<-! Nn2 address of http traffic ->
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>spark02:50070</value>
</property>
<! - NameNode metadata specifies the storage location in the JournalNode ->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://spark05:8485;spark06:8485;spark07:8485/ns1</value>
</property>
<! - Specifies the position JournalNode data stored in the local disk ->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/app/hadoop-2.7.2/journaldata</value>
</property>
<! - automatic switch turn on failed NameNode ->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<! - automatic switching implementation configuration failure ->
<property> <name>dfs.client.failover.proxy.provider.ns1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<! - isolation mechanism disposed method, a plurality of line feed mechanism of splitting, i.e., with each mechanism temporary line ->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence
shell(/bin/true)
</value>
</property>
<-! Required when using ssh-free isolation mechanism sshfence Login ->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<! - Configure sshfence isolation mechanism timeout ->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
</configuration>
3.mapred-site.xml
<configuration>
<! - Specifies the way mr framework for yarn ->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
4.yarn-site.xml
<configuration>
<! - RM opened high availability ->
<property> <name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<! - Specifies the RM of cluster id ->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<! - Specifies the name RM ->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<! - are specified address of RM ->
<property> <name>yarn.resourcemanager.hostname.rm1</name>
<value>spark03</value>
</property>
<property> <name>yarn.resourcemanager.hostname.rm2</name>
<value>spark04</value>
</property>
<! - Specifies the cluster address zk ->
<property>
<name>yarn.resourcemanager.zk-address</name> <value>spark05:2181,spark06:2181,spark07:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
5.slaves
spark05
spark06
spark07
start up
1. Start three zookeeper
2. Start three journalnode
hadoop-daemon.sh start journalnode
3. Format HDFS
# Execute commands on spark01:
hdfs namenode -format
After the # format will generate a configuration file based core-site.xml in hadoop.tmp.dir, I configured here is /hadoop/hadoop-2.7.2/tmp, then /weekend/hadoop-2.4.1 / tmp hadoop02 copied to the /weekend/hadoop-2.7.2/.
scp -r tmp/ hadoop02:/home/hadoop/app/hadoop-2.7.2/
## can also be so, it is recommended hdfs namenode -bootstrapStandby
4 format ZKFC (can be executed on hadoop [spark] 01)
hdfs zkfc -formatZK
5. Start HDFS (executed on hadoop [spark] 01)
sbin/start-dfs.sh
6. Start YARN (##### ##### Note: the implementation of start-yarn.sh on spark03, to separate namenode and resourcemanager because of performance issues, because they have to take up a lot of resources, so take them apart , they were separated will start on different machines)
sbin/start-yarn.sh
And then manually start the second resourcemanager
yarn-daemon.sh start resourcemanager
7. Check nanenode in the browser
http://node1:50070
http://node2:50070
8. Verify HDFS HA
First, upload a file to hdfs
hadoop fs -put /etc/profile /profile
hadoop fs -ls /
And then kill off the active NameNode
kill -9
Accessed through a browser: http: //192.168.1.202: 50070
NameNode 'hadoop02: 9000' (active)
NameNode on weekend02 this time becomes active
9. execute the command:
hadoop fs -ls /
-rw-r–r– 3 root supergroup 1926 2014-02-06 15:36 /profile
Just upload files still exist! ! !
Manually start the hang of NameNode
sbin/hadoop-daemon.sh start namenode
Accessed through a browser: http: //192.168.1.201: 50070
NameNode ‘hadoop01:9000’ (standby)
Verify YARN:
WordCount run the program demo hadoop offered in:
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount /profile /out
1.hadoop-env.sh
2.core-site.xml<configuration> <!-- 指定hdfs的nameservice为ns1 --> <property> <name>fs.defaultFS</name> <value>hdfs://ns1/</value> </property> <!-- 指定hadoop临时目录 --> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/app/hadoop-2.7.2/tmp</value> </property> <!-- 指定zookeeper地址 --> <property> <name>ha.zookeeper.quorum</name> <value>spark05:2181,spark06:2181,spark07:2181</value> </property> </configuration>
3.hdfs-site.xml <configuration> <-! Hdfs designated as a nameservice ns1, we need to be consistent, and the core-site.xml -> <property> <name> dfs.nameservices </ name> <value > ns1 </ value> </ property> <-! ns1 following two NameNode, are nn1, nn2 -> <property> <name> dfs.ha.namenodes.ns1 </ name> <value> nn1 , nn2 </ value> </ property> <property> <name> dfs.namenode.rpc-address.ns1.nn1 </ name> <value> spark01 <- - nn1 RPC traffic address!>: 9000 < / value> </ property> <- nn1 communication address of http -> <property> <name> dfs.namenode.http-address.ns1.nn1 </ name> <value> spark01:! 50070 </ value> </ property> <-! nn2 the RPC communication address -> <property> <name> dfs.namenode.rpc-address.ns1.nn2 </ name> <value> spark02: <! - nn2 communication address of http -> 9000 </ value> </ property> <property> <name> dfs.namenode.http -address.ns1.nn2 </ name> <value> spark02: 50070 </ value> </ property> <property> <name> dfs <- - NameNode specified metadata storage position on the JournalNode!>. namenode.shared.edits.dir </ name> <value> qjournal: // spark05: 8485; spark06: 8485; spark07: 8485 / ns1 </ value> </ property> <- designated JournalNode data stored in the local disk! position -> <property> <name> dfs.journalnode.edits.dir </ name> <value> /home/hadoop/app/hadoop-2.7.2/journaldata </ value> </ property> <! - - Enable automatic switching NameNode failure -> <property> <name> dfs.ha.automatic-failover.enabled </ name> <value> true </ value> </ property> <-! automatic switching implementation configuration failure -> <property> <name> dfs.client.failover.proxy.provider.ns1 </ name > <value> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider </ value> </ property> <-! isolation mechanism disposed method, a plurality of line feed mechanism of splitting, i.e., with each mechanism temporary line -> <property> <name> dfs.ha.fencing.methods </ name> <value> sshfence shell (/ bin / true) </ value> </ property> <-! sshfence require isolation mechanism using ssh free landing -> <property> <name> dfs.ha.fencing.ssh.private-key-files </ name> <value> /home/hadoop/.ssh/id_rsa </ value> </ property> <! - isolation mechanism disposed sshfence timeout -> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property></configuration>
3.mapred-site.xml<configuration> <!-- 指定mr框架为yarn方式 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property></configuration>
4.yarn-site.xml<configuration> <!-- 开启RM高可用 --> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!-- 指定RM的cluster id --> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yrc</value> </property> <!-- 指定RM的名字 --> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <!-- 分别指定RM的地址 --> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>spark03</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>spark04</value> </property> <!-- 指定zk集群地址 --> <property> <name>yarn.resourcemanager.zk-address</name> <value>spark05:2181,spark06:2181,spark07:2181</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property></configuration>
5.slavesspark05 spark06 spark07
Start Start three ZooKeeper 1.
2. Start Start journalnode three journalnodehadoop-daemon.sh
3. formatted on spark01 HDFS # Run: hdfs namenode -format # after formatting in accordance with core-site the hadoop.tmp.dir generate .xml configuration file, and here I configuration is /hadoop/hadoop-2.7.2/tmp, then /weekend/hadoop-2.4.1/tmp copied to the hadoop02 / weekend / hadoop -2.7.2 / down. scp -r tmp / hadoop02: /home/hadoop/app/hadoop-2.7.2/ ## may be so, is recommended hdfs namenode -bootstrapStandby 4 format ZKFC (can be performed on the spark] [01 hadoop) hdfs zkfc - formatZK
5. the start the HDFS ([spark] execution on Hadoop 01) sbin / start-dfs.sh
6. the start YARN (##### ##### Note: start-yarn.sh is performed on spark03, namenode and resourcemanager to separate because of performance issues, because they have to take up a lot of resources, so to separate them, and they are separated will start on different machines) sbin / start-yarn.sh
then manually start the second Start the ResourceManager resourcemanageryarn-daemon.sh
7. Check in your browser nanenode http: // node1: 50070http: // node2 : 50070
8. Verify HDFS HA first upload a file hadoop fs -put / etc to hdfs / profile / profile hadoop fs -ls / and then kill off the active NameNode kill -9 accessed through a browser: http: //192.168.1.202: 50070 NameNode 'hadoop02: 9000' (active ) NameNode on weekend02 this time becomes Active
9. the execution of the command: hadoop fs -ls / -rw-r -r- 3 root supergroup 1926 2014-02-06 15:36 / profile just upload files still exist! ! ! Manually start the hang of NameNode sbin / hadoop-daemon.sh start namenode accessed through a browser: http: //192.168.1.201: 50070 NameNode ' hadoop01: 9000' (standby) verification YARN: demo hadoop provided running about in WordCount procedure: Share Hadoop JAR / Hadoop / MapReduce / Hadoop-MapReduce--2.4.1.jar examples WordCount / Profile / OUT
https://blog.csdn.net/u013821825/article/details/51377415