First, the tools to prepare
1,7 virtual machines (at least three), this set up to seven, for example, with a good ip, turn off the firewall, modify the mapping between host names and IP (/ etc / hosts), turn off the firewall
2, the JDK install, configure the environment variables
Second, the cluster plan:
Cluster Programming (seven):
Process hostname software IP installation to run
.. Hadoop01 192.168 * 121 the JDK, hadoop the NameNode, DFSZKFailoverController (zkfc)
hadoop02 192.168 * 122 the JDK, hadoop the NameNode, DFSZKFailoverController (zkfc)..
192.168 hadoop03 *. JDK .123, the ResourceManager Hadoop
hadoop04 192.168. *. JDK 124, the ResourceManager Hadoop
hadoop05 192.168. *. JDK 125, Hadoop, ZooKeeper DataNodes, the NodeManager, JournalNode, QuorumPeerMain
hadoop06 192.168. *. JDK 126, Hadoop, ZooKeeper DataNodes, the NodeManager, JournalNode , QuorumPeerMain
hadoop07 192.168. *. JDK 127, Hadoop, ZooKeeper DataNodes, the NodeManager, JournalNode , QuorumPeerMain
Three cluster of cluster planning:
Process hostname software IP installation running
hadoop01 192.168. *. 201 the JDK, hadoop the NameNode, DFSZKFailoverController (zkfc) JournalNode, QuorumPeerMain (ZooKeeper)
hadoop02 192.168. *. 202 the JDK, hadoop the NameNode, DFSZKFailoverController (zkfc) JournalNode, QuorumPeerMain (ZooKeeper )
hadoop03 192.168. *. JDK 203, Hadoop DataNodes JournalNode, QuorumPeerMain (ZooKeeper)
Third, the installation steps
1, zookeeper cluster configuration (hadoop05 on)
1.1 unzip
tar -zxvf zookeeper.tar.gz -C /hadoop/
1.2 modify the configuration
CD / Hadoop / ZooKeeper / the conf / CP zoo_sample.cfg zoo.cfg Vim zoo.cfg # Review: dataDir = / Home / App / Hadoop / ZooKeeper / Data # last added: . Server . 1 = hadoop05: 2888 : 3888 Server. 2 = hadoop06: 2888 : 3888 Server. 3 = hadoop07: 2888 : 3888 # save and exit # then create a tmp folder mkdir / hadoop / ZooKeeper / tmp # then create an empty file Touch / hadoop / ZooKeeper / tmp / myid # final is written to the file ID echo . 1 > / Hadoop / ZooKeeper / tmp / MyID
1.3 configured zookeeper copy to other nodes (respectively create a first directory under hadoop hadoop06, hadoop07 root: mkdir / hadoop)
scp -r /hadoop/zookeeper/ hadoop06:/hadoop/ scp -r /hadoop/zookeeper/ hadoop07:/hadoop/ # 注意:修改hadoop06、hadoop07对应/hadoop/zookeeper/tmp/myid内容 # hadoop06: echo 2 > /hadoop/zookeeper/tmp/myid # hadoop07: echo 3 > /hadoop/zookeeper/tmp/myid
2. Installation Configuration hadoop cluster (operation on hadoop01) (hadoop using a version 3.2.1)
2.1 unzip
tar -zxvf hadoop-3.2.1.tar.gz -C /hadoop/
2.2 Configuring HDFS (hadoop2.0 all configuration files are in the $ HADOOP_HOME / etc / hadoop directory)
# Add the environment variable to hadoop VI / etc / Profile Export the JAVA_HOME = / usr / Java / jdk1. . 8 Export HADOOP_HOME is = / hadoop / hadoop- 3.2 . . 1 Export the PATH = the PATH $: $ the JAVA_HOME / bin: $ HADOOP_HOME is / bin
# hadoop2.0 profile all in $ HADOOP_HOME / etc / under hadoop cd / Home / hadoop / App / hadoop- 3.2 . 1 / etc / hadoop
2.2.1 modify hadoo-env.sh
export JAVA_HOME=/home/hadoop/app/jdk1.7.0_55
2.2.2 modify the core-site.xml
<configuration> <!-- 指定hdfs的nameservice为ns1 --> <property> <name>fs.defaultFS</name> <value>hdfs://ns1</value> </property> <!-- 指定hadoop临时目录 --> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/app/hadoop-3.2.1/tmp</value> </property> <!-- 指定zookeeper地址 --> <property> <name>ha.zookeeper.quorum</name> <value>hadoop05:2181,hadoop06:2181,hadoop07:2181</value> </property> </configuration>
2.2.3 modify hdfs-site.xml
<Configuration> <-! hdfs designated as a nameservice ns1, and the need to maintain consistency in the core-site.xml -> <Property> <name> dfs.nameservices </ name> <value> NSl </ value> < / Property> <-! NSl following two NameNode, are NN1, NN2 -> <Property> <name> dfs.ha.namenodes.ns1 </ name> <value> NN1, NN2 </ value> < / Property> <- RPC communication address of NN1 ->! <Property> <name> dfs.namenode.rpc-address.ns1.nn1 </ name> <value> hadoop01: 9000 </ value> </ Property> <-! nn1 mailing address of http -> <property> <name>dfs.namenode.http-address.ns1.nn1</name> <value>hadoop01:9870</value> </property> <!-- nn2的RPC通信地址 --> <property> <name>dfs.namenode.rpc-address.ns1.nn2</name> <value>hadoop02:9000</value> </property> <!-- nn2的http通信地址 --> <property> <name>dfs.namenode.http-address.ns1.nn2</name> <value>hadoop02:9870</value> </property> <!-- 指定NameNode的元数据在JournalNode上的存放位置 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop05: 8485; hadoop06: 8485; hadoop07: 8485 / NSl </ value> </ Property> <-! JournalNode specified data stored in the local disk position -> <Property> <name> dfs.journalnode.edits. the dir </ name> <value> / Home / Hadoop / App / hadoop- 3.2 . . 1 / journaldata </ value> </ Property> <-! NameNode open automatically switch failure -> <Property> <name> dfs.ha failover.enabled-.automatic </ name> <value> to true </ value> </ Property> <-! automatic switching implementation configuration failure -> <Property> <name> dfs.client.failover.proxy.provider.ns1</name> <value> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider </ value> </ Property> <-! isolation mechanism disposed method, a plurality of line feed mechanism of splitting, i.e., each line of the mechanism for temporary use - -> <Property> <name> dfs.ha.fencing.methods </ name> <value> sshfence shell ( / bin / to true ) </ value> </ Property> <-! required when using ssh-free isolation mechanism sshfence Log in -> <Property> . <name> dfs.ha.fencing SSH .private-Key-Files </ name> . <value> / Home / hadoop / SSH / id_rsa </ value> </ Property> <!- isolation mechanism disposed sshfence timeout -> <Property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> </configuration>
2.2.4 modify mapred-site.xml
<configuration> <!-- 指定mr框架为yarn方式 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
2.2.5 modify yarn-site.xml
<configuration> <!-- 开启RM高可用 --> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!-- 指定RM的cluster id --> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yrc</value> </property> <!-- 指定RM的名字 --> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <!- RM are designated address -> <name> yarn.resourcemanager. <Property>hostname.rm1</name> <value>hadoop03</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>hadoop04</value> </property> <!-- 指定zk集群地址 --> <property> <name>yarn.resourcemanager.zk-address</name> <value>hadoop05:2181,hadoop06:2181,hadoop07:2181</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
2.2.6 modify workers (workers is the position of the specified child node, because you want to start HDFS on hadoop01, starting yarn in hadoop03, so workers file is specified on the hadoop01 datanode location, workers file is specified on the hadoop03 of nodemanager position)
2. * versions are slaves
hadoop05
hadoop06
hadoop07
2.2.7 Configuring the password-free login
# First, configure hadoop01 to hadoop02, hadoop05, hadoop06, hadoop07-free password workers # produce a pair of keys on hadoop01 workersssh -keygen - t rsa workers # copy the public key to other nodes, including their own workersssh -coyp- the above mentioned id hadoop01 workersssh -coyp- the above mentioned id hadoop02 workersssh -coyp- the above mentioned id hadoop05 workersssh -coyp- the above mentioned id hadoop06 workersssh -coyp- the above mentioned id hadoop07 workers # configure hadoop03 to hadoop04, hadoop05, hadoop06, password-free login hadoop07 of workers # produce a pair of keys on hadoop03 workersssh -keygen - T RSA Workers # copy the public key to the other nodes workersssh -coyp-the above mentioned id hadoop04 workersssh -coyp- the above mentioned id hadoop05 workersssh -coyp- the above mentioned id hadoop06 workersssh -coyp- the above mentioned id hadoop07 Workers # Note: To configure between two namenode ssh password-free login, do not forget to configure hadoop02 hadoop01-free landing on hadoop02 workers the production of a pair of keys workersssh -keygen - t rsa workersssh -coyp- the above mentioned id -i hadoop01
2.4 configured copy to other nodes hadoop
scp -r /hadoop-3.2.1/ hadoop02:/home/hadoop/app/ scp -r /hadoop-3.2.1/ hadoop03:/home/hadoop/app/ scp -r /hadoop-3.2.1/ hadoop04:/home/hadoop/app/ scp -r /hadoop-3.2.1/ hadoop05:/home/hadoop/app/ scp -r /hadoop-3.2.1/ hadoop06:/home/hadoop/app/ scp -r /hadoop-3.2.1/ hadoop07:/home/hadoop/app/
### Note: The following steps strictly in accordance with the next step
2.5 start zookeeper cluster (respectively hadoop05, hadoop06, start zk on hadoop07)
cd / Home / hadoop / App / ZooKeeper / bin / . / zkServer. SH Start # Check status: a leader, two follower . / zkServer. SH Status
2.6 Start journalnode (respectively in hadoop05, hadoop06, performed on hadoop07)
cd / Home / hadoop / App / hadoop- 3.2 . 1 sbin / HDFS - daemon Start journalnode # run jps command inspection, hadoop05, hadoop06, more JournalNode process on hadoop07
2.7 HDFS format
# Execute commands on hadoop01: HDFS the NameNode - format after # Formatting in accordance with core configuration -site.xml in hadoop.tmp.dir generated files, here is my configuration / hadoop / hadoop- 3.2 . 1 / tmp ,
# then / Hadoop / hadoop- 3.2 . . 1 / tmp to copy the hadoop02 / Hadoop / hadoop- 3.2 . . 1 / under. scp -r tmp / hadoop02: / Home / hadoop / App / hadoop- 3.2 . 1 / ## may be so, recommend the NameNode HDFS -bootstrapStandby
2.8 format ZKFC (can be executed on hadoop01)
hdfs zkfc -formatZK
2.9 start HDFS (executed on hadoop01)
sbin/start-dfs.sh
2.10 start YARN (##### ##### Note: the implementation of start-yarn.sh on hadoop03, to separate namenode and resourcemanager because of performance issues, because they have to take up a lot of resources, so put them apart they are separated will start on different machines)
sbin/start-yarn.sh
2.11 manually start the resoucemanager hadoop04
sbin/yarn --daemon start resourcemanager
This, hadoop-3.2.1 configuration is completed, you can count browser to access:
HTTP: //192.168.*.201: 9870
the NameNode 'hadoop01: 9000' (the Active)
HTTP: //192.168.*.202: 9870
the NameNode ' hadoop02: 9000 '(standby)
Verify HDFS HA first upload a file to HDFS hadoop FS -put / etc / Profile / Profile hadoop FS - LS / and then kill off active in the NameNode kill - 9 <pid of NN> accessed through a browser: HTTP: // 192.168. 1.202: 9870 the NameNode ' hadoop02: 9000 ' (active) the NameNode on hadoop02 this time became active in the implementation of the command: hadoop FS - LS / -rw-r - r-- 3 root Supergroup 1926 2014 - 02 - 06 15 : 36 / Profile just upload files still exist! ! ! Manually start the hang of the NameNode sbin / hadoop-daemon. SH Start the NameNode accessed through a browser: HTTP: // 192.168.1.201:9870 the NameNode ' hadoop01: 9000 ' (STANDBY) verification YARN: Demo run the hadoop provided in the WordCount procedure: Hadoop JAR Share / Hadoop / MapReduce / Hadoop-MapReduce-examples- 2.4 . . 1 .jar WordCount / Profile / OUT the OK, done! ! !