Hadoop to build a high-availability cluster

High availability Hadoop Cluster Setup

Build high availability clusters, the need to involve eight profiles

1.zookepper myid the file directory node zkdata
zoo.cfg under the file directory node 2.zookepper zkdata
etc / Hadoop hadoop-env.sh in a directory under / 3.hadoop installation directory
under the installation directory 4.hadoop core-site.xml under etc / hadoop / directory
etc / hadoop in 5.hadoop hdfs-site.xml installation directory under / directory
slaves in the etc / hadoop 6.hadoop installation directory under / directory
7.hadoop install etc / hadoop directory in the yarn-site.xml / directory
etc / hadoop in 8.hadoop mapred-site.xml installation directory under / directory

`集群策略:
zk 1 、2  、3为 zookapper 
hadoop4、5为 namenode
hadoop6、7为``resourcemanager 
hadoop 8910为 namdnode 、journalnode 以及 nodemanager 

Cluster Setup ideas and profile details

Setting up an zookepper cluster

Objective: To ensure automatic failover in a cluster environment (at least 3, an odd number)

①myid

1

Create a system on zkdata file directory, to which touch a file named myid
order to ensure zk each has its own unique identity.

②zoo.cfg

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/root/zkdata
clientPort=3001
server.1=主机名:3002:3003
server.2=主机名:4002:4003
server.3=主机名:5002:5003

Create a zoo in zkdata file directory, as shown in cfg file, the contents above.
dataDir file directory
clientPort 3001 port zk server (client process port)
3002: Atomic Broadcast internal port for data
3003: Internal for election tolerant port
Atoms broadcasts from the official website of explanation zookepper
atoms broadcast: that of any one of fellow servers write request from the client, in All inter-server synchronization.

So far, zookepper cluster preparatory work has been completed.

zk cluster command :

Executed in the bin directory under the installation directory zk

./zkServer.sh start /root/zkdata/zoo.cfg (start)
./zkServer.sh status /root/zkdata1/zoo.cfg (View Status)

2. set up a cluster hdfs

Pre-work:

Modify a static ip address: vim /etc/sysconfig/network-script/ifcfg-ens37
host name: vim /etc/hostname
ip mapping: vim /etc/hosts
configure ssh-free dense Login: ssh-keygen -t rsa(generated secret key), ssh-copy-id 主机名(distributed secret key)

③hadoop-env.sh

Modify the environment variables jdk purpose: remote access, the environment variable is invalid, it can operate properly to ensure remote access.

export JAVA_HOME=/自己的安装目录

④core-site.xml

hadoop core configuration file

<!-- 高可用集群的配置 -->
<!-- 文件系统的入口 -->	
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://ns</value>
</property>
<!-- 文件系统的目录 -->	
<property>
  <name>hadoop.tmp.dir</name>
  <value>/root/hadoop-2.9.2/data</value>
</property>
<!-- ZK系统的IP及Port -->	
<property>
	<name>ha.zookeeper.quorum</name>
  <value>zk1:3001,zk2:4001,zk3:5001</value>
</property>

Because it is a cluster, so no entry specify file systems to custom name *** ns *** instead.

⑤hdfs-site.xml

<!--指定hdfs的nameservice为ns,需要和core-site.xml中的保持一致 -->
	  <property>
		  <name>dfs.nameservices</name>
		  <value>ns</value>
	  </property>
	  
	  <!-- ns下面有两个NameNode,分别是nn1,nn2 -->
	  <property>
		  <name>dfs.ha.namenodes.ns</name>
		  <value>nn1,nn2</value>
	  </property>
	  
 		<!-- nn1的RPC通信地址 -->
	  <property>
		  <name>dfs.namenode.rpc-address.ns.nn1</name>
		  <value>hadoop4:9000</value>
	  </property>
	  <!-- nn1的http通信地址 -->
	  <property>
		  <name>dfs.namenode.http-address.ns.nn1</name>
		  <value>hadoop4:50070</value>
	  </property>
  	<!-- nn2的RPC通信地址 -->
	  <property>
		  <name>dfs.namenode.rpc-address.ns.nn2</name>
		  <value>hadoop5:9000</value>
	  </property>
	  <!-- nn2的http通信地址 -->
	  <property>
		  <name>dfs.namenode.http-address.ns.nn2</name>
		  <value>hadoop5:50070</value>
	  </property>

<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
	<property>
		<name>dfs.namenode.shared.edits.dir</name>
		<value>qjournal://hadoop8:8485;hadoop9:8485;hadoop10:8485/ns</value>
	</property>
	
	<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
	<property>
		<name>dfs.journalnode.edits.dir</name>
		<value>/root/journal</value>
	</property>
	
<!-- 开启NameNode故障时自动切换 -->
	<property>
		<name>dfs.ha.automatic-failover.enabled</name>
		<value>true</value>
	</property>
	
    <!-- 配置失败自动切换实现方式 -->
	<property>
		<name>dfs.client.failover.proxy.provider.ns</name>
		<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
	</property>
	
	<!-- 配置隔离机制,如果ssh是默认22端口,value直接写sshfence即可 -->
	<property>
		<name>dfs.ha.fencing.methods</name>
		<value>sshfence</value>
	</property>
	
	<!-- 使用隔离机制时需要ssh免登陆 -->
	<property>
		<name>dfs.ha.fencing.ssh.private-key-files</name>
		<value>/root/.ssh/id_rsa</value>
	</property>
	<!-- 可视化界面可写操作 -->	
	<property>
	  <name>dfs.permissions.enabled</name>
	  <value>false</value>
	</property>

⑥slaves

hadoop8
hadoop9
hadoop10

Configuration of three hosts, both as dataNode and as nodeManager.

⑦yarn-site.xml

<!-- 开启RM高可用 -->
<property>
  <name>yarn.resourcemanager.ha.enabled</name>
  <value>true</value>
</property>
<!-- 指定RM的cluster id -->
<property>
  <name>yarn.resourcemanager.cluster-id</name>
  <value>yrc</value>
</property>
<!-- 指定RM的名字 -->
<property>
  <name>yarn.resourcemanager.ha.rm-ids</name>
  <value>rm1,rm2</value>
</property>
<!-- 分别指定RM的地址 -->
<property>
  <name>yarn.resourcemanager.hostname.rm1</name>
  <value>hadoop24</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname.rm2</name>
  <value>hadoop25</value>
</property>
<property>
  <name>yarn.resourcemanager.webapp.address.rm1</name>
  <value>hadoop6:8088</value>
</property>
<property>
  <name>yarn.resourcemanager.webapp.address.rm2</name>
  <value>hadoop7:8088</value>
</property>
<!-- 指定zk集群地址 -->
<property>
  <name>yarn.resourcemanager.zk-address</name>
  <value>zk:3001,zk:4001,zk:5001</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>

⑧mapred-site.xml

<!--配置计算框架-->
<property>
	<name>mapreduce.framework.name</name>
	<value>yarn</value>
</property>
<!--配置历史服务器的TCP端口-->
<property>
   <name>mapreduce.jobhistory.address</name>
   <value>hadoop10:10020</value>
</property>
<!--配置历史服务器的web端口-->
<property>
   <name>mapreduce.jobhistory.webapp.address</name>
   <value>hadoop10:19888</value>
</property> 

Note: Because the default directory hadoop under no mapred-site.xml file. So the need to manually copy, execute the following command:

cp hadoop-2.9.2/etc/hadoop/mapred-site.xml.template hadoop-2.9.2/etc/hadoop/mapred-site.xml

Mapred can create a document
so far, to build a cluster configuration has been completed

3. Start the high availability cluster hadoop

Command (sequential)

☆☆☆☆ yum install psmisc -y(All nodes installed centos7.x cluster structures dependent)
☆☆☆☆ hdfs zkfc -formatZK(performed on any nameNode, formatting ZK)
☆☆☆☆ hadoop-daemon.sh start journalnode(execution started in all journalNode nodes)

First start journalNode is to ensure that the format nameNode (own data) before, to ensure the synchronization of data.

☆☆☆☆ hdfs namenode -format ns(execution start on the active [choice] of nameNode)
☆☆☆☆ start-dfs.sh(start hdfs File System)
☆☆☆☆ hdfs namenode -bootstrapStandby(formatting on the nameNode standby)
☆☆☆☆ hadoop-daemon.sh start namenode(start of nameNode standby node)
☆☆☆☆ start-yarn.sh(the active node on the execution start Jahn [choice] of resourcesNode)
☆☆☆☆ yarn-daemon.sh start resourcemanager(start of resourcesManager standby node)

4. Test cluster

Access to
ip:50070the file system's web interface HDFS
ip:8088Jahn yarn resourcesManager web interface

Other commands:

☆☆☆☆ mr-jobhistory-daemon.sh start histroyserver(start the history server)
☆☆☆☆ hadoop jar 包名(running jar package, perform the job work)

ip:19888 History Server web interface

Published 32 original articles · won praise 1 · views 1181

Guess you like

Origin blog.csdn.net/ASYMUXUE/article/details/103820917