"Attack on Big Data" series of tutorials on Hadoop to build HA high-availability clusters

table of Contents

1. HDFS HA ​​cluster planning

2. Modify hdfs-site and core-site configuration

3. Synchronize EditsLog data of namenode

Four, configure automatic state change

Five, kill an active namenode for verification

Six, realize the role of zookeeper in HA

Seven, clean up junk data

8. Configure yarn (ResourceManager) cluster


1. HDFS HA ​​cluster planning

2. Modify hdfs-site and core-site configuration

(1) Stop the cluster, stop-dfs.sh, and then back up the core-site.xml and hdfs-site.xml files of the hdfs of the three machines

(2) Add the following configuration to hdfs-site.xml of the master node

<property>
    <name>dfs.nameservices</name>
    <value>mycluster</value>
    <description>hdfs各节点服务的逻辑名称,可以是任意值</description>
</property>
<property>
    <name>dfs.ha.namenodes.mycluster</name>
    <value>nn1,nn2</value>
    <description>每一个namenode在集群中的唯一标识</description>
</property>
<property>
    <name>dfs.namenode.rpc-address.mycluster.nn1</name>
    <value>master:8020</value>
    <description>nn1这个namenode节点在rpc的时候使用的端口</description>
</property>
<property>
    <name>dfs.namenode.rpc-address.mycluster.nn2</name>
    <value>slave1:8020</value>
    <description>nn2这个namenode节点在rpc的时候使用的端口</description>
</property>
<property>
    <name>dfs.namenode.http-address.mycluster.nn1</name>
    <value>master:50070</value>
    <description>nn1这个namenode节点在对外提供的http服务的端口</description>
</property>
<property>
    <name>dfs.namenode.http-address.mycluster.nn2</name>
    <value>slave1:50070</value>
    <description>nn2这个namenode节点在对外提供的http服务的端口</description>
</property>

<property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal:://master:8485:slave1:8485:slave2:8485/mycluster</value>
    <description>标识journal组</description>
</property>
<property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/home/hadoop-twq/bigdata/dfs/journal/data</value>
    <description>存储journal数据的地方</description>
</property>
<property>
    <name>dfs.client.failover.proxy.provider.mycluster</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    <description>用于java客户端来连接Active的namenode</description>
</property>

<property>
    <name>dfs.ha.fencing.methods</name>
    <value>sshfence</value>
    <description>用于在namenode失败的时候不会对外提供服务</description>
</property>
<property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/home/hadoop-twq/.ssh/id_dsa</value>
    <description>用于在namenode失败的时候不会对外提供服务</description>
</property>

(3) Configure password-free ssh and install fuser

Use hadoop-twq account to configure keyless login master on slave1

ssh-keygen -t  dsa -P '' -f  ~/.ssh/id_dsa

cat  ~/.ssh/id_dsa.pub  >>  ~/.ssh/authorized_keys

scp ~/.ssh/authorized_keys  hadoop-twq@master: ~/.ssh/

ssh master does not require a password to indicate a successful configuration

Install fuser on master and salve1 as root user

yum  -y install  psmisc

(4) Placement core-site.xml 

<property>
    <name>fs.defaultFS</name>
    <value>hdfs:mycluster</value>
    <description>hdfs的基本路径</description>
</property>

Copy core-site.xml and hdfs-site.xml to slave1 and slave2 

scp  core-site.xml  hdfs-site.xml  hadoop-twq@slave1:~/bigdata/hadoop-2.7.5/etc/hadoop/

scp  core-site.xml  hdfs-site.xml  hadoop-twq@slave2:~/bigdata/hadoop-2.7.5/etc/hadoop/

3. Synchronize EditsLog data of namenode

(1) Execute on master, slave1, slave2

~/bigdata/hadoop-2.7.5/sbin/hadoop-daemon.sh  start  journalnode

 (2) Execute in master

scp  r~/bigdata/dfs/name/     hadoop-twq@slave1: ~/bigdata/dfs/

 (3) If you switch from an HDFS cluster with insufficient HA to an HA cluster, then execute:

Execute in slave1 

~/bigdata/hadoop-2.7.5/bin/hdfs namenode -bootstrapStandby => synchronize the data of two namenodes

Execute in master

hdfs namenode -initialzeSharedEdits =》 Initialize the data of the journal node

 If it is a newly built HDFS cluster of HA, then execute:

Execute in master 

hdfs namenode -format is fine

(4) Restart the hdfs cluster

start-hdfs.sh

Four, configure automatic state change

In the production environment, it needs to be configured to automatically switch the active namenode node, and the construction of the zk cluster is omitted here

Five, kill an active namenode for verification

Visit http://master:50070 to see that the current node is active

Visit http://slave1:50070 to see that the current node is in standby state

View the process number of the namenode node in jps on the master, kill -9 process number, and kill the process

, Visit http://master:50070 again and find that the current node has become a standby state.

Visit http://slave1:50070 again and find that the node of slave1 is automatically elected as active by zk.

Six, realize the role of zookeeper in HA

Seven, clean up junk data

When we need to restore the HA cluster to a single node non-HA state, we need to clean up some junk data

8. Configure yarn (ResourceManager) cluster

Comment out the original resourcemanager.hostname and resourcemanager.address

Check the status of yarn

yarn rmadmin -getServiceState rm1

Printing out active indicates that the current yarn node is in the elected state.

 

 

Guess you like

Origin blog.csdn.net/qq_31905135/article/details/111386523