Hadoop cluster (4) Hadoop upgrade

The cluster previously installed by Hadoop was version 2.6, and now it is upgraded to version 2.7.

Note that Hbase is running on this cluster, so before and after the upgrade, you need to start and stop Hbase.

For more installation steps, please refer to:

Hadoop cluster (1) Zookeeper construction

Hadoop cluster (2) HDFS construction

Hadoop cluster (3) Hbase build

The upgrade steps are as follows:

Cluster IP list

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Namenode:
192.168.143.46
192.168.143.103
Journalnode:
192.168.143.101
192.168.143.102
192.168.143.103
Datanode&Hbase regionserver:
192.168.143.196
192.168.143.231
192.168.143.182
192.168.143.235
192.168.143.41
192.168.143.127
Hbase master:
192.168.143.103
192.168.143.101
Zookeeper:
192.168.143.101
192.168.143.102
192.168.143.103

1. First determine the path where hadoop runs, distribute the new version of the software to this path on each node, and decompress it.

1
2
3
4
5
6
7
8
# ll /usr/ local /hadoop/
total 493244
drwxrwxr-x 9 root root      4096 Mar 21  2017 hadoop-release ->hadoop-2.6.0-EDH-0u1-SNAPSHOT-HA-SECURITY
drwxr-xr-x 9 root root      4096 Oct 11 11:06 hadoop-2.7.1
-rw-r --r-- 1 root root 194690531 Oct  9 10:55 hadoop-2.7.1.tar.gz
drwxrwxr-x 7 root root      4096 May 21  2016 hbase-1.1.3
-rw-r --r-- 1 root root 128975247 Apr 10  2017 hbase-1.1.3.tar.gz
lrwxrwxrwx 1 root root        29 Apr 10  2017 hbase-release -> /usr/ local /hadoop/hbase-1.1.3

Since it is an upgrade, the configuration file is completely unchanged. Copy/replace the etc/hadoop path under the original hadoop-2.6.0 to hadoop-2.7.1.

At this point, the preparations before the upgrade have been completed.

 

Now start the upgrade operation process. The whole process is a command executed on a transit machine, executed through a shell script, eliminating the need for frequent ssh login operations.

## Stop hbase, hbase user executes 

2. Stop the Hbase master, and the hbase user executes

Status check, confirm the master, stop the standby master first

1
http://192.168.143.101:16010/master-status
1
2
3
4
5
master:
ssh -t -q 192.168.143.103  sudo su -l hbase -c  "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ master"
ssh -t -q 192.168.143.103  sudo su -l hbase -c  "jps"
ssh -t -q 192.168.143.101  sudo su -l hbase -c  "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ master"
ssh -t -q 192.168.143.101  sudo su -l hbase -c  "jps"

3. Stop the Hbase regionserver, and the hbase user executes

1
2
3
4
5
6
ssh -t -q 192.168.143.196  sudo su -l hbase -c  "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ regionserver"
ssh -t -q 192.168.143.231  sudo su -l hbase -c  "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ regionserver"
ssh -t -q 192.168.143.182  sudo su -l hbase -c  "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ regionserver"
ssh -t -q 192.168.143.235  sudo su -l hbase -c  "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ regionserver"
ssh -t -q 192.168.143.41   sudo su -l hbase -c  "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ regionserver"
ssh -t -q 192.168.143.127  sudo su -l hbase -c  "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ regionserver"

Check operating status

1
2
3
4
5
6
ssh -t -q 192.168.143.196  sudo su -l hbase -c  "jps" 
ssh -t -q 192.168.143.231  sudo su -l hbase -c  "jps"
ssh -t -q 192.168.143.182  sudo su -l hbase -c  "jps"
ssh -t -q 192.168.143.235  sudo su -l hbase -c  "jps"
ssh -t -q 192.168.143.41   sudo su -l hbase -c  "jps"
ssh -t -q 192.168.143.127  sudo su -l hbase -c  "jps"

## stop service --HDFS

4. 先确认,active的namenode,网页确认.后续要先启动这个namenode

1
https://192.168.143.46:50470/dfshealth.html#tab-overview

5. 停止NameNode,hdfs用户执行

NN: 先停standby namenode

1
2
3
4
5
ssh -t -q 192.168.143.103  sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ namenode"
ssh -t -q 192.168.143.46   sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ namenode"
检查状态
ssh -t -q 192.168.143.103  sudo su -l hdfs -c  "jps"
ssh -t -q 192.168.143.46   sudo su -l hdfs -c  "jps"

6. 停止DataNode,hdfs用户执行

1
2
3
4
5
6
ssh -t -q 192.168.143.196  sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ datanode"
ssh -t -q 192.168.143.231  sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ datanode"
ssh -t -q 192.168.143.182  sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ datanode"
ssh -t -q 192.168.143.235  sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ datanode"
ssh -t -q 192.168.143.41   sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ datanode"
ssh -t -q 192.168.143.127  sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ datanode"

7. 停止ZKFC,hdfs用户执行

1
2
ssh -t -q 192.168.143.46   sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ zkfc"
ssh -t -q 192.168.143.103  sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ zkfc"

8.停止JournalNode,hdfs用户执行

1
2
3
4
JN:
ssh -t -q 192.168.143.101  sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ journalnode"
ssh -t -q 192.168.143.102  sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ journalnode"
ssh -t -q 192.168.143.103  sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ journalnode"

### 备份NameNode的数据,由于生产环境,原有的数据需要备份。以备升级失败回滚。

9. 备份namenode1

1
2
ssh -t -q 192.168.143.46  "cp -r /data1/dfs/name    /data1/dfs/name.bak.20171011-2;ls -al /data1/dfs/;du -sm /data1/dfs/*" 
ssh -t -q 192.168.143.46  "cp -r /data2/dfs/name    /data2/dfs/name.bak.20171011-2;ls -al /data1/dfs/;du -sm /data1/dfs/*"

10. 备份namenode2

1
2
ssh -t -q 192.168.143.103  "cp -r /data1/dfs/name
/data1/dfs/name.bak.20171011-2;ls -al /data1/dfs/;du -sm /data1/dfs/*"

11. 备份journal

1
2
3
ssh -t -q 192.168.143.101  "cp -r /data1/journalnode   /data1/journalnode.bak.20171011;ls -al /data1/dfs/;du -sm /data1/*"
ssh -t -q 192.168.143.102  "cp -r /data1/journalnode   /data1/journalnode.bak.20171011;ls -al /data1/dfs/;du -sm /data1/*"
ssh -t -q 192.168.143.103  "cp -r /data1/journalnode   /data1/journalnode.bak.20171011;ls -al /data1/dfs/;du -sm /data1/*"

journal路径,可以查看hdfs-site.xml文件

1
2
dfs.journalnode.edits.dir:  
/data1/journalnode

### 升级相关

12. copy文件(已提前处理,参考第一步)

切换软连接到2.7.1版本

1
ssh -t -q $h  "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"

13. 切换文件软链接,root用户执行

1
2
3
4
5
6
7
8
9
10
ssh -t -q 192.168.143.46    "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
ssh -t -q 192.168.143.103    "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
ssh -t -q 192.168.143.101    "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
ssh -t -q 192.168.143.102    "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
ssh -t -q 192.168.143.196    "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
ssh -t -q 192.168.143.231    "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
ssh -t -q 192.168.143.182    "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
ssh -t -q 192.168.143.235    "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
ssh -t -q 192.168.143.41     "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
ssh -t -q 192.168.143.127    "cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"

确认状态

1
2
3
4
5
6
7
8
9
10
ssh -t -q 192.168.143.46     "cd /usr/local/hadoop; ls -al"
ssh -t -q 192.168.143.103    "cd /usr/local/hadoop; ls -al"
ssh -t -q 192.168.143.101    "cd /usr/local/hadoop; ls -al"
ssh -t -q 192.168.143.102    "cd /usr/local/hadoop; ls -al"
ssh -t -q 192.168.143.196    "cd /usr/local/hadoop; ls -al"
ssh -t -q 192.168.143.231    "cd /usr/local/hadoop; ls -al"
ssh -t -q 192.168.143.182    "cd /usr/local/hadoop; ls -al"
ssh -t -q 192.168.143.235    "cd /usr/local/hadoop; ls -al"
ssh -t -q 192.168.143.41     "cd /usr/local/hadoop; ls -al"
ssh -t -q 192.168.143.127    "cd /usr/local/hadoop; ls -al"

### 启动HDFS,hdfs用户执行

14. 启动JournalNode 

1
2
3
4
JN:
ssh -t -q 192.168.143.101  sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ journalnode"
ssh -t -q 192.168.143.102  sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ journalnode"
ssh -t -q 192.168.143.103  sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ journalnode"
1
2
3
ssh -t -q 192.168.143.101  sudo su -l hdfs -c  "jps"
ssh -t -q 192.168.143.102  sudo su -l hdfs -c  "jps"
ssh -t -q 192.168.143.103  sudo su -l hdfs -c  "jps"

15. 启动第一个NameNode

1
2
3
ssh 192.168.143.46
su - hdfs
/usr/ local /hadoop/hadoop-release/sbin/hadoop-daemon.sh start namenode -upgrade

16. 确认状态,在状态完全OK之后,才可以启动另一个namenode

1
https://192.168.143.46:50470/dfshealth.html#tab-overview

17. 启动第一个ZKFC

1
2
3
su - hdfs
/usr/ local /hadoop/hadoop-release/sbin/hadoop-daemon.sh start zkfc
192.168.143.46

18. 启动第二个NameNode

1
2
3
4
ssh 192.168.143.103
su - hdfs
/usr/ local /hadoop/hadoop-release/bin/hdfs namenode -bootstrapStandby
/usr/ local /hadoop/hadoop-release/sbin/hadoop-daemon.sh start namenode

19. 启动第二个ZKFC

1
2
3
ssh 192.168.143.103
su - hdfs
/usr/ local /hadoop/hadoop-release/sbin/hadoop-daemon.sh start zkfc

20. 启动DataNode

1
2
3
4
5
6
ssh -t -q 192.168.143.196  sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ datanode"
ssh -t -q 192.168.143.231  sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ datanode"
ssh -t -q 192.168.143.182  sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ datanode"
ssh -t -q 192.168.143.235  sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ datanode"
ssh -t -q 192.168.143.41   sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ datanode"
ssh -t -q 192.168.143.127  sudo su -l hdfs -c  "/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ datanode"

确认状态

1
2
3
4
5
6
ssh -t -q 192.168.143.196  sudo su -l hdfs -c  "jps"
ssh -t -q 192.168.143.231  sudo su -l hdfs -c  "jps"
ssh -t -q 192.168.143.182  sudo su -l hdfs -c  "jps"
ssh -t -q 192.168.143.235  sudo su -l hdfs -c  "jps"
ssh -t -q 192.168.143.41   sudo su -l hdfs -c  "jps"
ssh -t -q 192.168.143.127  sudo su -l hdfs -c  "jps"

21. 一切正常之后,启动hbase, hbase用户执行

启动hbase master,最好先启动原来的active master。

1
2
ssh -t -q 192.168.143.101  sudo su -l hbase -c  "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ master"
ssh -t -q 192.168.143.103  sudo su -l hbase -c  "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ master"

启动Hbase regionserver

1
2
3
4
5
6
ssh -t -q 192.168.143.196  sudo su -l hbase -c  "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ regionserver"
ssh -t -q 192.168.143.231  sudo su -l hbase -c  "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ regionserver"
ssh -t -q 192.168.143.182  sudo su -l hbase -c  "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ regionserver"
ssh -t -q 192.168.143.235  sudo su -l hbase -c  "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ regionserver"
ssh -t -q 192.168.143.41   sudo su -l hbase -c  "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ regionserver"
ssh -t -q 192.168.143.127  sudo su -l hbase -c  "/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ regionserver"

22. Hbase region需要手动Balance开启、关闭

需要登录HBase Shell运行如下命令

开启

balance_switch true

关闭

balance_switch false

 

23. 本次不执行,系统运行一周,确保系统运行稳定,再执行Final。

注意:这期间,磁盘空间可能会快速增长。在执行完final之后,会释放一部分空间。

Finallize upgrade: hdfs dfsadmin -finalizeUpgrade  

http://blog.51cto.com/hsbxxl/1976472

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325053027&siteId=291194637