The cluster previously installed by Hadoop was version 2.6, and now it is upgraded to version 2.7.
Note that Hbase is running on this cluster, so before and after the upgrade, you need to start and stop Hbase.
For more installation steps, please refer to:
Hadoop cluster (1) Zookeeper construction
Hadoop cluster (2) HDFS construction
Hadoop cluster (3) Hbase build
The upgrade steps are as follows:
Cluster IP list
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
Namenode:
192.168.143.46
192.168.143.103
Journalnode:
192.168.143.101
192.168.143.102
192.168.143.103
Datanode&Hbase regionserver:
192.168.143.196
192.168.143.231
192.168.143.182
192.168.143.235
192.168.143.41
192.168.143.127
Hbase master:
192.168.143.103
192.168.143.101
Zookeeper:
192.168.143.101
192.168.143.102
192.168.143.103
|
1. First determine the path where hadoop runs, distribute the new version of the software to this path on each node, and decompress it.
1
2
3
4
5
6
7
8
|
# ll /usr/
local
/hadoop/
total 493244
drwxrwxr-x 9 root root 4096 Mar 21 2017 hadoop-release ->hadoop-2.6.0-EDH-0u1-SNAPSHOT-HA-SECURITY
drwxr-xr-x 9 root root 4096 Oct 11 11:06 hadoop-2.7.1
-rw-r
--r-- 1 root root 194690531 Oct 9 10:55 hadoop-2.7.1.tar.gz
drwxrwxr-x 7 root root 4096 May 21 2016 hbase-1.1.3
-rw-r
--r-- 1 root root 128975247 Apr 10 2017 hbase-1.1.3.tar.gz
lrwxrwxrwx 1 root root 29 Apr 10 2017 hbase-release -> /usr/
local
/hadoop/hbase-1.1.3
|
Since it is an upgrade, the configuration file is completely unchanged. Copy/replace the etc/hadoop path under the original hadoop-2.6.0 to hadoop-2.7.1.
At this point, the preparations before the upgrade have been completed.
Now start the upgrade operation process. The whole process is a command executed on a transit machine, executed through a shell script, eliminating the need for frequent ssh login operations.
## Stop hbase, hbase user executes
2. Stop the Hbase master, and the hbase user executes
Status check, confirm the master, stop the standby master first
1
|
http://192.168.143.101:16010/master-status
|
1
2
3
4
5
|
master:
ssh -t -q 192.168.143.103 sudo su -l hbase -c
"/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ master"
ssh -t -q 192.168.143.103 sudo su -l hbase -c
"jps"
ssh -t -q 192.168.143.101 sudo su -l hbase -c
"/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ master"
ssh -t -q 192.168.143.101 sudo su -l hbase -c
"jps"
|
3. Stop the Hbase regionserver, and the hbase user executes
1
2
3
4
5
6
|
ssh -t -q 192.168.143.196 sudo su -l hbase -c
"/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ regionserver"
ssh -t -q 192.168.143.231 sudo su -l hbase -c
"/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ regionserver"
ssh -t -q 192.168.143.182 sudo su -l hbase -c
"/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ regionserver"
ssh -t -q 192.168.143.235 sudo su -l hbase -c
"/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ regionserver"
ssh -t -q 192.168.143.41 sudo su -l hbase -c
"/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ regionserver"
ssh -t -q 192.168.143.127 sudo su -l hbase -c
"/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ stop\ regionserver"
|
Check operating status
1
2
3
4
5
6
|
ssh -t -q 192.168.143.196 sudo su -l hbase -c
"jps"
ssh -t -q 192.168.143.231 sudo su -l hbase -c
"jps"
ssh -t -q 192.168.143.182 sudo su -l hbase -c
"jps"
ssh -t -q 192.168.143.235 sudo su -l hbase -c
"jps"
ssh -t -q 192.168.143.41 sudo su -l hbase -c
"jps"
ssh -t -q 192.168.143.127 sudo su -l hbase -c
"jps"
|
## stop service --HDFS
4. 先确认,active的namenode,网页确认.后续要先启动这个namenode
1
|
https://192.168.143.46:50470/dfshealth.html#tab-overview
|
5. 停止NameNode,hdfs用户执行
NN: 先停standby namenode
1
2
3
4
5
|
ssh -t -q 192.168.143.103 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ namenode"
ssh -t -q 192.168.143.46 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ namenode"
检查状态
ssh -t -q 192.168.143.103 sudo su -l hdfs -c
"jps"
ssh -t -q 192.168.143.46 sudo su -l hdfs -c
"jps"
|
6. 停止DataNode,hdfs用户执行
1
2
3
4
5
6
|
ssh -t -q 192.168.143.196 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ datanode"
ssh -t -q 192.168.143.231 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ datanode"
ssh -t -q 192.168.143.182 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ datanode"
ssh -t -q 192.168.143.235 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ datanode"
ssh -t -q 192.168.143.41 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ datanode"
ssh -t -q 192.168.143.127 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ datanode"
|
7. 停止ZKFC,hdfs用户执行
1
2
|
ssh -t -q 192.168.143.46 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ zkfc"
ssh -t -q 192.168.143.103 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ zkfc"
|
8.停止JournalNode,hdfs用户执行
1
2
3
4
|
JN:
ssh -t -q 192.168.143.101 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ journalnode"
ssh -t -q 192.168.143.102 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ journalnode"
ssh -t -q 192.168.143.103 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ stop\ journalnode"
|
### 备份NameNode的数据,由于生产环境,原有的数据需要备份。以备升级失败回滚。
9. 备份namenode1
1
2
|
ssh -t -q 192.168.143.46
"cp -r /data1/dfs/name /data1/dfs/name.bak.20171011-2;ls -al /data1/dfs/;du -sm /data1/dfs/*"
ssh -t -q 192.168.143.46
"cp -r /data2/dfs/name /data2/dfs/name.bak.20171011-2;ls -al /data1/dfs/;du -sm /data1/dfs/*"
|
10. 备份namenode2
1
2
|
ssh -t -q 192.168.143.103
"cp -r /data1/dfs/name
/data1/dfs/name.bak.20171011-2;ls -al /data1/dfs/;du -sm /data1/dfs/*"
|
11. 备份journal
1
2
3
|
ssh -t -q 192.168.143.101
"cp -r /data1/journalnode /data1/journalnode.bak.20171011;ls -al /data1/dfs/;du -sm /data1/*"
ssh -t -q 192.168.143.102
"cp -r /data1/journalnode /data1/journalnode.bak.20171011;ls -al /data1/dfs/;du -sm /data1/*"
ssh -t -q 192.168.143.103
"cp -r /data1/journalnode /data1/journalnode.bak.20171011;ls -al /data1/dfs/;du -sm /data1/*"
|
journal路径,可以查看hdfs-site.xml文件
1
2
|
dfs.journalnode.edits.dir:
/data1/journalnode
|
### 升级相关
12. copy文件(已提前处理,参考第一步)
切换软连接到2.7.1版本
1
|
ssh -t -q $h
"cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
|
13. 切换文件软链接,root用户执行
1
2
3
4
5
6
7
8
9
10
|
ssh -t -q 192.168.143.46
"cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
ssh -t -q 192.168.143.103
"cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
ssh -t -q 192.168.143.101
"cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
ssh -t -q 192.168.143.102
"cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
ssh -t -q 192.168.143.196
"cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
ssh -t -q 192.168.143.231
"cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
ssh -t -q 192.168.143.182
"cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
ssh -t -q 192.168.143.235
"cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
ssh -t -q 192.168.143.41
"cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
ssh -t -q 192.168.143.127
"cd /usr/local/hadoop; rm hadoop-release; ln -s hadoop-2.7.1 hadoop-release"
|
确认状态
1
2
3
4
5
6
7
8
9
10
|
ssh -t -q 192.168.143.46
"cd /usr/local/hadoop; ls -al"
ssh -t -q 192.168.143.103
"cd /usr/local/hadoop; ls -al"
ssh -t -q 192.168.143.101
"cd /usr/local/hadoop; ls -al"
ssh -t -q 192.168.143.102
"cd /usr/local/hadoop; ls -al"
ssh -t -q 192.168.143.196
"cd /usr/local/hadoop; ls -al"
ssh -t -q 192.168.143.231
"cd /usr/local/hadoop; ls -al"
ssh -t -q 192.168.143.182
"cd /usr/local/hadoop; ls -al"
ssh -t -q 192.168.143.235
"cd /usr/local/hadoop; ls -al"
ssh -t -q 192.168.143.41
"cd /usr/local/hadoop; ls -al"
ssh -t -q 192.168.143.127
"cd /usr/local/hadoop; ls -al"
|
### 启动HDFS,hdfs用户执行
14. 启动JournalNode
1
2
3
4
|
JN:
ssh -t -q 192.168.143.101 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ journalnode"
ssh -t -q 192.168.143.102 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ journalnode"
ssh -t -q 192.168.143.103 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ journalnode"
|
1
2
3
|
ssh -t -q 192.168.143.101 sudo su -l hdfs -c
"jps"
ssh -t -q 192.168.143.102 sudo su -l hdfs -c
"jps"
ssh -t -q 192.168.143.103 sudo su -l hdfs -c
"jps"
|
15. 启动第一个NameNode
1
2
3
|
ssh 192.168.143.46
su - hdfs
/usr/
local
/hadoop/hadoop-release/sbin/hadoop-daemon.sh start namenode -upgrade
|
16. 确认状态,在状态完全OK之后,才可以启动另一个namenode
1
|
https://192.168.143.46:50470/dfshealth.html#tab-overview
|
17. 启动第一个ZKFC
1
2
3
|
su - hdfs
/usr/
local
/hadoop/hadoop-release/sbin/hadoop-daemon.sh start zkfc
192.168.143.46
|
18. 启动第二个NameNode
1
2
3
4
|
ssh 192.168.143.103
su - hdfs
/usr/
local
/hadoop/hadoop-release/bin/hdfs namenode -bootstrapStandby
/usr/
local
/hadoop/hadoop-release/sbin/hadoop-daemon.sh start namenode
|
19. 启动第二个ZKFC
1
2
3
|
ssh 192.168.143.103
su - hdfs
/usr/
local
/hadoop/hadoop-release/sbin/hadoop-daemon.sh start zkfc
|
20. 启动DataNode
1
2
3
4
5
6
|
ssh -t -q 192.168.143.196 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ datanode"
ssh -t -q 192.168.143.231 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ datanode"
ssh -t -q 192.168.143.182 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ datanode"
ssh -t -q 192.168.143.235 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ datanode"
ssh -t -q 192.168.143.41 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ datanode"
ssh -t -q 192.168.143.127 sudo su -l hdfs -c
"/usr/local/hadoop/hadoop-release/sbin/hadoop-daemon.sh\ start\ datanode"
|
确认状态
1
2
3
4
5
6
|
ssh -t -q 192.168.143.196 sudo su -l hdfs -c
"jps"
ssh -t -q 192.168.143.231 sudo su -l hdfs -c
"jps"
ssh -t -q 192.168.143.182 sudo su -l hdfs -c
"jps"
ssh -t -q 192.168.143.235 sudo su -l hdfs -c
"jps"
ssh -t -q 192.168.143.41 sudo su -l hdfs -c
"jps"
ssh -t -q 192.168.143.127 sudo su -l hdfs -c
"jps"
|
21. 一切正常之后,启动hbase, hbase用户执行
启动hbase master,最好先启动原来的active master。
1
2
|
ssh -t -q 192.168.143.101 sudo su -l hbase -c
"/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ master"
ssh -t -q 192.168.143.103 sudo su -l hbase -c
"/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ master"
|
启动Hbase regionserver
1
2
3
4
5
6
|
ssh -t -q 192.168.143.196 sudo su -l hbase -c
"/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ regionserver"
ssh -t -q 192.168.143.231 sudo su -l hbase -c
"/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ regionserver"
ssh -t -q 192.168.143.182 sudo su -l hbase -c
"/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ regionserver"
ssh -t -q 192.168.143.235 sudo su -l hbase -c
"/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ regionserver"
ssh -t -q 192.168.143.41 sudo su -l hbase -c
"/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ regionserver"
ssh -t -q 192.168.143.127 sudo su -l hbase -c
"/usr/local/hadoop/hbase-release/bin/hbase-daemon.sh\ start\ regionserver"
|
22. Hbase region需要手动Balance开启、关闭
需要登录HBase Shell运行如下命令
开启
balance_switch true
关闭
balance_switch false
23. 本次不执行,系统运行一周,确保系统运行稳定,再执行Final。
注意:这期间,磁盘空间可能会快速增长。在执行完final之后,会释放一部分空间。
Finallize upgrade: hdfs dfsadmin -finalizeUpgrade
http://blog.51cto.com/hsbxxl/1976472