11gR2中的vote与ocr都在一个磁盘组下,因此恢复是一体的
查看正常备份
[grid@rac1 admin]$
[grid@rac1 admin]$ ocrconfig -showbackup
rac1 2019/05/08 12:27:51 /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr
rac1 2019/05/07 15:32:37 /u01/app/11.2.0/grid/cdata/rac-cluster/backup01.ocr
rac1 2019/05/07 15:32:37 /u01/app/11.2.0/grid/cdata/rac-cluster/day.ocr
rac1 2019/05/07 15:32:37 /u01/app/11.2.0/grid/cdata/rac-cluster/week.ocr
PROT-25: Manual backups for the Oracle Cluster Registry are not available
[grid@rac1 admin]$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 88e9adcf20db4f88bfba7ac8848ff68b (ORCL:VDKBACK) [OCR]
Located 1 voting disk(s).
关闭数据库
[grid@rac1 admin]$ srvctl stop database -d racdb -o immediate
关闭集群
[root@rac1 ~]# crsctl stop cluster -all -f
查一下ocr使用的diskgroup ocr用asmdisk对应的物理设备
[grid@rac1 admin]$ oracleasm querydisk -d VDKBACK
Disk "VDKBACK" is a valid ASM disk on device /dev/sdf1[8,81]
模拟破坏
[root@rac1 ~]# dd if=/dev/zero of=/dev/sdf1 bs=1024K count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.006116 seconds, 171 MB/s
[root@rac1 ~]#
开启集群
crsctl start cluster -all
卡住了,过很长时间会退出
[root@rac1 ~]#
[root@rac1 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
[root@rac1 ~]# crsctl start cluster -all
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac2'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac2'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac2'
CRS-2674: Start of 'ora.diskmon' on 'rac1' failed
CRS-2679: Attempting to clean 'ora.diskmon' on 'rac1'
CRS-2674: Start of 'ora.diskmon' on 'rac2' failed
CRS-2679: Attempting to clean 'ora.diskmon' on 'rac2'
CRS-2681: Clean of 'ora.diskmon' on 'rac1' succeeded
CRS-2681: Clean of 'ora.diskmon' on 'rac2' succeeded
CRS-4404: The following nodes did not reply within the allotted time:
rac1, rac2
crsctl start crs
[root@rac1 ~]# crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.
查日志
more /var/log/messages没发现异常
more $ORACLE_HOME/log/rac1/cssd/ocssd.log
发现文件error
[root@rac1 ~]# tail -30 $ORACLE_HOME/log/rac1/cssd/ocssd.log
2019-05-08 16:34:29.389: [ CLSF][1146157376]checksum failed for disk:ORCL:VDKOCR1:
2019-05-08 16:34:29.389: [ CLSF][1146157376]Read ASM header off dev:ORCL:VDKOCR1:0:0
2019-05-08 16:34:29.389: [ SKGFD][1146157376]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x72f7190 for disk :ORCL:VDKOCR1:
2019-05-08 16:34:29.389: [ CLSF][1146157376]Read ASM header off dev:ORCL:VDKOCR2:0:0
2019-05-08 16:34:29.390: [ SKGFD][1146157376]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x72f7b30 for disk :ORCL:VDKOCR2:
2019-05-08 16:34:29.401: [ CLSF][1146157376]Read ASM header off dev:ORCL:VDKVOTE:0:0
2019-05-08 16:34:29.401: [ SKGFD][1146157376]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x72f84d0 for disk :ORCL:VDKVOTE:
2019-05-08 16:34:29.401: [ CSSD][1146157376]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2019-05-08 16:34:29.401: [ CSSD][1146157376]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2019-05-08 16:34:29.401: [ CSSD][1146157376]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2019-05-08 16:34:29.401: [ CSSD][1146157376]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2019-05-08 16:34:29.401: [ CSSD][1146157376]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2019-05-08 16:34:29.401: [ CSSD][1146157376]clssnmvDiskVerify: Successful discovery of 0 disks
2019-05-08 16:34:29.401: [ CSSD][1146157376]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2019-05-08 16:34:29.401: [ CSSD][1146157376]clssnmvFindInitialConfigs: No voting files found
2019-05-08 16:34:29.401: [ CSSD][1146157376]###################################
2019-05-08 16:34:29.401: [ CSSD][1146157376]clssscExit: CSSD signal 11 in thread clssnmvDDiscThread
2019-05-08 16:34:29.401: [ CSSD][1146157376]###################################
2019-05-08 16:34:29.401: [ CSSD][1146157376]
----- Call Stack Trace -----
2019-05-08 16:34:29.401: [ CSSD][1135667520]clssgmClientShutdown: total iocapables 0
2019-05-08 16:34:29.401: [ CSSD][1135667520]clssgmClientShutdown: graceful shutdown completed.
2019-05-08 16:34:29.401: [ CSSD][1146157376]calling call entry argument values in hex
2019-05-08 16:34:29.402: [ CSSD][1146157376]location type point (? means dubious value)
2019-05-08 16:34:29.402: [ CSSD][1146157376]-------------------- -------- -------------------- ----------------------------
[root@rac1 ~]#
[root@rac1 ~]# /etc/init.d/oracleasm scandisks
Scanning the system for Oracle ASMLib disks: [ OK ]
[root@rac1 ~]# /etc/init.d/oracleasm listdisks
VDKDATA
VDKOCR1
VDKOCR2
VDKVOTE
发现丢失了一个用于OCR的asmdisk VDKBACK
重新建
[root@rac1 ~]# /usr/sbin/oracleasm createdisk VDKBACK /dev/sdf1
Writing disk header: done
Instantiating disk: done
两个节点都扫一下
[root@rac2 ~]# /etc/init.d/oracleasm scandisks
Scanning the system for Oracle ASMLib disks: [ OK ]
[root@rac2 ~]# /etc/init.d/oracleasm listdisks
VDKBACK
VDKDATA
VDKOCR1
VDKOCR2
VDKVOTE
[root@rac1 ~]#
关闭集群
[root@rac1 ~]# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'rac1'
CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'rac1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1'
CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
以-excl -nocrs 方式启动集群,这将启动ASM实例 但不启动CRS
[root@rac1 ~]# crsctl start crs -excl
[root@rac1 ~]# crsctl start crs -excl
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2679: Attempting to clean 'ora.diskmon' on 'rac1'
CRS-2681: Clean of 'ora.diskmon' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'rac1'
CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded
CRS-2676: Start of 'ora.drivers.acfs' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rac1'
CRS-2676: Start of 'ora.crsd' on 'rac1' succeeded
[root@rac1 ~]# crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.
[root@rac1 ~]# crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.
[root@rac1 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4534: Cannot communicate with Event Manager
重建原ocr和votedisk所在磁盘组:
注意:这里是在grid用户下
[grid@rac1 admin]$ sqlplus "/as sysasm"
SQL*Plus: Release 11.2.0.1.0 Production on Wed May 8 17:03:35 2019
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> col path for a50
SQL> select path,header_status from v$asm_disk;
PATH HEADER_STATU
-------------------------------------------------- ------------
ORCL:VDKBACK PROVISIONED
ORCL:VDKDATA MEMBER
ORCL:VDKVOTE MEMBER
ORCL:VDKOCR2 MEMBER
ORCL:VDKOCR1 MEMBER
SQL> create diskgroup OCR EXTERNAL REDUNDANCY DISK 'ORCL:VDKBACK' ;
Diskgroup created.
创建好了ocr,开始恢复ocr备份内容,结果报错
[root@rac1 ~]# ocrconfig -restore /u01/app/11.2.0/grid/cdata/rac-cluster/backup01.ocr
PROT-16: Internal Error
[root@rac1 ~]# ocrconfig -restore /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr
PROT-16: Internal Error
[root@rac1 ~]# ocrconfig -restore /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr
PROT-16: Internal Error
查了下网络发现如下操作
SQL> select name ,COMPATIBILITY from v$asm_diskgroup;
[grid@rac1 admin]$ sqlplus "/as sysasm"
SQL*Plus: Release 11.2.0.1.0 Production on Wed May 8 17:38:49 2019
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> ^[[A " - rest of line ignored.
SQL> 042: unknown command "
SQL> drop diskgroup OCR;
Diskgroup dropped.
SQL> create diskgroup OCR EXTERNAL REDUNDANCY DISK 'ORCL:VDKBACK' attribute 'compatible.rdbms' = '11.2.0.0.0','compatible.asm' = '11.2.0.0.0';
Diskgroup created.
[grid@rac1 admin]$ ocrconfig -restore /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr执行成功
[root@rac1 ~]# ocrconfig -restore /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr
[root@rac1 ~]#
这里有坑{------------------------------------注意,坑了我两次
[root@rac1 ~]# ocrconfig -restore /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr
PROT-16: Internal Error
create diskgroup OCR EXTERNAL REDUNDANCY DISK 'ORCL:VDKBACK' attribute 'compatible.rdbms' = '11.2.0.0.0','compatible.asm' = '11.2.0.0.0';
create diskgroup OCR EXTERNAL REDUNDANCY DISK 'ORCL:VDKBACK' attribute 'compatible.rdbms' = '11.1.0.0.0','compatible.asm' = '11.1.0.0.0';
这个错误要避免设置'compatible.asm' = '11.1.0.0.0'即可,
但是增加ocr的时候会又要求11.2.0.0.0
[grid@rac1 ~]$ crsctl replace votedisk +OCR
Failed to create voting files on disk group OCR.
Change to configuration failed, but was successfully rolled back.
CRS-4000: Command Replace failed, or completed with errors.
crsctl replace votedisk +OCR
/u01/app/grid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log
NOTE: Voting File refresh pending for group 1/0x667e2acc (OCR)
NOTE: Attempting voting file creation in diskgroup OCR
ERROR: Voting file allocation failed for group OCR
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_29249.trc:
ORA-15221: ASM operation requires compatible.asm of 11.2.0.0.0 or higher
只好使用ocrconfig -import /u01/ocr.exp import了一下 才跳过了必须使用ocrconfig -restore 的PROT-16: Internal Error错误
看有文档提oracle官方说只有restore方式支持,export的恢复不再xxx(忘记内容了)看来也不是官方言论吧,oracle还是比较谨慎的,就是bug多一些而已,但是有补丁不怕
然后下来crsctl replace votedisk +OCR才成功
---------------------------------------------坑描述结束}
crsctl replace votedisk +OCR
[root@rac1 ~]# crsctl replace votedisk +OCR
Successful addition of voting disk 066b704e38164f4ebf2417cbf2caaa26.
Successfully replaced voting disk group with +OCR.
CRS-4266: Voting file(s) successfully replaced
docr和vot恢复后,crs等服务就会自动起来了
ocrcheck
crsctl query css votedisk
[root@rac1 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2712
Available space (kbytes) : 259408
ID : 1749057863
Device/File Name : +OCR
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
[root@rac1 ~]# crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 066b704e38164f4ebf2417cbf2caaa26 (ORCL:VDKBACK) [OCR]
Located 1 voting disk(s).
[root@rac1 ~]#
crsctl start cluster -all
crs_stat -t