【vbox】11g rac模拟ocr损坏恢复(有坑注意,坑一:PROT-16; 坑二:CRS-4000)

版权声明:所有文章禁止转载但是均可在生产中使用提高效率 https://blog.csdn.net/viviliving/article/details/89953170

11gR2中的vote与ocr都在一个磁盘组下,因此恢复是一体的

查看正常备份
[grid@rac1 admin]$ 
 

[grid@rac1 admin]$ ocrconfig -showbackup 

rac1     2019/05/08 12:27:51     /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr

rac1     2019/05/07 15:32:37     /u01/app/11.2.0/grid/cdata/rac-cluster/backup01.ocr

rac1     2019/05/07 15:32:37     /u01/app/11.2.0/grid/cdata/rac-cluster/day.ocr

rac1     2019/05/07 15:32:37     /u01/app/11.2.0/grid/cdata/rac-cluster/week.ocr
PROT-25: Manual backups for the Oracle Cluster Registry are not available
[grid@rac1 admin]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   88e9adcf20db4f88bfba7ac8848ff68b (ORCL:VDKBACK) [OCR]
Located 1 voting disk(s).

关闭数据库
[grid@rac1 admin]$ srvctl stop database -d racdb -o immediate 
 关闭集群

[root@rac1 ~]# crsctl stop cluster -all -f

查一下ocr使用的diskgroup ocr用asmdisk对应的物理设备

[grid@rac1 admin]$ oracleasm querydisk -d VDKBACK
Disk "VDKBACK" is a valid ASM disk on device /dev/sdf1[8,81]

模拟破坏

[root@rac1 ~]#  dd if=/dev/zero of=/dev/sdf1  bs=1024K count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.006116 seconds, 171 MB/s
[root@rac1 ~]# 

开启集群

crsctl start cluster -all 

卡住了,过很长时间会退出


[root@rac1 ~]# 
[root@rac1 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
[root@rac1 ~]# crsctl start cluster -all  
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac2'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac2'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac2'
CRS-2674: Start of 'ora.diskmon' on 'rac1' failed
CRS-2679: Attempting to clean 'ora.diskmon' on 'rac1'
CRS-2674: Start of 'ora.diskmon' on 'rac2' failed
CRS-2679: Attempting to clean 'ora.diskmon' on 'rac2'
CRS-2681: Clean of 'ora.diskmon' on 'rac1' succeeded
CRS-2681: Clean of 'ora.diskmon' on 'rac2' succeeded

CRS-4404: The following nodes did not reply within the allotted time:
rac1, rac2

crsctl start crs

[root@rac1 ~]# crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.

查日志

more  /var/log/messages没发现异常

more $ORACLE_HOME/log/rac1/cssd/ocssd.log

发现文件error

[root@rac1 ~]# tail -30 $ORACLE_HOME/log/rac1/cssd/ocssd.log

2019-05-08 16:34:29.389: [    CLSF][1146157376]checksum failed for disk:ORCL:VDKOCR1:
2019-05-08 16:34:29.389: [    CLSF][1146157376]Read ASM header off dev:ORCL:VDKOCR1:0:0
2019-05-08 16:34:29.389: [   SKGFD][1146157376]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x72f7190 for disk :ORCL:VDKOCR1:

2019-05-08 16:34:29.389: [    CLSF][1146157376]Read ASM header off dev:ORCL:VDKOCR2:0:0
2019-05-08 16:34:29.390: [   SKGFD][1146157376]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x72f7b30 for disk :ORCL:VDKOCR2:

2019-05-08 16:34:29.401: [    CLSF][1146157376]Read ASM header off dev:ORCL:VDKVOTE:0:0
2019-05-08 16:34:29.401: [   SKGFD][1146157376]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x72f84d0 for disk :ORCL:VDKVOTE:

2019-05-08 16:34:29.401: [    CSSD][1146157376]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2019-05-08 16:34:29.401: [    CSSD][1146157376]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2019-05-08 16:34:29.401: [    CSSD][1146157376]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2019-05-08 16:34:29.401: [    CSSD][1146157376]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2019-05-08 16:34:29.401: [    CSSD][1146157376]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2019-05-08 16:34:29.401: [    CSSD][1146157376]clssnmvDiskVerify: Successful discovery of 0 disks
2019-05-08 16:34:29.401: [    CSSD][1146157376]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2019-05-08 16:34:29.401: [    CSSD][1146157376]clssnmvFindInitialConfigs: No voting files found
2019-05-08 16:34:29.401: [    CSSD][1146157376]###################################
2019-05-08 16:34:29.401: [    CSSD][1146157376]clssscExit: CSSD signal 11 in thread clssnmvDDiscThread
2019-05-08 16:34:29.401: [    CSSD][1146157376]###################################
2019-05-08 16:34:29.401: [    CSSD][1146157376]

----- Call Stack Trace -----
2019-05-08 16:34:29.401: [    CSSD][1135667520]clssgmClientShutdown: total iocapables 0
2019-05-08 16:34:29.401: [    CSSD][1135667520]clssgmClientShutdown: graceful shutdown completed.
2019-05-08 16:34:29.401: [    CSSD][1146157376]calling              call     entry                argument values in hex      
2019-05-08 16:34:29.402: [    CSSD][1146157376]location             type     point                (? means dubious value)     
2019-05-08 16:34:29.402: [    CSSD][1146157376]-------------------- -------- -------------------- ----------------------------
[root@rac1 ~]# 

[root@rac1 ~]# /etc/init.d/oracleasm scandisks
Scanning the system for Oracle ASMLib disks:               [  OK  ]
[root@rac1 ~]# /etc/init.d/oracleasm listdisks
VDKDATA
VDKOCR1
VDKOCR2
VDKVOTE
发现丢失了一个用于OCR的asmdisk  VDKBACK

重新建

[root@rac1 ~]# /usr/sbin/oracleasm createdisk VDKBACK /dev/sdf1
Writing disk header: done
Instantiating disk: done

两个节点都扫一下

[root@rac2 ~]#  /etc/init.d/oracleasm scandisks
Scanning the system for Oracle ASMLib disks:               [  OK  ]
[root@rac2 ~]# /etc/init.d/oracleasm listdisks
VDKBACK
VDKDATA
VDKOCR1
VDKOCR2
VDKVOTE


[root@rac1 ~]# 

关闭集群

[root@rac1 ~]# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'rac1'
CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'rac1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1'
CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.

以-excl -nocrs 方式启动集群,这将启动ASM实例 但不启动CRS

[root@rac1 ~]# crsctl start crs -excl

[root@rac1 ~]# crsctl start crs -excl
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2679: Attempting to clean 'ora.diskmon' on 'rac1'
CRS-2681: Clean of 'ora.diskmon' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'rac1'
CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded
CRS-2676: Start of 'ora.drivers.acfs' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rac1'
CRS-2676: Start of 'ora.crsd' on 'rac1' succeeded
[root@rac1 ~]# crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

[root@rac1 ~]# crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

[root@rac1 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4534: Cannot communicate with Event Manager

重建原ocr和votedisk所在磁盘组:
注意:这里是在grid用户下

[grid@rac1 admin]$ sqlplus "/as sysasm"

SQL*Plus: Release 11.2.0.1.0 Production on Wed May 8 17:03:35 2019

Copyright (c) 1982, 2009, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> col path for a50
SQL> select path,header_status from v$asm_disk; 

PATH                                               HEADER_STATU
-------------------------------------------------- ------------
ORCL:VDKBACK                                       PROVISIONED
ORCL:VDKDATA                                       MEMBER
ORCL:VDKVOTE                                       MEMBER
ORCL:VDKOCR2                                       MEMBER
ORCL:VDKOCR1                                       MEMBER

SQL> create diskgroup OCR  EXTERNAL REDUNDANCY DISK  'ORCL:VDKBACK' ;

Diskgroup created.

创建好了ocr,开始恢复ocr备份内容,结果报错

[root@rac1 ~]# ocrconfig -restore /u01/app/11.2.0/grid/cdata/rac-cluster/backup01.ocr
PROT-16: Internal Error
[root@rac1 ~]# ocrconfig -restore  /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr
PROT-16: Internal Error
[root@rac1 ~]# ocrconfig -restore  /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr
PROT-16: Internal Error

查了下网络发现如下操作

SQL> select name ,COMPATIBILITY from v$asm_diskgroup;
[grid@rac1 admin]$ sqlplus "/as sysasm"

SQL*Plus: Release 11.2.0.1.0 Production on Wed May 8 17:38:49 2019

Copyright (c) 1982, 2009, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> ^[[A                  " - rest of line ignored.
SQL> 042: unknown command "
SQL> drop diskgroup OCR;

Diskgroup dropped.

SQL>  create diskgroup OCR  EXTERNAL REDUNDANCY DISK  'ORCL:VDKBACK' attribute  'compatible.rdbms' = '11.2.0.0.0','compatible.asm' = '11.2.0.0.0';

Diskgroup created.

[grid@rac1 admin]$ ocrconfig -restore  /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr执行成功

[root@rac1 ~]# ocrconfig -restore /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr
[root@rac1 ~]# 

这里有坑{------------------------------------注意,坑了我两次

[root@rac1 ~]# ocrconfig -restore  /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr
PROT-16: Internal Error

create diskgroup OCR  EXTERNAL REDUNDANCY DISK  'ORCL:VDKBACK' attribute  'compatible.rdbms' = '11.2.0.0.0','compatible.asm' = '11.2.0.0.0';

create diskgroup OCR  EXTERNAL REDUNDANCY DISK  'ORCL:VDKBACK' attribute  'compatible.rdbms' = '11.1.0.0.0','compatible.asm' = '11.1.0.0.0';

这个错误要避免设置'compatible.asm' = '11.1.0.0.0'即可,

但是增加ocr的时候会又要求11.2.0.0.0

[grid@rac1 ~]$ crsctl replace votedisk +OCR
Failed to create voting files on disk group OCR.
Change to configuration failed, but was successfully rolled back.
CRS-4000: Command Replace failed, or completed with errors.
crsctl replace votedisk  +OCR

/u01/app/grid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log


NOTE: Voting File refresh pending for group 1/0x667e2acc (OCR)
NOTE: Attempting voting file creation in diskgroup OCR
ERROR: Voting file allocation failed for group OCR
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_29249.trc:
ORA-15221: ASM operation requires compatible.asm of 11.2.0.0.0 or higher

只好使用ocrconfig -import /u01/ocr.exp  import了一下 才跳过了必须使用ocrconfig -restore 的PROT-16: Internal Error错误

看有文档提oracle官方说只有restore方式支持,export的恢复不再xxx(忘记内容了)看来也不是官方言论吧,oracle还是比较谨慎的,就是bug多一些而已,但是有补丁不怕

然后下来crsctl replace votedisk  +OCR才成功

---------------------------------------------坑描述结束}

 

crsctl replace votedisk  +OCR

[root@rac1 ~]# crsctl replace votedisk +OCR
Successful addition of voting disk 066b704e38164f4ebf2417cbf2caaa26.
Successfully replaced voting disk group with +OCR.
CRS-4266: Voting file(s) successfully replaced

docr和vot恢复后,crs等服务就会自动起来了

ocrcheck

crsctl query css votedisk

[root@rac1 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2712
         Available space (kbytes) :     259408
         ID                       : 1749057863
         Device/File Name         :       +OCR
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check succeeded

[root@rac1 ~]# crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   066b704e38164f4ebf2417cbf2caaa26 (ORCL:VDKBACK) [OCR]
Located 1 voting disk(s).
[root@rac1 ~]# 

crsctl start cluster -all

crs_stat -t

猜你喜欢

转载自blog.csdn.net/viviliving/article/details/89953170