RAC 一次掉盘导致集群故障

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_34556414/article/details/83147071

业务反馈,两台主机上面的数据库都宕机了,采用的存储是数据文件方式,不是ASM。

上去先查看集群状态。

[grid@cxcsdb01 ~]$ crsctl stat res -t -init  --可以看到集群管理的资源状态都是offline状态。

--------------------------------------------------------------------------------

NAME           TARGET  STATE        SERVER                   STATE_DETAILS       

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

      1        OFFLINE OFFLINE                               Instance Shutdown   

ora.cluster_interconnect.haip

      1        ONLINE  OFFLINE                                                   

ora.crf

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.crsd

      1        ONLINE  OFFLINE                                                   

ora.cssd

      1        ONLINE  OFFLINE                               STARTING            

ora.cssdmonitor

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.ctssd

      1        ONLINE  OFFLINE                                                   

ora.diskmon

      1        OFFLINE OFFLINE                                                   

ora.evmd

      1        ONLINE  OFFLINE                                                   

ora.gipcd

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.gpnpd

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.mdnsd

      1        ONLINE  ONLINE       cxcsdb01                   

 

[grid@cxcsdb01 ~]$ ps -ef | grep crs   --可以看到crsd.bin这个进程是没有起来的

grid     33095 30418  0 10:26 pts/2    00:00:00 grep --color=auto crs

[grid@cxcsdb01 ~]$ ps -ef | grep css

root     30844     1  0 10:24 ?        00:00:00 /opt/oracle/11.2.0.4/grid/bin/cssdmonitor

root     30856     1  0 10:24 ?        00:00:00 /opt/oracle/11.2.0.4/grid/bin/cssdagent

grid     30868     1  0 10:24 ?        00:00:00 /opt/oracle/11.2.0.4/grid/bin/ocssd.bin

grid     33129 30418  0 10:26 pts/2    00:00:00 grep --color=auto css

[grid@cxcsdb01 ~]$ ps -ef | grep ohasd

root      1513     1  0 Oct17 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run >/dev/null 2>&1 Type=simple

root      4266     1  0 10:04 ?        00:00:07 /opt/oracle/11.2.0.4/grid/bin/ohasd.bin reboot

grid     33254 30418  0 10:26 pts/2    00:00:00 grep --color=auto ohasd

 

去看css的相关日志

[grid@cxcsdb01 cssd]$ tail -f ocssd.log  --红色部分可以看到掉盘了

............................................................................................

2018-10-18 10:21:56.163: [    CSSD][2202380032]clssnmReadDiscoveryProfile: voting file discovery string(/crsdata/votedisk/votedata01/votedata01,/crsdata/votedisk/votedata02/votedata02,/crsdata/votedisk/votedata03/votedata03)

2018-10-18 10:21:56.163: [    CSSD][2202380032]clssnmvDDiscThread: using discovery string /crsdata/votedisk/votedata01/votedata01,/crsdata/votedisk/votedata02/votedata02,/crsdata/votedisk/votedata03/votedata03 for initial discovery

2018-10-18 10:21:56.163: [   SKGFD][2202380032]Discovery with str:/crsdata/votedisk/votedata01/votedata01,/crsdata/votedisk/votedata02/votedata02,/crsdata/votedisk/votedata03/votedata03:

 

2018-10-18 10:21:56.163: [   SKGFD][2202380032]UFS discovery with :/crsdata/votedisk/votedata01/votedata01:

 

2018-10-18 10:21:56.163: [   SKGFD][2202380032]Execute glob on the string /crsdata/votedisk/votedata01/votedata01

2018-10-18 10:21:56.164: [   SKGFD][2202380032]OSS discovery with :/crsdata/votedisk/votedata01/votedata01:

 

2018-10-18 10:21:56.164: [   SKGFD][2202380032]Discovery advancing to nxt string :/crsdata/votedisk/votedata02/votedata02:

 

2018-10-18 10:21:56.164: [   SKGFD][2202380032]UFS discovery with :/crsdata/votedisk/votedata02/votedata02:

 

2018-10-18 10:21:56.164: [   SKGFD][2202380032]Execute glob on the string /crsdata/votedisk/votedata02/votedata02

2018-10-18 10:21:56.164: [   SKGFD][2202380032]OSS discovery with :/crsdata/votedisk/votedata02/votedata02:

 

2018-10-18 10:21:56.164: [   SKGFD][2202380032]Discovery advancing to nxt string :/crsdata/votedisk/votedata03/votedata03:

 

2018-10-18 10:21:56.164: [   SKGFD][2202380032]UFS discovery with :/crsdata/votedisk/votedata03/votedata03:

 

2018-10-18 10:21:56.164: [   SKGFD][2202380032]Execute glob on the string /crsdata/votedisk/votedata03/votedata03

2018-10-18 10:21:56.164: [   SKGFD][2202380032]OSS discovery with :/crsdata/votedisk/votedata03/votedata03:

 

2018-10-18 10:21:56.164: [    CSSD][2202380032]clssnmvDiskVerify: Successful discovery of 0 disks

2018-10-18 10:21:56.164: [    CSSD][2202380032]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery

2018-10-18 10:21:56.164: [    CSSD][2202380032]clssnmvFindInitialConfigs: No voting files found

2018-10-18 10:21:56.164: [    CSSD][2202380032](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds

2018-10-18 10:21:56.478: [    CSSD][2204923648]clssgmExecuteClientRequest(): type(37) size(80) only connect and exit messages are allowed before lease acquisition proc(0x7f6278060880) client((nil))

......................................................................................................................................................

 

和业务确认在主机上面/crsdata文件系统确实不存在了,业务挂上盘之后,集群自动拉起。

[grid@cxcsdb01 ~]$ df -h

Filesystem                  Size  Used Avail Use% Mounted on

/dev/sda5                   474G   35G  439G   8% /

devtmpfs                    126G     0  126G   0% /dev

tmpfs                       126G     0  126G   0% /dev/shm

tmpfs                       126G   27M  126G   1% /run

tmpfs                       126G     0  126G   0% /sys/fs/cgroup

/dev/sda3                    20G   54M   20G   1% /home

/dev/sda1                   497M  166M  332M  34% /boot

tmpfs                       4.0K     0  4.0K   0% /dev/vx

tmpfs                        26G     0   26G   0% /run/user/50008

tmpfs                        26G     0   26G   0% /run/user/50007

tmpfs                        26G     0   26G   0% /run/user/1000

/dev/vx/dsk/crsdg/crsvol     14G  106M   14G   1% /crsdata

/dev/vx/dsk/archdg/archvol  199G  2.7G  195G   2% /archive

/dev/vx/dsk/oradg/oravol01 1000G  554G  443G  56% /oradata01

 

 

[grid@cxcsdb01 ~]$ crsctl stat res -t -init

--------------------------------------------------------------------------------

NAME           TARGET  STATE        SERVER                   STATE_DETAILS       

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

      1        OFFLINE OFFLINE                               Instance Shutdown   

ora.cluster_interconnect.haip

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.crf

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.crsd

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.cssd

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.cssdmonitor

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.ctssd

      1        ONLINE  ONLINE       cxcsdb01                 OBSERVER            

ora.diskmon

      1        OFFLINE OFFLINE                                                   

ora.evmd

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.gipcd

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.gpnpd

      1        ONLINE  ONLINE       cxcsdb01                                     

ora.mdnsd

      1        ONLINE  ONLINE       cxcsdb01    

 

 

 

猜你喜欢

转载自blog.csdn.net/qq_34556414/article/details/83147071