Oracle集群管理-crsd启动异常分析处理案例(1)

数据库节点1 主机重启,重启完成后ASM和数据库都未正常启动查看对应的agent

问题排查
1 查看has状态。

[grid@orcldb1 trace]$ ps -ef|grep has
root      60734      1  0 11:21 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
root      70798      1  0 11:22 ?        00:00:07 /u01/app/19c/grid/bin/ohasd.bin reboot
[grid@orcldb1 trace]$ 

查看agent数量

[grid@orcldb1 trace]$ ps -ef|grep agent
root      72466      1  0 11:22 ?        00:00:02 /u01/app/19c/grid/bin/orarootagent.bin
grid      73261      1  0 11:22 ?        00:00:01 /u01/app/19c/grid/bin/oraagent.bin
root      75573      1  0 11:22 ?        00:00:00 /u01/app/19c/grid/bin/cssdagent   ---正常应该为6个agent。has启动3个 crsd启动3个。显然crsd未正常启动导致。
grid      96525  93759  0 11:43 pts/0    00:00:00 grep --color=auto agent \


[root@orcldb1 ~]# lsof -p 65527 |grep "trc"
ohasd.bin 65527 root    1u      REG                8,3      6256  201923036 /u01/app/grid/crsdata/orcldb1/output/ohasdOUT.trc
ohasd.bin 65527 root    2u      REG                8,3      6256  201923036 /u01/app/grid/crsdata/orcldb1/output/ohasdOUT.trc
ohasd.bin 65527 root    4u      REG                8,3      6256  201923036 /u01/app/grid/crsdata/orcldb1/output/ohasdOUT.trc
ohasd.bin 65527 root   66w      REG                8,3  20507881  273343167 /u01/app/grid/diag/crs/orcldb1/crs/trace/ohasd.trc
[root@orcldb1 ~]# cd /u01/app/grid/diag/crs/orcldb1/crs/trace

查看alert.log,查看集群资源状态

[grid@orcldb1 trace]$ 
[grid@orcldb1 trace]$ crsctl status res -t -init

[grid@orcldb1 trace]$ crsctl start resource "ora.crsd" -init

--------------------------------------------------------------------------------
Name           Target  State        Server                   State details      
 
--------------------------------------------------------------------------------
Cluster Resources

--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       orcldb1                  STABLE
ora.crf
      1        ONLINE  ONLINE       orcldb1                  STABLE
**ora.crsd
      1        ONLINE  OFFLINE                               STABLE**
ora.cssd
      1        ONLINE  ONLINE       orcldb1                  STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       orcldb1                  STABLE
ora.ctssd
      1        ONLINE  OFFLINE                               STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       orcldb1                  STABLE
ora.evmd
      1        ONLINE  ONLINE       orcldb1                  STABLE
ora.gipcd
      1        ONLINE  ONLINE       orcldb1                  STABLE
ora.gpnpd
      1        ONLINE  ONLINE       orcldb1                  STABLE
ora.mdnsd
      1        ONLINE  ONLINE       orcldb1                  STABLE
ora.storage
      1        ONLINE  ONLINE       orcldb1                  STABLE

尝试手动启动crs.d资源


> [grid@orcldb1 trace]$ crsctl start resource "ora.crsd" -init

CRS-2672: Attempting to start 'ora.ctssd' on 'orcldb1'
CRS-2676: Start of 'ora.ctssd' on 'orcldb1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'orcldb1'
CRS-2672: Attempting to start 'ora.crsd' on 'orcldb1'
CRS-2676: Start of 'ora.asm' on 'orcldb1' succeeded
CRS-2676: Start of 'ora.crsd' on 'orcldb1' succeeded


> [grid@orcldb1 trace]$ systemctl status ntpd.service

鈼[0m ntpd.service - Network Time Service
   Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2020-11-13 11:32:15 CST; 14min ago
  Process: 59267 ExecStart=/usr/sbin/ntpd -u ntp:ntp $OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 59310 (ntpd)
    Tasks: 2
   CGroup: /system.slice/ntpd.service
           鈹溾攢59310 /usr/sbin/ntpd -u ntp:ntp -x -u ntp:ntp -p /var/run/ntpd.pid
           鈹斺攢59386 /usr/sbin/ntpd -u ntp:ntp -x -u ntp:ntp -p /var/run/ntpd.pid
[grid@orcldb1 trace]$ 
然后数据库和ASM都正常启动,难道crsd被disable了?

> [grid@orcldb1 trace]$ crsctl status resource  "ora.crsd" -init -p|grep -i "enable"

ENABLED=1
RESOURCE_USE_ENABLED=1
[grid@orcldb1 trace]$ 

2020-11-13 12:09:20.897 [OCSSD(67590)]CRS-1720: Cluster Synchronization Services daemon (CSSD) is ready for operation.
2020-11-13 12:09:21.027 [OCTSSD(67944)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 67944
2020-11-13 12:09:21.835 [OCTSSD(67944)]CRS-2403: The Cluster Time Synchronization Service on host eomsdb1 is in observer mode.
2020-11-13 12:09:23.093 [OCTSSD(67944)]CRS-2407: The new Cluster Time Synchronization Service reference node is host eomsdb2.
2020-11-13 12:09:23.093 [OCTSSD(67944)]CRS-2401: The Cluster Time Synchronization Service started on host eomsdb1.
**2020-11-13 12:09:23.135 [OCTSSD(67944)]CRS-2419: The clock on host eomsdb1 differs from mean cluster time by 627364947 microseconds. 
The Cluster Time Synchronization Service will not perform time synchronization because the time difference is beyond the permissible offset of 600 seconds. Details in** /u01/app/grid/diag/crs/eomsdb1/crs/trace/octssd.trc.
2020-11-13 12:09:23.831 [OCTSSD(67944)]CRS-2402: The Cluster Time Synchronization Service aborted on host eomsdb1. 
Details at (:ctsselect_msm3:) in /u01/app/grid/diag/crs/eomsdb1/crs/trace/octssd.trc.
2020-11-13 14:15:41.203 [OCTSSD(186370)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 186370
2020-11-13 14:15:42.014 [OCTSSD(186370)]CRS-2403: The Cluster Time Synchronization Service on host eomsdb1 is in observer mode.
2020-11-13 14:15:43.276 [OCTSSD(186370)]CRS-2407: The new Cluster Time Synchronization Service reference node is host eomsdb2.
 

猜你喜欢

转载自blog.csdn.net/oradbm/article/details/110136150