Oracle集群管理-集群资源crsd异常启动案例1

1 环境介绍

数据库版本11.2.0.4 RAC环境。

操作系统版本centos 7,

2 故障现象

今日对数据库一个节点进行重启,重启完成后发现。数据库agent信息只有3个agent在运行

[grid@rac02 admin]$ ps -ef|grep agent
patrol   10723 10552  0 09:56 ?        00:00:00 /usr/bin/ssh-agent /bin/sh -c exec -l /bin/bash -c "env GNOME_SHELL_SESSION_MODE=classic gnome-session --session gnome-classic"
grid     16067     1  0 10:02 ?        00:00:00 /u01/11.2.0/bin/oraagent.bin
root     16099     1  0 10:02 ?        00:00:02 /u01/11.2.0/bin/orarootagent.bin
root     16145     1  0 10:02 ?        00:00:00 /u01/11.2.0/bin/cssdagent
grid     19423 16434  0 10:07 pts/2    00:00:00 grep --color=auto agent

查询资源状态信息如下:

[grid@rac02 admin]$ crsctl status res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       rac02                  Started             
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       rac02                                      
ora.crf
      1        ONLINE  ONLINE       rac02                                      
ora.crsd
      1        ONLINE  OFFLINE            
                                      
ora.cssd
      1        ONLINE  ONLINE       rac02                                      
ora.cssdmonitor
      1        ONLINE  ONLINE       rac02                                      
ora.ctssd
      1        ONLINE  ONLINE       rac02                  OBSERVER            
ora.diskmon
      1        OFFLINE OFFLINE                                                   
ora.evmd
      1        ONLINE  ONLINE       rac02                                      
ora.gipcd
      1        ONLINE  ONLINE       rac02                                      
ora.gpnpd
      1        ONLINE  ONLINE       rac02                                      
ora.mdnsd
      1        ONLINE  ONLINE       rac02                                      
[grid@rac02 admin]$ 

crsd资源处于offline状态。

3 日志分析

查询alert日志发现如下信息


[crsd(19237)]CRS-0813:Cluster Ready Service aborted due to failure to initialize the network layer with error [clsclisten failed with ret 3
(File: caa_Socket.cpp, line: 525
]. Details at (:CRSD00133:) in /u01/11.2.0/log/rac02/crsd/crsd.log.
2021-03-10 10:07:17.115: 
[ohasd(15918)]CRS-2765:Resource 'ora.crsd' has failed on server 'rac02'.
2021-03-10 10:07:17.116: 
[ohasd(15918)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.

查询crsd.log发现如下信息:


[  OCRMAS][1132443392]th_master: Received group public data event. Incarnation [1]
2021-03-10 10:07:16.527: [  OCRMAS][1132443392]th_master:1': Recvd pubdata event from node [2]
2021-03-10 10:07:16.527: [  OCRMAS][1132443392]th_master:2': Recvd pubdata event for self. Do nothing.
2021-03-10 10:07:16.533: [ CRSMAIN][1468389184] Running path init...
2021-03-10 10:07:16.539: [    CLSE][1468389184]clse_get_auth_loc: Returning default authloc: /u01/11.2.0/auth/crs/rac02
2021-03-10 10:07:16.539: [ CRSMAIN][1468389184] Using Authorizer location: /u01/11.2.0/auth/crs/rac02
2021-03-10 10:07:16.539: [ CRSMAIN][1468389184] Initialing cluclu context...
2021-03-10 10:07:16.551: [  CLSCLU][1468389184]clsclu_init: rc 0
2021-03-10 10:07:16.551: [ CRSMAIN][1468389184] Getting CR Root...
2021-03-10 10:07:16.555: [ CRSMAIN][1468389184] Initializing RTI
2021-03-10 10:07:16.555: [ CRSMAIN][1468389184] Initializing staging area
2021-03-10 10:07:16.571: [    CLSE][1468389184]clse_get_auth_loc: Returning default authloc: /u01/11.2.0/auth/crs/rac02
2021-03-10 10:07:16.571: [ default][1468389184] AuthLoc /u01/11.2.0/auth/crs/rac02
2021-03-10 10:07:16.571: [ default][1468389184] PE active version: 11.2.0.4.0
2021-03-10 10:07:16.571: [ default][1468389184] PE Engine: NEW
2021-03-10 10:07:16.571: [ default][1468389184] Using OCR batch ops : ENABLED
2021-03-10 10:07:16.571: [ CRSMAIN][1468389184] Creating RTI lock info...
2021-03-10 10:07:16.571: [ CRSMAIN][1468389184] Initializing EVMMgr
2021-03-10 10:07:16.576: [ CRSMAIN][1468389184] Getting local nodename...
[   CLWAL][1468389184]clsw_Initialize: OLR initlevel [70000]
2021-03-10 10:07:16.617: [  OCRSRV][1126139648]th_upgrade: Starting upgrade calculation
2021-03-10 10:07:16.630: [  OCRSRV][1126139648]th_upgrade:10.1 AV [186647552]. State [11]. Already upgraded.Updated global data to the crs version group. Return [0]
2021-03-10 10:07:16.835: [ COMMCRS][1096722176]clsclisten: Error listening on: (ADDRESS=(PROTOCOL=tcp)(HOST=10.2.0.76)(PORT=0))

2021-03-10 10:07:16.835: [ COMMCRS][1096722176]clsclisten: op 65 failed, NSerr (12560, 0), transport: (584, 0, 0)

2021-03-10 10:07:16.836: [    CRSD][1468389184] Created alert : (:CRSD00133:) :  Unable to get E2E port, error: IOException : clsclisten failed with ret 3
(File: caa_Socket.cpp, line: 525

2021-03-10 10:07:16.836: [    CRSD][1468389184][PANIC] CRSD exiting: Unable to get E2E port after 2nd attempt
2021-03-10 10:07:16.836: [    CRSD][1468389184] Done.

查看网卡信息如下:

ens36: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.2.151.86  netmask 255.255.255.224  broadcast 10.228.151.95
        inet6 fe80::250:56ff:fe8d:5908  prefixlen 64  scopeid 0x20<link>
        ether 00:50:56:8d:59:08  txqueuelen 1000  (Ethernet)
        RX packets 7289  bytes 646307 (631.1 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 10140  bytes 6909723 (6.5 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens37: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.2.0.76  netmask 255.255.255.0  broadcast 10.2.0.255
        inet6 fe80::250:56ff:fe8d:13fa  prefixlen 64  scopeid 0x20<link>
        ether 00:50:56:8d:13:fa  txqueuelen 1000  (Ethernet)
        RX packets 271  bytes 35397 (34.5 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 183  bytes 29338 (28.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens37:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 169.254.220.193  netmask 255.255.0.0  broadcast 169.254.255.255
        ether 00:50:56:8d:13:fa  txqueuelen 1000  (Ethernet)

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 4524  bytes 7037484 (6.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 4524  bytes 7037484 (6.7 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 52:54:00:8d:96:71  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0-nic: flags=4098<BROADCAST,MULTICAST>  mtu 1500
        ether 52:54:00:8d:96:71  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[grid@rac02 rac02]$ 

HAIP已经正常启动。

4 问题解决

后来发现是由于GRID_HOME下sqlnet.ora文件配置存在问题导致scan和普通listener无法正常启动

[grid@rac02 admin]$ rm sqlnet.ora

启动资源

[grid@rac02 admin]$ crsctl start resource "ora.crsd" -init
CRS-2672: Attempting to start 'ora.crsd' on 'rac02'
CRS-2676: Start of 'ora.crsd' on 'rac02' succeeded
[grid@rac02 admin]$ ps -ef|grep tns
root        19     2  0 09:55 ?        00:00:00 [netns]
grid     21423 20603  0 10:12 pts/2    00:00:00 grep --color=auto tns
[grid@rac02 admin]$ ps -ef|grep tns
root        19     2  0 09:55 ?        00:00:00 [netns]
grid     21493 20603  0 10:12 pts/2    00:00:00 grep --color=auto tns
[grid@rac02 admin]$ ps -ef|grep tns
root        19     2  0 09:55 ?        00:00:00 [netns]
grid     21506 20603  0 10:12 pts/2    00:00:00 grep --color=auto tns
[grid@rac02 admin]$ ps -ef|grep tns
root        19     2  0 09:55 ?        00:00:00 [netns]
grid     21513     1  2 10:12 ?        00:00:00 /u01/11.2.0/bin/tnslsnr LISTENER_SCAN1 -inherit
grid     21525     1  0 10:12 ?        00:00:00 /u01/11.2.0/bin/tnslsnr LISTENER -inherit
grid     21546 20603  0 10:12 pts/2    00:00:00 grep --color=auto tns
[grid@rac02 admin]$ 

资源启动正常。

猜你喜欢

转载自blog.csdn.net/oradbm/article/details/114633991