服务器异常断电,重启后集群无法自动启动,执行crsctl start crs时显示
/u01/app/11.2.0/grid/bin/crsctl start crs
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.
集群日志无任何信息,仅主机日志有
grid: [ID 702911 user.error] exec /u01/app/grid/product/11.2.0/grid/perl/bin/perl -I/u01/app/grid/product/11.2.0/grid/perl/lib /u01/app/grid/product/11.2.0/grid/bin/crswrapexece.pl
/u01/app/grid/product/11.2.0/grid/crs/install/s_crsconfig_racdb01_env.txt /u01/app/grid/product/11.2.0/grid/bin/ohasd.bin "reboot"
解决方法:
SOLUTION
WORKAROUND: ----------- Clear all sockets under /var/tmp/.oracle or /tmp/.oracle if any and then open two terminals of the same node, where stack is not coming up. 1) On Terminal 1 , issue as Root user :-
crsctl start crs
2) Simultaneously , open another terminal of the same node and issue the below command as Root user once the npohasd socket has been created.
/bin/dd if=/tmp/.oracle/npohasd of=/dev/null bs=1024 count=1
3) Now if you check on terminal 1 , the CRS stack would start coming up.
ps -ef |grep d.bin
4) Once entire CRS stack is up, you can press CTRL+C and come out of the dd command running on 2nd terminal.
Check and validate all resources are online using
crsctl stat res -t
crsctl stat res -t -init
参考文档
Cluster Failed to Start Due to Problem With Socket Pipe npohasd (文档 ID 1612325.1)
诊断 Grid Infrastructure 启动问题 (文档 ID 1623340.1)