关于HA的问题


关于HA的问题
2011年12月20日
  关于HA的问题今天不知怎么了,起HA时发生错误,小机上的Socket服务无法使用了,我很菜,请大家帮忙看看,感激不尽。小机为两台630做双机,系统为5.2,HA为5.1,A机上装DB2,B机上有Websphere和socket服务。A机、B机都连在一台3750交换机上,故障发生前一切正常,后是因为我的一个同事不小把交换机弄断电之后,没有做任何处理的情况下,又将交换机加电,导致B机down掉,后来我就将A机按正常的步骤关掉重启,A机也一切正常(包括HA),但在起B机的HA时,有两次当机,第三次是在做了Verify and Synchronize HACMP Configuration后,才将HA启动,但是B机上的socket服务却还是不可用,websphere却正常。在tail -f /tmp/hacmp.out窗口中看到有错误,信息如下:
  Jan 31 15:04:06 EVENT START: event_error 1 2_node_down WASServer graceful _2
  :event_error[52] [[ high = high ]]
  :event_error[52] version=1.10
  :event_error[53] :event_error[53] cl_get_path
  HA_DIR=es
  :event_error[55] EXIT_STATUS=1
  :event_error[56] RP_NAME=1 2_node_down WASServer graceful _2
  :event_error[59] [ 2 -ne 2 ]
  :event_error[65] set -u
  :event_error[67] RP_NAME=node_down WASServer graceful _2
  :event_error[68] RP_NAME=node_down WASServer graceful
  :event_error[70] :event_error[70] cllsclstr -c
  :event_error[70] grep -v cname
  :event_error[70] cut -d : -f2
  CLUSTER=tyyc
  :event_error[74] [ -x /usr/lpp/ssp/bin/spget_syspar ]
  :event_error[81] echo WARNING: Cluster tyyc Failed while running node_down WASServer graceful , exit status was 1
  :event_error[81] 1>; /dev/console
  :event_error[82] echo WARNING: Cluster tyyc Failed while running node_down WASServer graceful , exit status was 1
  WARNING: Cluster tyyc Failed while running node_down WASServer graceful , exit status was 1
  :event_error[88] [[ node_down WASServer graceful= reconfig_resource* ]]
  Jan 31 15:04:06 EVENT FAILED:-1: event_error 1 2_node_down WASServer graceful _2
  Jan 31 15:09:14 EVENT START: config_too_long 360 /usr/es/sbin/cluster/events/node_down.rp
  :config_too_long[64] [[ high = high ]]
  :config_too_long[64] version=1.11
  :config_too_long[65] :config_too_long[65] cl_get_path
  HA_DIR=es
  :config_too_long[67] NUM_SECS=360
  :config_too_long[68] EVENT=/usr/es/sbin/cluster/events/node_down.rp
  :config_too_long[70] HOUR=3600
  :config_too_long[71] THRESHOLD=5
  :config_too_long[72] SLEEP_INTERVAL=1
  :config_too_long[78] PERIOD=30
  :config_too_long[81] set -u
  :config_too_long[86] LOOPCNT=0
  :config_too_long[87] MESSAGECNT=0
  :config_too_long[88] :config_too_long[88] cllsclstr -c
  :config_too_long[88] grep -v cname
  :config_too_long[88] cut -d : -f2
  CLUSTER=tyyc
  :config_too_long[89] TIME=360
  :config_too_long[90] sleep_cntr=0
  :config_too_long[95] [ -x /usr/lpp/ssp/bin/spget_syspar ]
  WARNING: Cluster tyyc has been running recovery program '/usr/es/sbin/cluster/events/node_down.rp' for 360 seconds. Please check clu
  ster status.
  WARNING: Cluster tyyc has been running recovery program '/usr/es/sbin/cluster/events/node_down.rp' for 390 seconds. Please check clu
  ster status.
  WARNING: Cluster tyyc has been running recovery program '/usr/es/sbin/cluster/events/node_down.rp' for 420 seconds. Please check clu
  ster status.
  WARNING: Cluster tyyc has been running recovery program '/usr/es/sbin/cluster/events/node_down.rp' for 450 seconds. Please check clu
  ster status.
  WARNING: Cluster tyyc has been running recovery program '/usr/es/sbin/cluster/events/node_down.rp' for 480 seconds. Please check clu
  ster status.
  WARNING: Cluster tyyc has been running recovery program '/usr/es/sbin/cluster/events/node_down.rp' for 540 seconds. Please check clu
  ster status.
  WARNING: Cluster tyyc has been running recovery program '/usr/es/sbin/cluster/events/node_down.rp' for 600 second
  求助:请大家帮我看看这是什么问题?谢谢 s. Please check clu
  ster status.
  WARNING: Cluster tyyc has been running recovery program '/usr/es/sbin/cluster/events/node_down.rp' for 660 seconds. Please check clu
  ster status.
  WARNING: Cluster tyyc has been running recovery program '/usr/es/sbin/cluster/events/node_down.rp' for 720 seconds. Please check clu
  ster status.
  _________________________________
  我很菜http://www.airjordan6.net,需要你的帮助,谢谢!关于HA的问题你贴的哪台机器的hacmp.out?看起来hacmp在停的阶段碰到了问题,还没停下来呢。关于HA的问题是B机的,就是停不下来,只有在smitty clstop中用force方式才能停掉。但在hacmp.out中就有前贴所示之错;但重启机器后用命令lssrc -g cluster以及用
  /usr/sbin/cluster/clstat看到输出都正常。同步、校验都没有报错。请帮忙看看。谢谢!关于HA的问题系统里没有什么别的错误吗?资源都带起来了吗?
  把B的ha配置全删干净了再从A上运行同步试试。关于HA的问题我看了一下,资源在A、B中都已经起来了,但我想还是有问题,因为我的socket服务还是不能使用。
  把B的ha配置全删干净?那不是要在B上重新配过HA?
  我的环境是http://www.ccywm.com:A机跑DB2,B机跑websphere和socket,另外一台是MQ。
  谢谢回复!关于HA的问题把B的ha配置全删干净后你可以在A上同步!九牧王
  注:把B的ha配置全删干净指:smitty hacmp中讲你的cluster name删掉
  如果不行建议你打补丁
  aix的,还有hacmp的!关于HA的问题你的socket服务是作为app由HA带起来的?可以这样试试,先测不带应用的HA,再测不带HA的应用。
  也许本来就是socket服务有问题了,和HA无关
  
  
  

猜你喜欢

转载自ntk006vz.iteye.com/blog/1359454
HA
今日推荐