corosync+pacemaker+postgresql流复制问题处理小计
情况说明
在搭建corosync+pacemaker集群环境时,每次开启pacemaker,原主库就自动关闭了,而原从库成为了主。并且slave虚拟ip也未生效,整个集群环境崩溃。
状态如下:
[root@plat-ecloud01-andfleethe-prd-postgres03 ~]# crm status
Stack: corosync
Current DC: plat-ecloud01-andfleethe-prd-postgres03 (version 1.1.15-11.el7_3.4-e174ec8) - partition with quorum
Last updated: Wed Jan 2 11:22:13 2019 Last change: Wed Jan 2 11:18:30 2019 by root via crm_attribute on plat-ecloud01-andfleethe-prd-postgres04
2 nodes and 8 resources configured
Online: [ plat-ecloud01-andfleethe-prd-postgres03 plat-ecloud01-andfleethe-prd-postgres04 ]
Full list of resources:
fence-cps01 (ocf::heartbeat:fence_check): Started plat-ecloud01-andfleethe-prd-postgres03
fence-cps02 (ocf::heartbeat:fence_check): Started plat-ecloud01-andfleethe-prd-postgres04
Master/Slave Set: msPostgresql [pgsql]
Masters: [ plat-ecloud01-andfleethe-prd-postgres04 ]
Stopped: [ plat-ecloud01-andfleethe-prd-postgres03 ]
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Started plat-ecloud01-andfleethe-prd-postgres04
Clone Set: clnPingCheck [pingCheck]
Started: [ plat-ecloud01-andfleethe-prd-postgres03 plat-ecloud01-andfleethe-prd-postgres04 ]
Resource Group: slave-group
vip-slave (ocf::heartbeat:IPaddr2): Stopped
问题解决
检查日志发现报错:
an 02 14:59:12 [17628] plat-ecloud01-andfleethe-prd-postgres03 lrmd: notice: operation_finished: pingCheck_monitor_10000:10617:stderr [ /usr/lib/ocf/resource.d/heartbeat/.ocf-shellfuncs: line 24: $'\r': command not found ]
Jan 02 14:59:12 [17628] plat-ecloud01-andfleethe-prd-postgres03 lrmd: notice: operation_finished: pingCheck_monitor_10000:10617:stderr [ /usr/lib/ocf/resource.d/heartbeat/.ocf-shellfuncs: line 26: $'\r': command not found ]
Jan 02 14:59:12 [17628] plat-ecloud01-andfleethe-prd-postgres03 lrmd: notice: operation_finished: pingCheck_monitor_10000:10617:stderr [ /usr/lib/ocf/resource.d/heartbeat/.ocf-shellfuncs: line 31: $'\r': command not found ]
Jan 02 14:59:12 [17628] plat-ecloud01-andfleethe-prd-postgres03 lrmd: notice: operation_finished: pingCheck_monitor_10000:10617:stderr [ /usr/lib/ocf/resource.d/heartbeat/.ocf-shellfuncs: line 39: $'\r': command not found ]
Jan 02 14:59:12 [17628] plat-ecloud01-andfleethe-prd-postgres03 lrmd: notice: operation_finished: pingCheck_monitor_10000:10617:stderr [ /usr/lib/ocf/resource.d/heartbeat/.ocf-shellfuncs: line 41: expot: command not found ]
Jan 02 14:59:12 [17628] plat-ecloud01-andfleethe-prd-postgres03 lrmd: notice: operation_finished: pingCheck_monitor_10000:10617:stderr [ /usr/lib/ocf/resource.d/heartbeat/.ocf-shellfuncs: line 42: expot: command not found ]
Jan 02 14:59:12 [17628] plat-ecloud01-andfleethe-prd-postgres03 lrmd: notice: operation_finished: pingCheck_monitor_10000:10617:stderr [ /usr/lib/ocf/resource.d/heartbeat/.ocf-shellfuncs: line 43: $'\r': command not found ]
Jan 02 14:59:12 [17628] plat-ecloud01-andfleethe-prd-postgres03 lrmd: notice: operation_finished: pingCheck_monitor_10000:10617:stderr [ /usr/lib/ocf/resource.d/heartbeat/.ocf-shellfuncs: line 45: $'\r': command not found ]
Jan 02 14:59:12 [17628] plat-ecloud01-andfleethe-prd-postgres03 lrmd: notice: operation_finished: pingCheck_monitor_10000:10617:stderr [ /usr/lib/ocf/resource.d/heartbeat/.ocf-shellfuncs: line 66: syntax error near unexpected token `$'{\r'' ]
Jan 02 14:59:12 [17628] plat-ecloud01-andfleethe-prd-postgres03 lrmd: notice: operation_finished: pingCheck_monitor_10000:10617:stderr [ /usr/lib/ocf/resource.d/heartbeat/.ocf-shellfuncs: line 66: `ocf_is_oot() {^M' ]
原来是link文件出了问题,我之前已经排除过ocf-shellfuncs的问题,没想到主库的link文件还有错误。
[root@plat-ecloud01-andfleethe-prd-postgres03 ~]# ls -a /usr/lib/ocf/resource.d/heartbeat/
. CTDB Dummy galera IPaddr2.bak MailTo nfsserver .ocf-shellfuncs pgsql.bak rabbitmq-cluster slapd Xinetd
.. db2 ethmonitor garbd IPsrcaddr mysql nginx .ocf-shellfuncs.bak pgsql.bb redis Squid
apache Delay exportfs iface-vlan iSCSILogicalUnit nagios .ocf-binaries oracle ping Route symlink
clvm dhcpd fence_check IPaddr iSCSITarget named .ocf-directories oralsnr portblock rsyncd tomcat
conntrackd docker Filesystem IPaddr2 LVM nfsnotify .ocf-returncodes pgsql postfix SendArp VirtualDomain
修改后重启整套系统,完成修复:
[root@plat-ecloud01-andfleethe-prd-postgres03 ~]# crm status
Stack: corosync
Current DC: plat-ecloud01-andfleethe-prd-postgres03 (version 1.1.15-11.el7_3.4-e174ec8) - partition with quorum
Last updated: Wed Jan 2 15:46:46 2019 Last change: Wed Jan 2 15:22:53 2019 by root via crm_attribute on plat-ecloud01-andfleethe-prd-postgres03
2 nodes and 8 resources configured
Online: [ plat-ecloud01-andfleethe-prd-postgres03 plat-ecloud01-andfleethe-prd-postgres04 ]
Full list of resources:
fence-cps01 (ocf::heartbeat:fence_check): Started plat-ecloud01-andfleethe-prd-postgres03
fence-cps02 (ocf::heartbeat:fence_check): Started plat-ecloud01-andfleethe-prd-postgres04
Master/Slave Set: msPostgresql [pgsql]
Masters: [ plat-ecloud01-andfleethe-prd-postgres03 ]
Slaves: [ plat-ecloud01-andfleethe-prd-postgres04 ]
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Started plat-ecloud01-andfleethe-prd-postgres03
Clone Set: clnPingCheck [pingCheck]
Started: [ plat-ecloud01-andfleethe-prd-postgres03 plat-ecloud01-andfleethe-prd-postgres04 ]
Resource Group: slave-group
vip-slave (ocf::heartbeat:IPaddr2): Started plat-ecloud01-andfleethe-prd-postgres04