GreenPlum 集群监控

目录

 

一、gpstate

1.1 gpstate -s

1.2 gpstate -e

1.3 gpstate -Q

1.4 gpstate -m

1.5 gpstate -f

1.6 gpstate -i

二、系统表gp_segment_configuration

2.1 列出当前故障离线的节点

2.2 列出集群里每个服务器上正在运行的节点个数

三、Segment的故障恢复和再平衡


一、gpstate

gpstate是了解Greenplum状态的基本命令,无论数据库是否正在正常运转,都可以使用该命令获取当前数据库的状态信息。

如果在终端输入gpstate命令无法正常工作,建议排查以下内容:

❏是否执行了source < Greenplum安装目录>/greenplum_path.sh。

❏是否正确设置了MASTER_DATA_DIRECTORY和PGPORT环境变量。

1.1 gpstate -s

显示每个节点的详细配置和状态信息

~]$ gpstate -s
20210222:17:15:51:055081 gpstate:gpmaster:gpadmin-[INFO]:-Starting gpstate with args: -s
20210222:17:15:51:055081 gpstate:gpmaster:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b'
20210222:17:15:51:055081 gpstate:gpmaster:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Apr 16 2020 02:24:06'
20210222:17:15:51:055081 gpstate:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20210222:17:15:51:055081 gpstate:gpmaster:gpadmin-[INFO]:-Gathering data from segments...
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:--Master Configuration & Status
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Master host                    = gpmaster
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Master postgres process ID     = 25885
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Master data directory          = /greenplum/gpdata/master/gpseg-1
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Master port                    = 5432
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Master current role            = dispatch
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Greenplum initsystem version   = 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Greenplum current version      = PostgreSQL 9.4.24 (Greenplum Database 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Apr 16 2020 02:24:06
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Postgres version               = 9.4.24
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Master standby                 = No master standby configured
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-Segment Instance Status Report
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Segment Info
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-      Hostname                          = gpseg01
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-      Address                           = gpseg01
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-      Datadir                           = /greenplum/gpdata/primary/gpseg0
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-      Port                              = 55000
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Mirroring Info
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-      Current role                      = Primary
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-      Preferred role                    = Primary
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-      Mirror status                     = Synchronized
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Status
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-      PID                               = 24988
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-      Configuration reports status as   = Up
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-      Database status                   = Up

1.2 gpstate -e

检查所有Segment的状态,看是否有离线、恢复数据中、非平衡等异常状态

~]$ gpstate -e
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-Starting gpstate with args: -e
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b'
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Apr 16 2020 02:24:06'
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-Gathering data from segments...
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-Segment Mirroring Status Report
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-All segments are running normally

1.3 gpstate -Q

快速检查并列出离线节点的信息 

1.4 gpstate -m

列出所有Mirror Segment的配置信息

1.5 gpstate -f

显示备份主节点Standby Master的信息

 ~]$gpstate -f
20210222:17:20:15:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-Starting gpstate with args: -f
20210222:17:20:15:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.27.0 build 1'
20210222:17:20:15:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.27.0 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Jul 11 2018 19:48:18'
20210222:17:20:15:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-Obtaining Segment details from master...
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-Standby master details
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-----------------------
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-   Standby address          = P1QMSMDW02
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-   Standby data directory   = /gpmaster/gpseg-1
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-   Standby port             = 5432
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-   Standby PID              = 25445
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-   Standby status           = Standby host passive
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--------------------------------------------------------------
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--pg_stat_replication
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--------------------------------------------------------------
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--WAL Sender State: streaming
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--Sync state: sync
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--Sent Location: 6CC/7ABFAA50
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--Flush Location: 6CC/7ABFAA50
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--Replay Location: 6CC/7ABEDFA8
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--------------------------------------------------------------

1.6 gpstate -i

显示Greenplum版本信息 

 ~]$ gpstate -i
20210222:17:20:22:055731 gpstate:gpmaster:gpadmin-[INFO]:-Starting gpstate with args: -i
20210222:17:20:22:055731 gpstate:gpmaster:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b'
20210222:17:20:22:055731 gpstate:gpmaster:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Apr 16 2020 02:24:06'
20210222:17:20:22:055731 gpstate:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20210222:17:20:22:055731 gpstate:gpmaster:gpadmin-[INFO]:-Loading version information
20210222:17:20:22:055731 gpstate:gpmaster:gpadmin-[INFO]:-   Host       Datadir                            Port    Version                                                                                                                                                                                                
20210222:17:20:22:055731 gpstate:gpmaster:gpadmin-[INFO]:-   gpmaster   /greenplum/gpdata/master/gpseg-1   5432    PostgreSQL 9.4.24 (Greenplum Database 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Apr 16 2020 02:24:06

二、系统表gp_segment_configuration

当Greenplum集群处于工作状态时,可以查询gp_segment_configuration获取所有节点的配置和运行状态。gp_segment_configuration位于pg_global表空间,无论当前的SQL客户端连接到哪个数据库名,都可以查询该系统表。

postgres=# select * from gp_segment_configuration;
 dbid | content | role | preferred_role | mode | status | port  | hostname | address  |             datadir              
------+---------+------+----------------+------+--------+-------+----------+----------+----------------------------------
    1 |      -1 | p    | p              | n    | u      |  5432 | gpmaster | gpmaster | /greenplum/gpdata/master/gpseg-1
    5 |       3 | p    | p              | s    | u      | 55000 | gpseg02  | gpseg02  | /greenplum/gpdata/primary/gpseg3
   11 |       3 | m    | m              | s    | u      | 56000 | gpseg01  | gpseg01  | /greenplum/gpdata/mirror/gpseg3
    6 |       4 | p    | p              | s    | u      | 55001 | gpseg02  | gpseg02  | /greenplum/gpdata/primary/gpseg4
   12 |       4 | m    | m              | s    | u      | 56001 | gpseg01  | gpseg01  | /greenplum/gpdata/mirror/gpseg4
    7 |       5 | p    | p              | s    | u      | 55002 | gpseg02  | gpseg02  | /greenplum/gpdata/primary/gpseg5
   13 |       5 | m    | m              | s    | u      | 56002 | gpseg01  | gpseg01  | /greenplum/gpdata/mirror/gpseg5
    4 |       2 | p    | p              | s    | u      | 55002 | gpseg01  | gpseg01  | /greenplum/gpdata/primary/gpseg2
   10 |       2 | m    | m              | s    | u      | 56002 | gpseg02  | gpseg02  | /greenplum/gpdata/mirror/gpseg2
    2 |       0 | p    | p              | s    | u      | 55000 | gpseg01  | gpseg01  | /greenplum/gpdata/primary/gpseg0
    8 |       0 | m    | m              | s    | u      | 56000 | gpseg02  | gpseg02  | /greenplum/gpdata/mirror/gpseg0
    3 |       1 | p    | p              | s    | u      | 55001 | gpseg01  | gpseg01  | /greenplum/gpdata/primary/gpseg1
    9 |       1 | m    | m              | s    | u      | 56001 | gpseg02  | gpseg02  | /greenplum/gpdata/mirror/gpseg1

输出的每一行代表Greenplum上的一个节点,各字段和含义如表 

 

2.1 列出当前故障离线的节点

可以基于此sql 做节点状态监控.

SELECT '当前segment host: '|| hostname||' 的role: '||role|| '状态为: '|| status FROM gp_segment_configuration    
postgres-# WHERE status <> 'u';

2.2 列出集群里每个服务器上正在运行的节点个数

=# SELECT hostname,count(distinct dbid) FROM gp_segment_configuration    
WHERE status = 'u' and role = 'p' group by hostname;
 hostname | count 
----------+-------
 gpmaster |     1
 gpseg01  |     3
 gpseg02  |     3

 注意,当一个集群里各个服务器上正在运行的节点的个数不一致时,整个集群处于非平衡状态,这将影响整个集群的吞吐量。

三、Segment的故障恢复和再平衡

Greenplum集群的数据节点(Primary或者Mirror)可能发生故障,故障的原因包括但不限于集群中某些主机的操作系统、电源、网络、磁盘故障。配置了MirrorSegment的Greenplum集群具备高可用特性,当节点因故障离线时集群仍然保持数据完整且可以读写。在排查修复可能的硬件故障之后,管理员需要尽快利用gprecoverseg命令将故障的节点恢复到正常工作状态

在节点发生故障时,Greenplum集群可能处于以下几种情况:

  • 只有Primary没有Mirror的集群,一旦任何Primary发生故障,即无法确保数据完整性,该Greenplum集群将不可用。
  • 在配置了Mirror的集群里,一旦Primary Segment发生故障,对应的MirrorSegment将会自动切换成Primary角色,取代故障的节点继续保证集群可用。同时会记录故障期间数据库发生的改变,以便恢复时可以将故障节点同步到最新的状态。由于一对Primary和Mirror默认不会存在于同一台服务器上,这种Mirror取代Primary的临时工作状态使得集群里各台服务器之间的工作负载不平衡,从而影响吞吐量,因此需要及时修复。
  • 当仅有Mirror Segment发生故障时,对应的Primary Segment会记录故障期间的数据库改变。各个节点的角色不会发生变化,因此对集群性能的影响小,但仍需要尽快修复。

下面将以图所示集群为例,演示运用gprecoverseg恢复故障的集群数据的过程。

1)初始节点分配状态下,集群里有两个Segment主机,主机1上运行节点1和2的Primary Segment,主机2上运行节点3和4的Primary Segment。如图13-1所示。

2)此时主机2上的3号Primary Segment发生故障,主机1上的Mirror Segment将会自动切换为Primary Segment以保持数据完整集群可用。如图

3)当解除了主机2上可能的硬件系统故障后,管理员应在Master上运行gprecoverseg命令。该命令将启动离线的节点,一旦启动成功,主备节点之间随即开始进行增量同步以恢复离线期间的数据改动。以上恢复和同步期间,集群仍然保持完全可用,但性能会受到影响。同步过程的耗时取决于节点故障期间数据库上发生改变的数据大小。如图

4)同步完成后,系统如下图所示。值得注意的是,虽然整个故障和恢复过程中集群始终可用,但是当前状态下每个主机上实际激活的Primary Segment数目可能不一致。主机1上有3个Primary Segment,而主机2上只有一个PrimarySegment。由于Primary Segment是真正参与查询计算的执行,这种状态会导致主机1上的工作负载相当于主机2的工作负载的3倍,因而严重影响集群的吞吐量

这种状态称为非平衡(Unbalanced)状态,管理员需要在Master上运行以下命令再平衡节点以恢复集群性能:

gprecovery -r

,再平衡命令会导致节点的Primary和Mirror同时离线进行同步,用户会话会保持连接,但是正在执行的查询和操作会被取消或回滚,因此管理员需要在合适的时机进行再平衡操作,确保对生产的影响降到最低。如图

一旦再平衡完成,系统会恢复到性能最优的初始节点分配状态。

可以用gprecoverseg -F命令以全文拷贝的方式恢复节点,建议优先尝试默认的增量同步。

参考:https://blog.csdn.net/MyySophia/article/details/102812733

猜你喜欢

转载自blog.csdn.net/MyySophia/article/details/113944771