目录
一、gpstate
gpstate是了解Greenplum状态的基本命令,无论数据库是否正在正常运转,都可以使用该命令获取当前数据库的状态信息。
如果在终端输入gpstate命令无法正常工作,建议排查以下内容:
❏是否执行了source < Greenplum安装目录>/greenplum_path.sh。
❏是否正确设置了MASTER_DATA_DIRECTORY和PGPORT环境变量。
1.1 gpstate -s
显示每个节点的详细配置和状态信息
~]$ gpstate -s
20210222:17:15:51:055081 gpstate:gpmaster:gpadmin-[INFO]:-Starting gpstate with args: -s
20210222:17:15:51:055081 gpstate:gpmaster:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b'
20210222:17:15:51:055081 gpstate:gpmaster:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Apr 16 2020 02:24:06'
20210222:17:15:51:055081 gpstate:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20210222:17:15:51:055081 gpstate:gpmaster:gpadmin-[INFO]:-Gathering data from segments...
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:--Master Configuration & Status
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- Master host = gpmaster
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- Master postgres process ID = 25885
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- Master data directory = /greenplum/gpdata/master/gpseg-1
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- Master port = 5432
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- Master current role = dispatch
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- Greenplum initsystem version = 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- Greenplum current version = PostgreSQL 9.4.24 (Greenplum Database 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Apr 16 2020 02:24:06
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- Postgres version = 9.4.24
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- Master standby = No master standby configured
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-Segment Instance Status Report
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- Segment Info
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- Hostname = gpseg01
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- Address = gpseg01
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- Datadir = /greenplum/gpdata/primary/gpseg0
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- Port = 55000
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- Mirroring Info
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- Current role = Primary
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- Preferred role = Primary
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- Mirror status = Synchronized
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- Status
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- PID = 24988
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- Configuration reports status as = Up
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:- Database status = Up
1.2 gpstate -e
检查所有Segment的状态,看是否有离线、恢复数据中、非平衡等异常状态
~]$ gpstate -e
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-Starting gpstate with args: -e
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b'
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Apr 16 2020 02:24:06'
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-Gathering data from segments...
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-Segment Mirroring Status Report
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-All segments are running normally
1.3 gpstate -Q
快速检查并列出离线节点的信息
1.4 gpstate -m
列出所有Mirror Segment的配置信息
1.5 gpstate -f
显示备份主节点Standby Master的信息
~]$gpstate -f
20210222:17:20:15:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-Starting gpstate with args: -f
20210222:17:20:15:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.27.0 build 1'
20210222:17:20:15:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.27.0 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Jul 11 2018 19:48:18'
20210222:17:20:15:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-Obtaining Segment details from master...
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-Standby master details
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-----------------------
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:- Standby address = P1QMSMDW02
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:- Standby data directory = /gpmaster/gpseg-1
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:- Standby port = 5432
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:- Standby PID = 25445
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:- Standby status = Standby host passive
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--------------------------------------------------------------
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--pg_stat_replication
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--------------------------------------------------------------
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--WAL Sender State: streaming
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--Sync state: sync
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--Sent Location: 6CC/7ABFAA50
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--Flush Location: 6CC/7ABFAA50
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--Replay Location: 6CC/7ABEDFA8
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--------------------------------------------------------------
1.6 gpstate -i
显示Greenplum版本信息
~]$ gpstate -i
20210222:17:20:22:055731 gpstate:gpmaster:gpadmin-[INFO]:-Starting gpstate with args: -i
20210222:17:20:22:055731 gpstate:gpmaster:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b'
20210222:17:20:22:055731 gpstate:gpmaster:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Apr 16 2020 02:24:06'
20210222:17:20:22:055731 gpstate:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20210222:17:20:22:055731 gpstate:gpmaster:gpadmin-[INFO]:-Loading version information
20210222:17:20:22:055731 gpstate:gpmaster:gpadmin-[INFO]:- Host Datadir Port Version
20210222:17:20:22:055731 gpstate:gpmaster:gpadmin-[INFO]:- gpmaster /greenplum/gpdata/master/gpseg-1 5432 PostgreSQL 9.4.24 (Greenplum Database 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Apr 16 2020 02:24:06
二、系统表gp_segment_configuration
当Greenplum集群处于工作状态时,可以查询gp_segment_configuration获取所有节点的配置和运行状态。gp_segment_configuration位于pg_global表空间,无论当前的SQL客户端连接到哪个数据库名,都可以查询该系统表。
postgres=# select * from gp_segment_configuration;
dbid | content | role | preferred_role | mode | status | port | hostname | address | datadir
------+---------+------+----------------+------+--------+-------+----------+----------+----------------------------------
1 | -1 | p | p | n | u | 5432 | gpmaster | gpmaster | /greenplum/gpdata/master/gpseg-1
5 | 3 | p | p | s | u | 55000 | gpseg02 | gpseg02 | /greenplum/gpdata/primary/gpseg3
11 | 3 | m | m | s | u | 56000 | gpseg01 | gpseg01 | /greenplum/gpdata/mirror/gpseg3
6 | 4 | p | p | s | u | 55001 | gpseg02 | gpseg02 | /greenplum/gpdata/primary/gpseg4
12 | 4 | m | m | s | u | 56001 | gpseg01 | gpseg01 | /greenplum/gpdata/mirror/gpseg4
7 | 5 | p | p | s | u | 55002 | gpseg02 | gpseg02 | /greenplum/gpdata/primary/gpseg5
13 | 5 | m | m | s | u | 56002 | gpseg01 | gpseg01 | /greenplum/gpdata/mirror/gpseg5
4 | 2 | p | p | s | u | 55002 | gpseg01 | gpseg01 | /greenplum/gpdata/primary/gpseg2
10 | 2 | m | m | s | u | 56002 | gpseg02 | gpseg02 | /greenplum/gpdata/mirror/gpseg2
2 | 0 | p | p | s | u | 55000 | gpseg01 | gpseg01 | /greenplum/gpdata/primary/gpseg0
8 | 0 | m | m | s | u | 56000 | gpseg02 | gpseg02 | /greenplum/gpdata/mirror/gpseg0
3 | 1 | p | p | s | u | 55001 | gpseg01 | gpseg01 | /greenplum/gpdata/primary/gpseg1
9 | 1 | m | m | s | u | 56001 | gpseg02 | gpseg02 | /greenplum/gpdata/mirror/gpseg1
输出的每一行代表Greenplum上的一个节点,各字段和含义如表
2.1 列出当前故障离线的节点
可以基于此sql 做节点状态监控.
SELECT '当前segment host: '|| hostname||' 的role: '||role|| '状态为: '|| status FROM gp_segment_configuration
postgres-# WHERE status <> 'u';
2.2 列出集群里每个服务器上正在运行的节点个数
=# SELECT hostname,count(distinct dbid) FROM gp_segment_configuration
WHERE status = 'u' and role = 'p' group by hostname;
hostname | count
----------+-------
gpmaster | 1
gpseg01 | 3
gpseg02 | 3
注意,当一个集群里各个服务器上正在运行的节点的个数不一致时,整个集群处于非平衡状态,这将影响整个集群的吞吐量。
三、Segment的故障恢复和再平衡
Greenplum集群的数据节点(Primary或者Mirror)可能发生故障,故障的原因包括但不限于集群中某些主机的操作系统、电源、网络、磁盘故障。配置了MirrorSegment的Greenplum集群具备高可用特性,当节点因故障离线时集群仍然保持数据完整且可以读写。在排查修复可能的硬件故障之后,管理员需要尽快利用gprecoverseg命令将故障的节点恢复到正常工作状态
在节点发生故障时,Greenplum集群可能处于以下几种情况:
- 只有Primary没有Mirror的集群,一旦任何Primary发生故障,即无法确保数据完整性,该Greenplum集群将不可用。
- 在配置了Mirror的集群里,一旦Primary Segment发生故障,对应的MirrorSegment将会自动切换成Primary角色,取代故障的节点继续保证集群可用。同时会记录故障期间数据库发生的改变,以便恢复时可以将故障节点同步到最新的状态。由于一对Primary和Mirror默认不会存在于同一台服务器上,这种Mirror取代Primary的临时工作状态使得集群里各台服务器之间的工作负载不平衡,从而影响吞吐量,因此需要及时修复。
- 当仅有Mirror Segment发生故障时,对应的Primary Segment会记录故障期间的数据库改变。各个节点的角色不会发生变化,因此对集群性能的影响小,但仍需要尽快修复。
下面将以图所示集群为例,演示运用gprecoverseg恢复故障的集群数据的过程。
1)初始节点分配状态下,集群里有两个Segment主机,主机1上运行节点1和2的Primary Segment,主机2上运行节点3和4的Primary Segment。如图13-1所示。
2)此时主机2上的3号Primary Segment发生故障,主机1上的Mirror Segment将会自动切换为Primary Segment以保持数据完整集群可用。如图
3)当解除了主机2上可能的硬件系统故障后,管理员应在Master上运行gprecoverseg命令。该命令将启动离线的节点,一旦启动成功,主备节点之间随即开始进行增量同步以恢复离线期间的数据改动。以上恢复和同步期间,集群仍然保持完全可用,但性能会受到影响。同步过程的耗时取决于节点故障期间数据库上发生改变的数据大小。如图
4)同步完成后,系统如下图所示。值得注意的是,虽然整个故障和恢复过程中集群始终可用,但是当前状态下每个主机上实际激活的Primary Segment数目可能不一致。主机1上有3个Primary Segment,而主机2上只有一个PrimarySegment。由于Primary Segment是真正参与查询计算的执行,这种状态会导致主机1上的工作负载相当于主机2的工作负载的3倍,因而严重影响集群的吞吐量
这种状态称为非平衡(Unbalanced)状态,管理员需要在Master上运行以下命令再平衡节点以恢复集群性能:
gprecovery -r
,再平衡命令会导致节点的Primary和Mirror同时离线进行同步,用户会话会保持连接,但是正在执行的查询和操作会被取消或回滚,因此管理员需要在合适的时机进行再平衡操作,确保对生产的影响降到最低。如图
一旦再平衡完成,系统会恢复到性能最优的初始节点分配状态。
可以用gprecoverseg -F命令以全文拷贝的方式恢复节点,建议优先尝试默认的增量同步。
参考:https://blog.csdn.net/MyySophia/article/details/102812733