GreenPlum cluster monitoring

table of Contents

 

One, gpstate

1.1 gpstate -s

1.2 gpstate -e

1.3 gpstate -Q

1.4 gpstate -m

1.5 gpstate -f

1.6 gpstate -i

Two, the system table gp_segment_configuration

2.1 List the nodes that are currently offline

2.2 List the number of nodes running on each server in the cluster

3. Segment failure recovery and rebalancing


One, gpstate

gpstate is a basic command to understand the status of Greenplum. You can use this command to obtain the status information of the current database regardless of whether the database is running normally.

If entering the gpstate command in the terminal does not work, it is recommended to check the following:

❏Whether source <Greenplum installation directory>/greenplum_path.sh has been executed.

❏Whether the MASTER_DATA_DIRECTORY and PGPORT environment variables are set correctly.

1.1 gpstate -s

Display detailed configuration and status information of each node

~]$ gpstate -s
20210222:17:15:51:055081 gpstate:gpmaster:gpadmin-[INFO]:-Starting gpstate with args: -s
20210222:17:15:51:055081 gpstate:gpmaster:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b'
20210222:17:15:51:055081 gpstate:gpmaster:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Apr 16 2020 02:24:06'
20210222:17:15:51:055081 gpstate:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20210222:17:15:51:055081 gpstate:gpmaster:gpadmin-[INFO]:-Gathering data from segments...
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:--Master Configuration & Status
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Master host                    = gpmaster
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Master postgres process ID     = 25885
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Master data directory          = /greenplum/gpdata/master/gpseg-1
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Master port                    = 5432
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Master current role            = dispatch
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Greenplum initsystem version   = 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Greenplum current version      = PostgreSQL 9.4.24 (Greenplum Database 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Apr 16 2020 02:24:06
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Postgres version               = 9.4.24
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Master standby                 = No master standby configured
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-Segment Instance Status Report
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Segment Info
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-      Hostname                          = gpseg01
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-      Address                           = gpseg01
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-      Datadir                           = /greenplum/gpdata/primary/gpseg0
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-      Port                              = 55000
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Mirroring Info
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-      Current role                      = Primary
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-      Preferred role                    = Primary
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-      Mirror status                     = Synchronized
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-   Status
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-      PID                               = 24988
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-      Configuration reports status as   = Up
20210222:17:15:52:055081 gpstate:gpmaster:gpadmin-[INFO]:-      Database status                   = Up

1.2 gpstate -e

Check the status of all segments to see if there are abnormal states such as offline, recovering data, unbalanced, etc.

~]$ gpstate -e
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-Starting gpstate with args: -e
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b'
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Apr 16 2020 02:24:06'
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-Gathering data from segments...
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-Segment Mirroring Status Report
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20210222:17:17:03:055291 gpstate:gpmaster:gpadmin-[INFO]:-All segments are running normally

1.3 gpstate -Q

Quickly check and list the information of offline nodes 

1.4 gpstate -m

List configuration information of all Mirror Segments

1.5 gpstate -f

Display information about the standby master node

 ~]$gpstate -f
20210222:17:20:15:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-Starting gpstate with args: -f
20210222:17:20:15:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.27.0 build 1'
20210222:17:20:15:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.27.0 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Jul 11 2018 19:48:18'
20210222:17:20:15:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-Obtaining Segment details from master...
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-Standby master details
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-----------------------
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-   Standby address          = P1QMSMDW02
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-   Standby data directory   = /gpmaster/gpseg-1
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-   Standby port             = 5432
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-   Standby PID              = 25445
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:-   Standby status           = Standby host passive
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--------------------------------------------------------------
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--pg_stat_replication
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--------------------------------------------------------------
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--WAL Sender State: streaming
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--Sync state: sync
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--Sent Location: 6CC/7ABFAA50
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--Flush Location: 6CC/7ABFAA50
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--Replay Location: 6CC/7ABEDFA8
20210222:17:20:16:033853 gpstate:P1QMSMDW01:gpadmin-[INFO]:--------------------------------------------------------------

1.6 gpstate -i

Display Greenplum version information 

 ~]$ gpstate -i
20210222:17:20:22:055731 gpstate:gpmaster:gpadmin-[INFO]:-Starting gpstate with args: -i
20210222:17:20:22:055731 gpstate:gpmaster:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b'
20210222:17:20:22:055731 gpstate:gpmaster:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Apr 16 2020 02:24:06'
20210222:17:20:22:055731 gpstate:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20210222:17:20:22:055731 gpstate:gpmaster:gpadmin-[INFO]:-Loading version information
20210222:17:20:22:055731 gpstate:gpmaster:gpadmin-[INFO]:-   Host       Datadir                            Port    Version                                                                                                                                                                                                
20210222:17:20:22:055731 gpstate:gpmaster:gpadmin-[INFO]:-   gpmaster   /greenplum/gpdata/master/gpseg-1   5432    PostgreSQL 9.4.24 (Greenplum Database 6.7.0 build commit:2fbc274bc15a19b5de3c6e44ad5073464cd4f47b) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Apr 16 2020 02:24:06

Two, the system table gp_segment_configuration

When the Greenplum cluster is working, you can query gp_segment_configuration to obtain the configuration and running status of all nodes. gp_segment_configuration is located in the pg_global tablespace, no matter which database name the current SQL client is connected to, it can query this system table.

postgres=# select * from gp_segment_configuration;
 dbid | content | role | preferred_role | mode | status | port  | hostname | address  |             datadir              
------+---------+------+----------------+------+--------+-------+----------+----------+----------------------------------
    1 |      -1 | p    | p              | n    | u      |  5432 | gpmaster | gpmaster | /greenplum/gpdata/master/gpseg-1
    5 |       3 | p    | p              | s    | u      | 55000 | gpseg02  | gpseg02  | /greenplum/gpdata/primary/gpseg3
   11 |       3 | m    | m              | s    | u      | 56000 | gpseg01  | gpseg01  | /greenplum/gpdata/mirror/gpseg3
    6 |       4 | p    | p              | s    | u      | 55001 | gpseg02  | gpseg02  | /greenplum/gpdata/primary/gpseg4
   12 |       4 | m    | m              | s    | u      | 56001 | gpseg01  | gpseg01  | /greenplum/gpdata/mirror/gpseg4
    7 |       5 | p    | p              | s    | u      | 55002 | gpseg02  | gpseg02  | /greenplum/gpdata/primary/gpseg5
   13 |       5 | m    | m              | s    | u      | 56002 | gpseg01  | gpseg01  | /greenplum/gpdata/mirror/gpseg5
    4 |       2 | p    | p              | s    | u      | 55002 | gpseg01  | gpseg01  | /greenplum/gpdata/primary/gpseg2
   10 |       2 | m    | m              | s    | u      | 56002 | gpseg02  | gpseg02  | /greenplum/gpdata/mirror/gpseg2
    2 |       0 | p    | p              | s    | u      | 55000 | gpseg01  | gpseg01  | /greenplum/gpdata/primary/gpseg0
    8 |       0 | m    | m              | s    | u      | 56000 | gpseg02  | gpseg02  | /greenplum/gpdata/mirror/gpseg0
    3 |       1 | p    | p              | s    | u      | 55001 | gpseg01  | gpseg01  | /greenplum/gpdata/primary/gpseg1
    9 |       1 | m    | m              | s    | u      | 56001 | gpseg02  | gpseg02  | /greenplum/gpdata/mirror/gpseg1

Each row of the output represents a node on Greenplum, and the fields and meanings are shown in the table 

 

2.1 List the nodes that are currently offline

You can monitor node status based on this sql.

SELECT '当前segment host: '|| hostname||' 的role: '||role|| '状态为: '|| status FROM gp_segment_configuration    
postgres-# WHERE status <> 'u';

2.2 List the number of nodes running on each server in the cluster

=# SELECT hostname,count(distinct dbid) FROM gp_segment_configuration    
WHERE status = 'u' and role = 'p' group by hostname;
 hostname | count 
----------+-------
 gpmaster |     1
 gpseg01  |     3
 gpseg02  |     3

 Note that when the number of nodes running on each server in a cluster is inconsistent, the entire cluster is in an unbalanced state, which will affect the throughput of the entire cluster.

3. Segment failure recovery and rebalancing

The data node (Primary or Mirror) of the Greenplum cluster may fail. The reasons for the failure include, but are not limited to, the operating system, power supply, network, and disk failures of some hosts in the cluster. The Greenplum cluster configured with MirrorSegment has high availability characteristics. When a node goes offline due to a failure, the cluster still maintains data integrity and can read and write. After troubleshooting and repairing possible hardware failures, the administrator needs to use the gprecoverseg command to restore the failed node to normal working status as soon as possible

When a node fails, the Greenplum cluster may be in the following situations:

  • There are only clusters where the primary does not have a Mirror. Once any primary fails, data integrity cannot be ensured, and the Greenplum cluster will be unavailable.
  • In a cluster configured with Mirror, once the Primary Segment fails, the corresponding MirrorSegment will automatically switch to the Primary role, replacing the failed node to continue to ensure that the cluster is available. At the same time, the changes in the database during the failure are recorded, so that the failed node can be synchronized to the latest state during recovery. Because a pair of Primary and Mirror does not exist on the same server by default , this kind of Mirror replaces the temporary working state of Primary, which makes the workload imbalance between the servers in the cluster, which affects throughput, and needs to be repaired in time.
  • When only the Mirror Segment fails, the corresponding Primary Segment will record the database changes during the failure. The role of each node will not change, so the impact on cluster performance is small, but it still needs to be repaired as soon as possible.

The following will take the cluster shown in the figure as an example to demonstrate the process of using gprecoverseg to recover the failed cluster data.

1) In the initial node allocation state, there are two segment hosts in the cluster, the primary segments of nodes 1 and 2 are running on host 1, and the primary segments of nodes 3 and 4 are running on host 2. As shown in Figure 13-1.

2) At this time, the Primary Segment No. 3 on the host 2 fails, and the Mirror Segment on the host 1 will automatically switch to the Primary Segment to keep the data complete cluster available. As shown

3) After the possible hardware system failure on host 2 is resolved, the administrator should run the gprecoverseg command on the Master. This command will start the offline node. Once the startup is successful, the incremental synchronization between the active and standby nodes will start to restore the data changes during the offline period. During the above recovery and synchronization period, the cluster remains fully available, but performance will be affected. The time spent in the synchronization process depends on the size of the data that has changed on the database during the node failure. As shown

4) After the synchronization is completed, the system is as shown in the figure below. It is worth noting that although the cluster is always available during the entire failure and recovery process, the actual number of Primary Segments activated on each host in the current state may be inconsistent. There are 3 Primary Segments on Host 1, and there is only one Primary Segment on Host 2. Since the Primary Segment is really involved in the execution of query calculations, this state will cause the workload on host 1 to be 3 times that of host 2, which will seriously affect the throughput of the cluster

This state is called Unbalanced state. The administrator needs to run the following command on the Master to rebalance the nodes to restore cluster performance:

gprecovery -r

, The rebalance command will cause the primary and mirror of the node to be synchronized offline at the same time, and the user session will remain connected, but the query and operation being executed will be cancelled or rolled back, so the administrator needs to perform the rebalance operation at the right time to ensure The impact on production is minimized. As shown

Once the rebalancing is complete, the system will return to the initial node allocation state with the best performance.

You can use the gprecoverseg -F command to restore the node in full-text copy mode. It is recommended to try the default incremental synchronization first.

Reference: https://blog.csdn.net/MyySophia/article/details/102812733

 

Guess you like

Origin blog.csdn.net/MyySophia/article/details/113944771