Pigsty v2.2 released—a major upgrade of the monitoring system

Pigsty v2.2 is now released , ushering in an epic major upgrade. Based on the complete remake of Grafana v10, it raises PG observability to a new stage and brings a new user experience. Demo : http://demo.pigsty.cc .

In addition, Pigsty v2.2 also provides a 42- node production simulation environment sandbox template, supports Citus 12, PG 16beta2, provides a vagrant template using a KVM virtual machine, and provides a dedicated Pigsty Yum for scattered/off-wall RPM packages source, and supports the domestic Xinchuang operating system Tongxin UOS20.

Surveillance System Rework: Visual Color Matching

In Pigsty v2.2, the monitoring panel has been completely reworked, making full use of the new features of Grafana v10 to bring users a refreshing visual experience .

The most intuitive change is color. Pigsty v2.2 adopts a new color scheme. Taking the PGSQL Overview panel as an example, the new color scheme reduces saturation, and the overall visual experience is more coordinated and beautiful than the old version. 

Pigsty v2.0 uses Grafana's default highly saturated color scheme

Pigsty v2.2: The failed instance is marked black, click to go directly to the fault site

In  the monitoring panel of Pigsty v2.2, PG blue , Nginx green, Red is red, Python yellow, Grafana orange and other colors are used as benchmarks . This color scheme is inspired by this article: SCI, but "Weathering Son" "~When the illustrations of SCI papers meet the color matching of Xin Haicheng's son of the weather https://zhuanlan.zhihu.com/p/619556088. 

Surveillance System Rework: Swarm Navigation

Of course, in addition to color matching, v2.2 has also redesigned the content arrangement and layout . For example, a large number of tabular navigations are replaced by  Stats  color block statistics , so that problematic services can be seen at a glance on the first screen. Click on the abnormal color block to go directly to the fault site.

Of course, the old-fashioned navigation table can provide richer information, and it has not been removed, but moved to the dedicated Instances / Members column. Let's take the most commonly used PGSQL Cluster panel as an example:

The first screen is based on color block-based graphic element navigation, showing the cluster component survival status and service availability, core indicators, load levels and alarm event diagrams. And provides quick navigation to the internal resources of the cluster - instances, connection pools, load balancers, services, databases

Tabular Navigation for PGSQL Cluster

Surveillance System Remake: Examples

PGSQL Instance shows the detailed state of an instance, also reworked in v2.2. The most basic design principle is: it is not the blue/green state that needs attention. In this way, through color visual coding, users can quickly locate the root cause of a database instance failure during accident analysis.

The specific cluster resource table is in the second column for details. Cooperating with the indicator column and log column at the back, it fully presents the core status of a PostgreSQL database cluster.

Other instances, host nodes, ETCD, MinIO, and Redis also use similar designs, for example, the first screen of Node Instance is like this.

The metrics section of Node Instance remains mostly the same, but the above-the-fold overview section has been reworked. The same goes for MinIO Overview.

Etcd Overview uses State Timeline to visualize the availability status of DCS services. For example, the following figure shows a simulated etcd failure scene: In a 5-node etcd cluster, each instance is shut down in turn. The cluster can tolerate two node failures, but three node failures will cause the overall etcd service to be unavailable (the yellow bar turns to It is dark blue, which means that the ETCD service is unavailable as a whole).

When the DCS fails, the PostgreSQL cluster that relies on ETCD for high availability will enable FailSafeMode by default: on the premise of confirming that all cluster members are reachable, not itself but the DCS failure, the failure of the main library degradation can be avoided. And this will also be reflected in the monitoring of PG

Surveillance System Remake: Services

Another redesigned part is Service and Proxy. The Service panel now adds important information about the service: SLI. Through the Statetimeline strip, users can visually see service interruptions, obtain service availability indicators, and understand the status of the load balancer and the real back-end database server.

In this example, the four HAProxy of the pg-test cluster are drained, the maintenance state operation is set, and the backend database server is shut down. Only when all instances of a cluster are offline, the read-only service pg-test-replica will enter the unavailable state.

This is the monitoring panel of the HAProxy load balancer number 1 in the pg-test cluster. Each service hosted by it will be listed in it, showing the status of the backend server and calculating the SLI. The status and monitoring of HAProxy itself is placed in the Node Haproxy monitoring panel.

In the global overview, you can see the overall status timeline and SLI indicators of all database services in Pigsty.

Surveillance System Rework: Database Statistics

In Pigsty, in addition to monitoring the database server, it will also monitor the logical objects carried by the database server - database, table, query, index and other logic.

PGSQL Databases shows cluster-level database statistics. For example, there are 4 database instances in the pg-test cluster, and a database test, and here is a comparison of the level of the database indicators of these 4 instances.

Users can further drill down to the internal statistics of a single database instance, that is, the PGSQL Database panel. This panel provides some key metrics about the database and the connection pool, but most importantly, the PGSQL Database panel provides an index of the most active and eye-catching tables and queries in the database - these are the two most important types of library objects.

Surveillance System Remake: System Catalog

In Pigsty, in addition to the indicator data collected by pg exporter, another type of optional important supplementary data - system catalog will be used. This is also what the PGCAT series Dashboard does. PGCAT Instance will directly access the database system catalog (using up to 8 monitoring read-only connections), obtain and present the required information.

For example, you can obtain the current running activities of the database, locate and analyze slow queries, useless indexes, and full table scans in the database according to various indicators. Check the database role, session, replication status, configuration modification status, memory usage details, backup and persistence details.

If PGCAT Instance focuses on the database server itself, then PGCAT Database pays more attention to the object details inside a single database: such as Schema, Table, Index, Expansion, Top SQL, Top Table, and so on.

Each Schema, Table, and Index can be clicked to drill down to enter a more detailed dedicated panel. For example, PGCAT Schema further reveals the object details in an architectural model.

The queries in the database are also aggregated according to the execution plan, which is convenient for users to find problem SQL and quickly locate slow query problems .

Monitoring System Rework: Tables and Queries

In Pigsty, you can look up every aspect of a table. The PGCAT Table panel allows you to view table metadata, indexes on it, statistics for each column, and related queries.

Of course, you can also use the PGSQL Table panel to view the key indicators of a table in any historical time period from the dimension of indicators. Click on the table name to easily switch between the two perspectives.

Correspondingly, you can also get detailed information of the same class of SQL ( with the same execution plan ) .

In Pigsty, there are also many Dashboards on specific topics. Due to space limitations, this is the introduction to the monitoring system. The most intuitive way to experience it is to visit the public demo provided by Pigsty: http://demo.pigsty.cc and play it yourself. Although this is only a simple environment of four 1C virtual machines, it is enough to demonstrate Pigsty's most basic monitoring system capabilities.

Large simulation environment

Pigsty provides a sandbox environment based on Vagrant and Virtualbox, which can run on your laptop/Mac. There is a 1-node minimal version, and a 4-node full version for demonstration and learning, and now v2. 2 has added a  42  -node production simulation sandbox.

All the details of the production sandbox are described by prod.yml, a configuration file of less than 500 lines. It can easily run on an ordinary server physical machine, and the process of pulling it up is the same as 4 nodes: make prod install and you're done.

Pigsty v2.2 provides a libvirt-based Vagrantfile template. You only need to adjust the machine list in the above configuration to create the required virtual machine with one click. Everything can easily run on a  Dell R730 48C  256G physical machine, and the second-hand price is less than 3,000 yuan. Of course , you can still use the Pigsty Terraform template to pull up a virtual machine on a cloud provider with one click.

After the installation is complete, the environment is as follows, including the monitoring infrastructure of two nodes, one master and one backup. A 5-node dedicated etcd cluster, a 3-node sample MinIO cluster that provides object storage services to store PG backups, and a two-node dedicated HAProxy cluster that can uniformly provide load balancing for database services.

On top of this, there are 3 sets of Redis database clusters and 10 sets of PostgreSQL database clusters of different , including a set of 5-shard Citus 12 distributed PostgreSQL clusters out of the box.

This configuration is a reference example for medium and large enterprises to run and manage large-scale database clusters, and you can complete one-click startup on a single physical server in half an hour.

A smoother build process

When you choose to download the software required by Pigsty directly from the Internet, you may encounter the annoyance of Kungfu.com. For example, the default Grafana/Prometheus Yum source downloads are extremely slow. In addition, there are some scattered RPM packages that need to be downloaded through the Web URL instead of the repotrack RPM.

In Pigsty v2.2, this problem is solved. Pigsty provides an official yum source: http://get.pigsty.cc and is configured as one of the default upstream sources. All scattered RPMs and RPMs that need to go over the wall are placed in it, which can effectively speed up online installation/build.

In addition, Pigsty also provides support for Xinchuang operating system, Tongxin UOS 1050e uel20 in v2.2, to meet the special needs of some special customers. Pigsty has recompiled PG-related RPM packages for these systems to provide support for customers in need.

Install

Starting from v2.2, the installation command for Pigsty becomes:

bash -c "$(curl -fsSL http://get.pigsty.cc/latest)"

Pigsty can be fully installed on a new machine with one line of command. If you want to try the beta version, just replace latest with beta. For special environments without internet access, you can also use the link below to download Pigsty, as well as an offline installer packaged with all the software:

  • http://get.pigsty.cc/v2.2.0/pigsty-v2.2.0.tgz
  • http://get.pigsty.cc/v2.2.0/pigsty-pkg-v2.2.0.el7.x86_64.tgz
  • http://get.pigsty.cc/v2.2.0/pigsty-pkg-v2.2.0.el8.x86_64.tgz
  • http://get.pigsty.cc/v2.2.0/pigsty-pkg-v2.2.0.el9.x86_64.tgz

The above are the changes brought about by Pigsty v2.2. For more details, please refer to Pigsty official documentation and Github Release Note .

Guess you like

Origin www.oschina.net/news/252434/pigsty-v2-2-released