GaussDB's Application Lossless Transparency (ALT)

1. Background

As an enterprise-level distributed database, GaussDB provides the ultimate high-availability disaster recovery capabilities such as "active-active across AZs in the same city, three centers in two places, and strong consistency between dual clusters". When a database node cannot provide external services due to failure, in order to continue to ensure the availability of database services, the JDBC driver will send subsequent database connection requests to other available nodes. However, after a fault occurs, the connections that have established sessions with the faulty node cannot be automatically switched to the available nodes, causing errors to be reported by business units using these connections. If the business unit lacks connection retry or business consistency verification, it may cause application interruption, or even business data inconsistency, resulting in serious business losses for users.

Therefore, the HUAWEI CLOUD GaussDB database provides a client connection transfer solution in the event of a database failure - ALT (Application Lossness Transparent, application lossless and transparent). The principle of this scheme is that when a node of the database cluster cannot provide external services due to a failure, if there are other available nodes in the cluster at this time, the session connection on the failed node will be automatically migrated to the target node, and the client does not need to A connection request is issued and database operations can still continue. During the whole process, the client application is unaware, as if it has experienced a slightly delayed SQL request processing, which greatly improves the availability of database services.

2. Technical Architecture

Let's first look at the technical architecture and operating principle of ALT:

1.png

Figure 1 - Schematic diagram of ALT architecture

As can be seen from the above figure, the GaussDB cluster introduces an independent component GNS (GaussDB Notification Service), which is used to detect and obtain real-time status information of each node of the database. When an application calls the JDBC interface to establish a connection to any node in the cluster for the first time, the JDBC driver will establish a cluster status subscription link with the GNS service. When GNS detects that the status of the cluster has changed, it will send the status change event to the JDBC driver through the subscription link. After the event processing thread receives the task, it will manage the affected connection through the reference copy saved in the cluster connection manager. migrate.

The GNS component adopts a multi-node peer-to-peer multi-active deployment method. Each GNS service has the full state data of the cluster. The JDBC driver only needs to establish a subscription service with any of the GNSs to manage the application on all nodes of the cluster. on the connection.

3. Key capabilities

After understanding the overall architecture and operating principles of ALT, let's take a look at what key capabilities it has and what business value these capabilities can bring to customers.

3.1 Quick App Notification

ALT provides an active message notification mechanism for database state changes. The JDBC driver subscribes to the status of the database cluster used by the business through the GNS service. When the status of the nodes in the cluster changes, the GNS pushes the change event to the JDBC driver, and the latter manages and manages the connection on the target database according to the latest status of the cluster. migrate.

At the same time, the JDBC driver also provides the application program with a callback function registration interface for cluster status changes. The application can register the callback function of state change with the JDBC driver for certain database connections. When the status of the cluster changes, the JDBC driver will call the registered function. By registering the callback function, it is convenient to implement operation and maintenance management operations such as email notification of database status changes and alarm platform reporting on the business side.

3.2 Connectionless Migration

When it is detected that the GaussDB database fails or is about to be shut down for maintenance, the JDBC-driven event processing thread analyzes each affected connection to determine whether there are other database nodes that meet the connection requirements. If so, the connection is migrated to an available node. And restore the session state information of the connection. In the active shutdown maintenance scenario, users can also configure the connection suspension time for waiting for available nodes to appear through parameters, so as to improve the service availability in the cluster unified maintenance scenario.

3.3 Transaction breakpoint resume

After ALT is enabled on the connection, both the JDBC driver and the GaussDB server will track and record the transaction status information of the current session. If a failure occurs while the database is processing SQL requests, after the connection is migrated to a new node, ALT restores the session to the point before the failure according to the recorded transaction status information, and the transaction continues from the interrupted position, avoiding business interruption caused by the database failure Data inconsistency with the application layer.

The value brought by the ALT feature to customers can be summarized as follows:

  • When avoiding database failure, the server status cannot be obtained in time, resulting in too large RTO;
  • Accelerate the connection establishment of JDBC specified node type (targetServerType);
  • Business continuity guarantee when the cluster is shut down for maintenance;
  • Business continuity guarantee in case of database failure;
  • Quick application notification during cluster disaster recovery switchover.

4. ALT Feature Demonstration

JDBC enables ALT mode

Example:

URL=jdbc:opengauss://host1:port1,host2:port2,host3:port3/database?enableALT=true&gns=gns_host1:gns_port1, gns_host2:gns_port2

When the application uses the JDBC driver to access the GaussDB database, you only need to add the configuration item enableALT and the GNS listening address to the connection URL to enable the ALT service. The minimum subscription granularity of the ALT service is at the connection level, and the JDBC driver supports simultaneous establishment of an ALT connection and a normal connection to the same cluster.

Demonstration scene:

When the GaussDB centralized cluster performs a switchover operation, observe the execution of the SQL request using the ALT connection.

Demo steps:

The application program and the database master node respectively establish a common JDBC connection and a connection with the ALT feature enabled, use the two connections to execute the following SQL commands at the same time, and observe whether the database connection can be used normally after the cluster completes switchover.

1. The client sends a SQL request: View the information of the currently accessed database instance

SQL> show listen_addresses;

2. Client sends SQL request: create and use database object

SQL> create table alt_test_switchover(mes text);

SQL> insert into alt_test_switchover values('message before switchover');

<-- 集群操作:switchover -->

3. Client sends SQL request: use database object

SQL> insert into alt_test_switchover values('message after switchover');

SQL> select mes from alt_test_switchover;

4. The client sends a SQL request: View the information of the currently accessed database instance

SQL> show listen_addresses;

compare results:

(1) Ordinary JDBC connection: After the cluster switches over, the database connection is disconnected, and the application can no longer use the connection to send SQL requests.

2.png

Figure 2 - Common JDBC connection log

(2) Connection with ALT feature enabled: After the cluster switches over, the database connection is automatically migrated to the new master node, and the application can continue to use the connection to send SQL requests.

3.png

Figure 3 – ALT connection log

As an enterprise-level distributed database, GaussDB has the core advantages of five highs and two easy (high availability, high security, high performance, high flexibility, high intelligence, easy deployment, and easy migration). In terms of meeting the reliability requirements of the financial core business, GaussDB and ICBC Lianchuang have launched the first dual-cluster strong consistency solution in China to achieve complete isolation of cluster-level faults with RPO=0, and the new non-destructive and transparent application solution has achieved system failure The application is unaware, and the business is truly realized 7*24 hours without interruption, bringing a more extreme high-availability experience to the enterprise.

Guess you like

Origin blog.csdn.net/GaussDB/article/details/132403300