GaussDB Technical Interpretation: Application Lossless Transparency (ALT)

This article is shared from Huawei Cloud Community " DTCC 2023 Expert Interpretation丨GaussDB Technology Interpretation Series: Application Lossless and Transparency (ALT) ", author: GaussDB database.

Recently, at the 14th China Database Technology Conference (DTCC 2023), GaussDB's "five highs and two easy" core technologies gave the world a better choice. Xu Yiliang, an expert on Huawei cloud database technology, shared "GaussDB High Availability Application Lossless and Transparent 》The keynote speech introduced the latest achievements in high availability of Huawei Cloud GaussDB.

cke_138.png

The following is the transcript of the speech:

good afternoon everyone! Let me introduce GaussDB's high-availability application lossless and transparent features.

1. Brief introduction of lossless and transparent features of GaussDB application

We know that when users use the database online business system, if the database server undergoes maintenance operations, such as restarting or active-standby switchover, or when a failure occurs, the application will sense that the connection between the databases is interrupted, and some executed transactions will be rolled back automatically. When the data service is restored, the application needs to re-establish the connection and retry according to the business logic. This not only increases the development complexity of the business system for the business, but also increases the risk of business operation.

We know that when an exception occurs in the database, we cannot judge whether the transaction being executed is committed or not. In order to solve the problems of connection interruption and transaction automatic rollback, the HUAWEI CLOUD GaussDB database provides a lossless and transparent high-availability capability. Through this capability, we realize the functions of connection maintenance and transaction breakpoint continuation (continuing from the place where the transaction was interrupted). .

The non-destructive and transparent application of GaussDB supports automatic judgment of transaction boundaries within the database, and caches session information and data information of current transaction execution, including session locks and user variables. When the database is restored, a consistent snapshot point can be automatically built and restored based on the cached session information and transaction information, as well as the server’s log. We can use this consistent snapshot point to recover all the data at the moment when the master/slave switchover occurs in the database. The application session restores the uncommitted transaction state at the moment of active-standby switchover. After the session and transaction state are restored, we can continue to execute the transaction from the consistency point of the transaction.

From a business perspective, if the application lossless and transparent function is used, during the master/standby switchover of the entire database, the application program only perceives that the transaction execution is slightly slower, and does not perceive the interruption of transaction execution, nor does it need to reconnect. There is no need to retry transactions, which simplifies the development complexity of business programs and reduces risks.

2. GaussDB uses a lossless and transparent architecture

Let's take a look at the application lossless and transparent architecture of the GaussDB database. The two features mentioned just now, one is connection maintenance, and the other is transaction breakpoint continuation. For connection maintenance, a common solution in the industry is to deploy a middleware between the application and the database. The connection plays a role of forwarding. When the active/standby switch occurs on the database server, the connection between the database and the middleware is disconnected, and the connection between the application program and the middleware is not broken. From the performance point of view, it seems that the problem of connection maintenance has been solved. function, but we know that the connection between the middleware and the database is still interrupted, and the session-level parameters are lost after the connection is broken. The database performs master-standby switchover. When the standby database is upgraded to the master database, all uncommitted transactions are automatically rolled back. The middleware is independent, it has nothing to do with the database, and cannot solve the problem of continuation of transaction breakpoints, and the middleware not only adds an extra link in the call chain, but also increases the complexity and risk of high availability, and requires additional deployment , Increased resource consumption.

The GaussDB database does not use middleware, but directly builds capabilities in the driver layer. When the active/standby switchover or failure of the database server occurs, the driver layer will not report the interruption information to the application program, but maintain the connection internally, and cache some session data and transaction data during the maintenance. When the database server recovers, the driver will automatically establish a new session connection, and use the cached session data to restore the session-level parameters of the original connection. After the session-level parameters are restored, the original uncommitted transaction state will continue to be restored according to the cached transaction information, so that after the session and transaction state are both restored, the transaction can continue to execute from the consistent snapshot point when the active/standby switchover occurs.

cke_139.png

As mentioned earlier, after the main database goes down and the standby database is upgraded to the main database, the general database will roll back the uncommitted transactions. To solve the rollback problem of uncommitted transactions, after GaussDB starts the non-destructive and transparent function, the uncommitted transactions cannot For automatic rollback, it is necessary to maintain the state of uncommitted transactions, wait for the new connection to come, and perform transaction recovery and bridging transactions. In this place, we refer to the function of the logical transaction ID to judge the validity of the status of the transaction by matching the logical transaction ID with the real transaction ID. If the transaction state is valid, the transaction information will be bound to the connection, realizing the recovery of the uncommitted transaction state after the database standby database is upgraded to the main database.

When the master/standby switchover occurs in the database, the connection is interrupted, and the application does not know what happened to the database or when the database will be restored. The general strategy is to use regular retries to connect continuously to check whether the database is normal.

There are two problems with this timing test mechanism: one is that during the recovery of the database service, non-stop attempts will continue to report errors, invalid and redundant operations; the other is that there is a delay problem in the timing detection mechanism, when the database service is restored. , the application cannot sense that the service has been restored in real time. GaussDB provides a real-time message notification service (GNS for short). When there is any change in the status of the database server, it can send a message event notification to the application in a timely manner, and the application can take corresponding actions after receiving it. The GaussDB message notification service is a real-time active push method with lower latency and less resource consumption.

3. The non-destructive and transparent usage of GaussDB application

When it comes to lossless and transparent applications, its use is very simple, you only need to turn on the function switch when the application establishes a connection to the database. For example, taking the JDBC driver as an example, when an application wants to use the application lossless transparency function, it only needs to configure the IP address and port of the GaussDB Message Notification Service (GNS for short) in the JDBC URL. At this time, there will be a situation. There are many application clients in a database cluster. Some application clients want to use the lossless and transparent function of the application, and some applications do not. When the master/standby switchover occurs in the database, the application client using the non-destructive and transparent function will have the ability to maintain the connection and continue the transaction breakpoint. At this time, the database master/standby switchover operation is imperceptible to the business. For application clients that are not configured to use the lossless and transparent function of the application, when the active/standby switchover or abnormality occurs in the database, the connection interruption can be sensed immediately, and the transaction being executed by the database will be rolled back immediately.

We have seen that GNS is peer-to-peer and multi-active. When there are many clients, they can connect to different GNSs, which is equivalent to the function of load balancing, instead of pressing all resources on one physical machine like the active-standby mode.

cke_140.png

4. GaussDB application lossless and transparent usage scenarios

I just introduced the architecture principles and usage methods. The following describes the non-destructive and transparent usage scenarios of the GaussDB database, mainly including three aspects: planned active-standby switchover, unplanned active-standby switchover, and disaster recovery switchover.

Planned active/standby switchover

Planned active-standby switchover is generally used in routine database maintenance and upgrades, and is initiated by the operation and maintenance administrator. When performing planned active-standby switchover, GaussDB will automatically judge and wait until a safe transaction boundary is reached, then cache the session data and transaction data just mentioned in the driver layer, and then perform the active-standby switchover after the data is cached.

A safe transaction boundary means that the transaction on the current session is executed to a certain consistency point, which can be divided into two situations. One is to perform master-standby switchover after the transaction is executed, that is, transaction-level emptying, and the other is to execute a SQL statement You can perform active-standby switchover, that is, statement-level draining. At this time, the transaction is in an uncommitted state. It can be said that statement-level draining has less impact on the business.

During the entire database master-standby switchover period, the GaussDB driver maintains the connection. When the database service is restored, the connection is rebuilt first, and then the transaction is resumed. After the transaction is restored, the execution continues from the transaction consistency point. We can see that in the normal planned master-standby switchover scenario, using the application lossless transparency feature is insensitive to the database, and this kind of master-standby switchover will use statement-level draining. In the OLTP scenario, the execution time of SQL statements is at the millisecond level. Using the lossless and transparent function of the active/standby switchover only increases the waiting time for SQL statement execution, that is, waiting for an extra millisecond level. Compared with the original several Seconds or even tens of seconds of master-standby switchover time, the extra millisecond-level waiting time is almost negligible.

Unplanned active/standby switchover

Let's take a look at the unplanned scenario. Unplanned can be said to be catastrophic and sudden, and it is different from the planned active-standby switchover scenario in some processing mechanisms. For unplanned active-standby switchover, the GaussDB database implements it by caching the statements in the entire transaction and replaying the entire transaction based on the transaction granularity. GaussDB automatically uses transactions as consistency boundaries, and the driver automatically caches executed transaction SQL statements during transaction execution. After the database service is restored, GaussDB re-executes the cached transaction SQL statement. In order to avoid repeated execution and submission of the transaction, the driver will first query the transaction execution status according to the logical transaction ID when replaying. If the transaction has already been submitted, it does not need to be executed again.

The reason why transaction granularity replay is used instead of statement granularity is because the data synchronization between the main library and the standby database of the database is based on the unit of transaction. When the transaction is submitted on the main library, in order to ensure If the data is not lost, the wal log of the transaction will be synchronized to the standby database in real time. This is a common and basic mechanism for database master and backup data replication, which can fully meet the requirements of transaction granularity replay, and if you want to use statement granularity For replay, it is necessary to adopt some other additional measures, such as using savepoint as the real-time synchronization boundary of the master and backup to save progress in stages, etc. Compared with transaction granular replay, statement granular replay makes the solution more complicated and covers more Fewer scenes, worse performance.

Disaster recovery switching

Let's take a look at the disaster recovery scenario. Remote disaster recovery means that two cities are thousands of kilometers apart, and the delay is tens of milliseconds. It is difficult to require cross-regional access to the database with such a large delay. Therefore, the disaster recovery switch in the remote scene is the interaction between the disaster recovery of the database and the disaster recovery of the business program. Cooperate with the process of switching at the same time. At present, the disaster recovery switchover is facing a dilemma, because the two switchovers are independent of each other. The database cluster switchover is completed, but the application program is not switched over. The end-to-end view shows that the database is restored, but the business is not restored.

In order to solve the end-to-end problem, GaussDB provides a message notification service function, namely GNS. When a disaster recovery switchover occurs in the database, for example, when the production cluster is downgraded to a disaster recovery cluster, a message will be sent to the application immediately. After the application program receives the event notification of the database cluster downgrade, it can immediately perform the master/standby switchover; when the database cluster When the disaster recovery cluster is upgraded to the main cluster, GNS will notify the application program in time. After the application program receives the message, it can immediately switch the flow to restore the business. Such a cooperative mechanism can solve the problem of end-to-end RTO duration without unnecessary waiting or human judgment.

In addition to remote disaster recovery, there is also a deployment scenario for intra-city disaster recovery. Because it is in the same city, the latency is relatively low and the network environment is relatively good. Change. In this scenario, if we use the application lossless and transparent mechanism provided by GaussDB, that is, through the connection maintenance and transaction breakpoint continuation technology, during the intra-city disaster recovery switch, we can ensure that the RPO=0 does not lose data, and the database can also be guaranteed. Disaster recovery switching is also transparent to applications.

That’s all for what I’m sharing today, thank you all.

Extra!

cke_10141.jpeg

Huawei will hold the 8th HUAWEI CONNECT 2023 at the Shanghai World Expo Exhibition Hall and Shanghai World Expo Center on September 20-22, 2023. With the theme of "accelerating industry intelligence", this conference invites thought leaders, business elites, technical experts, partners, developers and other industry colleagues to discuss how to accelerate industry intelligence from the aspects of business, industry, and ecology.

We sincerely invite you to come to the site, share the opportunities and challenges of intelligentization, discuss the key measures of intelligentization, and experience the innovation and application of intelligent technology. you can:

  • In 100+ keynote speeches, summits, and forums, collide with the viewpoint of accelerating industry intelligence
  • Visit the 17,000-square-meter exhibition area to experience the innovation and application of intelligent technology in the industry at close range
  • Meet face-to-face with technical experts to learn about the latest solutions, development tools, and hands-on
  • Seek business opportunities with customers and partners

Thank you for your support and trust as always, and we look forward to meeting you in Shanghai.

Conference official website: HUAWEI CONNECT 2023 | HUAWEI CONNECT 2023

Welcome to follow the "Huawei Cloud Developer Alliance" official account to get the conference agenda, exciting activities and cutting-edge dry goods.

Click to follow and learn about Huawei Cloud's fresh technologies for the first time~

Guess you like

Origin blog.csdn.net/devcloud/article/details/132608027