GaussDB Technology Interpretation Series: Database Migration Innovation Practice

Recently, the 14th China Database Technology Conference (DTCC 2023) with the theme of "Digital Intelligence Empowerment to Build the Future" was held in Beijing. A special session on GaussDB's "Five Highs and Two Easy" core technologies gave the world a better choice . Dou Deming, Director of R&D of Huawei Cloud Database Ecological Tools, shared the innovative practice of GaussDB database migration.

This article will share the innovative practice of GaussDB database migration.

Ease of migration is a key consideration in enterprise database replacement selection

In addition to the capabilities of the database itself, the selection of a database depends on whether it can be smoothly migrated from other databases to GaussDB. This is also a key factor that many companies consider. Whether the database can be migrated smoothly has two very core factors. One is the database itself, such as whether it can be well compatible with the syntax of mainstream databases, so that the application can be changed with less or no changes; the other is whether it can provide some good features around the database. The easy-to-use migration tool can smoothly migrate SQL embedded in the application, objects in the database, and full and incremental data from other databases with almost zero business downtime. These two points are the two key factors for migration that enterprises consider when selecting databases.

Structure (UGO) + Data (DRS)

One-stop migration solution

At the DTCC conference in 2021, Huawei Cloud Structure + Data one-stop migration solution was released, which has two core tools. One tool is UGO, which mainly evaluates and converts the syntax compatibility of structures and applications. For example, it captures the SQL embedded in the upper-layer application of the database for evaluation, evaluates the DDL of the internal objects of the database, and outputs a report to clearly display Which ones are compatible with the database itself, which ones can be compatible through UGO conversion, and which ones cannot be converted and require manual intervention for transformation, etc. Another tool is DRS. As we all know, there are many data migration problems in the process of replacing heterogeneous databases. What DRS needs to solve is how to quickly migrate customers' existing and incremental data with almost no business downtime. , and ensure that data will not be lost, correct, or messy under any circumstances, while providing flexible and diverse data comparison and repair capabilities.

UGO+DRS integrated solution

Verified in actual projects

The UGO+DRS one-stop migration solution has been verified and applied in many projects in the past two years. Here are a few practical examples. The first one is our company's own internal MetaERP project. UGO was used to automatically convert nearly 700 million rows of O database SQL scripts. The conversion success rate was close to 100%. At the same time, GaussDB implemented parallel logic decoding with a performance of nearly 300MB/second. Let DRS easily cope with MetaERP's traffic peaks of 10 to 20 times at monthly, quarterly and yearly ends, ensuring data synchronization with a low latency of <5 seconds.

The second one is the database replacement of a bank. The migration complexity of this project is relatively high. It faces difficulties such as many applications, many database objects, and deep dependence on stored procedures and packages. So far, through our one-stop migration solution, we have completed nearly Automatic UGO conversion of 130 million lines of SQL scripts (including nearly 80 million stored procedures) with a conversion success rate of over 96%. Nearly 300 sets of O database instances were migrated using DRS, achieving long-term parallelization of O database and GaussDB database. Stable operation, forward and reverse low-latency data synchronization.

New difficulties and challenges encountered during project implementation

During the implementation of a large number of projects, the UGO+DRS one-stop migration solution has also encountered some new difficulties and challenges. I believe everyone will encounter these challenges, and I would like to share them with you.

Challenge 1

When replacing heterogeneous databases, how to quickly identify syntax incompatibilities in heterogeneous databases, identify performance differences in executing the same SQL in different databases when the data is the same, and whether there will be incompatibilities or problems when upgrading from a lower version to a higher version. Performance degradation, and then there is how to simulate the database behavior during peak business traffic.

Challenge 2

Currently, many enterprise developers and DBAs are not very familiar with GaussDB, and their SQL writing skills are uneven. Moreover, there is a lack of unified SQL programming specifications and effective SQL audit mechanisms when doing application development. A lot of bad SQL has flowed into the production environment. , which in turn causes a large number of application performance problems, affecting production operations and customer experience.

Challenge three

The current character sets of many databases have been extended or customized based on the standard character sets, resulting in incompatibility of the same character sets during data migration, or there is no equivalent character set, and what's more, historical data already exists All kinds of garbled data are collected, and these special scenarios will affect the smoothness of migration.

Of course, there are still many difficulties and challenges, but these three will block or slow down the database migration process. So what exploration and innovation have we done to address these three challenges? Let me share it with you again.

Addressing Challenge 1: Incubating database traffic recording and playback capabilities

I believe everyone is familiar with the concept of traffic recording and playback. In the database field, some database vendors also provide corresponding tools. GaussDB faces many business scenarios, so the required technologies vary depending on the scenario. If the source database is a public cloud service and provides a full amount of SQL, then you can directly obtain the full amount of SQL and play it back; if the source database has the audit log enabled, you can also directly download and parse the audit log. Of course, turning on the audit log will affect the database. Performance has a certain impact; if the source is a self-built database and the audit log is not enabled, then an agent needs to be deployed to parse all SQL issued by the application by capturing network data packets and combining it with the communication protocol of the database itself. Basically, these three solutions can cover all scenarios. There are a few points to note here. The first is to thoroughly study the communication protocols of different databases. The second is to realize automatic SQL conversion in heterogeneous database replacement scenarios. The other is to have The flow control capability of SQL playback can speed up or slow down, etc. Of course, in the event of abnormality in parsing, playback, etc., records must be made.

picture

The next step is to perform traffic playback in the mirror library of the source database and the GaussDB database at the same time, and ensure that the data in the mirror library and the target database are completely consistent, and the playback SQL is exactly the same. Finally, an analysis report is output to compare the execution consumption of each SQL. Time, resource consumption, and even execution results, it is easy to see which SQL performance GaussDB is better than the source database, which ones have deteriorated, and which ones are basically the same.

The GaussDB team is working with a bank on joint innovation of database traffic recording and playback. Judging from the actual application results, by capturing traffic packets through agents, the SQL capture success rate can reach more than 97%, and the parsing success rate and playback success rate can reach over 97%. Reaching 95%, during this process, exceptions such as syntax incompatibility and semantic incompatibility can also be identified.

Addressing Challenge 2: Incubating GaussDB database SQL audit capabilities

Everyone is more familiar with SQL audit, and many large companies will explore and practice it. However, for GaussDB, because it is a purely independent and innovative distributed database, many corporate developers and DBAs are not familiar with GaussDB’s SQL syntax and have no knowledge of it. Develop relatively complete SQL programming specifications. Many third-party SQL audit tools do not have audit capabilities for GaussDB. In this case, we combined UGO’s mature SQL parser and SQL tuning practices in multiple projects to incubate SQL auditing capabilities of GaussDB database.

picture

The input for SQL audit can be of many types, it can be a code warehouse, a SQL file, or dynamic SQL obtained through traffic recording, etc. The native SQL obtained directly can be audited, or it can be audited through UGO conversion. After the SQL.

So far, 81 audit rules have been accumulated and applied in two projects within the company and multiple external banks, and the results have exceeded expectations.

Addressing Challenge Three: Incubating Character Set Compatibility Analysis and Assessment Capabilities

Regarding data migration, what everyone is most worried about is that various problems will occur during the official cutover, causing the cutover to fail. In addition to the functions of the migration tool itself, the most common ones may be incompatible character sets, garbled data, and uncommon characters. etc. For example, O database has expanded the GBK character set to store UTF-8 characters, while the GBK character set of GaussDB database is very standardized. When migrating data from O database to GaussDB, these UTF-8 characters cannot be written at all. The migration will inevitably fail. What’s more, many customers’ massive historical data contains a large amount of garbled data. It is impossible to determine when the data was written, which application wrote it, or whether it will be used again in the future, but the customer requires that it be migrated. . Well, in the face of these challenges, we try to identify them in advance by incubating character set compatibility analysis and evaluation tools.

picture

The principle of this tool is very simple. The first is to establish a character set baseline that can be analyzed, such as GB series, Unicode series, etc. Secondly, it is to obtain the metadata of the source database, including character set, index information, table structure information (column type, column type, etc.) length), etc., and then perform mapping based on the character sets of the source database and the target database, and finally conduct data scanning and analysis on the database, and output a multi-dimensional analysis and evaluation report. This tool is currently being jointly created with a bank. Judging from the early trial results, many problems can indeed be found, such as the mixing of ZHS16GBK and AL32UTF8 character sets, direct writing into binary format resulting in garbled data, and the use of the ZHS16GBK character set. A large number of uncommon words, etc.

One-stop migration solution for UGO+DRS

Thoughts on the evolution of

The above are the three major challenges encountered during the use of UGO+DRS one-stop migration solution, as well as some innovative practices to deal with these three challenges. Now there are more and more migration scenarios for GaussDB, and they are becoming more and more complex. Therefore, we will continue to explore and innovate to make the solution more complete and the migration process smoother. For example, traffic playback, SQL audit, and character set compatibility assessment will support more With multiple databases, we have launched a very detailed and comprehensive application, structure, and data migration feasibility analysis report to achieve integrated management of SQL capture, conversion, review, optimization, etc. We also hope to cooperate with customers, partners, and colleagues. .

The above are some innovative practices in migration of GaussDB database. Thank you all and welcome to exchange.

Guess you like

Origin blog.csdn.net/GaussDB/article/details/132969699