Interpretation of GaussDB technology丨Innovative practice of database migration

This article is shared from Huawei Cloud Community " DTCC 2023 Expert Interpretation丨GaussDB Technology Interpretation Series Database Migration Innovation Practice ", author: GaussDB database.

Recently, the 14th China Database Technology Conference (DTCC 2023) was held in Beijing with the theme of "Building the Future with Digital Intelligence Empowerment". GaussDB's "five highs and two easy" core technologies gave the world a better choice special session , Dou Deming, Director of R&D of Huawei Cloud Database Ecological Tools, shared the innovative practice of GaussDB database migration.

cke_171.jpeg

The following is the transcript of the speech:

Dear colleagues, I am Dou Deming, Director of R&D of Huawei Cloud Database Ecosystem Tools. What I am sharing is the innovative practice of GaussDB database migration.

Ease of migration is a key consideration in enterprise database replacement selection

In addition to the capabilities of the database itself, the selection of a database depends on whether it can be smoothly migrated from other databases to GaussDB. This is also a key factor that many companies consider. Whether the database can be migrated smoothly has two very core factors. One is the database itself, such as whether it can be well compatible with the syntax of mainstream databases, so that the application can be changed with less or no changes; the other is whether it can provide some good features around the database. The easy-to-use migration tool can smoothly migrate SQL embedded in the application, objects in the database, and full and incremental data from other databases with almost zero business downtime. These two points are the two key factors for migration that enterprises consider when selecting databases.

Structure (UGO) + data (DRS) one-stop migration solution

At the DTCC conference in 2021, we released the Huawei Cloud Structure + Data one-stop migration solution, which has two core tools. One tool is UGO, which mainly evaluates and converts the syntax compatibility of structures and applications, such as capturing the SQL embedded in the upper-level application of the database for evaluation, evaluating the DDL of internal objects in the database, and outputting a report to clearly display Which ones are compatible with the database itself, which ones can be compatible through UGO conversion, and which ones cannot be converted and require manual intervention for transformation, etc. Another tool is DRS. As we all know, there are many data migration problems in the process of heterogeneous database replacement. What DRS needs to solve is how to quickly migrate the customer's stock data and incremental data without business downtime. , and ensure that data will not be lost, correct, or messy under any circumstances, while providing flexible and diverse data comparison and repair capabilities.

UGO+DRS integrated solution has been verified in actual projects

The UGO+DRS one-stop migration solution has been verified and applied in many projects in the past two years. Here are a few practical examples. The first one is our company's own internal MetaERP project, using UGO to automatically convert nearly 700 million rows of O database SQL scripts, the conversion success rate is close to 100%, and GaussDB has realized parallel logic decoding, with a performance of nearly 300MB/s, which can Let DRS easily cope with MetaERP's traffic peaks of 10 to 20 times at monthly, quarterly and yearly ends, ensuring data synchronization with a low latency of <5 seconds.

The second is the database replacement of a certain bank. The migration complexity of this project is relatively high, and it faces difficulties such as many applications, many database objects, and deep dependence on stored procedures and packages. Up to now, through our one-stop migration solution, nearly UGO automatic conversion of 130 million lines of SQL scripts (including nearly 80 million stored procedures), with a conversion success rate of over 96%, migrated nearly 300 sets of O database instances using DRS, and realized the long-term parallel operation of O database and GaussDB database Stable operation, forward and reverse low-latency data synchronization.

New difficulties and challenges encountered during project implementation

During the implementation of a large number of projects, our UGO+DRS one-stop migration solution also encountered some new difficulties and challenges. I believe everyone will encounter these challenges, so I would like to share them here.

Challenge 1 : When replacing heterogeneous databases, how to quickly identify syntax incompatibility points of heterogeneous databases, identify performance differences of the same SQL executed in different databases with the same data, and whether there will be differences when upgrading from a low version to a high version Incompatibility or performance degradation, and then there is the question of how to simulate the database behavior during business traffic peaks.

Challenge 2 : At present, developers and DBAs in many enterprises are not familiar with GaussDB, and the level of SQL writing is uneven. Moreover, there is a lack of unified SQL programming specifications and effective SQL audit mechanisms when doing application development. Many bad SQLs flow into It affects the production environment, causing a large number of application performance problems, affecting production business and customer experience.

Challenge 3 : The current character sets of many databases have been extended or customized based on the standard character sets, resulting in incompatibility of the same character sets during data migration, or there is no equivalent character set, and even worse, historical data There are already all kinds of garbled data in it, and these special scenarios will affect the smoothness of migration.

Of course, there are still many difficulties and challenges, but these three will block or slow down the database migration process, so what explorations and innovations have we made to address these three challenges? Let me share it with you again.

Addressing Challenge 1: Incubating database traffic recording and playback capabilities

The concept of traffic recording and playback is believed to be familiar to everyone. In the database field, some database vendors also provide corresponding tools. GaussDB faces many business scenarios, so the required technologies vary by scenario. If the source database is a public cloud service and provides a full amount of SQL, then you can directly obtain the full amount of SQL and play it back; if the source database has audit logs enabled, you can also directly download and parse the audit logs. Of course, enabling the audit logs will affect the database Performance has a certain impact; if the source is a self-built database and the audit log is not enabled, an agent needs to be deployed to analyze all the SQL issued by the application by capturing network data packets and combining the communication protocol of the database itself. Basically, these three solutions can cover all scenarios. There are a few points to pay attention to here. First, we need to study the communication protocols of different databases. Second, we need to realize the automatic conversion of SQL in heterogeneous database replacement scenarios. In addition, we must have The flow control capability of SQL playback can speed up or slow down, etc. Of course, in the event of abnormality in parsing, playback, etc., records must be made.

cke_172.png

The next step is to replay the traffic in the mirror library of the source database and the GaussDB database at the same time, and ensure that the data in the mirror library and the target database are completely consistent, and the playback SQL is also exactly the same, and finally output an analysis report to compare the execution consumption of each SQL. Time, resource consumption, and even execution results, it is easy to see which SQL performance GaussDB is better than the source database, which ones have deteriorated, and which ones are basically the same.

We are working with a bank on the joint innovation of database traffic recording and playback. From the perspective of actual application results, the success rate of SQL capture can reach more than 97% by capturing traffic packets through the agent method, and the success rate of parsing and playback can reach 95%, during this process, abnormal situations of syntax incompatibility and semantic incompatibility can also be identified.

Addressing Challenge 2: Incubating GaussDB database SQL audit capabilities

Everyone is more familiar with SQL auditing, and many large companies will explore and practice, but for GaussDB, because it is a distributed database of pure independent innovation, many enterprise developers and DBAs are not familiar with the SQL syntax of GaussDB, nor Develop a relatively complete SQL programming specification, and many third-party SQL audit tools do not have the audit capability for GaussDB. In this case, we combined UGO's mature SQL parser and SQL tuning practices in multiple projects to hatch the SQL auditing capabilities of GaussDB database.

cke_173.png

The input of our SQL audit can be of various types, it can be a code warehouse, it can also be a SQL file, it can also be dynamic SQL obtained through traffic recording, etc., and the original SQL obtained directly can be audited, or it can be audited through UGO Converted SQL.

Up to now, we have accumulated 81 audit rules and applied them in two internal projects of the company and several external banks, and the effect has exceeded expectations.

Addressing Challenge 3: Incubating Character Set Compatibility Analysis and Assessment Capabilities

For data migration, what everyone is most worried about is that there are various problems during the formal cutover that lead to the failure of the cutover. In addition to the functions of the migration tool itself, the most common problems may be incompatible character sets, garbled data, and rare characters. etc. For example, the O database has expanded the GBK character set to store UTF-8 characters, while the GBK character set of the GaussDB database is very standardized. When migrating data from the O database to GaussDB, these UTF-8 characters cannot be written at all. The migration will inevitably fail. What's more, many customers have a large amount of garbled data in their massive historical data. It is impossible to determine when the data was written, which application wrote it, or whether it will be used later, but the customer requires that it must be migrated. . Well, in the face of these challenges, we try to identify them in advance by incubating character set compatibility analysis and evaluation tools.

cke_174.png

The principle of this tool is very simple. The first is to establish an analyzable character set baseline, such as GB series, Unicode series, etc., and the second is to obtain the metadata of the source database, including character set, index information, table structure information (column type, column length), etc., and then make a mapping based on the character sets of the source database and the target database, and finally scan and analyze the database, and output a multi-dimensional analysis and evaluation report. At present, this tool is co-creating with a certain bank. Judging from the early trial results, many problems can indeed be found, such as the ZHS16GBK and AL32UTF8 character sets are mixed, and the data is garbled due to direct writing in binary format, and the ZHS16GBK character set is used. A large number of uncommon words, etc.

Thoughts on the evolution of UGO+DRS one-stop migration solution

The above are the three major challenges we encountered in the process of using the UGO+DRS one-stop migration solution, and some innovative practices to deal with these three challenges. Now there are more and more GaussDB migration scenarios, and they are becoming more and more complex, so we will continue to explore and innovate to make our solutions more perfect and the migration process smoother, such as traffic playback, SQL audit, character set compatibility evaluation It will support more databases, launch very detailed and comprehensive application, structure, and data migration feasibility analysis reports, and realize SQL capture, conversion, audit, optimization, and integrated management of the whole process, etc., and hope to cooperate with customers, partners, and everyone Collaborate with peers.

The above are some innovative practices in migration of GaussDB database that I shared. Thank you all.

Extra!

cke_12411.jpeg

Huawei will hold the 8th Huawei Connectivity Conference (HUAWEICONNECT 2023) at the Shanghai World Expo Exhibition and Convention Center and Shanghai World Expo Center from September 20-22, 2023. With the theme of "Accelerating Industry Intelligence", this conference invites thought leaders, business elites, technical experts, partners, developers and other industry colleagues to discuss how to accelerate industry intelligence from the aspects of business, industry, ecology and other aspects.

We sincerely invite you to come to the site, share the opportunities and challenges of intelligentization, discuss the key measures of intelligentization, and experience the innovation and application of intelligent technology. you can:

  • In 100+ keynote speeches, summits, and forums, collide with the viewpoint of accelerating industry intelligence
  • Visit the 17,000-square-meter exhibition area to experience the innovation and application of intelligent technology in the industry at close range
  • Meet face-to-face with technical experts to learn about the latest solutions, development tools, and hands-on
  • Seek business opportunities with customers and partners

Thank you for your continued support and trust, and we look forward to meeting you in Shanghai.

Official website of the conference: Huawei Connect Conference 2023 | HUAWEI CONNECT 2023

Welcome to follow the "Huawei Cloud Developer Alliance" public account to get the conference agenda, exciting activities and cutting-edge information.

Click to follow and learn about Huawei Cloud's fresh technologies for the first time~

Guess you like

Origin blog.csdn.net/devcloud/article/details/132742098