CEIBS Fortune: Application history of distributed database and exploration of new features of TiDB 7.1

Author: Zhang Zhengjun, Head of China-Europe Wealth Database

CEIBS Fortune: Application history of distributed database and exploration of new features of TiDB 7.1

China Europe Wealth is the sales subsidiary of China Europe Fund Holdings. Its APP covers all fund varieties in the industry and provides investment tools and services such as fund trading, big data fund selection, smart fixed investment, and financial planner consultation. China Europe Wealth is committed to providing investors and partners with one-stop Internet wealth management solutions, and its business has continued to maintain steady growth since its establishment in 2015.

This article introduces China-Europe Fortune's exploration process in the field of distributed databases and its practice of successfully migrating business systems to the TiDB platform. The article details the four go-live stages of China-Europe Fortune's adoption of TiDB, demonstrating TiDB's outstanding performance in coping with data growth, handling DDL challenges, and optimizing write performance. In addition, the article also highlights the new features brought by the TiDB 7.1 LTS version, including resource management and control, Partitioned Raft KV, etc. These innovations have greatly improved the business efficiency and performance levels of CEIBS.

The application history of distributed database

CEIBS Fortune has begun investigating distributed databases in 2021, hoping to use distributed databases to meet needs that the original MySQL database cannot meet, thereby solving technical problems encountered at the business level. During this period, TiDB was fully tested, confirming that TiDB is compatible from an architectural perspective and meets the standards from a performance perspective. In 2022, we purchased servers and began to deploy TiDB clusters, and gradually migrated some peripheral systems to TiDB. This year, we have conducted more detailed testing and verification, and switched more and more complex business systems to TiDB. Four systems were launched in the first half of the year, and six more systems are planned to be launched in the second half of the year.

The launch of China-Europe Fortune's distributed database can be divided into four stages:

The first stage is in-depth testing of the business  . By building a parallel environment with the same configuration as production, use production data to conduct in-depth testing. Each business system has its own characteristics and different scenarios. Before each launch, it is necessary to ensure that the testing is sufficient: business-level testing must ensure that all businesses can run; in terms of performance, the performance index of each business cannot be lower than its original performance index on MySQL. Compare the efficiency of real-time business and batch tasks horizontally to identify slow SQL and optimize it.

The second stage is data synchronization  . Use the DM tool provided by TiDB to synchronize data from production MySQL to the TiDB cluster in real time. After the data is synchronized in real time, all MySQL downstream synchronization links (MySQL native synchronization, Canal, FlinkCDC, etc.) in the original architecture are switched to TiDB (output through TiCDC), and then data synchronization observation is performed for two to three weeks, and calibration is performed. Check data consistency.

The third stage is the application launch  . Generally, we will find a small downtime window, turn off DM synchronization, ensure data consistency, and then switch the application to TiDB.

The fourth stage is the guarantee work after going online  , tracking and observing the running status of the online application and the performance of the database. After some businesses go online, they may encounter some problems that were not encountered in testing, or there may be special situations such as execution plan jumps, which require manual processing. Basically, every time our system goes online, we will follow these four steps.

The following figure is a schematic diagram of the current architecture of the China-Europe Fortune TiDB cluster. The storage layer uses a configuration of 3 TiKVs with 3 copies each, for a total of 9 TiKV nodes. 3 TiDB nodes, TiDB and PD adopt mixed deployment mode. In addition, two TiDB nodes with high memory configurations are prepared, and some special, larger SQL, SQL that takes up more memory, and slow SQL are separately thrown on these two TiDB servers to run, which achieves resource isolation to a certain extent. Function (the V6 version used in production does not have the resource isolation function for the time being). In addition, there are three TiFlash nodes and two physical machines used as TiCDC.

Let’s talk about the steps to go online in detail. Business A and Business B have done some table and database splitting. After the database is divided, data aggregation needs to be done and then synchronized to the MySQL summary database. In addition to the summary library, we also have a big data platform. We use Canal to extract some synchronized data from MySQL, and then throw it into the big database after processing. When switching to TiDB, we first synchronize MySQL data to TiDB through DM. TiDB then writes the data to the downstream summary library through TiCDC, and the other end outputs it to Kafka and synchronizes it to the big database. This architecture has been running for a while, and after verifying that data synchronization is OK, the business applications will be actually switched to the TiDB cluster. When switching, you only need to match the address in the JDBC link to the HAProxy address to complete the launch of a business system.

Currently, China Europe Wealth has launched multiple business systems on the TiDB cluster, including the rate system, fund data system, risk control system, major event system, channel system and membership system. We plan to launch more business systems in the second half of the year. The application of TiDB is extending to core scenarios. Our latest portfolio investment advisory system, marketing system, product system, user system, including trading system are all under planning.

Benefits of using distributed databases

When we investigated distributed databases in 2021, it was mainly because our business encountered three challenges.

First of all,  the data in a single table grows very rapidly  . Our development and operation and maintenance often have to cooperate with various sub-databases and tables. Sometimes a business database cannot be divided any further. Table and database partitioning are very labor-intensive. Not long after some tables were divided, the data volume of a single table quickly increased to 500 million+, and the data needed to be re-sharded. This amount of work is very huge.

Secondly, it is  DDL for large tables   . I encountered this problem in the last two weeks. A certain business scenario has changed and fields need to be expanded. The DDL for one sub-table takes 6 hours to run, and there are ten sub-tables in total, which is a huge waste of time. Moreover, DDL cannot be run during busy business hours, so DBA may need to split a business logic DDL change into several days or even weeks to complete, which imposes a huge burden on operation and maintenance.

Third, there is  the issue of single-node writing  . Under MySQL's traditional one-master-multiple-slave architecture, only one master node can write. When it comes to liquidation, warehouse adjustment, and batch running tasks, it cannot meet the business requirements for write throughput. TiDB is an architecture that separates storage and computing, can be expanded and reduced online, can support high-concurrency OLTP scenarios, and meets financial-level high availability requirements.

After the business was launched on TiDB, the above three problems were perfectly solved, and great benefits were achieved in terms of manpower and costs.

Exploring new features of TiDB V7.1

We will pay attention to every major version iteration of TiDB immediately, because some new features can indeed solve some users' pain points. Like when we used MySQL before, we basically wouldn't upgrade the MySQL version unless we encountered a catastrophic BUG. Because we feel that the benefits that can be brought in a long time are not very large, there is no need to take risks. As for TiDB's upgrade iterations, the new features introduced are still very attractive.

For example, in the TiDB 7.1 LTS version, we found some of the new features in it to be very useful. So we started to build an environment for exploration. Here are four examples of new functions that are important to our business scenarios and will be used in the future.

First of all,  the most important thing is resource management and control, which is the multi-tenant function  . The database cluster is divided into multiple logical units, and multiple different applications can be put into one cluster. Even if the load of a certain business application surges, it will not affect the normal operation of other businesses. In the financial business scenario, after resource management and control is enabled on the unified cluster, it can be ensured that the online transaction business will not be affected by batch or analysis services.

The original MySQL architecture still has one master and two slaves. Because the write volume is relatively large and semi-synchronous replication is enabled, when the processing volume is large, the MySQL main database is still somewhat delayed, resulting in the read-write separation function being inapplicable. This slave database is basically used for disaster recovery, so the overall resource usage is very low. In addition, the peak traffic periods of some businesses are different. During the day, everyone may be buying and selling, or various channels are pushing data, and at night, they may be liquidated and batched. TiDB can achieve peak load shaving and valley filling through multi-tenancy, improve overall resource utilization, and reduce operation and maintenance costs.

In addition, resource management and control can also play a role in limiting traffic. It is very common to encounter Bad SQL or super slow SQL in production. If we encounter this situation, we can temporarily limit the flow by combining the functions of SQL binding and resource management and control. Generally speaking, current limiting is done more in the Proxy layer, but we do not have this capability now. If the database layer encounters an emergency, we can do a SQL-level current limiting for a single SQL, which is very good. As a function, there is no need to change the code and resend the application. It can be done directly on the database side through simple SQL Binding and resource groups.

The second is Partitioned Raft KV  . The data of each Region can be stored independently in a single instance. In this way, each TiKV instance can store more data. The write performance improvement that we are more concerned about is very large, and the speed of shrinking and expanding has also been significantly improved.

The third one is load adaptive reading  . Our current business, including some of the larger businesses we will launch in the future, will all encounter hot spots. The previous solutions for breaking up hot spots are not suitable for our business. With the load-adaptive read function, requests can read copies from other TiKV nodes without waiting in queue at the hotspot TiKV node. In hotspot situations, read throughput can be increased by 70% to 200%, which is a very impressive improvement.

The fourth one is the global incrementing column  . This is a feature of TiDB 6.5, but we are using version 6.1.2 in production and have not yet used this feature. The global increasing column can ensure that the ID is unique and monotonically increasing, which is completely consistent with MySQL's auto-incrementing key. The previous pre-assignment of IDs would make it impossible to implement the paging logic of some of our businesses, requiring developers to adjust the business logic. With the global auto-increment column, there is no need to modify the paging logic when subsequent businesses go online, further reducing development costs.

future outlook

Finally, let’s talk about the future prospects of TiDB.

The first is to launch TiProxy to replace the HAproxy we are using. Under the current situation, if three TiDB-servers are upgraded, the application may be disconnected three times in a row, which is very unfriendly. However, TiProxy can achieve lossless upgrade or restart. In addition, fusing and current limiting functions can be added to TiPrxoy to make the entire architecture more flexible and reliable. TiProxy can even capture the entire database traffic and replay it to other environments or other high versions of TiDB to detect the stability of the new version of the cluster, especially when the database version iterates rapidly, allowing users to better evaluate the new version. Whether it can be used in production.

Second, I hope TiDB can integrate functional platforms. TiDB provides many tool platforms, such as Dashboard, TiUniManager, DM-web, etc., which are all independent platforms. We hope to integrate these tools into a centralized management platform, and even add TiCDC management, so that they can be used by operation and maintenance personnel. It will be more convenient.

Third, we hope to provide inspection functions. After the system goes online, you have to go to the Dashboard or Grafna platform to check the specific situation. If there is an inspection function, you can save labor costs. Combined with current AI technology, it is very meaningful for users to have TiDB issue a cluster operation report and optimization suggestions.

Guess you like

Origin blog.csdn.net/TiDB_PingCAP/article/details/132589004