Non-inductive smooth migration: how to localize massive high-concurrency databases?

First, let’s talk about the background of database localization.

1. Background of database localization

In terms of national strategy, as the external situation becomes increasingly complex, the core technology urgently needs to be independently controllable, safe, reliable, efficient and open; the other requirement is business. When the business develops rapidly, various problems will follow one after another. Stand-alone database Reaching the bottleneck, business split, vertical split, horizontal split, etc., all require a lot of research and development time.

Second, the mainstream database architecture

First, share with you the current mainstream database architecture. At present, the mainstream database architectures in the industry are roughly divided into three categories: Shared-Everything, Shared-Nothing, and Shared-Storage.

  • Shared-Everything

This kind of architecture may be familiar to everyone. It is a very classic architecture. All processes on the host share CPU, memory, and IO. If any hardware reaches a bottleneck, it means that the database has reached a bottleneck.

  • Shared-Nothing

Two architectures can be further subdivided here, one is Proxy-based architecture, which evolved from the traditional stand-alone database, when our stand-alone database reaches the bottleneck, we will add a layer of Proxy, through which we The data is scattered to different nodes to solve the problem of data scalability; another Shared-Nothing architecture is a popular database like TiDB and OB in China.

  • Shared-Storage

This kind of architecture is more famous in the industry than Aurora, and Ali's PolarDB is also relatively popular in China.

First, let's introduce the Shared-Everything architecture in detail. This architecture may be familiar to everyone. It is a traditional stand-alone database architecture. When our database reaches a bottleneck, we usually use various methods to disperse our data into each set. To solve the problem of the upper limit of the local capacity of the database.

Many companies use this method to integrate data into a set based on the same sharding logic at the application layer, access layer, and gateway layer, thereby realizing a closed loop of data in the set. A lot of interesting things can be done through such a set of architecture. For example, some full-link online pressure tests, because the data is already closed in a set, if I do data pressure tests in a test set, it will not pollute the real data on the line, and I can also do some direct line Grayscale drainage, grayscale publishing, etc. on the Internet.

In the Shared-Storage architecture, AWS's Aurora is currently doing very well abroad, and Ali's PolarDB is doing better in China. Because it has a very high investment in database research and development, and at the same time needs to rely on a very powerful underlying base, so after these databases were launched at the beginning, Aurora has only supported deployment on the cloud so far.

Shared-Nothing architecture is already familiar to everyone. After our traditional database reaches the bottleneck evolution, a single server cannot solve the performance and capacity problems, so we add a layer of Proxy and do various routes on the Proxy. This architecture is slow slowly evolved to the present. There are a lot of databases that have evolved based on this architecture in China, and most of the databases currently used for mobile use a similar architecture.

This architecture can be divided into three main categories of components:

The GTM component is responsible for global coordination and is mainly used to coordinate the management of distributed transactions, including the active generation of global IDs, snapshots of GTE IDs, and so on.

Proxy/compute node. In the beginning, the Proxy layer may simply do data routing, but with the evolution of the entire database architecture, the Proxy layer also undertakes a certain amount of computing power, including optimization of distributed transactions, pushdown of computing, and Some SQL parsing, etc.

Most of the underlying storage nodes are currently based on open source databases for secondary development, and the more common ones are MySQL and PostgreSQL. The advantage of its architecture is that it is relatively mature and stable. But its disadvantage is that the whole transformation process is very painful, because in the whole transformation process, we need to select the shard key in the first step, and then disperse the data to different nodes according to our shard key. Just enough to drive the business crazy.

Another architecture of Shared-Nothing is the OB and TiDB just mentioned. Here we take TiDB as an example. TiDB is a set of distributed databases developed based on Google's papers. This database is divided into three layers, the scheduling layer PD, the computing node TiDB, and the underlying storage node TiKV.

However, TiDB does not need to explicitly specify the sharding key in the process of using it. The sharding and splitting of its data is automatically completed with the underlying database. Therefore, if such an architecture is adopted, the transformation cost may be lower. Relatively lower.

In summary, the four database architectures mentioned above can be seen in terms of scalability. Except for the first one, there is no way to achieve dynamic horizontal expansion. In fact, as long as the transformation is completed, the latter three can basically achieve the data layer. Horizontal expansion.

In terms of consistency, the first and last types basically rely on semi-synchronization to ensure consistency between nodes. The two types of databases in the middle generally use distributed consensus protocols. Currently, the Paxio protocol and the Raft protocol are widely used.

Next, I would like to share with you some of our practical experience in the challenge and exploration of database localization.

3. Challenges and explorations of database localization

The deep binding of business logic and data layer is the most troublesome point in our entire database transformation process, so various database migration solutions have emerged, which can be summed up in the following seven steps: selection, testing, synchronization, transformation , Grayscale, Online, Guarantee.

The first is the selection stage. We pay more attention to stability, efficiency, cost and ecology. The model selection stage is a very important stage, because if the model is selected well, it can greatly reduce our cost in the process of database localization transformation later. But many times when we are doing database transformation, in addition to technical factors, there are many non-technical factors that need to be considered in the selection stage, so in the end we can only choose the optimal solution within a limited range. It cannot be expected to solve all the problems in the localization of the entire database at the stage of model selection.

Then comes the testing phase. After we get a database, we may do various tests, such as basic functional testing, usability testing, maintainability testing, and some basic performance testing. But even after we have done such a test, we cannot guarantee that this set of databases can meet the needs of the business. What is more, after we have completed such tests, we will eliminate some databases that do not meet our needs. If we want to filter out which databases can basically meet our requirements, we may also need to combine some online traffic recording and traffic playback.

Data synchronization can be expanded a little bit here. This part is mainly divided into full data synchronization and incremental data synchronization . I strongly recommend that when you do this kind of large data migration, you should not go up to study the full backup but use physical backup. Or logical backup, which tool is strong and which tool is bad. The first step is more to analyze your own business, analyze your own database tables, what is the business logic, how is the data distributed, which are historical data, hot data, cold data, and which data can be used now What data needs to be migrated immediately, and which data can be moved later... After such data analysis, full and incremental data migration can often achieve twice the result with half the effort .

In terms of incremental synchronization, we have also developed a similar incremental data synchronization tool internally. It listens to the log of database changes, and then writes these changes to message middleware like Kafka through middleware, and then subscribes to business needs from our middleware.

A lot of interesting things can be done with such a tool. Like the data migration we just mentioned, you can use these tools to synchronize heterogeneous data, and you can also do something like synchronization between databases as caches, or synchronization between OLTP and OlAP, etc. This tool can solve a lot of business pain points.

Business transformation is also a very painful point. The key points of our transformation are mainly divided into two parts, one is the application, and the application part focuses more on some drivers, syntax compatibility, data objects, etc., API, SQL, etc. The database layer focuses more on things like data sharding, hot and cold separation, light and heavy isolation, SQL optimization, read-write separation, and so on.

There are already dozens of points that need to be paid attention to in this adaptation and transformation that we have accumulated in our current transformation, so I did not list them in detail here, but just summarized them a little bit. What I just mentioned in the previous pages is that the biggest pain point in the transformation of database localization is the deep coupling of business logic and database, so this problem can be solved. So we gradually weakened the DB into a simple storage during the transformation process, then clarified a boundary between the application and the data, and then made various adaptations and transformations.

The localization of the database through this series of operations also serves as a sorting and optimization of our entire architecture. The switching scheme can be briefly mentioned here. There are two main switching schemes, both of which are currently in use: the first one is based on the middleware of the data layer, which is relatively simple, and the other one is based on the application This scheme may be relatively complicated, but this kind of stability and security requirements will better guarantee the stability and requirements.



Let me briefly introduce the switching based on the data layer. At the beginning, our applications may be directly connected to our DB, and there are some VIP things in the middle, so I will omit them directly. At the beginning, we added a layer of middleware to it, from the first step to the second step, because the middleware also points to our DB, so R&D can be changed slowly. It doesn't matter that the transformation process takes ten days and half a month, and it won't have any impact on the business. After we switch all traffic to the middleware, we capture packets at the database level to ensure that all our requests come through the middleware.

After all the transformation of the application is completed, we will synchronize all the data to the new database, and then we can put our read traffic on the new database. Because the application has come through our middleware, we do read-write separation on the middleware, which is completely transparent to the application.

We send our read traffic to our new database to verify whether the new database can meet the business requirements. Of course, there is no way to verify the write traffic through such an architecture. During the switching process, the impact on the business is just a few tenths of a second of flash disconnection. After the replacement, we will synchronize the data back. If there is a problem, we still have a way to go back.

The second solution is the business double-write solution. This solution may be more secure than the previous solution, but it is also more complicated. First of all, we prepare a new set of databases, first perform full synchronization, and then perform incremental synchronization to ensure that the data on both sides is equal. Then we find another time window. Of course, the premise is that we have completed the execution of the previous step, that is, the adapter card photo of our application has been executed, and then enable our double writing through our application, and then ensure that the data on both sides are written. Consistent, at this time, I will stop the incremental synchronization, and then start a script at the bottom to compare the data on both sides asynchronously to ensure that the data on both sides are consistent.

If any inconsistencies are found during the reconciliation process, log them out, and then manually analyze them. When the two sides are consistent, you can start to analyze them. Because at this time our business is mainly based on the old library and supplemented by the new library. At this time, the new library has both reading and writing. We can observe various solution indicators of the new library from the business perspective, including time-consuming , error rate, etc. If there is a problem with the new library, it can basically be found at this stage, and it will have no impact on the business. After it is stable enough, we will reverse the double-write, make the new library the main one, and the old one as a supplementary one, and then continue to run stably for a period of time. When we find that the new library is stable enough, we will remove the double write.

After completing the above step, it is time for the guarantee stage after going online. There are mainly two parts to do here, one is observability, and the other is controllability and observability, which ensures that we can quickly discover and locate faults when they occur. Respond quickly and recover quickly after a failure.

Observability is a piece of logging, tracing, and metric, which are often heard by everyone, so I won’t go into details. In fact, we have also made corresponding constructions for the points just now, including the monitoring of the underlying resources, the monitoring of the business layer, the log collection platform, and the call link analysis platform. In terms of recovery, in fact, when we can find that it is a fault through past faults, most of the fault recovery time is not spent on the moment of repairing the fault, but more time is spent on the division of labor between information communication personnel From the occurrence of faults to coordinating with the corresponding people, then synchronizing the information corresponding to the letter, and then deciding what to do.

We have made a whole set of emergency response system for time distribution. We divide faults into minor problems, medium problems, and major problems to create a contingency plan management platform. Usually, O&M may only need to establish some of our atomic contingency plans on it. For example, some atomic capabilities such as SQL kill, database master-slave switching, and so on. Then we integrate such a variety of very mature atomic capabilities into the contingency plan platform, and then arrange various fault contingency plans on the contingency plan platform, combined with AI, to automatically recommend contingency plans when a fault occurs Supplemented by human judgment.

By pushing such intelligent plans to our mobile terminal, what people may need to do is just click on the mobile terminal, and then it can automatically execute our plan. For example, if we encounter some relatively simple problems encountered in the transformation process, we will perform downgrade operations. Downgrades include business downgrades, request downgrades, and underlying resource downgrades. At the technical level, we may also do some architectural downgrades.

Finally, we still have our bottom-up plan. Among the switching solutions mentioned above, whether it is our double-write solution or our solution based on database middleware switching, we can guarantee that we can roll back that database within seconds. Like the double-write scheme, because we use that double-write as a switch stored in a configuration item, we only need to modify the switch of the database switch to make a fallback, like the middleware scheme, we only need to switch the back-end direction.

Finally, make a summary of our experience in database localization transformation.

Summarize

The first is that it is suitable. In fact, most of the time, the centralized database is still the current optimal solution. If your data may only be a few hundred G, or even two or three T, then the centralized database may still be the best choice at present, because You don't need to bear data sharding, and there are a series of problems such as various time-consuming cross-node synchronization.

The second is that there is no silver bullet, but don't expect to use one database to solve all problems. When we used Oracle in the past, Oracle took on too many things that he shouldn't have to take on. Now we are in the process of database transformation, and slowly we are also solving such a problem. For example, we use Redis to handle the amount, such as ad hoc query, then use some databases such as clickhouse to carry it, and then do some large-scale Some of the report analysis of such things. Maybe we will synchronize to our big data platform through our synchronization tool, and do some analysis like this through the big data platform.

The third is to tear down the walls in operation and maintenance and break down the barriers of technology and business. Technology can only serve the business and integrate with the business, while operation and maintenance can only get twice the result with half the effort and be called real technical operation only if it jumps out of its own frame and looks at the problem from a global perspective.

Guess you like

Origin blog.csdn.net/LinkSLA/article/details/130361665