JD business system database sub-database sub-table architecture design

I am honored to participate in the implementation of the entire technical solution, and have a deep understanding of the architecture design and technical details. Welcome to discuss and exchange!

The one-yuan treasure grabbing system is an emerging business system of JD Virtual, and the order volume has continued to grow since its launch. Two months before 618, the virtual R&D department of Jingdong Mall made an overall estimate for the system. The rapid increase in order volume and the arrival of the 618 promotion will bring about a sharp increase in order volume, which will inevitably put pressure on the database capacity and load. The analysis results show that the database is likely to become a bottleneck affecting performance, and it is decided to transform the bottom layer of the database into sub-databases and sub-tables to ensure the ability to dynamically expand data levels, meet the needs of continuous growth in data capacity, and improve order efficiency.

1. Business introduction


The picture above is the product details page of One Yuan Qianbao. It can be seen from the picture that the product of One Yuan Qianbao is a product item. It is different from other JD.com products in that it has the concept of period, total number of people and remaining people; assuming that a commodity item has 100 stocks, it will be sold in 100 times, and one sale in each time is one inventory; When the remaining number of people is 0, the current period of treasure grabbing ends, and then according to the corresponding algorithm to generate treasure grabbers; then proceed to the next period of treasure grabbing.

Through technical transformation, three goals are achieved on the whole: 1. Realization of underlying routing strategy; 2. Historical data migration; 3. Business transformation. The process of this transformation is described in detail below.

2. Database container estimation

The most important thing about sub-database and sub-table is to estimate the container first, and estimate the number of containers/libraries/tables and the rules of sub-database and sub-table based on the amount of data and business characteristics.

Assuming 1 million orders a day, 360 million orders will be generated in a year; suppose the data structure is as follows: the order table has 10 fields, and each field contains 50 characters; an order requires 500 bytes of storage, so 360 million orders require about 170GB of storage space; assuming that the storage space of each machine is 200GB, adding one machine every year can meet the capacity requirements. The actual demand should be determined according to the stress test results; for example, stress test whether other indicators meet the demand, such as QPS, response time, etc.

3. Underlying routing strategy selection and implementation

The sub-database and sub-table routing strategy is the foundation, affecting the entire system architecture, whether the later business requirements are met and supported, and whether it is convenient to use are all related to this. The routing policy design is reasonable, and the upper-layer business will be very convenient to use. The routing strategy adaptation and implementation of the Yiyuan Qianbao project is implemented at the DAO layer, which is transparent to the upper business layer, so you don’t need to care about the specific implementation, and the routing strategy does not involve structural changes, and will not affect the upper layer.

We know that there are two common table splitting strategies:

hash routing

Advantages: Data dispersion and hotspot dispersion can be realized;

Insufficient: When adding database nodes, the routing strategy will be affected, and data migration is required;

Partition routing (incremental interval routing)

Advantages: The strategy supports dynamic expansion, which can be expanded theoretically infinitely;

Disadvantages: There is a data hotspot problem, and the newly generated table has a high frequency of reading and writing; each query needs to go through the routing policy table.

Of course, every strategy is not perfect, only the strategy that best suits the business scenario is good. The project uses a combination of two approaches.

First, divide the database according to the hash of the treasure grab item, and then divide the table according to the treasure grab period interval, as shown in the following figure:

The current routing policy table rules are as follows:

Why use this strategy?

The treasure grab item is the upper-level dimension of the business, which can be understood as a commodity, and most tables have this field; the id is continuous when generated, and in the long run, the data will be balanced after the hash is divided into databases. The treasure grab period is a dimension under the treasure grab item. For example, if the inventory of an item is 100, under the premise of non-stop sales, 100 periods will be generated, and only one period is on sale. Why choose the period id range as the sub-table routing strategy? Some friends think that you can also choose the order id. From the perspective of routing strategy, there is no problem. However, in the business scenario of the one-dollar grabbing treasure project, there is a scenario in which the order participation record is queried according to the item id and period id, so it is necessary to consider that the order can be found through these two dimensions. In addition, using intervals as a table partitioning strategy can be dynamically expanded. Even if each query passes through the routing table, this overhead can be ignored, and it is all loaded through the cache.

What are the dimensions that can be routed for the above strategies?

1. Routing through the order id: the order number is generated according to certain rules, which stores the information of the library and table, and can directly locate the corresponding library and table according to the order number;

2. Routing through the id of the treasure grab item and the id of the treasure grab period: the hash of the treasure grab item is located in the library, and the routing policy table of the treasure grab period is located in the table. The specific diagram is as follows:

4. Realization of aggregation query and aggregation data synchronization

Having points involves aggregation queries, how do we achieve it? First look at the following architecture diagram:


The above picture is the architecture diagram after the transformation of the data layer. It used to be a single-table master-slave mode, but after the transformation, it has multiple sub-databases and basic databases. Aggregation uses elastic search (hereinafter referred to as ES).

Why use it? First, it is simple, convenient, and easy to access; second, it supports dynamic expansion and sharding, and is transparent to the business layer. The aggregation query in the system mainly uses ES. Of course, we have many downgrading schemes, which will be mentioned later. ES cannot be used as a library, and it cannot guarantee data integrity 100%, so there must be data backup. We use an aggregation table to save data for a period of time for downgrading. Once ES is delayed or the cluster is unavailable, the query aggregation table will be downgraded.

How do we synchronize ES? We used canal. Some friends may have said, why not synchronize when inserting directly into the code, you can do this, but there are two problems, one is how to deal with synchronization failure, how to ensure transactions, and the other is strong coupling with business code, borrowing terms, not beautify. Using canal, the code is decoupled and does not intrude into the code. It actually simulates the master-slave replication mechanism of the database and pretends to be a slave library. When the binlog of the database (in order not to affect the production of the main library, we monitor the slave library) changes, canal monitors it and analyzes and filters the binlog through the analysis service to filter out the required logs. After parsing, we send MQ messages. The message body is the table name and primary key id, not the entire piece of data. The consumer end receives the changed table name and id, queries the latest data from the database in real time, and synchronizes it to ES and aggregation tables.

Why pass MQ messages? It can also be explained by the above two points. One is that the message supports retrying on failure, and an exception is thrown after the storage fails, waiting for the next processing. The other is decoupling between systems. Careful friends can see that a message queue is subscribed by multiple consumers (it can be understood that each consumer's queue is mirrored). This is done in order not to interfere with each other during storage; if one subscriber is used for processing, the storage of ES fails, and the other two aggregations are successfully stored, then an exception or other processing methods should also be thrown, and the other two aggregations will be stored once next time consumption.

The above is the design of our aggregation and synchronous aggregation. When querying, some services will first query the cache, and then query the ES if it does not exist. If it is downgraded, the database will be checked, and the normal aggregation query will not find the database.

5. Historical data migration

Since our system is a single database when it goes online, and the sub-database is a technical transformation made a few months after it goes online, the data needs to be migrated. The main migration steps are as follows:


In the first half, new codes are used from scanning to synchronization to sub-databases, and the above logic is reused from canal to synchronization ES and aggregation tables. This design reduces our overall workload and ensures the integrity of data migration.

The specific migration details are as follows:


It can be seen that it is mainly divided into two parts, before shutdown and after shutdown. Before the downtime, the historical data is migrated, and repeated migration is supported; after the downtime, only the incremental part is migrated, which greatly shortens our online time. Only a small amount of data needs to be migrated after an outage.

Migration involves data verification, and the verification logic is relatively simple as a whole:


The three dimensions are compared with the basic database respectively. If they are different, re-migrate the data for a certain day.

6. Degradation of key system nodes

This part is also very important. Our downgrade mainly has two points, one is canal synchronization delay downgrade, and the other is ES unavailable downgrade. The first is as follows:


If the canal synchronization is delayed, or the slave library hangs up, turn on the switch, scan the main database data (in the last few hours) and directly synchronize to ES and aggregation tables; in this way, even if the slave library hangs up, it will not affect the business data. This is very important, and we have encountered it in actual business scenarios.

When ES is downgraded and ES is unavailable, turn off the ES switch and directly query the aggregation table.

7. Summary

From the design to the final completion of a system, it depends on the whole team, everyone's ideas, the collision and dedication of different ideas; it is especially important to have a reasonable and meticulous design in the early stage, and make a detailed plan for each time point and specific launch steps and rollback plan; in addition, meticulous and in-depth testing, the test environment and multiple rounds of online testing and regression are also important guarantees for normal launch.

The above is the main idea of ​​sub-database and sub-table of Jingdong Yiyuan Qianbao project. I hope that friends who have the same idea can communicate in depth and improve the system architecture of each other. 

Guess you like

Origin blog.csdn.net/dalinsi/article/details/131105972