How to use Mysql to achieve sub-database and sub-table for hundreds of millions of orders? Check out this design idea

1. Background

With the growth of the company's business, if there are more than 10 million orders per day, there will be about 1 billion orders in three months. The previous database in the form of a single database and single table is no longer sufficient for business needs, and the database transformation is imminent.

2. How to divide order data

We can divide order data into two types: hot data and cold data.

  • Hot data: order data within 3 months, high real-time query;
  • Cold data A: order data from 3 months to 12 months ago, the query frequency is not high;
  • Cold data B: The order data of 1 year ago is almost never queried, and there are only occasional query requirements;

Maybe there is a doubt here why the cold data should be divided into two categories, because according to the actual scenario requirements, users will basically not check the data one year ago. If this part of the data is still stored in the db, the cost will be very high, and It is also not easy to maintain. In addition, if there are individual users who need to view the order information one year ago, users can go offline to view the data.

For the storage of these three types of data, the current planning is as follows:

  • Hot data: Use mysql for storage, of course, you need to sub-database and sub-table;
  • Cold data A: For this type of data, it can be stored in ES, and basically faster queries can be done by using the characteristics of search engines;
  • Cold data B: For this kind of data that is not frequently queried, it can be stored in Hive;

3. How MySql divides the database and the table

3.1. Split by business

In the initial stage of business, in order to speed up application launch and rapid iteration, many applications adopt a centralized architecture. However, with the expansion of the business system, the system becomes more and more complex, more and more difficult to maintain, the development efficiency becomes lower and lower, and the consumption of resources becomes larger and larger, and the cost of improving system performance through hardware will become higher.

Generally, a general e-commerce platform includes several major modules such as users, commodities, and orders. The simple method is to create four tables in the same library, as shown in the following figure:

However, with the improvement of business, it has become more and more difficult to maintain all services in one library. Therefore, we recommend that different services be placed in different libraries, as shown in the following figure:

As can be seen from the figure, we put different services into different libraries, and disperse all the original pressure from the same library to different libraries, which improves the throughput of the system.

3.2. Sub-library and sub-table

We know that each machine has its own physical upper limit no matter how well it is configured, so when our application has reached or far exceeded a certain upper limit of a single machine, we have to look for the help of other machines or continue to upgrade. Our hardware, but the common solution is to share the pressure by adding more machines.

We also have to consider when our business logic continues to grow, can our machines be able to meet the demand through linear growth? Therefore, the use of database sub-database and sub-table can immediately improve the performance of the system. The other reasons for using the database sub-database and sub-table will not be repeated here, but the specific implementation strategy will be discussed.

(1) Sub-table strategy

Let's take the order table as an example. In the order table, the order id must not be repeated, so it is very suitable to use this field as a shard key, similar to other tables. Suppose the fields of the order table are as follows:

1create table order(2 order_id bigint(11) ,3

We assume that it is estimated that a single library needs to allocate 100 tables to meet our business needs. We can simply take the modulo to calculate which sub-table the order is in, for example: order_id % 100,

At this time, someone may ask, if I divide the table according to the order_id, but I want to query the corresponding order according to the user_id, can't I locate which sub-table? It is true, once the shard key is determined, it can only be determined according to The shard key is located in the sub-table to query the data under the sub-table; if you really want to query the related orders based on user_id, you should set the shard key to user_id, and the sub-table rule should be changed accordingly: user_id % 100;

(1) Sub-library implementation strategy

Database sharding can solve the problem of data query efficiency when the amount of data in a single table is large, but it cannot improve the efficiency of concurrent operations of the database, because the essence of sharding is still an operation performed on a database, which is easily affected. Database IO performance limit.

Therefore, how to evenly distribute the database IO performance problem, it is obvious that dividing the data into the database can well solve the performance problem of a single database.

The implementation of the sub-database strategy is very similar to the implementation of the sub-table strategy. The simplest way is to route by modulo.

Let's take the order table as an example,

For example: order_id % library capacity,

If order_id is not an integer type, you can hash it first and take the modulo,

For example: hash(order_id) % library capacity

(3) Combined use strategy of sub-library and sub-table

Database sub-table can solve the query performance problem of massive data in a single table, and sub-database can solve the concurrent access pressure problem of a single database. Sometimes, we need to consider these two issues at the same time. Therefore, we need to perform both table and database operations on a single table in order to expand the concurrent processing capability of the system and improve the query performance of a single table at the same time. The sub-library and sub-table used.

If the sub-database and sub-table are used in combination, the order_id modulo operation cannot be simply performed, and an intermediate variable needs to be added to disperse to different sub-tables. The formula is as follows:

Intermediate variable = shard key % (number of libraries * number of tables in a single library); 2 library serial number = rounded (intermediate variable / single

For example: there are 10 databases, each database has 100 data tables, the user's order_id=1001, according to the above routing strategy, we can get:

In this case, for order_id=1001, it will be routed to the second table of the first database (index 0 represents 1, and so on).

3. Overall architecture design

From the figure, we divide the request into read and write requests. The write request is relatively simple, that is, it can be written to the db according to the rules of sub-database and sub-table.

For a read request, we need to calculate whether the query is hot data or cold data. The general order_id generation rules are as follows, "area code of the merchant + timestamp + random number", we can calculate whether the query is hot or cold data according to the timestamp , (of course, the specific business needs to be treated in detail, and will not be elaborated here)

In addition, the cold data in the architecture diagram refers to the data from 3 months to 12 months ago. If you want to query the data from one year ago, it is recommended to directly check hive offline.

There is a timing job in the figure, which is mainly used to migrate order data regularly. Cold data needs to be migrated to ES and hive respectively.


Author: ~ Ou Dad ~
Link: https://juejin.cn/post/6844903683046522887
Source: Rare Earth Nuggets
The copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.

Guess you like

Origin blog.csdn.net/wdjnb/article/details/124452786