Database sub-database sub-table (talking about)

Database sub-table is to solve the query performance problem of massive data in a single table, and sub-database is to solve the problem of concurrent access pressure of a single database
There are two options for splitting tables:

1. Sub-tables in the same database: All sub-tables are in one database. Since the table names in the database cannot be repeated, it is necessary to take the database table names into different names.

   ● Advantages: Since they are all in one database, the common table does not need to be copied, and the processing is simple;

   ● Disadvantage: Since the data is still in focus, bottlenecks such as CPU, memory, file IO, and network IO cannot be solved, and the number of data records in a single table can only be reduced.

   Note: Inconsistent table names will also make subsequent processing complicated

2. Sub-tables in different databases: Since the sub-tables are in different databases, the same table name can be used at this time.

   ● Advantages: CPU, memory, file IO, network IO and other bottlenecks can be effectively solved, the table names are the same, and the processing is relatively simple;

   ● Disadvantage: The common table needs to be replicated and synchronized because it is used in all sub-tables.

Implementation of the table partitioning strategy (simple implementation using user ID):

Because most database design and business operations are related to the user's ID, all strategies for splitting tables can be implemented using the user's ID. (Of course, there are many table-splitting strategies, this is just one of the relatively simple ones)

The e-commerce platform assumes that the fixed-point table order stores the user's order data, as follows Sql:

CREATE TABLE `order` (
  `order_id` bigint(32) primary key auto_increment,
  `user_id` bigint(32),
   ...
)

When the data is relatively large, the data is divided into tables. First of all, it is necessary to determine how many tables the data needs to be evenly distributed, that is, the table capacity.

Assuming that there are 100 tables for storage, when we store data, we first perform a modulo operation on the user ID, and  user_id%100 perform a storage query operation according to the corresponding table obtained. The schematic diagram is as follows:


For example, user_id = 101 then, when we get the value, we can use the following sql statement:

select * from order_1 where user_id= 101

Among them, order_1is  101%100 the first chapter order table after the sub-table according to the calculation result.

MyBatis support for database sub-tables:

Interface definition:

/**
  * 获取用户相关的订单详细信息
  * @param tableNum 具体某一个表的编号
  * @param userId 用户ID
  * @return 订单列表
  */
public List<Order> getOrder(@Param("tableNum") int tableNum,@Param("userId") int userId);

xml configuration mapping file:

<select id="getOrder" resultMap="BaseResultMap">
    select * from order_${tableNum}
    where user_id = #{userId}
  </select>

The ${tableNum} meaning is to directly add parameters to sql, which is a feature supported by MyBatis.

Note: In actual development, our shadow tiger ID is more likely to be generated by UUID. In this case, we can first hash the UUID to obtain an integer value, and then perform the modulo operation.

Sub-library strategy:

Database sharding can solve the problem of data query efficiency when the amount of data in a single table is large, but it cannot improve the efficiency of concurrent operations of the database, because the essence of sharding is still an operation performed on a database, which is easily affected. Database IO performance limit.

Therefore, how to evenly distribute the database IO performance problem, it is obvious that dividing the data into the database can well solve the performance problem of a single database.

The implementation of the sub-library strategy is very similar to the implementation of the sub-table strategy, and the simplest method can be done by taking the modulo .


Sub-database sub-table implementation strategy:

In the above configuration, database sub-table can solve the query performance problem of massive data in a single table, and sub-database can solve the concurrent access pressure problem of a single database.

Sometimes, we need to consider these two issues at the same time. Therefore, we need to perform both table and database operations on a single table, so as to expand the concurrent processing capability of the system and improve the query performance of a single table at the same time. The sub-library and sub-table used.

The strategy of sub-database sub-table is more complicated than the previous two. A common routing strategy is as follows:

1、中间变量 = user_id%(库数量*每个库的表数量);
2、库序号 = 取整(中间变量/每个库的表数量);
3、表序号 = 中间变量%每个库的表数量;

For example: there are 256 databases, each database has 1024 data tables, the user's user_id=262145, according to the above routing strategy, we can get:

1、中间变量 = 262145%(256*1024)= 1;
2、库序号 = 取整(1/1024)= 0;
3、表序号 = 1%1024 = 1;

In this case, for user_id=262145, it will be routed to the first table of the 0th database.


Read-write separation strategy:

The storage of massive data is the access, and the data processing capability is improved by separating the read and write of the database. The operations can be decomposed to other databases. In this way, as long as the cost of data replication is paid, the processing pressure of the database can be divided into multiple databases, thereby greatly improving the data processing capability.

Summarize:

There are many options for the sub-database sub-table strategy, and the above should be a relatively simple one based on the user ID. Other methods, such as using number segments for partitioning or directly using hash for routing, etc. Those who are interested can find and learn on their own.

As mentioned above, if the user's ID is generated by UUID, we need to perform a separate hash operation, and then perform a modulo operation, etc. In fact, hash itself is a strategy for sub-database and sub-table. When hash routing strategy is used, what we need to know is the advantages and disadvantages of hash routing strategy. The advantages are: the data is distributed evenly;

The above-mentioned sub-database and sub-table operations, query performance and concurrency capabilities have been improved, but there are still some things that need to be paid attention to, for example: the original cross-table things have become distributed things; In databases and different data tables, it is difficult to perform multi-table association queries, and data cannot be queried without specifying routing fields. After sub-database and sub-table, if we need to further expand the system (routing policy change), it will become very inconvenient, and we need to re-migrate the data.

Finally, it should be pointed out that there are many middleware options for sub-database and sub-table, the most common one is Taobao's Cobar; in addition, Spring can also implement database read-write separation operations.

Middleware:


Note: There are many references in the content of the article, it is only used as a self-summary. If it helps everyone, that would be the best.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325976209&siteId=291194637