Database single table split problem

The focus of this issue: sub-tables in a single database

In this issue, multiple projects have been divided into tables in a single database. The typical approach is:

A table (Table A ) is expected to have tens of millions or even hundreds of millions of pieces of data, so it is split into several independent tables in a single database : A_1, A_2, A_3, … , A_n , use "table name + suffix" to distinguish.

 

In the vast majority of cases, this is not an appropriate practice. The following table shows the impact of sub-tables in the library on some key indicators:

project

Library sub-table

Difficulty of application development

higher than a single table

The routing of data, cross-table query, sorting, etc. need to be managed.

The difficulty of dismantling the library in the future

higher than a single table

Need to migrate data to multiple tables and apply

Difficulty of table structure changes

higher than a single table

In addition to being time-consuming, multiple tables have to be changed.

query performance

There is no obvious difference between the index and the single table.

A full table scan, if performed concurrently, will be faster than a single table scan.

Write performance

There is no significant difference compared to the single table

Concurrent performance

It has no effect on row locks and will reduce table lock conflicts, but updating by the primary key does not add table locks.

 

As can be seen from the above table, the benefits provided by sub-tables in the library are very small, and these benefits are basically not used in our business scenarios. However, it brings many difficulties in programming, operation and maintenance, etc.

Based on these factors, we do not support the use of sub-tables in the library.

 

How to deal with the following situations in the library table?

1. High number of records      

Generally, we do not recommend that more than 10 million rows of data appear in a single table . If such data appears in the business, the first thing to consider is to reduce the amount of data through archiving.

Archiving can effectively separate hot and cold data, and at the same time, warm data can also be designed to provide services under low-level SLAs .

 

For data that cannot be archived, it is mainly master data, such as users and commodities. Whether to do read-write separation or horizontal split can be determined according to tps/qps and the ratio between them, the application's requirements for data consistency, and the size of the disk space.

However, the main data is generally shared data with high access volume. Normally, the performance should be guaranteed by dismantling the database.

 

2. Slow query      

The occurrence of slow queries cannot be taken for granted because the number of records in the table is too large. There is no necessary relationship between the number of records in the table and the slow query. Common reasons for slow queries:

l The index is not used correctly or the index is invalid, resulting in a full table scan 

l The table design is unreasonable, and there is a multi-table Join 

l There is a problem with the writing method of SQL itself, and a large number of intermediate results appear. 

l The result set returned by SQL is too large 

 

Therefore, slow queries should be solved by focusing on the design of the table and the writing method of SQL . The sub-table in the library is not a good solution.

 

3. High TPS/QPS      

High TPS is the most important basis for dismantling a library. For example, some large tables such as order tables and coupon tables have hundreds of millions of data. The primary reason for splitting is that TPS can’t support it anymore. If it exceeds 6K , the library will start to have data. Significant delay.

If the TPS is not high but the QPS is high, and the data consistency requirements are not high, the read-write separation scheme can be considered.

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326437859&siteId=291194637