Only sub-database and sub-table are necessary for hundred million-level data optimization

1. How many rows of a single table is suitable for sub-database sub-table?

When the number of rows in a single table exceeds 5 million rows or when the capacity of a single table exceeds 2GB, it is recommended to use sub-database sub-table.

If the amount of data in the project is expected to reach this level for more than three years, please do not divide the database and tables when creating the tables.

Learn Ali Java Specification

2. The benefits of sub-database and sub-table

  • Improve query efficiency
  • Reduce database pressure
  • High availability: There is a problem with a database, other businesses will not be affected

3. How to split the sub-database and sub-table data?

  • Vertical split: Divided into different libraries, the data structure of the tables in the library are different
  • Horizontal split: Divided into different libraries, the data structure of the tables in the library is the same

4. Problems and solutions brought by sub-database and sub-table

1. Problems caused by vertical sub-database sub-table: associated query and distributed transaction

Relevant query solutions:
①Redundancy: try to avoid related queries through field redundancy.
②Data synchronization: synchronize table data from another library to this library through MQ; dblink, ETL (synchronize table data through timing tasks, Low real-time performance)
③Broadcast table: each library has a same table, and the data in the table is consistent
④In the code: query the data in the tables in each library, and then sort it out in the code in the memory. We need data
⑤ try to have a business association table in a library

Distributed transaction solution ideas:
Distributed frameworks such as seata, tcc, etc. solve the problem of distributed transactions, but the use of distributed transactions will slow down the efficiency, which is inevitable.

Microservices are actually vertical sub-databases

2. Problems caused by horizontal split: paging query, global ID, uniform distribution of data
paging query solutions:
① If the amount of data is large, you can have one table per month, or even one table per week, when querying Only support to check the data of a certain month
② Sort the data in the code

Global ID solution ideas:
①UUID ②Snowflake
algorithm

Uniformly distributed data:
①hash% modulus
② random
③ range: 0-1 one hundred million in a data table, the data in another table 1-2 one hundred million, 200-300 million in another data table
④ time: monthly, weekly ...
⑤ By region, the data generated in the same region are in a table
⑥ Compound algorithm: range modulo modulo range
⑦ enumeration: male female

Guess you like

Origin blog.csdn.net/RookiexiaoMu_a/article/details/106630959