MySQL: A summary of commonly used sub-database sub-table solutions by Internet companies. Isn't it fragrant?

One, the database bottleneck

  1. IO bottleneck

  2. CPU bottleneck

2. Sub-database and sub-table

  1. Horizontal sub-library

  2. Level score table

  3. Vertical sub-library

  4. Vertical sub-table

3. Sub-database and sub-table tool

Fourth, the steps of sub-database sub-table

Fifth, the problem of sub-database and sub-table

  • Non-partition key query problem

  • Non-partition key cross-database cross-table paging query problem

  • Scaling problem

6. Summary of sub-databases and sub-tables

7. Example of sub-database and sub-table

 

 

 

One, the database bottleneck

 

Whether it is an IO bottleneck or a CPU bottleneck, it will eventually lead to an increase in the number of active connections in the database, which will then approach or even reach the threshold of the number of active connections that the database can carry. From the perspective of the business service, there are few or no available database connections. Then you can imagine it (concurrency, throughput, crash).

 

1. IO bottleneck

The first type: disk read IO bottleneck, too much hot data, database cache can not fit, each query will generate a large amount of IO, reduce query speed -> sub-database and vertical sub-table.

 

The second type: network IO bottleneck, too much data requested, insufficient network bandwidth -> sub-database.

 

2. CPU bottleneck

The first: SQL problems, such as SQL including join , group by, order by, non-indexed field conditional query, etc., increase the operation of CPU operation -> SQL optimization, establish appropriate indexes, and perform business calculations in the business service layer.

 

The second type: The amount of data in a single table is too large, too many rows are scanned during query, SQL efficiency is low, and CPU is the first to have a bottleneck -> horizontal table splitting.

 

 

2. Sub-database and sub-table

 

1. Horizontal sub-library

 

Concept: Based on the field, according to a certain strategy (hash, range, etc.), the data in one database is split into multiple databases.

 

result:

  • The structure of each library is the same;

  • The data of each library is different, and there is no intersection;

  • The union of all libraries is the full amount of data.

 

Scenario: Absolute concurrency of the system has risen, it is difficult to solve the problem fundamentally for sub-metering, and there is no obvious business attribution to vertical sub-database.

 

Analysis: With more libraries, the pressure on io and cpu can naturally be relieved exponentially.

 

2. Level score table

 

 

Concept: Based on the field, according to a certain strategy (hash, range, etc.), the data in one table is split into multiple tables.

 

result:

  • The structure of each table is the same;

  • The data of each table is different, there is no intersection;

  • The union of all tables is the full amount of data;

 

Scenario : The absolute concurrency of the system has not risen, but the amount of data in a single table is too much, which affects the efficiency of SQL, increases the burden on the CPU, and becomes a bottleneck.

 

Analysis: The amount of data in the table is less, and the single SQL execution efficiency is high, which naturally reduces the burden on the CPU.

 

3. Vertical sub-library

 

 

Concept: Based on the table, split different tables into different libraries according to different business attributions.

 

result:

  • The structure of each library is different;

  • The data of each library is different, there is no intersection;

  • The union of all libraries is the full amount of data;

 

Scenario: Absolute concurrency of the system has come up, and separate business modules can be abstracted.

 

Analysis: At this point, it can basically be service-oriented. For example, with the development of business, there are more and more common configuration tables, dictionary tables, etc. At this time, these tables can be split into separate libraries, or even service-oriented. Furthermore, with the development of the business, a set of business models has been incubated. At this time, the related tables can be separated into a separate library, or even service-oriented.

 

4. Vertical sub-table

 

 

Concept: Based on the field, according to the activity of the field, the fields in the table are split into different tables (main table and extended table).

 

result:

  • The structure of each table is different;

  • The data of each table is different. Generally speaking, the fields of each table have at least one column of intersection, usually the primary key, which is used to associate data;

  • The union of all tables is the full amount of data;

 

Scenario: The absolute concurrency of the system has not come up, the table has not many records, but there are many fields, and the hot data and non-hot data are together, and the storage space required for a single row of data is relatively large. As a result, the number of data rows cached by the database is reduced, and a large amount of random read IO will be generated when querying the disk data, resulting in IO bottleneck.

 

Analysis: You can use the list page and the details page to help understand. The principle of splitting the vertical table is to put the hot data (the data that may be redundant and often queried together) together as the main table, and the non-hot data together as the extended table. In this way, more hot data can be cached, thereby reducing random read IO. After dismantling, if you want to get all the data, you need to associate two tables to fetch the data.

 

But remember, don't use join, because join will not only increase the CPU burden but also talk about two tables coupled together (must be on a database instance). Associated data, you should make a fuss about the business service layer, obtain the main table and the extended table data separately, and then use the associated fields to associate all the data.

 

 

3. Sub-database and sub-table tool

  • sharding-sphere: jar, formerly sharding-jdbc;

  • TDDL:jar,Taobao Distribute Data Layer;

  • Mycat: middleware.

Note: Please do your own research on the pros and cons of the tool, the official website and the community first.

 

 

Fourth, the steps of sub-database sub-table

Evaluate the number of sub-databases or sub-tables according to capacity (current capacity and growth) -> select key (even) -> table sub-rules (hash or range, etc.) -> execute (generally double write) -> expand capacity (minimize as much as possible) Data movement).

 

 

Fifth, the problem of sub-database and sub-table

1. Non-partition key query issues

 

Based on the horizontal sub-database sub-table, the split strategy is the commonly used hash method.

In addition to the partition key, there is only one non-partition key as a conditional query.

 

Mapping method:

 

 

Genetic method:

 

 

Note: When writing, the gene method generates user_id, as shown in the figure. Regarding the xbit gene, for example, there are 8 tables, 23=8, so x takes 3, that is, the 3bit gene. When querying based on user_id, the modulo can be directly routed to the corresponding sub-database or sub-table.

 

When querying based on user_name, first generate user_name_code through the user_name_code generation function, and then take the modulo and route it to the corresponding sub-database or sub-table. id generates commonly used snowflake algorithm.

 

Except for the partition key, more than one non-partition key is used as a conditional query on the end

 

Mapping method:

 

Redundancy:

Note: When querying by order_id or buyer_id, it will be routed to the db_o_buyer database, and when querying by seller_id, it will be routed to the db_o_seller database. It feels a bit upside-down! Is there any other good way? What about changing the technology stack?

In addition to the partition key, there are various non-partition key combination condition queries in the background

 

NoSQL method:

 

 

Redundancy:

 

 

2. Non-partition key cross-database cross-table paging query problem

Based on the horizontal sub-database sub-table, the split strategy is the commonly used hash method.

Note: Use NoSQL method to solve (ES, etc.).

 

3. Expansion issues

Based on the horizontal sub-database sub-table, the split strategy is the commonly used hash method.

 

Horizontal expansion library (upgrade from library method)

Note: The expansion is doubled.

 

Horizontal expansion table (double write migration method)

  • The first step: (simultaneous double writing) modify the application configuration and code, add double writing, and deploy;

  • The second step: (synchronous double write) copy the old data in the old library to the new library;

  • The third step: (simultaneous double writing) proofread the old data in the new database based on the old database;

  • Step 4: (Synchronous dual writing) Modify the application configuration and code, remove the dual writing, and deploy;

Note: Double writing is a general scheme.

 

6. Summary of sub-databases and sub-tables

  • To sub-database and sub-table, you must first know where the bottleneck is, and then you can split it reasonably (sub-library or sub-table? Horizontal or vertical? How many?). And it cannot be split for the purpose of sub-database and sub-table.

  • The key selection is very important. It is necessary to consider not only the splitting evenly, but also the query of non-partition key.

  • As long as the demand can be met, the simpler the split rule, the better.

     

7. Example of sub-database and sub-table

Example GitHub address: https://github.com/littlecharacter4s/study-sharding

Guess you like

Origin blog.csdn.net/qq_31905135/article/details/108488019