Sub-database and sub-table: four concepts of vertical sub-database, vertical sub-table, horizontal sub-database, and horizontal sub-table

1. The significance of sub-database sub-table

With the rapid development of the company's business, the amount of data in the database has increased sharply, and the access performance has also slowed down. Optimization is imminent. Analyze where is the problem? The relational database itself is relatively easy to become a system bottleneck, and the storage capacity, number of connections, and processing power of a single machine are limited. When the amount of data in a single table reaches 1000W or 100G, due to the large number of query dimensions, even if you add slave libraries and optimize indexes, the performance will still drop seriously when doing many operations.
Sub-database sub-table is to solve the problem of database performance degradation due to excessive data volume. The original independent database is split into several databases, and the large data table is split into several data tables, so that a single database and a single data The amount of data in the table becomes smaller, so as to achieve the purpose of improving the performance of the database.

2. The idea of ​​vertical table division

Problem analysis: When users browse the product list, they will only view the detailed description of the product when they are interested in the product. Therefore, the access frequency of the product description field in the product information is low, and the storage space of this field is large , and it takes a long time to access a single data IO; the product name, product picture, product price and other field data in the product information are frequently accessed. Because the characteristics of these two types of data are different, he considers splitting the product information table as follows: store the product description information with low access frequency in a separate table, and store the basic information of product with high access frequency in a separate table .
insert image description here
This is the vertical table division, the definition of vertical table division: divide a table into multiple tables according to the fields, and each table stores a part of the fields.

Generally speaking, the access frequency of each data item in a business entity is different, and some data items may be BLOB or TEXT that occupy a relatively large storage space . Such as the product description in the example above. Therefore, when the amount of table data is large, the table can be divided into fields, and popular fields and unpopular fields can be placed in different tables separately. The performance improvement brought about by vertical sharding mainly focuses on the operation efficiency of hot data, and the reduction of disk contention.
Usually we split vertically according to the following principles:

  • Put the fields that are not commonly used in a separate table;
  • Split large fields such as text and blob and put them in the attached table;
  • Columns that are frequently combined and queried are placed in one table;

3. The idea of ​​vertical sub-library

Problem Analysis: The performance of vertical table splitting has been improved to a certain extent, but it has not yet met the requirements, and the disk space is almost insufficient , because the data is still limited to one server , and the vertical table splitting in the library only solves the single table data However , the table is not distributed to different servers, so each table still competes for the CPU, memory, network IO, and disk of the same physical machine .
insert image description here
Therefore, we can split the table according to the business level. As shown in the example, the store table and product table are stored in different databases , and the geographic region table exists as a dictionary table redundantly in these two databases.
This is vertical database sharding. Vertical sharding refers to classifying tables according to business and distributing them to different databases. Each database can be placed on a different server . Its core concept is dedicated to dedicated databases .
The improvements it brings are:

  • Solve the coupling at the business level, and the business is clear;
  • Ability to manage, maintain, monitor, and expand data of different businesses at different levels;
  • In high-concurrency scenarios, vertical sub-databases can increase the number of IO and database connections to a certain extent, and reduce the bottleneck of stand-alone hardware resources;
  • The vertical sub-database classifies the tables according to the business, and then distributes them in different databases, and these databases can be deployed on different servers, so as to achieve the effect of multiple servers sharing the pressure, but it still does not solve the problem of excessive data volume in a single table .

Fourth, the idea of ​​horizontal sub-library

Problem analysis: After vertical database division, the database performance problem has been solved to a certain extent, but with the growth of business volume, the data stored in a single database of PRODUCT_DB (commodity database) has exceeded the estimate . It is roughly estimated that there are currently 80,000 stores, and each store has an average of 150 products of different specifications. After counting the growth, the number of products has to be estimated at 1500,000+, and PRODUCT_DB (commodity database) is a very frequently accessed resource. A single server It can no longer be supported. How to
optimize at this time?
Therefore, try to divide the database horizontally, and put the product information with an odd-numbered store ID and an even-numbered store ID in two databases respectively.
insert image description here
In other words, to operate a piece of data, first analyze the ID of the store to which the piece of data belongs. If the store ID is double, map this operation to RRODUCT_DB1 (commodity database 1); if the store ID is odd, map the operation to RRODUCT_DB2 (commodity database 2). The expression to access the database name for this operation is RRODUCT_DB[store ID%2 + 1] .
This is the horizontal sub-database. The horizontal sub-database is to split the data of the same table into different databases according to certain rules. Each database can be placed on a different server.
The improvements it brings are:

  • It solves the performance bottleneck of single-database big data and high concurrency.
  • Improved system stability and availability.
  • When it is difficult for an application to fine-grained vertical segmentation, or the number of rows of data after segmentation is huge, and there is a single database read and write and storage performance bottleneck, then it is necessary to perform horizontal database segmentation. After horizontal segmentation optimization, often It can solve the storage capacity and performance bottlenecks of a single database. However, because the same table is allocated in different databases, additional routing work for data operations is required, which greatly increases the complexity of the system .

Five, the idea of ​​level sub-table

According to the idea of ​​horizontal database division, he can also horizontally split the tables in PRODUCT_DB_X (commodity database) , and the purpose is also to solve the problem of large amount of data in a single table .
insert image description here
Similar to the idea of ​​horizontal sub-database, but the target of this operation is the table, and the product information and product description are divided into two sets of tables. If the product ID is even, map this operation to the product information 1 table; if the product ID is odd, map the operation to the product information 2 table. The expression to access the table name in this operation is Commodity Information [Commodity ID%2 + 1] .
This is the level table. Horizontal table splitting is to split the data of the same table into multiple tables according to certain rules in the same database.
The improvements it brings are:

  • Optimize the performance problems caused by the large amount of data in a single table;
  • Avoid IO contention and reduce the chance of locking tables;
  • The horizontal table division in the database solves the problem of excessive data volume in a single table, and the divided small table only contains part of the data, thereby reducing the data volume of a single table and improving retrieval performance.

Guess you like

Origin blog.csdn.net/dgfdhgghd/article/details/128426013