Detailed explanation of sub-database and sub-table in one article

Detailed explanation of sub-database and sub-table design

background

​ In the traditional stand-alone database architecture, all data is stored in the same database. As the business scale continues to expand, the amount of data and concurrency will also increase, which will bring challenges to the performance and availability of the database. . In addition, when the capacity of a single-machine database reaches a bottleneck, it cannot continue to expand, which also limits the scalability of the application.

In order to solve these problems, the sub-database and sub-table technology came into being. By storing data dispersedly in multiple databases or tables, the load on a single database can be reduced and the concurrency performance of the system can be improved . In addition, the sub-database and sub-table can also support horizontal expansion, allowing the system to continue to expand as the business grows.

In short, sub-database and sub-table is a commonly used database architecture design solution, which can solve the bottleneck problems faced by the database in terms of data volume, concurrency, scalability, etc., and is one of the indispensable technologies for large-scale applications.

Why sub-database and table?

  1. Increase in data volume : When the data volume gradually increases, a single database or table may not be able to carry all the data, resulting in performance degradation or even crash .

    Database and table sharding can disperse data into multiple databases or tables. Each database or table is only responsible for a part of the data, thereby improving the performance and reliability of the system.

  2. Improve concurrency: When multiple users access the same table at the same time, it can easily cause a performance bottleneck in the database, resulting in slower response.

    Database and table sharding can disperse data into multiple databases or tables, thereby improving concurrent access capabilities and reducing system bottlenecks.

  3. Data isolation : In a multi-tenant scenario, different tenants need to use different databases or tables to ensure data isolation from each other . Database and table sharding can disperse the data of different tenants into different databases or tables to ensure data isolation.

  4. Reduce the pressure on a single node : When the system needs to support the read and write operations of a large amount of data, a single database or table may not be able to carry it. In this case, the data can be dispersed into multiple databases or tables, thereby reducing the pressure on a single node.

What is sub-library and sub-table?

  1. Split database: Split a database into multiple databases

    1. Horizontal sharding : refers to dividing a large database into multiple smaller, similar databases, each database containing different parts of the data.

      For example: allocate user data to user database DB1 and user database DB2 according to the range of their ID values.

      This sharding approach improves system scalability and performance because the load can be spread across multiple databases

    Insert image description here

    Features:

     1. 每个库的结构都一样,存储**同一业务类型**的数据	
     2. 每个库的数据都不一样,没有交集
     3. 所有库的并集是全量数据
    
    1. Vertical database division : refers to dividing a large database into multiple smaller databases based on business relationships.

      For example, user data is stored in the user library, and order data is stored in the order library.

      This sharding method can improve the flexibility and maintainability of the system , because different databases can be individually optimized and maintained as needed.

    Insert image description here

    Features:

     1. 每个库的结构都不一样,存储**不同业务类型**的数据
    
     2. 每个库的数据都不一样,没有交集
    
     3. 所有库的并集是全量数据
    

    Horizontal sharding and vertical sharding are not mutually exclusive concepts . They can be used in combination to achieve a more efficient database sharding solution.

  2. Table splitting: Split a data table into multiple data tables

    1. Horizontal table sharding : refers to dispersing data in one table into multiple tables according to a certain rule. This rule is usually based on the value of a certain column.

      For example, the user table can be divided into multiple sub-tables based on the user ID and through rules such as range, hash, remainder, etc., and each sub-table stores user data within a specific range.

      Horizontal table sharding can improve data query efficiency because queries only need to be executed in a small subtable instead of the entire table.

Insert image description here

  特点:

  1. 每个表的结构都一样,存储**同一实体、相同类型**的数据
  2. 每个表存的数据都不一样,没有交集
  3. 所有表的并集是全量数据
  1. Vertical table partitioning : refers to dividing a table according to columns, storing some columns in one table and storing other columns in another table. This segmentation is usually based on column correlation, typically such as the main table and extended table

    For example, a table containing user primary information and user auxiliary information can be divided into two tables, one containing user primary information and the other containing user auxiliary information.

    This improves data storage efficiency because each table contains only the necessary columns, rather than all columns of the entire table.

Insert image description here

  特点:

  1. 每个表的结构都不一样,存储**同一实体、不同类型**的数据
  2. 每个表存的数据都不一样,字段至少有一列交集,一般是主键,用于关联数据
  3. 所有表的并集是全量数据

split strategy

  1. Split by range : Split the table into multiple sub-tables based on a certain range of data. For example, you could split an orders table into subtables for each month based on time range.

    advantage:

    ​Horizontal expansion : Splitting databases and tables according to range can achieve horizontal expansion. When the amount of data increases, more databases or tables can be easily added; for example, split based on the 20 million standard, and those exceeding 20 million will be transferred to new tables. There is no need to migrate the previous records within 20 million.

    shortcoming:

    ​Hot issue : If some data is accessed very frequently, but is divided by range and then divided into the same section, then a large number of requests will be concentrated in this section, and there will be no good load balancing effect.

  2. Split according to hash value : Split the table into multiple sub-tables according to the hash value of the data. For example, a user table can be split into multiple subtables using the hash of the user ID.

    HASH modulo strategy: The specified routing key (usually the primary key ID) modulates the total number of sub-tables and distributes the data to each table.

    advantage:

    ​ Load balancing, no obvious hot spot tilt problem

    shortcoming:

    ​The expansion is not flexible . If it is divided into 4 tables based on HASH modulus at the beginning, at some point in the future, the data volume of the table will reach the bottleneck again and needs to be expanded. This will be more difficult and the data needs to be migrated again.

  3. Split by geographical location : If the data is related to geographical location, the table can be split into multiple sub-tables based on geographical location. For example, you can split a business table into multiple subtables by city.

  4. Split according to business type : If the data has different business types, the table can be split into multiple sub-tables according to the business type. For example, you can split a product table into multiple sub-tables based on product type.

  5. Split by access frequency : If some data is accessed frequently, you can put it into a separate table for faster access. For example, you could put the last year's order data into a separate table to make querying for recent orders faster.

Sub-database and sub-table tool

  1. ShardingSphere: ShardingSphere is an open source distributed database middleware solution that supports functions such as sharding databases and tables, separation of reading and writing, and data encryption.

    Official website: https://shardingsphere.apache.org/index_zh.html

  2. Mycat: Mycat is an open source distributed database middleware based on the MySQL protocol, supporting functions such as database and table sharding, read-write separation, and data sharding.

    Official website: http://www.mycat.org.cn/

  3. TDDL: TDDL is a distributed database middleware open sourced by Alibaba. It supports MySQL, Oracle and other databases, and provides functions such as database and table sub-database, read-write separation, and data sharding.

  4. Cobar: Cobar is a distributed database middleware open sourced by Alibaba. It supports MySQL, PostgreSQL and other databases, and provides functions such as database and table partitioning, read-write separation, and data sharding.

Sub-database and sub-table problem

  1. Primary key (auto-incrementing ID) uniqueness problem : When designing tables, auto-incrementing IDs are often used as the primary key, which leads to subsequent database and table migrations, or database and table sub-operations, due to changes in the primary key or inconsistency of the primary key. The only problem that arises is that the main solutions are

    1. Adopt a distributed global unified ID generation mechanism: such as UUID, snowflake algorithm, database number segment, etc.
  2. Distributed transactions : Since data is dispersed across multiple databases or tables, transaction processing becomes more complex. For example, if a transaction involves multiple databases or tables, then how to ensure the atomicity of these operations requires more careful design.

  3. Join/paging query : In the case of sub-databases and tables, queries across multiple databases or tables become more complex. For example, if you need to perform a full-text search on a set of data, you will need to execute the query in multiple databases or tables, which may affect query performance.

    1. Separate queries and then assemble the data

    2. Redundant fields: If each join operation is only to obtain a small number of fields, you can consider redundant fields directly to the table.

    3. data synchronization:

      ​ Synchronize data between data sources into a separate data warehouse, and then perform join operations in this data warehouse. This method can solve the cross-database and cross-table join problem, but it requires additional storage and synchronization costs, and may have data synchronization delays and consistency issues.

  4. Data migration : In the case of sharding databases and tables, if you need to migrate data to a new database or table, you need to consider how to migrate the data and how to ensure data consistency during the migration process.

  5. System complexity : Since database and table sharding requires the management of multiple databases or tables, the complexity of the system will also increase. For example, you need to consider how to monitor the status of each database or table, how to handle database or table failures, and so on.

  6. Data migration : In the case of sharding databases and tables, if you need to migrate data to a new database or table, you need to consider how to migrate the data and how to ensure data consistency during the migration process.

  7. System complexity : Since database and table sharding requires the management of multiple databases or tables, the complexity of the system will also increase. For example, you need to consider how to monitor the status of each database or table, how to handle database or table failures, and so on.

Guess you like

Origin blog.csdn.net/weixin_40709965/article/details/129636116