Spring Boot integrates ShardingSphere to realize data fragmentation (2) | Spring Cloud 41

I. Introduction

We introduced the background and challenges of data sharding in the last article Spring Boot integrated ShardingSphere to realize data sharding (1) | Spring Cloud 40. In this chapter, we describe the advantages and disadvantages of each data sharding type in combination with actual business scenarios.

2. Types of data fragmentation

In the previous article, it is known that the splitting methods of data sharding are divided into vertical sharding and horizontal sharding. In actual business scenarios, it can be divided into: vertical tables, vertical databases, horizontal tables, and horizontal databases.

ShardingSphereFor supported sharding algorithms, see: https://shardingsphere.apache.org/document/current/cn/dev-manual/sharding/

2.1 Vertical sub-table

2.1.1 Basic concepts

The fields in a table are classified and split into multiple tables according to the business, and each table stores a part of the fields.

2.1.2 Advantages

  • After the split, the business is clear and the split rules are clear

  • Easy to integrate or expand between systems

  • In order to avoid IOcontention and reduce the chance of locking the table

  • Give full play to the operation efficiency of popular data, store popular fields and unpopular fields separately, such as a basic personnel information table and a personnel detailed information table, large fields must be placed in the table of unpopular fields

    Why is the IO efficiency of large fields low?

    • The length of the data itself is too long, requiring a longer reading time;
    • Cross-page, the page is the basic unit of database storage, many search and positioning operations are based on the page, the more data rows in a single page, the better the overall performance of the database, and the large field takes up a lot of space, and the data stored in a single page is less, so IO efficiency is low;
    • The data is loaded into the memory in units of rows. If the field length is short, the memory can load more data and reduce disk IO, thereby improving database performance;

2.1.3 Disadvantages

  • Unable to solve the problem of excessive data volume in a single table after splitting

2.2 Vertical sub-library

2.2.1 Basic concepts

Vertical table division only solves the problem of a large amount of data in a single table, and does not distribute the table to different servers, so each table still competes for the CPUmemory, network, IOand disk of the same host.

Vertical sub-database is to classify tables according to business and distribute them to different databases, so as to distribute the pressure to different databases. Its core concept is dedicated to special databases.

2.2.2 Advantages

  • After the split, the business is clear and the split rules are clear
  • Easy to integrate or expand between systems
  • Simple data maintenance, capable of hierarchical management, maintenance, monitoring, expansion, etc. of data of different businesses
  • In high concurrency scenarios, it can properly improve the throughput of the business and reduce the impact of the bottleneck of stand-alone hardware resources.

2.2.3 Disadvantages

  • Unable to solve the problem of excessive data volume in a single table after splitting
  • The cross-database business table cannot be solved JOINand can only be solved through the interface, which increases the complexity of the system
  • Cross-database transaction processing is complex
  • Due to the different restrictions of each business, there is a single database performance bottleneck, which makes it difficult to expand data and improve performance

2.3 Horizontal table

2.3.1 Basic concepts

Horizontal sharding is to disperse data into multiple tables according to certain rules through a certain field (or several fields), and each sharded table only stores a part of the data.

2.3.2 Advantages

  • Optimize the performance problems caused by the large amount of data in a single table
  • Avoid IOcontention and reduce the chance of locking tables

2.3.2 Disadvantages

  • Low cross-table JOIN/paging/sorting performance

2.3.4 Commonly used horizontal sub-table methods

  • HashModulo sub-table

    This method is generally used for database table division. For example, an order table is divided into 4 tables according to the orderId%4 and the results

    • advantage
      • The data fragmentation is relatively uniform, and it is prone to hotspots and bottlenecks of concurrent access
    • shortcoming
      • It is easy to generate complex problems of cross-shard query
  • Data Range table

    Split by time interval or field interval

    • advantage
      • Single table size available
      • easy to expand
      • Effectively avoid the problem of cross-shard query
    • shortcoming
      • Hot data can become a performance bottleneck

2.4 Horizontal sub-library

2.4.1 Basic concepts

Horizontal table partitioning only solves the problem of a large amount of data in a single table, and does not distribute the table to different servers, so each table still competes for the memory, network, CPUand IOdisk of the same host.

Horizontal sharding is to split the data of the same table into different databases according to certain rules, so as to distribute the pressure to different databases.

2.4.2 Advantages

  • Optimize the performance problems caused by the large amount of data in a single table
  • Avoid IOcontention and reduce the chance of locking tables

2.4.3 Disadvantages

  • JOINLow cross-library /paging/sorting performance

Guess you like

Origin blog.csdn.net/ctwy291314/article/details/130379383