I. Introduction
We introduced the background and challenges of data sharding in the last article Spring Boot integrated ShardingSphere to realize data sharding (1) | Spring Cloud 40. In this chapter, we describe the advantages and disadvantages of each data sharding type in combination with actual business scenarios.
2. Types of data fragmentation
In the previous article, it is known that the splitting methods of data sharding are divided into vertical sharding and horizontal sharding. In actual business scenarios, it can be divided into: vertical tables, vertical databases, horizontal tables, and horizontal databases.
ShardingSphere
For supported sharding algorithms, see: https://shardingsphere.apache.org/document/current/cn/dev-manual/sharding/
2.1 Vertical sub-table
2.1.1 Basic concepts
The fields in a table are classified and split into multiple tables according to the business, and each table stores a part of the fields.
2.1.2 Advantages
-
After the split, the business is clear and the split rules are clear
-
Easy to integrate or expand between systems
-
In order to avoid
IO
contention and reduce the chance of locking the table -
Give full play to the operation efficiency of popular data, store popular fields and unpopular fields separately, such as a basic personnel information table and a personnel detailed information table, large fields must be placed in the table of unpopular fields
Why is the IO efficiency of large fields low?
- The length of the data itself is too long, requiring a longer reading time;
- Cross-page, the page is the basic unit of database storage, many search and positioning operations are based on the page, the more data rows in a single page, the better the overall performance of the database, and the large field takes up a lot of space, and the data stored in a single page is less, so IO efficiency is low;
- The data is loaded into the memory in units of rows. If the field length is short, the memory can load more data and reduce disk IO, thereby improving database performance;
2.1.3 Disadvantages
- Unable to solve the problem of excessive data volume in a single table after splitting
2.2 Vertical sub-library
2.2.1 Basic concepts
Vertical table division only solves the problem of a large amount of data in a single table, and does not distribute the table to different servers, so each table still competes for the CPU
memory, network, IO
and disk of the same host.
Vertical sub-database is to classify tables according to business and distribute them to different databases, so as to distribute the pressure to different databases. Its core concept is dedicated to special databases.
2.2.2 Advantages
- After the split, the business is clear and the split rules are clear
- Easy to integrate or expand between systems
- Simple data maintenance, capable of hierarchical management, maintenance, monitoring, expansion, etc. of data of different businesses
- In high concurrency scenarios, it can properly improve the throughput of the business and reduce the impact of the bottleneck of stand-alone hardware resources.
2.2.3 Disadvantages
- Unable to solve the problem of excessive data volume in a single table after splitting
- The cross-database business table cannot be solved
JOIN
and can only be solved through the interface, which increases the complexity of the system - Cross-database transaction processing is complex
- Due to the different restrictions of each business, there is a single database performance bottleneck, which makes it difficult to expand data and improve performance
2.3 Horizontal table
2.3.1 Basic concepts
Horizontal sharding is to disperse data into multiple tables according to certain rules through a certain field (or several fields), and each sharded table only stores a part of the data.
2.3.2 Advantages
- Optimize the performance problems caused by the large amount of data in a single table
- Avoid
IO
contention and reduce the chance of locking tables
2.3.2 Disadvantages
- Low cross-table
JOIN
/paging/sorting performance
2.3.4 Commonly used horizontal sub-table methods
-
Hash
Modulo sub-tableThis method is generally used for database table division. For example, an order table is divided into 4 tables according to the orderId%4 and the results
- advantage
- The data fragmentation is relatively uniform, and it is prone to hotspots and bottlenecks of concurrent access
- shortcoming
- It is easy to generate complex problems of cross-shard query
- advantage
-
Data Range table
Split by time interval or field interval
- advantage
- Single table size available
- easy to expand
- Effectively avoid the problem of cross-shard query
- shortcoming
- Hot data can become a performance bottleneck
- advantage
2.4 Horizontal sub-library
2.4.1 Basic concepts
Horizontal table partitioning only solves the problem of a large amount of data in a single table, and does not distribute the table to different servers, so each table still competes for the memory, network, CPU
and IO
disk of the same host.
Horizontal sharding is to split the data of the same table into different databases according to certain rules, so as to distribute the pressure to different databases.
2.4.2 Advantages
- Optimize the performance problems caused by the large amount of data in a single table
- Avoid
IO
contention and reduce the chance of locking tables
2.4.3 Disadvantages
JOIN
Low cross-library /paging/sorting performance