1. Background
Sharding
Apache ShardingSphere
It is the core feature of , and also ShardingSphere
one of the most well-known abilities of . In the past, if users needed to shard databases and tables, a typical implementation process (excluding data migration) was as follows:
- Users need to accurately understand the fragmentation strategy of each data table
- Clearly know the actual table name and database of each table
- Configure fragmentation rules based on this information
Taking the above sub-database and table-splitting scenario as an example, 8 库 * 4 表
the distribution of data may be as follows:
2. Pain points
In the aforementioned scenario of sub-database and sub-table, as a Sharding
functional user, he must have a clear understanding of the distribution of data tables in order to write correct actualDataNodes
rules. The sharding configuration corresponding to the above t_order
table is as follows:
tables:
t_order:
actualDataNodes: ds_${
0..7}.t_order_${
0..3}
databaseStrategy:
standard:
shardingColumn: order_id
shardingAlgorithmName: database_inline
tableStrategy:
standard:
shardingColumn: order_id
shardingAlgorithmName: table_inline
shardingAlgorithms:
database_inline:
type: INLINE
props:
algorithm-expression: ds_${
order_id % 8}
table_inline:
type: INLINE
props:
algorithm-expression: t_order_${
order_id % 4}
ShardingSphere
Although the configuration rules of are very standardized and concise, users still encounter various troubles in use:
- Do not understand the sharding strategy or configuration rules, no way to start
- The shard configuration does not match the actual distribution of the data table
- The configuration expression is malformed
- etc.
Three, AutoTable turned out
In order to help users better use the sharding function, reduce configuration complexity and improve user experience, Apache ShardingSphere 5.0.0
the version introduces a new sharding configuration method: AutoTable
.
As the name implies, AutoTable
the type of data table is ShardingSphere
automatically . Users only need to specify the number of shards and the data source used, and no longer need to care about the specific distribution of the table.
The configuration format is as follows:
autoTables:
t_order:
# 指定使用的数据源
actualDataSources: ds_${
0..7}
shardingStrategy:
standard:
shardingColumn: order_id
shardingAlgorithmName: mod
shardingAlgorithms:
mod:
type: MOD
props:
# 指定分片数量
sharding-count: 32
Through the above configuration, ShardingSphere
it is recognized that the logical table t_order
needs to be divided into 32
pieces and use 8
a data source, then 8库 * 4表
the distribution relationship of is automatically calculated to achieve a configuration result equivalent to the traditional method.
Fourth, combine with DistSQL
In the article "DistSQL: Using Apache ShardingSphere Like a Database" , Meng Haoran introduced DistSQL
the original design intention and syntax system of , and demonstrated SQL
the powerful ability to create distributed database tables through practical operations, showing the interactive experience Apache ShardingSphere
in the new form.
In the scenario of DistSQL
using for data management, AutoTable
it can greatly reduce the complexity of sharding configuration. Moreover, compared with the traditional file configuration form, the sharding rules configured DistSQL
through are effective immediately and do not need to be restarted, so there is no need to worry about the rule adjustment of a single table affecting other online services.
5. Fragmentation Algorithm Supported by AutoTable
AutoTable
Supports all automatic sharding algorithms, including:
MOD
: Modulus sharding algorithmHASH_MOD
: Hash modulo sharding algorithmVOLUME_RANGE
: Range sharding algorithm based on shard capacityBOUNDARY_RANGE
: Range Fragmentation Algorithm Based on Fragmentation BoundariesAUTO_INTERVAL
: Automatic Time Segment Sharding Algorithm
More details about the above algorithms and usage examples are reserved for the next chapter.