Sharding-JDBC actual combat (the most complete in history)

The article is very long and will be updated continuously. It is recommended to bookmark it and read it slowly! Java High Concurrency Enthusiast Community: Crazy Maker Circle (General Entrance) offers the following valuable learning resources:

Sharding-JDBC actual combat (the most complete in history)

Before starting the actual combat of Sharding-JDBC sub-database and sub-table,

It is necessary to understand some core concepts of sub-database and sub-table.

The background of sub-library and sub-table:

The traditional solution of centrally storing data in a single data node has been difficult to meet the massive data scenarios of the Internet in terms of performance, availability, and operation and maintenance costs.

As the amount of business data increases, all the data originally resides in one database, and network IO and file IO are concentrated on one database. Therefore, CPU, memory, file IO, and network IO may all become system bottlenecks.

When the data capacity of the business system approaches or exceeds the capacity of a single server, and the QPS/TPS approaches or exceeds the processing limit of a single database instance, etc.,

At this time, the vertical and horizontal data splitting method is often used to distribute data services and data storage to multiple database servers.

Capacity bottleneck:

In terms of performance, since most relational databases use B+ tree type indexes,

数据量超过一定大小,B+Tree 索引的高度就会增加,而每增加一层高度,整个索引扫描就会多一次 IO 。

在数据量超过阈值的情况下,索引深度的增加也将使得磁盘访问的 IO 次数增加,进而导致查询性能的下降;

一般的存储容量是多少呢? 请参见 3 高架构秒杀部分内容。

吞吐量瓶颈:

同时,⾼并发访问请求也使得集中式数据库成为系统的最⼤瓶颈。

一般的吞吐量是多少呢? 请参见 3 高架构秒杀部分内容。

在传统的关系型数据库⽆法满⾜互联⽹场景需要的情况下,将数据存储⾄原⽣⽀持分布式的 NoSQL 的尝试越来越多。

但 NoSQL 并不能包治百病,而关系型数据库的地位却依然不可撼动。

如果进行sql、nosql数据库的选型呢? 请参见 推送中台架构部分的内容。

分治模式在存储领域的落地

分治模式在存储领域的使用:数据分⽚

数据分⽚指按照某个维度将存放在单⼀数据库中的数据, 分散地存放⾄多个数据库或表中以达到提升性能瓶颈以及可⽤性的效果。

数据分⽚的有效⼿段是对关系型数据库进⾏分库和分表。

分库能够⽤于有效的分散对数据库单点的访问量;

分库的合理的时机, 请参见 3 高架构秒杀部分内容。

分表能够⽤于有效的数据量超过可承受阈值而产⽣的查询瓶颈, 解决MySQL 单表性能问题

分表的合理的时机, 请参见 3 高架构秒杀部分内容。

使⽤多主多从的分⽚⽅式,可以有效的避免数据单点,从而提升数据架构的可⽤性。

通过分库和分表进⾏数据的拆分来使得各个表的数据量保持在阈值以下,以及对流量进⾏疏导应对⾼访问量,是应对⾼并发和海量数据系统的有效⼿段。

数据分⽚的拆分⽅式⼜分为垂直分⽚和⽔平分⽚。

分库分表的问题

分库导致的事务问题

不过,由于目前采用柔性事务居多,实际上,分库的事务性能也是很高的,有关柔性事务,请参见疯狂创客圈的专题博文:

分布式事务面试题 (史上最全、持续更新、吐血推荐)

Sharding-JDBC简介

Sharding-JDBC is Dangdang's open source distributed data access basic class library suitable for microservices. It fully realizes the sub-database sub-table, read-write separation and distributed primary key functions, and initially realizes flexible transactions.

Since it was open-sourced in 2016, it has accumulated enough foundation after undergoing several refinements of the overall architecture and stability polishing.

The official website is as follows:

http://shardingsphere.apache.org/index_zh.html

ShardingSphere is an ecosystem of open source distributed database middleware solutions, which consists of three independent products: Sharding-JDBC, Sharding-Proxy and Sharding-Sidecar.

They all provide standardized data sharding, distributed transactions, and database governance functions, and are applicable to various application scenarios such as Java isomorphism, heterogeneous languages, and cloud native.

Apache ShardingSphere is an ecosystem of open source distributed database middleware solutions. It consists of three independent products, JDBC, Proxy, and Sidecar (under planning), but can be deployed in a mixed manner. They all provide standardized data sharding, distributed transactions, and database governance functions, and can be applied to various application scenarios such as Java isomorphism, heterogeneous languages, and cloud native.

Apache ShardingSphere is positioned as a relational database middleware, which aims to fully and reasonably utilize the computing and storage capabilities of relational databases in distributed scenarios, rather than implementing a brand new relational database. It captures the essence of things by focusing on the unchanged. Relational databases still occupy a huge market today and are the cornerstone of the core business of various companies, and they will be difficult to shake in the future. At this stage, we pay more attention to the increase on the original basis rather than subversion.

The Apache ShardingSphere 5.x version began to work on a pluggable architecture, and the functional components of the project can be flexibly extended in a pluggable manner. At present, functions such as data sharding, read-write separation, data encryption, and shadow database pressure testing, as well as support for SQL and protocols such as MySQL, PostgreSQL, SQLServer, and Oracle, are all woven into the project through plug-ins. Developers can customize their own unique system like building blocks. Apache ShardingSphere currently provides dozens of SPIs as system extension points, and the number is still increasing.

ShardingSphere has become a top-level project of the Apache Software Foundation on April 16, 2020.

Advantages of Sharding-JDBC

Sharding-JDBC directly encapsulates the JDBC API, which can be understood as an enhanced version of the JDBC driver, and the cost of old code migration is almost zero:

  • It can be applied to any Java-based ORM framework , such as JPA, Hibernate, Mybatis, Spring JDBC Template or use JDBC directly.
  • Can be based on any third-party database connection pool, such as DBCP, C3P0, BoneCP, Druid, etc.
  • In theory, any database that implements the JDBC specification can be supported. Although only MySQL is currently supported, there are plans to support databases such as Oracle and SQLServer.

Sharding-JDBC is positioned as a lightweight Java framework. It uses the client to directly connect to the database and provides services in the form of jar packages. There is no proxy layer, no additional deployment, no other dependencies, and the DBA does not need to change the original operation and maintenance method.

Sharding-JDBC has a flexible sharding strategy and can support multi-dimensional sharding such as equal sign, between, and in, as well as multiple sharding keys.

The SQL parsing function is perfect, and supports aggregation, grouping, sorting, limit, or and other queries, and supports Binding Table and Cartesian product table queries.

Compared with common open source products

The following table lists only a few projects that are very influential in the field of database sharding:

img

It can be seen from the above table that Cobar (MyCat) belongs to the middle layer solution, and a layer of proxy is built between the application program and MySQL.

The middle layer is between the application program and the database, and needs to be forwarded once, but based on the JDBC protocol, there is no additional forwarding, and the application program directly connects to the database, which has a slight advantage in performance. This does not mean that the middle layer must not be as good as the direct connection of the client. In addition to performance, there are many factors to consider. The middle layer is more convenient for monitoring, data migration, connection management and other functions.

Cobar-Client, TDDL, and Sharding-JDBC are all client-side direct connection solutions.

The advantages of this solution are portability, compatibility, performance and little impact on DBA. Among them, the implementation of Cobar-Client is based on the ORM (Mybatis) framework, and its compatibility and scalability are not as good as the latter two based on the JDBC protocol.

img

Currently commonly used are Cobar (MyCat) and Sharding-JDBC two schemes

MyCAT

MyCAT is a secondary development by community enthusiasts on the basis of Ali cobar, which solved some problems of cobar at that time, and added many new functions to it. Currently MyCAT community is very active,

Some companies are already using MyCAT.

Generally speaking, the support is relatively high, and it will continue to be maintained. It has developed to the current version, and it is no longer a simple MySQL agent.

Its backend can support mainstream databases such as MySQL, SQL Server, Oracle, DB2 , and PostgreSQL, as well as MongoDB, a new type of NoSQL storage, and will support more types of storage in the future.

MyCAT is a powerful database middleware, which can not only be used for read-write separation, sub-table and sub-database, disaster recovery management, but also can be used for multi-tenant application development, cloud platform infrastructure, so that your architecture has a strong adaptability and flexibility,

With the help of the soon-to-be-released MyCAT can only optimize the module, the data access bottlenecks and hotspots of the system are clear at a glance,

Based on these statistics and analysis data, you can automatically or manually adjust the back-end storage, and map different tables to different storage engines without changing a single line of code in the entire application.

MyCAT is a version developed on the basis of Cobar, with two significant improvements:

  • The backend is changed from BIO to NIO, and the concurrency has been greatly improved;

  • Added aggregation functions for Order By, Group By, Limit, etc.

(Although Cobar can also support Order By, Group By, Limit syntax, the results are not aggregated, but simply returned to the front end. The aggregation function still needs to be completed by the business system itself, which is suitable for large enterprises or large teams with dedicated team maintenance . )

Sharding-JDBC

Sharding-JDBC is positioned as a lightweight Java framework. It uses the client to directly connect to the database and provides services in the form of jar packages. There is no proxy layer, no additional deployment, no other dependencies, and the DBA does not need to change the original operation and maintenance method.

Therefore, it is suitable for small and medium-sized enterprises, or small and medium-sized teams .

Sharding-JDBC has a flexible sharding strategy and can support multi-dimensional sharding such as equal sign, between, and in, as well as multiple sharding keys.

The SQL parsing function is perfect, and supports aggregation, grouping, sorting, limit, or and other queries, and supports Binding Table and Cartesian product table queries.

Sharding-JDBC feature list

  • Sub-library & sub-table
  • read-write separation
  • distributed primary key

Two major tasks of high concurrent data sharding

In general, the data sharding of the development dimension is mostly based on the horizontal sharding mode (horizontal database sharding, table sharding),

Vertical sharding mainly lies in the operation and maintenance dimension, or when doing deep transformation of storage.

The work of data sharding

To put it simply, the work of data sharding is divided into two major tasks:

The first major task: the splitting of shards

The principle behind es data fragmentation

see video

The principle behind the data fragmentation of rediscluster

Table splitting:

Split a large table t_order to generate several small tables t_order_0, t_order_1, ..., t_order_n with exactly the same table structure,

Each small table only stores part of the data in the large table,

The second major task: routing of shards

When executing a SQL, the data **route (routing)** will be routed to different shards through the routing strategy.

the problem we are facing:

  • The choice of sharding

  • Selection of sharding strategy

  • Choice of Sharding Algorithm

What is data sharding?

Divide the data into several shards and partitions according to the sharding rules

insert image description here

Main Fragmentation Algorithms

range sharding

One is to divide according to the range, that is, each piece is a piece of continuous data, which is generally based on the time range/data range , but this is generally less used, because data skew is easy to occur , and a large amount of traffic is Hit the latest data .

For example, to install data range sharding, save numbers from 1 to 100 on 3 nodes

Fragment according to order, evenly distribute data on three nodes

  • The data from No. 1 to No. 33 are saved on node 1
  • Data from 34th to 66th is saved on node 2
  • Data from 67th to 100th is saved on node 3

insert image description here

ID modulo sharding

This kind of sharding rule divides the data into n parts (usually the dn node is also n), so that the data is evenly distributed in each table or on each node.

Easy to expand.

ID modulo fragmentation is commonly used in the design of relational databases

For details, please refer to the billion-level database table architecture design of the seckill video

hash hash distribution

Use the hash algorithm to obtain the hash result of the key, and then perform fragmentation according to the rules, so as to ensure that the data is broken up and the data distribution is relatively uniform

The hash distribution method is divided into three sharding methods:

  • Hash remainder sharding
  • Consistent hash sharding
  • Virtual Slot Fragmentation

Hash comodulo sharding

For example, 1 to 100 numbers, perform a hash operation on each number, and then divide the hash result of each number by the number of nodes to take the remainder. If the remainder is 1, it will be stored on the first node, and if the remainder is 2, it will be stored. On the second node, if the remainder is 0, it will be stored on the third node, so as to ensure that the data is scattered and the data distribution is relatively uniform

For example, if there are 100 data, after the hash operation is performed on each data, the remainder operation is performed with the number of nodes, and the remainder is stored on different nodes according to the difference

insert image description here

Hash remainder sharding is a very simple way of sharding

There is a problem with hash modulo sharding

That is, when adding or reducing nodes, 80% of the data in the original nodes will be migrated and all data will be redistributed

For hash remainder sharding, it is recommended to use multiple expansion methods. For example, before using 3 nodes to store data, expand the capacity to 6 nodes to store data, which is twice as many as before, so that only 50% of the data needs to be moved.

After the data migration, the data cannot be read from the cache for the first time, the data must be read from the database first, and then written back to the cache, and then the migrated data can be read from the cache

img

Advantages of hash remainder sharding:

  • Simple configuration: hash the data and take the remainder

Disadvantages of hash remainder sharding:

  • When the data node is scaled, it will cause data migration
  • The number of migrations is related to adding node data, it is recommended to double the capacity

Consistent hash sharding

Principle of consistent hashing:

Treat all data as a token ring,

The data range in the token ring is 0 to 2 to the 32nd power.

Then assign a token range value to each data node, and this node is responsible for saving the data within this range.

img

Perform a hash operation on each key, and find the nearest node clockwise if the hashed result is within the range of the token, and the key will be saved on this node.

img

Node expansion for consistent hash sharding

In the figure below:

  • There are 4 keys whose values ​​after being hashed are between the n1 node and the n2 node. According to the clockwise rule, these 4 keys will be stored on the n2 node

  • If an n5 node is added between the n1 node and the n2 node, when the next time a key is hashed, the value is between the n1 node and the n5 node, and these keys will be stored on the n5 node

In the example below, after adding the n5 node:

  • Data migration will be performed between n1 nodes and n2 nodes
  • Nodes n3 and n4 are not affected
  • The scope of data migration is greatly reduced

Similarly, if there are 1000 nodes, adding a node at this time will affect only 2/1000 of the range of nodes at most. Therefore, consistent hashing is generally used when there are many nodes. The more nodes there are, the fewer nodes will be affected during expansion.

img

Fragmentation method: hash + clockwise (optimized remainder)

Advantages of consistent hash sharding:

  • The consistent hashing algorithm solves the problem of data distribution under distributed conditions. For example, in the cache system, the cache key is mapped to different nodes through the consistent hash algorithm. Due to the existence of virtual nodes in the algorithm, the hash results are generally relatively uniform.
  • When a node scales, only adjacent nodes are affected, but there is still data migration

"But there is no silver bullet that can be applied to any scenario. So what are the shortcomings of the consistent hash algorithm in practice, or what scenarios are not applicable?"

Disadvantages of consistent hash sharding:

Consistent hashing is more load-balanced in large-volume data scenarios, but in small-scale data scenarios, a node may be completely idle per unit of time.

Virtual slot sharding (a variant of range sharding)

Redis Cluster does not use Consistency Hashing in its design, but uses data fragmentation to introduce hash slots;

Virtual slot sharding is the sharding method adopted by Redis Cluster.

Virtual slot sharding can be understood as a variant of range sharding, hash modulo sharding + range sharding, the remainder of the hash value is divided into n segments, and one segment is responsible for one node

insert image description here

Two major tasks of es data fragmentation

Shards

Represents index sharding. es can divide a complete index into multiple shards. The advantage of this is that a large index can be split into multiple pieces and distributed to different nodes. constitute a distributed search.

The number of shards can only be specified before the index is created, and cannot be changed after the index is created. (why, everyone can think independently!)

Fragmentation configuration suggestions:

The size of each fragment should not exceed 30G. If the hard disk is in good condition, it is not recommended to exceed 100G.

(Official recommendation, the data volume of each shard should be 20GB - 50GB).

All in all, each shard is a Lucene instance. When a query request hits ES, ES will forward the request to each shard for query respectively, and finally summarize it.

At this time, the fewer shards, the less overhead

routing mechanism

How does a piece of data land on the corresponding shard?

When indexing a document, the document is stored in a primary shard.

How does Elasticsearch know which shard a document should be stored in?

img

The routing process of es is determined according to the following algorithm:

shard_num = hash(_routing) % num_primary_shards


其中 _routing是一个可变值,默认是文档的 _id 的值  ,也可以设置成一个自定义的值。Elasticsearch文档的ID(类似于关系数据库中的自增ID),

_routing 通过 hash 函数生成一个数字,然后这个数字再除以  num_of_primary_shards (主分片的数量)后得到余数 。

这个分布在 0 到  number_of_primary_shards-1 之间的余数,就是我们所寻求的文档所在分片的位置。


This explains why we determine the number of primary shards when we create the index and never change the number :

Because if the number changes, all previous route values ​​will be invalidated and the documentation will never be found again.

Suppose you have an index with 100 shards. What happens when a request is executed on the cluster?

1. 这个搜索的请求会被发送到一个节点
2. 接收到这个请求的节点,将这个查询广播到这个索引的每个分片上(可能是主分片,也可能是复本分片)
3. 每个分片执行这个搜索查询并返回结果
4. 结果在通道节点上合并、排序并返回给用户

img

Two major tasks of rediscluster's data fragmentation

Virtual slot sharding (hybrid of hash modulo sharding + range sharding)

Redis Cluster does not use Consistency Hashing in its design, but uses data fragmentation to introduce hash slots;

Virtual slot sharding is the sharding method adopted by Redis Cluster.

In this sharding method:

  • First preset virtual slots, each slot is a hash value, and each node is responsible for a certain range of slots.
  • Each value is the remainder of the hash value of the key, and each slot maps a subset of data, which is generally larger than the number of nodes

The range of preset virtual slots in Redis Cluster is 0 to 16383

insert image description here

Sharding results of virtual slots of Redis cluster with 3 nodes:

[root@localhost redis-cluster]# docker exec -it redis-cluster_redis1_1 redis-cli --cluster check 172.18.8.164:6001
172.18.8.164:6001 (c4cfd72f...) -> 0 keys | 5461 slots | 1 slaves.
172.18.8.164:6002 (c15a7801...) -> 0 keys | 5462 slots | 1 slaves.
172.18.8.164:6003 (3fe7628d...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 0 keys in 3 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 172.18.8.164:6001)
M: c4cfd72f7cbc22cd81b701bd4376fabbe3d162bd 172.18.8.164:6001
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
S: a212e28165b809b4c75f95ddc986033c599f3efb 172.18.8.164:6006
   slots: (0 slots) slave
   replicates 3fe7628d7bda14e4b383e9582b07f3bb7a74b469
M: c15a7801623ee5ebe3cf952989dd5a157918af96 172.18.8.164:6002
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 5e74257b26eb149f25c3d54aef86a4d2b10269ca 172.18.8.164:6004
   slots: (0 slots) slave
   replicates c4cfd72f7cbc22cd81b701bd4376fabbe3d162bd
S: 8fb7f7f904ad1c960714d8ddb9ad9bca2b43be1c 172.18.8.164:6005
   slots: (0 slots) slave
   replicates c15a7801623ee5ebe3cf952989dd5a157918af96
M: 3fe7628d7bda14e4b383e9582b07f3bb7a74b469 172.18.8.164:6003
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

Routing mechanism for virtual slot sharding:

1. The 16384 slots are evenly distributed according to the number of nodes, and managed by the nodes.
2. Perform hash operations on each key according to the CRC16 rule.
3. Take the hash result modulo 16383.
4. Send the remainder to the Redis node
. 5. The node receives To the data, verify whether it is in the range of the slot number managed by itself

  • If it is within the slot number range managed by itself, save the data to the data slot, and then return the execution result
  • If it is outside the slot number range managed by itself, the data will be sent to the correct node, and the correct node will save the data in the corresponding slot

It should be noted that: Redis Cluster nodes will share messages, and each node will know which node is responsible for which range of data slots

In the virtual slot distribution method, since each node manages a part of the data slot, the data is stored in the data slot.

When the node expands or shrinks, the data slots can be redistributed and migrated, and the data will not be lost.

Two major tasks of data sharding in shardingjdbc

The first major task: the splitting of shards

Table splitting:

Split a large table t_order to generate several small tables t_order_0, t_order_1, ..., t_order_n with exactly the same table structure,

Each small table only stores part of the data in the large table,

Example: Data Fragmentation of User Table

insert image description here

Example: Data Fragmentation of the order table

insert image description here

The second major task: routing of shards

When executing a SQL, the data **route (routing)** will be routed to different shards through the routing strategy.

  • Data Source Routing

  • table routing

the problem we are facing:

  • Shard key selection
  • Selection of sharding strategy
  • Choice of Sharding Algorithm

Core idea

Shard key

The field used for sharding is the key field for horizontally splitting the database (table).

When sharding the data in the table, you must first select a shard key (Shard Key), that is, users can split the data horizontally through this field.

example:

Segment by taking the modulus of the mantissa of the order primary key in the order table, and then the order primary key is a fragment field.

Execution table selection

After we fragment the t_order table, when executing a SQL, the table to be executed is determined by taking the modulus of the field order_id, and in which table in which database this data should be executed. At this time, the order_id field is the fragmentation healthy.

insert image description here

Selection of execution library (selection of data source)

In this way, the relevant data of the same order will be stored in the same database table, greatly improving the performance of data retrieval.

illustrate

  • In addition to using a single field as a shard, sharding-jdbc also supports sharding based on multiple fields as a shard key.

  • If there is no shard field in SQL, full routing will be performed, and the performance will be poor.

data node

A data node is an indivisible minimum data unit (table) in a sub-database and sub-table, which consists of a data source name and a data table.

For example, ds1.t_user_0 in the figure above represents a data node.

logical table

Logical table refers to a group of tables with the same logic and data structure.

For example, we split the order table t_order into 10 tables such as t_order_0 ··· t_order_9.

At this time, we will find that after the database is divided into tables and tables, the t_order table no longer exists in the database, and it is replaced by t_order_n, but we still write SQL in the code according to t_order.

At this point t_order is the logical table of these split tables.

For example, t_user in the figure above represents a data node.

Real table (physical table)

The real table is the real physical table in the t_order_n database mentioned above.

For example, t_user _0 in the figure above represents a real table.

Fragmentation strategy

The sharding strategy is an abstract concept, and the actual sharding operation is done by the sharding algorithm and sharding key.

What can really be used for sharding operations is the sharding key + sharding algorithm, that is, the sharding strategy.

insert image description here

The reason for this design is because of the independence of the sharding algorithm, it is separated independently.

ShardingSphere-JDBC considers more flexibility, and abstracts the sharding algorithm separately, which is convenient for developers to expand;

Standard Fragmentation Strategy

The standard sharding strategy is applicable to a single sharding key, and this strategy supports two sharding algorithms, PreciseShardingAlgorithm and RangeShardingAlgorithm.

Among them, PreciseShardingAlgorithm is mandatory and is used to process = and IN fragmentation.

RangeShardingAlgorithm is used to process BETWEEN AND, >, <, >=, <= conditional sharding,

RangeShardingAlgorithm is optional. If RangeShardingAlgorithm is not configured, the conditions in SQL will be processed according to the whole library route.

insert image description here

Composite Fragmentation Strategy

The composite sharding strategy corresponds to ComplexShardingStrategy.

It also supports the fragmentation operations of =, >, <, >=, <=, IN and BETWEEN AND in SQL statements.

The difference is that it supports multiple shard keys, and the specific details of allocating shards are completely implemented by the application developer.

ComplexShardingStrategy supports multi-sharding keys. Because the relationship between multi-sharding keys is complex, it does not encapsulate too much, but directly transparently transmits the combination of sharding key values ​​and sharding operators to the sharding algorithm. Implemented by application developers, providing maximum flexibility.

Expression sharding strategy (inline inline sharding strategy)

The row expression sharding strategy supports the sharding operations of = and IN in the SQL statement, but only supports a single sharding key.

This strategy is usually used for simple sharding. There is no need to customize the sharding algorithm, and you can directly write rules in the configuration file.

t_order_$->{t_order_id % 4} means that t_order takes the modulus of its field t_order_id and splits it into 4 tables, and the table names are t_order_0 to t_order_3.

Enforce sharding strategy (Hint implies sharding strategy)

Hint sharding strategy, a strategy for sharding by specifying the shard key instead of extracting the shard key from SQL.

For scenarios where the sharding value is not determined by SQL, does not come from sharding, or even does not have a sharding, but is determined by other external conditions, you can use the Hint sharding strategy.

The previous sharding strategy is to parse the SQL statement, extract the shard key and shard value, and perform sharding according to the set sharding algorithm.

The Hint sharding algorithm specifies the sharding value instead of extracting it from SQL, but manually setting it as a strategy for sharding.

Example: In the internal system, the database is divided according to the primary key of employee login, but there is no such field in the database.

No Fragmentation Strategy

Corresponds to NoneShardingStrategy. No sharding strategy.

Strictly speaking, this is not a fragmentation strategy.

It's just that ShardingSphere also provides such a configuration.

Fragmentation Algorithm

We mentioned above that we can use the rule sharding of the sharding modulus, but this is only a relatively simple one.

In actual development, we also hope to use conditions such as >=, <=, >, <, BETWEEN, and IN as fragmentation rules to customize fragmentation logic. At this time, we need to use fragmentation strategies and fragmentation algorithms.

From the perspective of SQL execution, sub-database and sub-table can be regarded as a routing mechanism, which routes SQL statements to our desired database or data table and obtains data. The sharding algorithm can be understood as a routing rule.

Let's take a look at the relationship between them first. The sharding strategy is just an abstract concept. It is composed of a sharding algorithm and a sharding key. The sharding algorithm performs specific data sharding logic.

The sharding strategy configuration for database sharding and table sharding is relatively independent, and different strategies and algorithms can be used respectively. Each strategy can be a combination of multiple sharding algorithms, and each sharding algorithm can be used for multiple sharding keys. Make a logical judgment.

img

sharding-jdbc provides a variety of sharding algorithms:

Provides abstract fragmentation algorithm class: ShardingAlgorithm, which can be further divided into: precise fragmentation algorithm, interval fragmentation algorithm, compound fragmentation algorithm and Hint fragmentation algorithm according to the type;

  • Accurate Fragmentation Algorithm: Corresponding PreciseShardingAlgorithmclass, mainly used to process =and INfragmentation;
  • Interval Fragmentation Algorithm: Corresponding RangeShardingAlgorithmclass, mainly used to process BETWEEN AND, >, <, >=, <=fragmentation;
  • Composite Fragmentation Algorithm: Corresponding ComplexKeysShardingAlgorithmclass, used to handle scenarios where multiple keys are used as fragmentation keys for fragmentation;
  • Hint Fragmentation Algorithm: Corresponding HintShardingAlgorithmclass, used to deal with Hintscenarios using row fragmentation;

Precise Sharding Algorithm PreciseShardingAlgorithm

The precise sharding algorithm (PreciseShardingAlgorithm) is used for a single field as a sharding key. In SQL, there are shards with conditions such as = and IN.

It needs to be used with StandardShardingStrategy.

Range Sharding Algorithm RangeShardingAlgorithm

The RangeShardingAlgorithm is used for a single field as a sharding key. Sharding with BETWEEN AND, >, <, >=, <= and other conditions in SQL needs to be used in conjunction with StandardShardingStrategy.

Composite sharding algorithm ComplexKeysShardingAlgorithm

Corresponds to ComplexKeysShardingAlgorithm, which is used to handle scenarios where multiple keys are used as sharding keys for sharding.

(Multiple fields are used as sharding keys) , and the values ​​of multiple sharding keys are obtained at the same time, and business logic is processed according to multiple fields.

The logic of including multiple shard keys is complex, and application developers need to handle the complexity by themselves.

It needs to be used with ComplexShardingStrategy.

It needs to be used under the composite sharding strategy (ComplexShardingStrategy).

Hint Fragmentation Algorithm HintShardingAlgorithm

Hint Fragmentation Algorithm (HintShardingAlgorithm) is slightly different

The previous algorithms (such as StandardShardingAlgorithm) parse the SQL statement, extract the shard value, and perform sharding according to the set sharding algorithm.

The Hint sharding algorithm specifies the sharding value instead of extracting it from SQL, but manually setting it as a strategy for sharding.

For scenarios where the sharding value is not determined by SQL, does not come from sharding, or even does not have a sharding, but is determined by other external conditions, the Hint sharding algorithm can be used.

It is necessary to specify the fragment value through Java API, etc., which is also called mandatory routing , or implicit routing .

Example: In the internal system, the database is divided according to the primary key of employee login, but there is no such field in the database.

SQL Hint supports the use of Java API and SQL comments (to be implemented).

Fragmentation strategy of ShardingJDBC

The core of the entire ShardingJDBC sub-database and sub-table lies in **Configuring the sharding strategy + sharding algorithm**.

Our actual combat uses the inline fragmentation algorithm, which provides a fragmentation key and a fragmentation expression to formulate the fragmentation algorithm.

This method is simple to configure and has flexible functions. It is the best configuration method for sub-database and sub-table, and it is very easy to use for most sub-database and sharding scenarios.

However, for some more complex fragmentation strategies, such as multiple fragmentation keys, fragmentation by range, etc., the inline fragmentation algorithm is a bit powerless.

Therefore, we also need to learn several other sharding strategies provided by ShardingSphere.

ShardingSphere currently provides a total of five sharding strategies:

  • NoneShardingStrategy does not fragment

  • InlineShardingStrategy

InlineShardingStrategy

The most commonly used sharding method

Method to realize:

Shard according to the sharding expression.

Actual combat: JavaAPI uses InlineShardingStrategy actual combat

Inline inline sharding strategy

Fragmentation strategies basically correspond to the above fragmentation algorithms, including: standard fragmentation strategy, compound fragmentation strategy, Hint fragmentation strategy, inline fragmentation strategy, non-fragmentation strategy;\

  • Inline sharding strategy:

The corresponding InlineShardingStrategyclass does not provide a fragmentation algorithm, and the routing rules are implemented through expressions;

Inline inline fragmentation configuration class

In use, we did not directly use the above sharding strategy classes. ShardingSphere-JDBC provides corresponding strategy configuration classes including:

  • InlineShardingStrategyConfiguration

Inline inline sharding in practice

With the above related basic concepts, let’s do a simple actual combat for each sharding strategy.

Prepare libraries and tables first before actual combat;

For details, please refer to the video and supporting source code

Prepare real data sources

Prepare two libraries separately: ds0, ds1; and then each library contains 4 tables

CREATE TABLE `t_user_0` (`user_id` bigInt NOT NULL, `name` VARCHAR(45) NULL, PRIMARY KEY (`user_id`));
CREATE TABLE `t_user_1` (`user_id` bigInt NOT NULL, `name` VARCHAR(45) NULL, PRIMARY KEY (`user_id`));
CREATE TABLE `t_user_2` (`user_id` bigInt NOT NULL, `name` VARCHAR(45) NULL, PRIMARY KEY (`user_id`));
CREATE TABLE `t_user_3` (`user_id` bigInt NOT NULL, `name` VARCHAR(45) NULL, PRIMARY KEY (`user_id`));


We have two data sources here, both of which are configured using java code:

  @Before

    public void buildShardingDataSource() throws SQLException {
    
    
        /*
         * 1. 数据源集合:dataSourceMap
         * 2. 分片规则:shardingRuleConfig
         *
         */

        DataSource druidDs1 = buildDruidDataSource(
                "jdbc:mysql://cdh1:3306/sharding_db1?useUnicode=true&characterEncoding=utf8&allowMultiQueries=true&useSSL=true&serverTimezone=UTC",
                "root", "123456");

        DataSource druidDs2 = buildDruidDataSource(
                "jdbc:mysql://cdh1:3306/sharding_db2?useUnicode=true&characterEncoding=utf8&allowMultiQueries=true&useSSL=true&serverTimezone=UTC",
                "root", "123456");
        // 配置真实数据源
        Map<String, DataSource> dataSourceMap = new HashMap<String, DataSource>();
        // 添加数据源.
        // 两个数据源ds_0和ds_1
        dataSourceMap.put("ds0",druidDs1);
        dataSourceMap.put("ds1", druidDs2);

        /**
         * 需要构建表规则
         * 1. 指定逻辑表.
         * 2. 配置实际节点》
         * 3. 指定主键字段.
         * 4. 分库和分表的规则》
         *
         */
        // 配置分片规则
        ShardingRuleConfiguration shardingRuleConfig = new ShardingRuleConfiguration();
        //消息表分片规则
        TableRuleConfiguration userShardingRuleConfig = userShardingRuleConfig();
        shardingRuleConfig.getTableRuleConfigs().add(userShardingRuleConfig);
        // 多数据源一定要指定默认数据源
        // 只有一个数据源就不需要
        shardingRuleConfig.setDefaultDataSourceName("ds0");

        Properties p = new Properties();
        //打印sql语句,生产环境关闭
        p.setProperty("sql.show", Boolean.TRUE.toString());

        dataSource= ShardingDataSourceFactory.createDataSource(
                dataSourceMap, shardingRuleConfig, p);

    }

The two data sources configured here are common data sources, and finally the dataSourceMap will be handed over to ShardingDataSourceFactorymanagement;

Table rule configuration

The table rule configuration class TableRuleConfigurationcontains five elements:

Logic table, real data node, database fragmentation strategy, data table fragmentation strategy, distributed primary key generation strategy;

  /**
     * 消息表的分片规则
     */
    protected TableRuleConfiguration userShardingRuleConfig() {
    
    
        String logicTable = USER_LOGIC_TB;

        //获取实际的 ActualDataNodes
        String actualDataNodes = "ds$->{0..1}.t_user_$->{0..1}";

        TableRuleConfiguration tableRuleConfig = new TableRuleConfiguration(logicTable, actualDataNodes);

        //设置分表策略
        // inline 模式
        ShardingStrategyConfiguration tableShardingStrategy =
                new InlineShardingStrategyConfiguration("user_id", "t_user_$->{user_id % 2}");
                //自定义模式
//        TableShardingAlgorithm tableShardingAlgorithm = new TableShardingAlgorithm();
//        ShardingStrategyConfiguration tableShardingStrategy = new StandardShardingStrategyConfiguration("user_id", tableShardingAlgorithm);

        tableRuleConfig.setTableShardingStrategyConfig(tableShardingStrategy);

        // 配置分库策略(Groovy表达式配置db规则)
        // inline 模式
        ShardingStrategyConfiguration dsShardingStrategy = new InlineShardingStrategyConfiguration("user_id", "ds${user_id % 2}");
        //自定义模式
//        DsShardingAlgorithm dsShardingAlgorithm = new DsShardingAlgorithm();
//        ShardingStrategyConfiguration dsShardingStrategy = new StandardShardingStrategyConfiguration("user_id", dsShardingAlgorithm);
        tableRuleConfig.setDatabaseShardingStrategyConfig(dsShardingStrategy);
        tableRuleConfig.setKeyGeneratorConfig(new KeyGeneratorConfiguration("SNOWFLAKE", "user_id"));
        return tableRuleConfig;
    }

  • Logical table: The logical table configured here is t_user, and the corresponding physical tables are t_user_0 and t_user_1;

  • Real data node: Here, row expressions are used to configure, which simplifies configuration; the above configuration is equivalent to configuration:

    db0
      ├── t_user_0 
      └── t_user_1 
    db1
      ├── t_user_0 
      └── t_user_1
    
    
  • Database fragmentation strategy:

    The library fragmentation strategy here is the five types introduced above.

    The InlineShardingStrategy used here needs to set inline expressions and groovy expressions;

    
            //设置分表策略
            // inline 模式
            ShardingStrategyConfiguration tableShardingStrategy =
                    new InlineShardingStrategyConfiguration("user_id", "t_user_$->{user_id % 2}");
                    //自定义模式
    //        TableShardingAlgorithm tableShardingAlgorithm = new TableShardingAlgorithm();
    //        ShardingStrategyConfiguration tableShardingStrategy = new StandardShardingStrategyConfiguration("user_id", tableShardingAlgorithm);
    
            tableRuleConfig.setTableShardingStrategyConfig(tableShardingStrategy);
    

    The shardingValue here is the actual value corresponding to the user_id, which is the remainder of 2 each time; the availableTargetNames can be selected as {ds0, ds1}; it indicates which library to route to if the remainder matches which library;

  • Data table sharding strategy: The specified **sharding key (order_id)** is inconsistent with the sharding strategy, and everything else is the same;

  • Distributed primary key generation strategy: ShardingSphere-JDBC provides a variety of distributed primary key generation strategies, which will be described in detail later, and the snowflake algorithm is used here;

groovy syntax description

The use of line expressions is very intuitive, you only need to use ${ expression } or $->{ expression } to identify line expressions in the configuration.

Currently supports the configuration of data nodes and sharding algorithms.

The content of row expressions uses the syntax of Groovy, and all operations that Groovy can support, row expressions can support. For example:
${begin…end} means range interval
${[unit1, unit2, unit_x]} means enumeration value

If there are multiple consecutive ${ expression } or $->{ expression } expressions in a line expression, the final result of the entire expression will be Cartesian combined according to the result of each expression.
For example, the following line expression: [ ′ online ′ , ′ offline ′ ] table {['online', 'offline']}_table[online,offline]ta b l e {1…3}
will eventually resolve to:
online_table1, online_table2, online_table3, offline_table1, offline_table2, offline_table3

When configuring data nodes, for evenly distributed data nodes, if the data structure is as follows:

db0
├── t_order0
└── t_order1
db1
├── t_order0
└── t_order1

In row expressions this can be simplified to:
db 0..1.torder {0..1}.t_order0..1.tor d e r {0…1}
or
db− > 0..1.torder ->{0..1}.t_order>0..1.tor d e r ->{0…1}
For custom data nodes, if the data structure is as follows:

db0
├── t_order0
└── t_order1
db1
├── t_order2
├── t_order3
└── t_order4

With row expressions this can be simplified to:
db0.t_order 0..1 , db 1.torder {0..1},db1.t_order0..1,db1.tor d e r {2…4}
or
db0.t_order− > 0..1 , db 1. toorder ->{0..1},db1.t_order>0..1,db1.torder->{2…4}

Configure sharding rules

Configure fragmentation rules ShardingRuleConfiguration, including various configuration rules:

Table rule configuration, binding table configuration, broadcast table configuration, default data source name, default database fragmentation strategy, default table fragmentation strategy, default primary key generation strategy, master-slave rule configuration, encryption rule configuration;

  • Table rule configuration tableRuleConfigs : that is, the library fragmentation strategy and table fragmentation strategy configured above, and is also the most commonly used configuration;
  • Binding table configuration bindingTableGroups : refers to the main table and sub-tables with the same sharding rules; Cartesian product association will not appear in the multi-table association query between the binding tables, and the efficiency of the association query will be greatly improved;
  • Broadcast table configuration broadcastTables : tables that exist in all fragmented data sources, the table structure and data in the table are completely consistent in each database. Applicable to scenarios where the amount of data is not large and requires associated queries with tables of massive data;
  • Default data source name defaultDataSourceName : Tables without sharding will be located by the default data source;
  • The default database sharding strategy defaultDatabaseShardingStrategyConfig: The table rule configuration can set the database sharding strategy, if there is no configuration, you can configure the default here;
  • Default table sharding strategy defaultTableShardingStrategyConfig : Table rule configuration can set the table sharding strategy, if not configured, you can configure the default here;
  • Default primary key generation strategy defaultKeyGeneratorConfig : The table rule configuration can set the primary key generation strategy, if there is no configuration, you can configure the default here; built-in UUID, SNOWFLAKE generator;
  • Master-slave rule configuration masterSlaveRuleConfigs : used to achieve read-write separation, one master table and multiple slave tables can be configured, and load balancing strategies can be configured for multiple slave libraries on the read side;
  • Encryption rule configuration encryptRuleConfig : Provides the function of encrypting certain sensitive data, and provides a complete, safe, transparent, and low-cost transformation data encryption integration solution;

Practice: data insertion

After the above is ready, you can operate the database, and perform the insert operation here:


    /**
     * 新增测试.
     *
     */
    @Test
    public  void testInsertUser() throws SQLException {
    
    

        /*
         * 1. 需要到DataSource
         * 2. 通过DataSource获取Connection
         * 3. 定义一条SQL语句.
         * 4. 通过Connection获取到PreparedStament.
         *  5. 执行SQL语句.
         *  6. 关闭连接.
         */


        // * 2. 通过DataSource获取Connection
        Connection connection = dataSource.getConnection();
        // * 3. 定义一条SQL语句.
        // 注意:******* sql语句中 使用的表是 上面代码中定义的逻辑表 *******
        String sql = "insert into t_user(name) values('name-0001')";

        // * 4. 通过Connection获取到PreparedStament.
        PreparedStatement preparedStatement = connection.prepareStatement(sql);

        // * 5. 执行SQL语句.
        preparedStatement.execute();

         sql = "insert into t_user(name) values('name-0002')";
        preparedStatement = connection.prepareStatement(sql);
        preparedStatement.execute();

        // * 6. 关闭连接.
        preparedStatement.close();
        connection.close();
    }

Create a fragmented data source through the real data source, fragmentation rules and property files configured above ShardingDataSource;

Next, you can operate sub-databases and sub-tables like using a single database and a single table. You can directly use logical tables in SQL, and the fragmentation algorithm will perform routing processing according to specific values;

Finally after routing: odd numbers enter ds1.t_user_1, even numbers enter ds0.t_user_0;

Actual combat: data query

The above is ready, you can operate the database, here is the query operation:

  /**
     * 新增测试.
     *
     */
    @Test
    public  void testSelectUser() throws SQLException {

        /*
         * 1. 需要到DataSource
         * 2. 通过DataSource获取Connection
         * 3. 定义一条SQL语句.
         * 4. 通过Connection获取到PreparedStament.
         *  5. 执行SQL语句.
         *  6. 关闭连接.
         */


        // * 2. 通过DataSource获取Connection
        Connection connection = dataSource.getConnection();
        // * 3. 定义一条SQL语句.
        // 注意:******* sql语句中 使用的表是 上面代码中定义的逻辑表 *******
        String sql = "select * from  t_user where user_id=10000";

        // * 4. 通过Connection获取到PreparedStament.
        PreparedStatement preparedStatement = connection.prepareStatement(sql);

        // * 5. 执行SQL语句.
        ResultSet resultSet= preparedStatement.executeQuery();


        // * 6. 关闭连接.
        preparedStatement.close();
        connection.close();
    }

Actual combat: Properties configuration InlineShardingStrategy actual combat

Use InlineShardingStrategy through Properties configuration

Configuration parameters:

inline.shardingColumn sharding key;

inline.algorithmExpression slice expression

Configuration example

spring.shardingsphere.datasource.names=ds0,ds1
spring.shardingsphere.datasource.ds0.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.ds0.driver-class-name=com.mysql.cj.jdbc.Driver
spring.shardingsphere.datasource.ds0.filters=com.alibaba.druid.filter.stat.StatFilter,com.alibaba.druid.wall.WallFilter,com.alibaba.druid.filter.logging.Log4j2Filter
spring.shardingsphere.datasource.ds0.url=jdbc:mysql://cdh1:3306/sharding_db1?useUnicode=true&characterEncoding=utf8&allowMultiQueries=true&useSSL=true&serverTimezone=UTC
spring.shardingsphere.datasource.ds0.password=123456
spring.shardingsphere.datasource.ds0.username=root
spring.shardingsphere.datasource.ds0.maxActive=20
spring.shardingsphere.datasource.ds0.initialSize=1
spring.shardingsphere.datasource.ds0.maxWait=60000
spring.shardingsphere.datasource.ds0.minIdle=1
spring.shardingsphere.datasource.ds0.timeBetweenEvictionRunsMillis=60000
spring.shardingsphere.datasource.ds0.minEvictableIdleTimeMillis=300000
spring.shardingsphere.datasource.ds0.validationQuery=SELECT 1 FROM DUAL
spring.shardingsphere.datasource.ds0.testWhileIdle=true
spring.shardingsphere.datasource.ds0.testOnBorrow=false
spring.shardingsphere.datasource.ds0.testOnReturn=false
spring.shardingsphere.datasource.ds0.poolPreparedStatements=true
spring.shardingsphere.datasource.ds0.maxOpenPreparedStatements=20
spring.shardingsphere.datasource.ds0.connection-properties=druid.stat.merggSql=ture;druid.stat.slowSqlMillis=5000
spring.shardingsphere.datasource.ds1.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.ds1.driver-class-name=com.mysql.cj.jdbc.Driver
spring.shardingsphere.datasource.ds1.filters=com.alibaba.druid.filter.stat.StatFilter,com.alibaba.druid.wall.WallFilter,com.alibaba.druid.filter.logging.Log4j2Filter
spring.shardingsphere.datasource.ds1.url=jdbc:mysql://cdh1:3306/sharding_db2?useUnicode=true&characterEncoding=utf8&allowMultiQueries=true&useSSL=true&serverTimezone=UTC
spring.shardingsphere.datasource.ds1.password=123456
spring.shardingsphere.datasource.ds1.username=root
spring.shardingsphere.datasource.ds1.maxActive=20
spring.shardingsphere.datasource.ds1.initialSize=1
spring.shardingsphere.datasource.ds1.maxWait=60000
spring.shardingsphere.datasource.ds1.minIdle=1
spring.shardingsphere.datasource.ds1.timeBetweenEvictionRunsMillis=60000
spring.shardingsphere.datasource.ds1.minEvictableIdleTimeMillis=300000
spring.shardingsphere.datasource.ds1.validationQuery=SELECT 1 FROM DUAL
spring.shardingsphere.datasource.ds1.testWhileIdle=true
spring.shardingsphere.datasource.ds1.testOnBorrow=false
spring.shardingsphere.datasource.ds1.testOnReturn=false
spring.shardingsphere.datasource.ds1.poolPreparedStatements=true
spring.shardingsphere.datasource.ds1.maxOpenPreparedStatements=20
spring.shardingsphere.datasource.ds1.connection-properties=druid.stat.merggSql=ture;druid.stat.slowSqlMillis=5000




spring.shardingsphere.sharding.tables.t_user.actual-data-nodes=ds$->{0..1}.t_user_$->{0..1}
spring.shardingsphere.sharding.tables.t_user.table-strategy.inline.sharding-column=user_id
spring.shardingsphere.sharding.tables.t_user.table-strategy.inline.algorithm-expression=t_user_$->{user_id % 2}
spring.shardingsphere.sharding.tables.t_user.database-strategy.inline.sharding-column=user_id
spring.shardingsphere.sharding.tables.t_user.database-strategy.inline.algorithm-expression=ds$->{user_id % 2}
spring.shardingsphere.sharding.tables.t_user.key-generator.column=user_id
spring.shardingsphere.sharding.tables.t_user.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.t_user.key-generator.props.worker.id=123

spring.shardingsphere.sharding.tables.t_order.actual-data-nodes=ds$->{0..1}.t_order_$->{0..1}
spring.shardingsphere.sharding.tables.t_order.table-strategy.inline.sharding-column=user_id
spring.shardingsphere.sharding.tables.t_order.table-strategy.inline.algorithm-expression=t_order_$->{user_id % 2}
spring.shardingsphere.sharding.tables.t_order.database-strategy.inline.sharding-column=user_id
spring.shardingsphere.sharding.tables.t_order.database-strategy.inline.algorithm-expression=ds$->{user_id % 2}
spring.shardingsphere.sharding.tables.t_order.key-generator.column=order_id
spring.shardingsphere.sharding.tables.t_order.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.t_order.key-generator.props.worker.id=123


spring.shardingsphere.sharding.binding-tables[0]=t_order,t_user


# 配置公共表
spring.shardingsphere.sharding.broadcast-tables=t_config
spring.shardingsphere.sharding.tables.t_config.key-generator.column=id
spring.shardingsphere.sharding.tables.t_config.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.t_config.key-generator.props.worker.id=123

Test case for row expression sharding strategy


    @Test
    public void testAddSomeUser() {

        for (int i = 0; i < 10; i++) {
            User dto = new User();

            dto.setName("user_" + i);

            //增加用户
            entityService.addUser(dto);
        }


    }

    @Test
    public void testSelectAllUser() {
        //增加用户
        List<User> all = entityService.selectAllUser();
        System.out.println(all);

    }


    @Test
    public void testSelectAll() {
        entityService.selectAll();
    }

Problems with row expression fragmentation strategy

The row expression sharding strategy ( InlineShardingStrategy), using the expression in the configuration , provides support for and sharding operations Groovyin the SQL statement , and it only supports a single shard key.=IN

The row expression sharding strategy is suitable for simple sharding algorithms, without the need for custom sharding algorithms and tedious code development. It is the simplest of several sharding strategies.

Its configuration is quite concise, and this sharding strategy utilizes inline.algorithm-expressionwritten expressions.

For example: ds-$->{order_id % 2}it means to order_idperform modulo calculation on , $it is a wildcard used to undertake the modulo result, and finally calculate the sub-library ds-0... ds-n, which is relatively simple overall.

spring.shardingsphere.sharding.tables.t_order.actual-data-nodes=ds$->{0..1}.t_order_$->{0..1}
spring.shardingsphere.sharding.tables.t_order.table-strategy.inline.sharding-column=user_id
spring.shardingsphere.sharding.tables.t_order.table-strategy.inline.algorithm-expression=t_order_$->{user_id % 2}

Advantage:

quite concise

Problems with row expression fragmentation strategy:

Does not support range sharding

Range sharding is used to handle sharding containing BETWEEN AND, >, >=, <=.<

Specific demonstration, please see the video

Actual combat: JavaAPI uses StandardShardingStrategy

Usage Scenarios of Standard Fragmentation Strategies

Usage scenario> : There are , >=, , , and operators in the SQL statement , and this fragmentation <=strategy can be applied.<=INBETWEEN AND

Standard sharding strategy ( StandardShardingStrategy), which only supports database sharding and table sharding based on a single shard key (field),

It also provides two sharding algorithms PreciseShardingAlgorithm(precise sharding) and RangeShardingAlgorithm(range sharding).

Among them, the precise sharding algorithm is an algorithm that must be implemented, and it is used for the sharding processing of SQL containing =and ;IN

The range sharding algorithm is optional and is used to process shards containing BETWEEN AND, >, >=, <=.<

Once we do not configure the range sharding algorithm, and SQL uses BETWEEN ANDor and likeso on, then SQL will be executed one by one in the way of full database and table routing, and the query performance will be very poor, which requires special attention.

combat preparation

With the above related basic concepts, let’s do a simple actual combat for each sharding strategy.

Prepare libraries and tables first before actual combat;

For details, please refer to the video and supporting source code

Precise sharding is used to process shards containing = and in.

Range sharding is used to handle sharding containing BETWEEN AND, >, >=, <=.<

Table rule configuration

The table rule configuration class TableRuleConfigurationcontains five elements:

Logic table, real data node, database fragmentation strategy, data table fragmentation strategy, distributed primary key generation strategy;


  
    /**
     * 表的分片规则
     */
    protected TableRuleConfiguration userShardingRuleConfig() {
    
    
        String logicTable = USER_LOGIC_TB;

        //获取实际的 ActualDataNodes
        String actualDataNodes = "ds$->{0..1}.t_user_$->{0..1}";

        // 两个表达式的 笛卡尔积
//ds0.t_user_0
//ds1.t_user_0
//ds0.t_user_1
//ds1.t_user_1

        TableRuleConfiguration tableRuleConfig = new TableRuleConfiguration(logicTable, actualDataNodes);

        //设置分表策略
        // inline 模式
//        ShardingStrategyConfiguration tableShardingStrategy =
//                new InlineShardingStrategyConfiguration("user_id", "t_user_$->{user_id % 2}");
        //自定义模式
        TablePreciseShardingAlgorithm tablePreciseShardingAlgorithm =
                new TablePreciseShardingAlgorithm();

     /*   RouteInfinityRangeShardingAlgorithm tableRangeShardingAlg =
                new RouteInfinityRangeShardingAlgorithm();
                */
        RangeOrderShardingAlgorithm tableRangeShardingAlg =
                new RangeOrderShardingAlgorithm();
        PreciseOrderShardingAlgorithm preciseOrderShardingAlgorithm =
                new PreciseOrderShardingAlgorithm();

        ShardingStrategyConfiguration tableShardingStrategy =
                new StandardShardingStrategyConfiguration("user_id",
                        preciseOrderShardingAlgorithm);
        tableRuleConfig.setTableShardingStrategyConfig(tableShardingStrategy);

        // 配置分库策略(Groovy表达式配置db规则)
        // inline 模式
//        ShardingStrategyConfiguration dsShardingStrategy = new InlineShardingStrategyConfiguration("user_id", "ds${user_id % 2}");
        //自定义模式
        DsPreciseShardingAlgorithm dsPreciseShardingAlgorithm = new DsPreciseShardingAlgorithm();
        RangeOrderShardingAlgorithm dsRangeShardingAlg =
                new RangeOrderShardingAlgorithm();

        ShardingStrategyConfiguration dsShardingStrategy =
                new StandardShardingStrategyConfiguration("user_id",
                        preciseOrderShardingAlgorithm);

        tableRuleConfig.setDatabaseShardingStrategyConfig(dsShardingStrategy);

        tableRuleConfig.setKeyGeneratorConfig(new KeyGeneratorConfiguration("SNOWFLAKE", "user_id"));
        return tableRuleConfig;
    }


Database sharding strategy StandardShardingStrategyConfiguration

        ShardingStrategyConfiguration dsShardingStrategy =
                new StandardShardingStrategyConfiguration("user_id",
                        dsPreciseShardingAlgorithm);

The shardingValue here is the actual value corresponding to the user_id, which is the remainder of 2 each time; the availableTargetNames can be selected as {ds0, ds1}; it indicates which library to route to if the remainder matches which library;

  • Data table sharding strategy: The specified **sharding key (order_id)** is inconsistent with the sharding strategy, and everything else is the same;

  • Distributed primary key generation strategy: ShardingSphere-JDBC provides a variety of distributed primary key generation strategies, which will be described in detail later, and the snowflake algorithm is used here;

test case

After the above is ready, you can operate the database, and perform the insert operation here:

  @Test
    public void testSelectUserIn() throws SQLException {
    
    

        /*
         * 1. 需要到DataSource
         * 2. 通过DataSource获取Connection
         * 3. 定义一条SQL语句.
         * 4. 通过Connection获取到PreparedStament.
         *  5. 执行SQL语句.
         *  6. 关闭连接.
         */


        // * 2. 通过DataSource获取Connection
        Connection connection = dataSource.getConnection();
        // * 3. 定义一条SQL语句.
        // 注意:******* sql语句中 使用的表是 上面代码中定义的逻辑表 *******
        String sql = "select * from  t_user where user_id in (10,11,23)";

        // * 4. 通过Connection获取到PreparedStament.
        PreparedStatement preparedStatement = connection.prepareStatement(sql);

        // * 5. 执行SQL语句.
        ResultSet resultSet = preparedStatement.executeQuery();


        // * 6. 关闭连接.
        preparedStatement.close();
        connection.close();
    }


Create a fragmented data source through the real data source, fragmentation rules and property files configured above ShardingDataSource;

Next, you can operate sub-databases and sub-tables like using a single database and a single table. You can directly use logical tables in SQL, and the fragmentation algorithm will perform routing processing according to specific values;

The most common accurate fragmentation algorithm is used above, let's continue to look at other fragmentation algorithms;

Actual combat: JavaAPI uses RangeShardingAlgorithm actual combat 1

Fragmentation algorithm and fragmentation value

Four Fragmentation Algorithms

  • Precise Sharding Algorithm PreciseShardingAlgorithm

The precise sharding algorithm (PreciseShardingAlgorithm) is used for a single field as a sharding key. In SQL, there are shards with conditions such as = and IN.

It needs to be used with StandardShardingStrategy.

  • Range Sharding Algorithm RangeShardingAlgorithm

The RangeShardingAlgorithm is used for a single field as a sharding key. Sharding with BETWEEN AND, >, <, >=, <= and other conditions in SQL needs to be used in conjunction with StandardShardingStrategy.

  • Composite sharding algorithm ComplexKeysShardingAlgorithm

Corresponds to ComplexKeysShardingAlgorithm, which is used to handle scenarios where multiple keys are used as sharding keys for sharding.

(Multiple fields are used as sharding keys) , and the values ​​of multiple sharding keys are obtained at the same time, and business logic is processed according to multiple fields.

The logic of including multiple shard keys is complex, and application developers need to handle the complexity by themselves.

It needs to be used with ComplexShardingStrategy.

It needs to be used under the composite sharding strategy (ComplexShardingStrategy).

  • Hint Fragmentation Algorithm HintShardingAlgorithm

Hint Fragmentation Algorithm (HintShardingAlgorithm) is slightly different

The previous algorithms (such as StandardShardingAlgorithm) parse the SQL statement, extract the shard value, and perform sharding according to the set sharding algorithm.

The Hint sharding algorithm specifies the sharding value instead of extracting it from SQL, but manually setting it as a strategy for sharding.

For scenarios where the sharding value is not determined by SQL, does not come from sharding, or even does not have a sharding, but is determined by other external conditions, the Hint sharding algorithm can be used.

It is necessary to specify the fragment value through Java API, etc., which is also called mandatory routing , or implicit routing .

Example: In the internal system, the database is divided according to the primary key of employee login, but there is no such field in the database.

Four major fragmentation values

SQL Hint supports the use of Java API and SQL comments (to be implemented).

ShardingSphere-JDBC provides corresponding data for each sharding algorithm ShardingValue, including:

  • PreciseShardingValue
  • RangeSharingValue
  • ComplexKeysShardingValue
  • HintShardingValue

range sharding algorithm

When used in interval query/range query, such as the following query SQL:

select * from  t_user where user_id between 10 and 20

The above two interval values ​​of 10 and 20 will be directly saved to RangeShardingValue, when doing library routing, so two libraries will be accessed;

The reference code is as follows (the following code is described in detail in the video):

public final class RangeOrderShardingAlgorithm implements RangeShardingAlgorithm<Integer> {
    
    

    @Override
    public Collection<String> doSharding(final Collection<String> availableTargetNames, final RangeShardingValue<Integer> shardingValue) {
    
    
        Collection<String> result = new HashSet<>(2);
        for (int i = shardingValue.getValueRange().lowerEndpoint(); i <= shardingValue.getValueRange().upperEndpoint(); i++) {
    
    

            for (String each : availableTargetNames) {
    
    
                System.out.println("shardingValue = " + shardingValue.getValueRange() + " target = " + each + "  shardingValue.getValue() % 2) = " + i % 2);
                if (each.endsWith(String.valueOf(i % 2))) {
    
    
                    result.add(each);
                }
            }
        }
        return result;
    }
}

Test case:

  @Test
    public void testSelectUserBetween() throws SQLException {

        /*
         * 1. 需要到DataSource
         * 2. 通过DataSource获取Connection
         * 3. 定义一条SQL语句.
         * 4. 通过Connection获取到PreparedStament.
         *  5. 执行SQL语句.
         *  6. 关闭连接.
         */


        // * 2. 通过DataSource获取Connection
        Connection connection = dataSource.getConnection();
        // * 3. 定义一条SQL语句.
        // 注意:******* sql语句中 使用的表是 上面代码中定义的逻辑表 *******
        String sql = "select * from  t_user where user_id between 10 and 20 ";

        // * 4. 通过Connection获取到PreparedStament.
        PreparedStatement preparedStatement = connection.prepareStatement(sql);

        // * 5. 执行SQL语句.
        ResultSet resultSet = preparedStatement.executeQuery();


        // * 6. 关闭连接.
        preparedStatement.close();
        connection.close();
    }

Actual combat: JavaAPI uses RangeShardingAlgorithm actual combat 2

Exception: range unbounded on this side

Using the above algorithm, executing the following test case will throw an exception: range unbounded on this side

You can execute the following use cases to see the effect of exceptions


    @Test
    public void testSelectUserBigThan() throws SQLException {

        /*
         * 1. 需要到DataSource
         * 2. 通过DataSource获取Connection
         * 3. 定义一条SQL语句.
         * 4. 通过Connection获取到PreparedStament.
         *  5. 执行SQL语句.
         *  6. 关闭连接.
         */


        // * 2. 通过DataSource获取Connection
        Connection connection = dataSource.getConnection();
        // * 3. 定义一条SQL语句.
        // 注意:******* sql语句中 使用的表是 上面代码中定义的逻辑表 *******
        String sql = "select * from  t_user where user_id > 10000";

        // * 4. 通过Connection获取到PreparedStament.
        PreparedStatement preparedStatement = connection.prepareStatement(sql);

        // * 5. 执行SQL语句.
        ResultSet resultSet = preparedStatement.executeQuery();


        // * 6. 关闭连接.
        preparedStatement.close();
        connection.close();
    }

abnormal reason

The above two interval values ​​have no boundaries, and when the upper boundary is obtained , RangeShardingValue will throw an exception

Since there is no border, do full routing directly

Routing range shards without boundaries

When used in interval query/range query, such as the following query SQL:

select * from  t_user where user_id > 10000

The reference code is as follows (the following code is described in detail in the video):

public final class RouteInfinityRangeShardingAlgorithm implements RangeShardingAlgorithm<Integer> {
    
    

    @Override
    public Collection<String> doSharding(final Collection<String> availableTargetNames, final RangeShardingValue<Integer> shardingValue) {
    
    

        Collection<String> result = new HashSet<>();

        result.addAll(availableTargetNames);

        return result;
    }
}

Properties configuration StandardShardingStrategy actual combat

Use StandardShardingStrategy through Properties configuration

Configuration parameters:

  • standard.sharding-column sharding key;

  • standard.precise-algorithm-class-name precise sharding algorithm class name;

  • standard.range-algorithm-class-name range sharding algorithm class name

Parameter standard.precise-algorithm-class-name Description:

standard.precise-algorithm-class-name points to a java implementation class that implements the PreciseShardingAlgorithm interface,

 io.shardingsphere.api.algorithm.sharding.standard.PreciseShardingAlgorithm

This java implementation class provides precise segmentation according to = or IN logic

Parameter standard.range-algorithm-class-name Description:

Point to a java class name that implements the io.shardingsphere.api.algorithm.sharding.standard.RangeShardingAlgorithm interface,

This java implementation class provides range sharding based on the Between condition.

示例: com.crazymaker.springcloud.message.core.PreciseShardingAlgorithm

Supplementary description of parameters:

Among the two built-in algorithms of StandardShardingStrategy: the precise sharding algorithm must be provided, while the range sharding algorithm is optional.

Configuration example


spring.shardingsphere.sharding.tables.t_order.actual-data-nodes=ds$->{0..1}.t_order_$->{0..1}
#spring.shardingsphere.sharding.tables.t_order.table-strategy.inline.sharding-column=user_id
#spring.shardingsphere.sharding.tables.t_order.table-strategy.inline.algorithm-expression=t_order_$->{user_id % 2}
spring.shardingsphere.sharding.tables.t_order.table-strategy.standard.sharding-column=user_id
spring.shardingsphere.sharding.tables.t_order.table-strategy.standard.precise-algorithm-class-name=com.crazymaker.springcloud.sharding.jdbc.demo.core.TablePreciseShardingAlgorithmDemo
spring.shardingsphere.sharding.tables.t_order.key-generator.column=order_id
spring.shardingsphere.sharding.tables.t_order.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.t_order.key-generator.props.worker.id=123
spring.shardingsphere.sharding.tables.t_order.database-strategy.standard.precise-algorithm-class-name=com.crazymaker.springcloud.sharding.jdbc.demo.core.DsPreciseShardingAlgorithmDemo
spring.shardingsphere.sharding.tables.t_order.database-strategy.standard.sharding-column=user_id
#spring.shardingsphere.sharding.tables.t_order.database-strategy.inline.sharding-column=user_id
#spring.shardingsphere.sharding.tables.t_order.database-strategy.inline.algorithm-expression=ds$->{user_id % 2}

That's all for writing, for more detailed content, please see the video

ComplexShardingStrategy composite sharding strategy practice

Disadvantages of inline sharding and standard sharding strategies:

Only one shard is built

Question: How can multiple shard keys participate in shard routing?

ComplexSharding Fragmentation Strategy

The fragmentation strategy basically corresponds to the fragmentation algorithm above, including: standard fragmentation strategy, composite fragmentation strategy, hint fragmentation strategy, inline fragmentation strategy, and non-fragmentation strategy;

  • Standard Fragmentation Strategy: The corresponding StandardShardingStrategyclass provides PreciseShardingAlgorithmtwo RangeShardingAlgorithmfragmentation algorithms, PreciseShardingAlgorithmwhich are mandatory and RangeShardingAlgorithmoptional;

    public final class StandardShardingStrategy implements ShardingStrategy {
          
          
        private final String shardingColumn;
        private final PreciseShardingAlgorithm preciseShardingAlgorithm;
        private final RangeShardingAlgorithm rangeShardingAlgorithm;
    }
    
    
  • Composite sharding strategy: corresponding ComplexShardingStrategyclass, providing ComplexKeysShardingAlgorithmsharding algorithm;

    public final class ComplexShardingStrategy implements ShardingStrategy {
          
          
        @Getter
        private final Collection<String> shardingColumns;
        private final ComplexKeysShardingAlgorithm shardingAlgorithm;
    }
    
    

    It can be found that multiple shard keys are supported;

  • Hint Fragmentation Strategy: For the corresponding HintShardingStrategyclass, the strategy for fragmentation is to specify the fragmentation value through Hint instead of extracting the fragmentation value from SQL; provide HintShardingAlgorithmfragmentation algorithms;

    public final class HintShardingStrategy implements ShardingStrategy {
          
          
        @Getter
        private final Collection<String> shardingColumns;
        private final HintShardingAlgorithm shardingAlgorithm;
    }
    
    
  • Inline sharding strategy: corresponding InlineShardingStrategyclass, no sharding algorithm is provided, and routing rules are implemented through expressions;

  • Non-fragmentation strategy: corresponding NoneShardingStrategyclass, non-fragmentation strategy;

ComplexSharding sharding strategy configuration class

In use, we did not directly use the above sharding strategy classes. ShardingSphere-JDBC provides corresponding strategy configuration classes including:

  • StandardShardingStrategyConfiguration
  • ComplexShardingStrategyConfiguration
  • HintShardingStrategyConfiguration
  • InlineShardingStrategyConfiguration
  • NoneShardingStrategyConfiguration
/**
 * Complex sharding strategy configuration.
 */
@Getter
public final class ComplexShardingStrategyConfiguration implements ShardingStrategyConfiguration {
    
    private final String shardingColumns;
    
    private final ComplexKeysShardingAlgorithm shardingAlgorithm;
    
    public ComplexShardingStrategyConfiguration(
    
    final String shardingColumns, 
    final ComplexKeysShardingAlgorithm shardingAlgorithm) {
    
        Preconditions.checkArgument(!Strings.isNullOrEmpty(shardingColumns), "ShardingColumns is required.");
        Preconditions.checkNotNull(shardingAlgorithm, "ShardingAlgorithm is required.");
        this.shardingColumns = shardingColumns;
        this.shardingAlgorithm = shardingAlgorithm;
    }
}

ComplexSharding Fragmentation Algorithm

Provides abstract fragmentation algorithm class: ShardingAlgorithm, which can be further divided into: precise fragmentation algorithm, interval fragmentation algorithm, compound fragmentation algorithm and Hint fragmentation algorithm according to the type;

  • Accurate Fragmentation Algorithm: Corresponding PreciseShardingAlgorithmclass, mainly used to process =and INfragmentation;
  • Interval Fragmentation Algorithm: Corresponding RangeShardingAlgorithmclass, mainly used to process BETWEEN AND, >, <, >=, <=fragmentation;
  • Composite Fragmentation Algorithm: Corresponding ComplexKeysShardingAlgorithmclass, used to handle scenarios where multiple keys are used as fragmentation keys for fragmentation;
  • Hint Fragmentation Algorithm: Corresponding HintShardingAlgorithmclass, used to deal with Hintscenarios using row fragmentation;

All the above algorithm classes are interface classes, and the specific implementation is left to the developers themselves;

Customize ComplexSharding Fragmentation Algorithm

Question: How can multiple shard keys participate in shard routing?

user_id, and oder_id participate in sharding

The fragmentation algorithm is as follows:


public class SimpleComplexKeySharding implements ComplexKeysShardingAlgorithm<Long> {
    
    

    @Override
    public Collection<String> doSharding(Collection<String> availableTargetNames,
                                         ComplexKeysShardingValue<Long> shardingValue) {
    
    
        Map<String, Collection<Long>> map = shardingValue.getColumnNameAndShardingValuesMap();

        Collection<Long> userIds = map.get("user_id");
        Collection<Long> orderIds = map.get("order_id");

        List<String> result = new ArrayList<>();
        // user_id,order_id分片键进行分表
        for (Long userId : userIds) {
    
    
            for (Long orderId : orderIds) {
    
    

                Long innerShardingValue = userId + orderId;
                Long suffix = innerShardingValue % 2;


                for (String each : availableTargetNames) {
    
    
                    System.out.println("innerShardingValue = " + innerShardingValue + " target = " + each + " innerShardingValue % 2 = " + suffix);
                    if (each.endsWith(suffix + "")) {
    
    
                        result.add(each);
                    }
                }
            }
        }
        return result;
    }
}

Use the ComplexSharding composite sharding algorithm through code

Multiple sharding keys can be used at the same time, for example, user_id and order_id can be used as sharding keys at the same time;

orderTableRuleConfig.setDatabaseShardingStrategyConfig(
		new ComplexShardingStrategyConfiguration("order_id,user_id", new SimpleComplexKeySharding()));
orderTableRuleConfig.setTableShardingStrategyConfig(
		new ComplexShardingStrategyConfiguration("order_id,user_id", new SimpleComplexKeySharding()));

As above, when configuring the sub-database and sub-table strategy, two shard keys are specified, separated by commas;

Configure with properties

Complex sharding strategies that support multiple shard keys.

Configuration parameters:

complex.sharding-columns shard key (multiple);

complex.algorithm-class-name Fragmentation algorithm implementation class.

Configuration parameters:

shardingColumn specifies multiple sharding columns.

algorithmClassName points to a java class name that implements the org.apache.shardingsphere.api.sharding.complex.ComplexKeysShardingAlgorithm interface. Provides an algorithm for comprehensive sharding based on multiple sharding columns.

For details, please see the video

Test Cases and Execution

see video

HintShardingStrategy enforces (hints) sharding strategy in practice

Question: In some application scenarios, the shard value does not exist in SQL, but exists in external business logic, what's wrong?

Question 2: How to divide according to external value?

eg:

I want to shard by month, or by hour

I want to slice according to mood

easy to understand

This sharding strategy, simply understood, means that its sharding key is no longer associated with the SQL statement, but is specified separately by the program.

For some complex situations, such as select count(*) from (select userid from t_user where userid in (1,3,5,7,9)) SQL statements, it is impossible to specify a partition key through SQL statements .

The hint strategy is different from the previous strategy:

  • The previous strategy to extract the sharding key columns and values ​​and perform sharding is Apache ShardingSphere's zero-intrusion implementation of SQL.

If there is no fragmentation condition in the SQL statement, fragmentation cannot be performed and full routing is required.

In some application scenarios, sharding conditions do not exist in SQL, but in external business logic.

  • The hint strategy needs to provide a way to specify the shard value externally, which is called Hint in Apache ShardingSphere.

The implied shard value algorithm is as follows:

You can programmatically HintManageradd fragmentation values ​​to , which only take effect in the current thread; then use hint hint strategy + hint hint algorithm fragmentation

Fragmentation Strategy Algorithm

ShardingSphere-JDBC introduces two concepts of sharding algorithm and sharding strategy respectively in the sharding strategy.

Of course, the shard key is also a core concept in ; it can be easily understood here 分片策略 = 分片算法 + 分片键;

As for why it is designed in this way, it should be that ShardingSphere-JDBC considers more flexibility and abstracts the sharding algorithm separately to facilitate developers to expand;

Fragmentation Algorithm

Provides abstract fragmentation algorithm class: ShardingAlgorithm, which can be further divided into: precise fragmentation algorithm, interval fragmentation algorithm, compound fragmentation algorithm and Hint fragmentation algorithm according to the type;

  • Accurate Fragmentation Algorithm: Corresponding PreciseShardingAlgorithmclass, mainly used to process =and INfragmentation;
  • Interval Fragmentation Algorithm: Corresponding RangeShardingAlgorithmclass, mainly used to process BETWEEN AND, >, <, >=, <=fragmentation;
  • Composite Fragmentation Algorithm: Corresponding ComplexKeysShardingAlgorithmclass, used to handle scenarios where multiple keys are used as fragmentation keys for fragmentation;
  • Hint Fragmentation Algorithm: Corresponding HintShardingAlgorithmclass, used to handle scenarios that use external value fragmentation;

All the above algorithm classes are interface classes, and the specific implementation is left to the developers themselves;

Fragmentation strategy

The fragmentation strategy basically corresponds to the fragmentation algorithm above, including: standard fragmentation strategy, composite fragmentation strategy, hint fragmentation strategy, inline fragmentation strategy, and non-fragmentation strategy;

  • Hint Fragmentation Strategy: For the corresponding HintShardingStrategyclass, the strategy for fragmentation is to specify the fragmentation value through Hint instead of extracting the fragmentation value from SQL; provide HintShardingAlgorithmfragmentation algorithms;

    public final class HintShardingStrategy implements ShardingStrategy {
          
          
        @Getter
        private final Collection<String> shardingColumns;
        private final HintShardingAlgorithm shardingAlgorithm;
    }
    
    
  • Inline sharding strategy: corresponding InlineShardingStrategyclass, no sharding algorithm is provided, and routing rules are implemented through expressions;

  • Non-fragmentation strategy: corresponding NoneShardingStrategyclass, non-fragmentation strategy;

Fragmentation strategy configuration class

In use, we did not directly use the above sharding strategy classes. ShardingSphere-JDBC provides corresponding strategy configuration classes including:

  • StandardShardingStrategyConfiguration
  • ComplexShardingStrategyConfiguration
  • HintShardingStrategyConfigurationExternal Value Sharding
  • InlineShardingStrategyConfiguration
  • NoneShardingStrategyConfiguration

Custom HintShardingAlgorithm sharding algorithm

Question: how to adjust according to external value fragmentation?

I want to shard by month, or by hour

I want to slice according to mood

The fragmentation algorithm is as follows:


public class SimpleHintShardingAlgorithmDemo implements HintShardingAlgorithm<Integer> {

    @Override
    public Collection<String> doSharding(Collection<String> availableTargetNames,
                                         HintShardingValue<Integer> hintShardingValue) {

        Collection<String> result = new HashSet<>(2);
        Collection<Integer> values = hintShardingValue.getValues();


        for (String each : availableTargetNames) {

            for (int shardingValue : values) {


                System.out.println("shardingValue = " + shardingValue + " target = " + each + " shardingValue % 2 = " + shardingValue % 2);
                if (each.endsWith(String.valueOf(shardingValue % 2))) {
                    result.add(each);
                }

            }
        }
        return result;
    }

	
}

Configure using code

// 设置库表分片策略
orderTableRuleConfig.setDatabaseShardingStrategyConfig(new HintShardingStrategyConfiguration(new 		SimpleHintShardingAlgorithmDemo()));
orderTableRuleConfig.setTableShardingStrategyConfig(new HintShardingStrategyConfiguration(new SimpleHintShardingAlgorithmDemo()));

Configure with properties

  • Configuration parameter: hint.algorithm-class-name Fragmentation algorithm implementation class.

  • Method to realize:

    algorithmClassName points to a java class name that implements the org.apache.shardingsphere.api.sharding.hint.HintShardingAlgorithm interface.示例:com.roy.shardingDemo.algorithm.MyHintShardingAlgorithm

    In this algorithm class, a shard key is also required. The partition key is specified through the HintManager.addDatabaseShardingValue method (sub-database) and HintManager.addTableShardingValue (sub-table).

    When using it, it should be noted that this shard key is thread-isolated and only valid in the current thread, so it is usually recommended to close it immediately after use, or open it with a try resource.

HintManagerHints are used in code

In some application scenarios, fragmentation conditions do not exist in SQL, but in external business logic;

Question: how to adjust according to external value fragmentation?

I want to shard by month, or by hour

I want to slice according to mood

You can programmatically HintManageradd a fragment value to , which only takes effect in the current thread;


    @Test
    public void testAddSomeOrderByMonth() {
    
    

        for (int month = 1; month <= 12; month++) {
    
    
            final int index = month;
            new Thread(new Runnable() {
    
    
                @Override
                public void run() {
    
    
                    System.out.println("当前月份 = " + index);
                    HintManager hintManager = HintManager.getInstance();
                    hintManager.addTableShardingValue("t_order", index);
                    hintManager.addDatabaseShardingValue("t_order", index);

                    Order dto = new Order();
                    dto.setUserId(704733680467685377L);

                    //增加订单
                    entityService.addOrder(dto);


                }
            }).start();
        }
    }

Test Cases and Execution

see video

Hint implementation mechanism

Apache ShardingSphere uses ThreadLocalto manage shard key values. HintManagerFragmentation conditions can be added programmatically , and the fragmentation conditions are only valid in the current thread.

In addition to using forced shard routing programmatically, Apache ShardingSphere can also refer to hints through special comments in SQL, so that developers can use this function in a more transparent way.

SQL with mandatory sharding route specified will ignore the original sharding logic and be routed directly to the specified real data node.

Remember:

If it involves ThreadLocalthread local variables, remember to clean them up after execution. So as not to pollute the subsequent execution, especially in the thread pool scenario.

The use of Session is also similar.

Advantages and disadvantages of Hint sharding strategy

Scene advantage:

Fragment value can be specified programmatically

Performance advantages:

The Hint sharding strategy does not completely follow the SQL parsing tree to build a sharding strategy, it bypasses SQL parsing.

For some complex statements, the performance of the hint fragmentation strategy may be better. It is only possible, and the source code needs to be analyzed .

usage restrictions

Hint routing has many restrictions when used:

-- 不支持UNION
SELECT * FROM t_order1 UNION SELECT * FROM t_order2
INSERT INTO tbl_name (col1, col2,) SELECT col1, col2,FROM tbl_name WHERE col3 = ?

-- 不支持多层子查询
SELECT COUNT(*) FROM (SELECT * FROM t_order o WHERE o.id IN (SELECT id FROM t_order WHERE status = ?))

-- 不支持函数计算。ShardingSphere只能通过SQL字面提取用于分片的值
SELECT * FROM t_order WHERE to_date(create_time, 'yyyy-mm-dd') = '2019-01-01';

It can also be seen from here that even with the ShardingSphere framework, the support for SQL statements after sub-database sub-table is still very fragile.

NoneShardingStrategyConfiguration non-fragmentation strategy actual combat

How to configure without fragmentation

Fragmentation strategy

The fragmentation strategy basically corresponds to the fragmentation algorithm above, including: standard fragmentation strategy, composite fragmentation strategy, hint fragmentation strategy, inline fragmentation strategy, and non-fragmentation strategy;

  • Non-fragmentation strategy: corresponding NoneShardingStrategyclass, non-fragmentation strategy;

Fragmentation strategy configuration class

In use, we did not directly use the above sharding strategy classes. ShardingSphere-JDBC provides corresponding strategy configuration classes including:

  • NoneShardingStrategyConfiguration

Configure using code

Just configure NoneShardingStrategyConfiguration:

orderTableRuleConfig.setDatabaseShardingStrategyConfig(new NoneShardingStrategyConfiguration());
orderTableRuleConfig.setTableShardingStrategyConfig(new NoneShardingStrategyConfiguration());

Configure with properties

see video

In this way, the data will be inserted into each table in each library, which can be understood as广播表

Actual Combat: Principles and Practical Operations of Broadcast Tables

What is broadcast table:

The tables, table structures and data in the tables that exist in all data sources are completely consistent in each database.

Generally, it is a dictionary table or a configuration table t_config,

Once a table is configured as a broadcast table, as long as the broadcast table of a certain database is modified, the data in the broadcast table in all data sources will be synchronized accordingly.

There is such a situation: the table structure and the data in the table are completely consistent in each database, such as a dictionary table, so what should be done at this time? At this time, the radio watch came into being.

Definition: Refers to the tables that exist in all fragmented data sources, and the table structure and data in the tables are completely consistent in each database.
Applicable: Scenarios where the amount of data is not large and requires associated queries with tables of massive data, such as dictionary tables.

The broadcast table needs to meet the following requirements:
(1) This table exists in every database table and the table structure is the same.
(2) When saving, each database will insert the same data.

Configure using code

Just configure NoneShardingStrategyConfiguration:


        //广播表配置如下;
//        shardingRuleConfig.getBroadcastTables().add("t_config");


Configure with properties

spring.shardingsphere.sharding.broadcast-tables=t_config

For a specific demonstration, please see the video

The effect of the radio table

The result of the operation is as follows:

When adding a record, the same data will be saved in both ds0 and ds1.
When querying, a data source is randomly selected for querying.

Add to


[main] INFO  ShardingSphere-SQL - Logic SQL: insert into t_config (status, id) values (?, ?)
[main] INFO  ShardingSphere-SQL - SQLStatement: InsertStatementContext(super=CommonSQLStatementContext(sqlStatement=org.apache.shardingsphere.sql.parser.sql.statement.dml.InsertStatement@61be6051, tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@13c18bba), tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@13c18bba, columnNames=[status, id], insertValueContexts=[InsertValueContext(parametersCount=2, valueExpressions=[ParameterMarkerExpressionSegment(startIndex=42, stopIndex=42, parameterMarkerIndex=0), ParameterMarkerExpressionSegment(startIndex=45, stopIndex=45, parameterMarkerIndex=1)], parameters=[UN_KNOWN, 1])], generatedKeyContext=Optional.empty)
[main] INFO  ShardingSphere-SQL - Actual SQL: ds0 ::: insert into t_config (status, id) values (?, ?) ::: [UN_KNOWN, 1]
[main] INFO  ShardingSphere-SQL - Actual SQL: ds1 ::: insert into t_config (status, id) values (?, ?) ::: [UN_KNOWN, 1]

Inquire

[main] INFO  o.h.h.i.QueryTranslatorFactoryInitiator - HHH000397: Using ASTQueryTranslatorFactory
[main] INFO  ShardingSphere-SQL - Logic SQL: select configenti0_.id as id1_0_, configenti0_.status as status2_0_ from t_config configenti0_ limit ?
[main] INFO  ShardingSphere-SQL - SQLStatement: SelectStatementContext(super=CommonSQLStatementContext(sqlStatement=org.apache.shardingsphere.sql.parser.sql.statement.dml.SelectStatement@784212, tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@5ac646b3), tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@5ac646b3, projectionsContext=ProjectionsContext(startIndex=7, stopIndex=66, distinctRow=false, projections=[ColumnProjection(owner=configenti0_, name=id, alias=Optional[id1_0_]), ColumnProjection(owner=configenti0_, name=status, alias=Optional[status2_0_])]), groupByContext=org.apache.shardingsphere.sql.parser.binder.segment.select.groupby.GroupByContext@24b38e8f, orderByContext=org.apache.shardingsphere.sql.parser.binder.segment.select.orderby.OrderByContext@5cf072ea, paginationContext=org.apache.shardingsphere.sql.parser.binder.segment.select.pagination.PaginationContext@1edac3b4, containsSubquery=false)
[main] INFO  ShardingSphere-SQL - Actual SQL: ds1 ::: select configenti0_.id as id1_0_, configenti0_.status as status2_0_ from t_config configenti0_ limit ? ::: [3]
[ConfigBean(id=1, status=UN_KNOWN), ConfigBean(id=704836248892059648, status=UN_KNOWN0), ConfigBean(id=704836250150350849, status=UN_KNOWN1)]


Combat: binding table

Binding tables: those main tables and sub-tables with consistent sharding rules.

For example: t_order order table and t_order_item order service item table are both fragmented according to the order_id field, so the two tables are bound to each other.

What is the significance of the existence of the binding table?

Usually in our business, tables such as t_order and t_order_item are used for multi-table joint query, but these tables are split into N sub-tables after sub-database sub-tables.

If the binding table relationship is not configured, a Cartesian product association query will appear, and the following four SQLs will be generated.

no binding table effect


[main] INFO  ShardingSphere-SQL - Logic SQL: SELECT a.* FROM `t_order` a left join `t_user` b on a.user_id=b.user_id  where  a.user_id=?
....
[main] INFO  ShardingSphere-SQL - Actual SQL: ds1 ::: SELECT a.* FROM `t_order_1` a left join `t_user_1` b on a.user_id=b.user_id  where  a.user_id=? ::: [704733680467685377]
[main] INFO  ShardingSphere-SQL - Actual SQL: ds1 ::: SELECT a.* FROM `t_order_1` a left join `t_user_0` b on a.user_id=b.user_id  where  a.user_id=? ::: [704733680467685377]
[order_id: 704786564605521921, user_id: 704733680467685377, status: NotPayed, order_id: 704786564697796609, ....]

has the effect of binding the table

[main] INFO  ShardingSphere-SQL - Logic SQL: SELECT a.* FROM `t_order` a left join `t_user` b on a.user_id=b.user_id  where  a.user_id=?
[main] INFO  ShardingSphere-SQL - SQLStatement: SelectStatementContext(super=CommonSQLStatementContext(sqlStatement=org.apache.shardingsphere.sql.parser.sql.statement.dml.SelectStatement@4247093b, tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@7074da1d), tablesContext=org.apache.shardingsphere.sql.parser.binder.segment.table.TablesContext@7074da1d, projectionsContext=ProjectionsContext(startIndex=7, stopIndex=9, distinctRow=false, projections=[ShorthandProjection(owner=Optional[a], actualColumns=[ColumnProjection(owner=a, name=order_id, alias=Optional.empty), ColumnProjection(owner=a, name=user_id, alias=Optional.empty), ColumnProjection(owner=a, name=status, alias=Optional.empty)])]), groupByContext=org.apache.shardingsphere.sql.parser.binder.segment.select.groupby.GroupByContext@5bdb6ea8, orderByContext=org.apache.shardingsphere.sql.parser.binder.segment.select.orderby.OrderByContext@3e55eeb9, paginationContext=org.apache.shardingsphere.sql.parser.binder.segment.select.pagination.PaginationContext@44a13699, containsSubquery=false)
[main] INFO  ShardingSphere-SQL - Actual SQL: ds1 ::: SELECT a.* FROM `t_order_1` a left join `t_user_1` b on a.user_id=b.user_id  where  a.user_id=? ::: [704733680467685377]
[order_id: 704786564605521921, user_id: 704733680467685377, status: NotPayed, order_id: 704786564697796609, user_id: 704733680467685377, status: NotPayed, order_id: 704786564790071297, user_id: 704733680467685377, .....]

SQL execution process of shardingjdbc

shardingjdbc expands the original DataSource, Connectionetc. interfaces into ShardingDataSource, ShardingConnection,

The sharding operation interface exposed to the outside world is completely consistent with the interface provided in the JDBC specification. As long as you are familiar with JDBC, you can easily apply Sharding-JDBC to realize sharding and sharding.

img

A table is split into multiple sub-tables after being divided into databases and tables, and distributed to different databases.

Under the premise of not modifying the original business SQL, Sharding-JDBCsome modifications must be made to the SQL to execute normally.

The general execution process: SQL 解析-> 查询优化-> SQL 路由-> - SQL 改写> -> SQL 执⾏-> 结果归并consists of six steps, let's take a look at what each step does.

img

SQL parsing

Then the syntax analysis will convert the split SQL into an abstract syntax tree. By traversing the abstract syntax tree, the context required for sharding is extracted.

The context includes query field information ( Field), table information ( Table), query condition ( ) Condition, sorting information ( ) Order By, grouping information ( Group By), and paging information ( Limit), etc., and marks the positions that may need to be rewritten in SQL.

For example, the following SQL:

SELECT id, name FROM t_user WHERE status = 'ACTIVE' AND age > 18

SQL parsing engine

Compared to other programming languages, SQL is relatively simple. However, it is still a complete programming language, so parsing the syntax of SQL is not fundamentally different from parsing other programming languages ​​(such as: Java language, C language, Go language, etc.).

function points

• Provide independent SQL parsing function

• It is very convenient to expand and modify the grammar rules (using ANTLR)

• Supports SQL parsing in multiple dialects

database support status
MySQL support, perfect
PostgreSQL support, perfect
SQLServer support
Oracle support
SQL92 support
  • history

SQL parsing is the core of sub-database and sub-table products, and its performance and compatibility are the most important metrics. ShardingSphere's SQL parser has undergone three generations of product update iterations.

The first-generation SQL parser used Druid as the SQL parser in the version before 1.4.x in pursuit of performance and fast implementation. After actual testing, its performance far exceeds other parsers.

The second-generation SQL parser starts from version 1.5.x, and ShardingSphere adopts a completely self-developed SQL parsing engine. Due to different purposes, ShardingSphere does not need to convert SQL into a complete abstract syntax tree, nor does it need to traverse twice through the accessor mode. It adopts a semi-understanding method of SQL, and only extracts the context that data sharding needs to pay attention to, so the performance and compatibility of SQL parsing have been further improved.

The third-generation SQL parser, starting from version 3.0.x, tries to use ANTLR as the generator of the SQL parsing engine, and uses Visit to obtain SQL Statement from AST. Starting from version 5.0.x, the architecture of the parsing engine has been restructured and adjusted. At the same time, by putting the AST obtained from the first parse into the cache, it is convenient to directly obtain the parsing results of the same SQL next time to improve parsing efficiency.

Therefore, the official recommends that users use PreparedStatement, a SQL precompiled method, to improve performance.

abstract syntax tree

The parsing process is divided into lexical parsing and grammar parsing. The lexical analyzer is used to disassemble SQL into indivisible atomic symbols called tokens. And according to the dictionaries provided by different database dialects, they are classified into keywords, expressions, literals and operators. Then use a syntax parser to convert the SQL into an abstract syntax tree.

SQL routing principle of shardingjdbc

SQL routing parses the sharding context, matches the sharding strategy configured by the user, and generates a routing path.

A simple understanding is that according to the sharding strategy we configured, we can calculate which database and which table the SQL should be executed in.

分片路由And the SQL route distinguishes and based on whether there is a shard key 广播路由.

img

A route with a shard key is called a shard route, which is subdivided into three types: direct route, standard route, and Cartesian product route.

direct routing (implied routing)

Direct routing is HintAPIa sharding method that directly routes SQL to specified database tables, and direct routing can be used in scenarios where the sharding key is not in SQL, and can also execute complex situations including subqueries and custom functions. Arbitrary SQL.

For example, if the order is queried based on the condition of the field, it is hoped that direct routing can be used t_order_idwithout modifying the SQL and adding it as a sharding condition.user_id

Direct routing needs to specify the fragment value through Hint (use HintAPI to directly specify the route to the library table),

SQL parsing can be avoided on the premise that there is no need to extract the shard key value, and only the database is sharded without the table being sharded.

Therefore, it has the best compatibility and can execute arbitrary SQL including complex situations such as subqueries and custom functions. Direct routing can also be used in scenarios where the shard key is not in SQL. For example, setting the value for database sharding to 3

hintManager.setDatabaseShardingValue(3);

If the routing algorithm is value % 2, when a logical library t_order corresponds to two real libraries t_order_0 and t_order_1, SQL will be executed on t_order_1 after routing.

standard route

Standard routing is the most recommended and commonly used sharding method, and its scope of application is SQL that does not include associated queries or only includes associated queries between bound tables.

  • When the operator of the SQL shard key is =, the routing result will fall into a single database (table), and the routing policy returns a single target.

  • When the fragmentation operator is in the range of BETWEENor IN, the routing result does not necessarily fall into the only library (table), so a logical SQL may eventually be split into multiple real SQLs for execution.

If the data is sharded according to the odd and even numbers of order_id, the SQL for a single table query is as follows:

SELECT * FROM t_order  where t_order_id in (1,2)

After SQL routing processing

SELECT * FROM t_order_0  where t_order_id in (1,2)
SELECT * FROM t_order_1  where t_order_id in (1,2)

The complexity and performance of the associated query of the bound table is equivalent to that of the single table query.

For example, if the SQL of an associated query that includes a bound table is as follows:

 SELECT * FROM t_order o JOIN t_order_item i ON o.order_id=i.order_id WHERE order_ id IN (1, 2);

Then the result of the route should be:

SELECT * FROM t_order_0 o JOIN t_order_item_0 i ON o.order_id=i.order_id WHERE order_id IN (1, 2);
SELECT * FROM t_order_1 o JOIN t_order_item_1 i ON o.order_id=i.order_id WHERE order_id IN (1, 2);

It can be seen that the number of SQL splits is consistent with that of a single table.

Cartesian product routing

Cartesian routing is generated by association queries between non-binding tables, and the query performance is low. Try to avoid this routing mode.

Cartesian routing is the most complicated case. It cannot locate the sharding rules based on the relationship of the bound tables. Therefore, the association query between non-bound tables needs to be disassembled into Cartesian product combinations for execution.

If the SQL in the previous example does not configure the binding table relationship, the result of the route should be:

SELECT * FROM t_order_0 o JOIN t_order_item_0 i ON o.order_id=i.order_id WHERE order_id IN (1, 2);
SELECT * FROM t_order_0 o JOIN t_order_item_1 i ON o.order_id=i.order_id WHERE order_id IN (1, 2);
SELECT * FROM t_order_1 o JOIN t_order_item_0 i ON o.order_id=i.order_id WHERE order_id IN (1, 2);
SELECT * FROM t_order_1 o JOIN t_order_item_1 i ON o.order_id=i.order_id WHERE order_id IN (1, 2);

Cartesian routing query performance is low, so it should be used with caution.

broadcast routing

Routing without a shard key is also called broadcast routing, which can be divided into five types: full-database table routing, full-database routing, full-instance routing, unicast routing, and blocking routing.

Full table routing

The full database table routing is aimed at database DQLand DML, and DDLother operations,

When we execute a logical table SQL, it is executed one by one in the corresponding real tables t_orderin all shard databases .t_order_0t_order_n

Full Library Routing

Full database routing is mainly for operations on the database level, such as SETdatabase management commands of the database type, and transaction control statements such as TCL.

After setting autocommitthe attribute on the logical library, this command will be executed in all corresponding real libraries.

SET autocommit=0;

Full instance routing

Full-instance routing is a DCL operation for database instances (setting or changing database user or role permissions), such as: create a user order, this command will be executed in all real database instances, so as to ensure that the order user can normally access each database instance.

CREATE USER order@127.0.0.1 identified BY '程序员内点事';

unicast routing

Unicast routing is used to obtain information about a real table, such as the description information of the table:

DESCRIBE t_order; 

t_orderThe real table of is t_order_0... t_order_n, their description structure is exactly the same, we only need to execute it once on any real table.

block routing

Used to shield SQL operations on the database, for example:

USE order_db;

This command will not be executed in the real database, because ShardingSpherethe logical schema (organization and structure of the database) is adopted, so there is no need to send the command to switch the database to the real database.

SQL rewriting

Rewrite the SQL developed based on the logical table into a statement that can be executed correctly in the real database. For example, to query t_orderthe order table, the SQL in our actual development is t_orderwritten according to the logical table.

SELECT * FROM t_order

However, after sub-database sub-tables, t_orderthe tables in the real database do not exist, but are split into multiple sub-tables t_order_nand scattered in different databases. It is obviously not feasible to execute the original SQL. At this time, it is necessary to configure the sub-tables The logical table name of is rewritten to the real table name obtained after routing.

SELECT * FROM t_order_n

SQL execution

The routed and rewritten real SQL is safely and efficiently sent to the underlying data source for execution. But this process is not simply sending SQL directly to the data source for execution through JDBC, but balancing the consumption of data source connection creation and memory usage, which automatically balances resource control and execution efficiency.

Merge results

Merging the multi-data result sets obtained from each data node into a large result set and returning it to the requesting client correctly is called result merging.

The sorting, grouping, paging, and aggregation syntaxes in our SQL are all operated on the merged result set.

Question: How to solve the join of sub-databases

insert image description here

First look at the kind of join.

The content of JOIN is roughly divided into left connection, right connection, inner connection, outer connection, and natural connection.

img

Cartesian Product

JOIN must first understand the Cartesian product.

Cartesian product is to forcibly put together every record in table A and every record in table B. Therefore, if table A has n records and table B has m records, the result of Cartesian product will produce n*m ​​records.

Inner join: INNER JOIN

INNER JOIN is the most commonly used connection operation. From a mathematical point of view, it is to find the intersection of two tables, and from a Cartesian product point of view, it is to pick out the records whose ON clause conditions are satisfied from the Cartesian product.

Left join: LEFT JOIN

The meaning of left join LEFT JOIN is to find the intersection of two tables plus the remaining data in the left table.

From the perspective of Cartesian product, it is to pick out the record of ON clause condition from Cartesian product first, and then add the remaining records in the left table

Right join: RIGHT JOIN

In the same way, right join RIGHT JOIN is to find the intersection of the two tables plus the remaining data in the right table.

Described from the perspective of Cartesian products, the right join is to pick out the records whose ON clause conditions are true from the Cartesian products, and then add the remaining records in the right table

Commonly used is the left outer join

   /**
     * 根据用户查询 order
     *
     * @return
     */
    @Query(nativeQuery = true,
            value = "SELECT a.* FROM `t_order` a left join `t_user` b on a.user_id=b.user_id  where  a.user_id=?1")
    List<OrderEntity> selectOrderOfUserId(long userId);
    /**
     * 根据用户查询 order
     *
     * @return
     */
    @Query(nativeQuery = true,
            value = "SELECT a.* FROM `t_order` a left join `t_user` b on a.user_id=b.user_id ")
    List<OrderEntity> selectOrderOfUser();

Answer: How to solve the join of sub-databases:

  • Is the general use of left outer join,

  • Both tables are built with the same shard,

  • And perform table binding to prevent Cartesian product routing in the data source instance.

  • When joining, the data inside a shard is joined within the shard, and then shardingjdbc completes the merging of the results.

  • to get the final result.

Serial question: After sub-database and sub-table, how to deal with fuzzy condition query?

All of the above mentioned are SQL executions with sharding column in the condition.

However, there are always some query conditions that do not include sharding columns. At the same time, it is impossible for us to unrestricted redundant sub-database and sub-table for these queries with low request volume.

So how to deal with the SQL without sharding column in these query conditions?

In the era of mobile Internet, massive users generate massive amounts of data every day, and these massive amounts of data are far from being able to be held by a single table.

for example

  • User table: Alipay 800 million, WeChat 1 billion. CITIC is 1.4 million public and 87 million private.

  • Order table: Meituan has tens of millions of orders per day, and Taobao has historical orders of tens of billions and hundreds of billions.

At present, the core data of most companies are: mainly RDBMS storage, supplemented by NoSQL/NewSQL storage!

  • RDBMS Internet companies mainly focus on MySQL

  • NoSQL is more representative of MongoDB, es

  • NewSQL is more representative of TiDB.

However, a single MySQL table can store 1 billion-level data. The specific reason has been analyzed in the previous video.

However, as recognized by the industry, the capacity of a MySQL single table is below 1KW, so it is necessary to divide the database into separate tables

To review, the core steps of sharding are:

SQL parsing, rewriting, routing, execution, and result merging.

Taking sharding-jdbc as an example, as many sub-databases and sub-tables as there are, it is necessary to concurrently route to as many sub-databases and sub-tables for execution, and then merge the results.

What's more, especially some fuzzy conditional queries, or the last ten conditional filters.

Compared with the conditional query with sharding column, the performance of this kind of conditional query will obviously drop a lot.

It is best not to use multiple sharding columns. It is recommended to use a single sharding column + es + HBase index and storage isolation architecture.

Index and storage isolation architecture

For example, queries with sharding columns are sub-database and sub-table, some fuzzy queries, or multiple unfixed conditions are filtered by es, and mass storage is handed over to HBase.

insert image description here

HBase features:

The full amount of data in all fields is stored in HBase. The storage capacity of HBase under the Hadoop system is massive.

The rowkey query is fast, as fast as lightning (can be optimized to 50Wqps or even higher).

es features:

The multi-condition retrieval capability of es is very powerful. Fields that may participate in conditional retrieval are indexed into ES.

This solution gives full play to the advantages of es and HBase, while avoiding their disadvantages. It can be said to be a best practice to promote strengths and avoid them.

This is the classic ES+HBase combination scheme, that is, the scheme in which the index and data storage are isolated.

The interaction between them is roughly like this:

  • First, according to the conditions entered by the user, go to es to query to obtain the rowkey value that meets the filter conditions.

  • Then use rowkey value to HBase query

The interaction diagram looks like this:

insert image description here

For massive data and a certain amount of concurrency sub-databases and sub-tables, it is by no means that the introduction of a sub-database sub-table middleware can solve the problem, but a systematic project.

It is necessary to analyze the business related to the entire table and let the appropriate middleware do what it does best.

For example, the query with sharding column goes to sub-database and sub-table,

Some fuzzy queries, or multiple unfixed conditions are filtered by es, and massive storage is handed over to HBase.

Architecture of Biglog Synchronization to Ensure Data Consistency

In many business situations, we will add redis cache to the system for query optimization, and use es for full-text search.

If the database data is updated, it is necessary to write a code to update redis synchronously in the business code.

This kind of data synchronization code will not be elegant when combined with business code . Can you extract these data synchronization codes to form an independent module? The answer is yes.

insert image description here

Hot and cold separation of data

After doing so many things, there will be a lot of work to be done later, such as the consistency of data synchronization,

After running for a period of time, the data volume of some tables gradually reaches the bottleneck of a single table. At this time, cold data migration is required.

Question: Is the broadcast table a public table

insert image description here

It can be understood in this way.

The broadcast table is the update operation, which covers all fragments, and the query operation, just check one fragment.

distributed primary key

After data fragmentation, it is very difficult to generate global unique primary keys for different data nodes.

The auto-increment keys between different real tables (t_order_n) in the same logical table (t_order) produce duplicate primary keys because they cannot be aware of each other.

Although ID collisions can be avoided by setting the initial value and step size of the auto-increment primary key, this will increase maintenance costs and lack integrity and scalability.

If you need to increase the number of shard tables in the future, you need to modify the step size of the shard tables one by one. The operation and maintenance costs are very high, so this method is not recommended.

In order to make it easier to get started, Apache ShardingSphere has built-in UUID and SNOWFLAKE two distributed primary key generators.

By default, the snowflake algorithm (snowflake) is used to generate 64-bit long integer data.

Not only that, it also extracts the interface of the distributed primary key generator, which is convenient for us to implement a custom self-incrementing primary key generation algorithm.

achieve motivation

In traditional database software development, the primary key automatic generation technology is a basic requirement. Each database also provides corresponding support for this requirement, such as MySQL's auto-increment key and Oracle's auto-increment sequence.

After the data is fragmented, it is very difficult for different data nodes to generate a global unique primary key.

The auto-increment keys between different actual tables in the same logical table produce duplicate primary keys because they cannot be aware of each other .

Although collisions can be avoided by constraining the initial value and step size of the auto-increment primary key, additional operation and maintenance rules need to be introduced, making the solution lack of integrity and scalability.

There are currently many third-party solutions that can perfectly solve this problem, such as UUID, which rely on specific algorithms to generate unique keys, or introduce primary key generation services.

In order to facilitate the use of users and meet the needs of different usage scenarios of different users, Apache ShardingSphere not only provides a built-in distributed primary key generator,

For example UUID, SNOWFLAKE,

It also extracts the interface of the distributed primary key generator, so that users can implement their own self-defined self-incrementing primary key generator.

Built-in primary key generator

UUID

Use UUID.randomUUID() to generate distributed primary keys.

SNOWFLAKE

In the sharding rule configuration module, you can configure the primary key generation strategy for each table. By default, the snowflake algorithm (snowflake) is used to generate 64bit long integer data.
The Snowflake Algorithm is a distributed primary key generation algorithm announced by Twitter, which can ensure the non-repetition of primary keys of different processes and the ordering of primary keys of the same process.

Realization principle

In the same process, it first guarantees non-repetition through the time bit, and if the time is the same, it is guaranteed through the sequence bit. At the same time, since the time bit is monotonically increasing, and if each server roughly synchronizes the time, the generated primary key can be considered to be generally ordered in a distributed environment, which ensures the high efficiency of inserting index fields sex.

For example, the primary key of MySQL's Innodb storage engine.
The primary key generated using the snowflake algorithm, the binary representation contains 4 parts, from high to low:

  • 1bit sign bit,
  • 41bit timestamp bit,
  • 10bit working process bits and
  • 12bit serial number.

Sign bit (1bit)

Reserved sign bit, always zero.

Timestamp bit (41bit)

The number of milliseconds that a 41-bit timestamp can hold is 2 to the 41st power, and the number of milliseconds used in a year is: 365 * 24 * 60 * 60 *1000.

It can be known by calculation:
Math.pow(2, 41) / (365 * 24 * 60 * 60 * 1000L);
the result is approximately equal to 69.73 years.

The time epoch of Apache ShardingSphere's snowflake algorithm starts at midnight on November 1, 2016 and can be used until 2086.

I believe it can meet the requirements of most systems.

Work process bit (10bit)

This flag is unique within a Java process. If it is a distributed application deployment, you should ensure that the id of each worker process is different. The value defaults to 0 and can be set through properties.

Serial number bit (12bit)

This sequence is used to generate different IDs within the same millisecond.

If the number generated in this millisecond exceeds 4096 (2 to the power of 12), the generator will wait until the next millisecond to continue generating.
The detailed structure of the snowflake algorithm primary key is shown in the figure below.

insert image description here

clock back

The clock back of the server will cause repeated sequences, so the default distributed primary key generator provides a maximum tolerance of clock back in milliseconds.

If the clock callback time exceeds the maximum tolerated millisecond threshold, the program will report an error;

If it is within the tolerable range, the default distributed primary key generator will wait for the clock to be synchronized to the time of the last primary key generation before continuing to work.

The default value of the maximum tolerated clock back in milliseconds is 0, which can be set through properties.

unbalanced step size

lead to data skew

Shardingjdbc SPI and custom primary key

What is Java SPI

The full name of SPI is Service Provider Interface. It is a set of APIs provided by Java to be implemented or extended by third parties. It can be used to enable framework extensions and replace components.

Each abstraction of system design often has many different implementation schemes.

In object-oriented design, it is generally recommended to program between modules based on interfaces, and not to hard-code implementation classes between modules.

Once the code involves a specific implementation class, it violates the principle of pluggability. If you need to replace an implementation, you need to modify the code.

In order to realize that it can not be specified dynamically in the program when the module is assembled, a service discovery mechanism is needed.

Java SPI provides such a mechanism: a mechanism to find a service implementation for an interface.

It is somewhat similar to the idea of ​​IOC, which is to move the control of assembly out of the program. This mechanism is especially important in modular design.

The overall mechanism diagram is as follows:

insert image description here

Write picture description here

Java SPI is actually a dynamic loading mechanism implemented by the combination of " interface-based programming + strategy mode + configuration file ". So the core idea of ​​SPI is decoupling .

Conventions of the Java SPI

Java SPI usage scenarios

In a nutshell, it is applicable to: the implementation strategy of the caller to enable, extend, or replace the framework according to the actual use needs

More common examples:

  • Database driver loading interface implements class loading
    JDBC loads drivers for different types of databases
  • Log facade interface implementation class loading
    SLF4J loads log implementation classes from different providers
  • Spring
    uses a lot of SPI in Spring, such as: the implementation of ServletContainerInitializer for servlet3.0 specification, automatic type conversion Type Conversion SPI (Converter SPI, Formatter SPI), etc.
  • Dubbo
    Dubbo also uses SPI extensively to implement framework extensions, but it encapsulates the native SPI provided by Java, allowing users to extend and implement the Filter interface

Java SPI Usage Conventions

To use Java SPI, the following conventions need to be followed:

  • 1. When the service provider provides a specific implementation of the interface, create a file named "interface fully qualified name" in the META-INF/services directory of the jar package, and the content is the fully qualified name of the implementation class;
  • 2. The jar package where the interface implementation class is located is placed in the classpath of the main program;
  • 3. The main program dynamically loads the implementation module through java.util.ServiceLoder. It finds the fully qualified name of the implementation class by scanning the configuration files in the META-INF/services directory, and loads the class into the JVM;
  • 4. The implementation class of SPI must carry a constructor without parameters;

JavaSPI combat

First, we need to define an interface, the SPI Service interface

package com.crazymaker.springcloud.sharding.jdbc.demo.generator;

public interface IdGenerator
{

    /**
     * Next id long.
     *
     * @return the nextId
     */
    Long nextId();

}

Then, define two implementation classes, or you can define two implementation classes

// 单机版 AtomicLong 类型的ID生成器
@Data
public class AtomicLongShardingKeyGeneratorSPIDemo implements IdGenerator {

    private AtomicLong atomicLong = new AtomicLong(0);

    @Override
    public Long nextId() {
        return atomicLong.incrementAndGet();
    }
}

Finally, to configure and add a file under the ClassPath path:

  • The file name is the fully qualified class name of the interface
  • The content is the fully qualified class name of the implementing class
  • Multiple implementation classes are separated by newlines.

SPI configuration file location, the file path is as follows:

insert image description here

The content is the fully qualified class name of the implementing class:

 com.crazymaker.springcloud.sharding.jdbc.demo.generator.AtomicLongShardingKeyGeneratorSPIDemo

test

Then we can ServiceLoader.load或者Service.providersget the instance of the implementation class through the method.

  • Service.providersThe package is located at sun.misc.Service,
  • ServiceLoader.loadThe package is located in java.util.ServiceLoader.

    @Test
    public void testGenIdByProvider() {
    
    
        Iterator<IdGenerator> providers = Service.providers(IdGenerator.class);

        while (providers.hasNext()) {
    
    
            IdGenerator generator = providers.next();

            for (int i = 0; i < 100; i++) {
    
    

                Long id = generator.nextId();

                System.out.println("id = " + id);

            }
        }

    }

    @Test
    public void testGenIdByServiceLoader() {
    
    
        ServiceLoader<IdGenerator> serviceLoaders = ServiceLoader.load(IdGenerator.class);


        Iterator<IdGenerator> iterator = serviceLoaders.iterator();
        while (iterator.hasNext()) {
    
    
            IdGenerator generator = iterator.next();

            for (int i = 0; i < 100; i++) {
    
    

                Long id = generator.nextId();

                System.out.println("id = " + id);

            }
        }
    }

The output results of the two methods are consistent:

Pluggable Architecture

background

In Apache ShardingSphere, many function implementation classes are loaded through SPI (Service Provider Interface) annotations. SPI is an API intended to be implemented or extended by a third party, and it can be used to implement framework extensions or component replacements.

challenge

The pluggable architecture has very high requirements for the program architecture design. It is necessary to make each module independent of each other and not aware of each other, and use a pluggable core to combine various functions in a superimposed manner. Designing an architecture system that completely isolates functional development can not only stimulate the vitality of the open source community to the greatest extent, but also guarantee the quality of the project.
The Apache ShardingSphere 5.x version began to focus on the pluggable architecture, and the functional components of the project can be flexibly extended in a pluggable manner. At present, functions such as data sharding, read-write separation, data encryption, and shadow database pressure testing, as well as support for SQL and protocols such as MySQL, PostgreSQL, SQLServer, and Oracle, are all woven into the project through plug-ins. Apache ShardingSphere currently provides dozens of SPIs as system extension points, and the number is still increasing.

target

It is the design goal of Apache ShardingSphere's pluggable architecture to allow developers to customize their own unique systems like building blocks.

The Apache ShardingSphere pluggable architecture provides dozens of SPI-based extension points. For developers, it is very convenient to customize and extend the functions.
This chapter lists all the SPI extension points of Apache ShardingSphere. If there is no special requirement, users can use the built-in implementation provided by Apache Shard-ingSphere; advanced users can refer to the interface of each functional module for custom implementation.

Type-Based SPI Mechanism

package org.apache.shardingsphere.spi;

import java.util.Properties;

/**
 * Base algorithm SPI.
 */
public interface TypeBasedSPI {
    
    /**
     * Get algorithm type.
     * 
     * @return type
     */
    String getType();
    
    /**
     * Get properties.
     * 
     * @return properties of algorithm
     */
    Properties getProperties();
    
    /**
     * Set properties.
     * 
     * @param properties properties of algorithm
     */
    void setProperties(Properties properties);
}

Distributed primary key extension point

insert image description here

Custom primary key practice

package com.crazymaker.springcloud.sharding.jdbc.demo.generator;

import lombok.Data;
import org.apache.shardingsphere.spi.keygen.ShardingKeyGenerator;

import java.util.Properties;
import java.util.concurrent.atomic.AtomicLong;

// 单机版 AtomicLong 类型的ID生成器
@Data
public class AtomicLongShardingKeyGenerator implements ShardingKeyGenerator {

    private AtomicLong atomicLong = new AtomicLong(0);
    private Properties properties = new Properties();

    @Override
    public Comparable<?> generateKey() {
        return atomicLong.incrementAndGet();
    }

    @Override
    public String getType() {

        //声明类型
        return "DemoAtomicLongID";
    }
}

use case



    @Test
    public void testGenIdByShardingServiceLoader() {
        ShardingKeyGeneratorServiceLoader serviceLoader = new ShardingKeyGeneratorServiceLoader();
        ShardingKeyGenerator keyGenerator= serviceLoader.newService("DemoAtomicLongID" ,new Properties());

        for (int i = 0; i < 100; i++) {

            Long id = (Long) keyGenerator.generateKey();

            System.out.println("id = " + id);

        }
    }


Demo and source code introduction:

see video

ShardingSphere SQL usage restrictions

See the official website documentation:

https://shardingsphere.apache.org/document/current/cn/features/sharding/use-norms/sql/

in the document

It lists in detail the SQL types supported and unsupported by the current version of ShardingSphere.

These require attention.

Supported SQL

SQL necessary conditions
SELECT * FROM tbl_name
SELECT * FROM tbl_name WHERE (col1 = ? or col2 = ?) and col3 = ?
SELECT * FROM tbl_name WHERE col1 = ? ORDER BY col2 DESC LIMIT ?
SELECT COUNT(*), SUM(col1), MIN(col1), MAX(col1), AVG(col1) FROM tbl_name WHERE col1 = ?
SELECT COUNT(col1) FROM tbl_name WHERE col2 = ? GROUP BY col1 ORDER BY col3 DESC LIMIT ?, ?
INSERT INTO tbl_name (col1, col2,…) VALUES (?, ?, ….)
INSERT INTO tbl_name VALUES (?, ?,….)
INSERT INTO tbl_name (col1, col2, …) VALUES (?, ?, ….), (?, ?, ….)
INSERT INTO tbl_name (col1, col2, …) SELECT col1, col2, … FROM tbl_name WHERE col3 = ? INSERT table and SELECT table must be the same table or bound table
REPLACE INTO tbl_name (col1, col2, …) SELECT col1, col2, … FROM tbl_name WHERE col3 = ? REPLACE table and SELECT table must be the same table or bound table
UPDATE tbl_name SET col1 = ? WHERE col2 = ?
DELETE FROM tbl_name WHERE col1 = ?
CREATE TABLE tbl_name (col1 int, …)
ALTER TABLE tbl_name ADD col1 varchar(10)
DROP TABLE tbl_name
TRUNCATE TABLE tbl_name
CREATE INDEX idx_name ON tbl_name
DROP INDEX idx_name ON tbl_name
DROP INDEX idx_name
SELECT DISTINCT * FROM tbl_name WHERE col1 = ?
SELECT COUNT(DISTINCT col1) FROM tbl_name
SELECT subquery_alias.col1 FROM (select tbl_name.col1 from tbl_name where tbl_name.col2=?) subquery_alias

Unsupported SQL

SQL Reason not supported
INSERT INTO tbl_name (col1, col2, …) VALUES(1+2, ?, …) The VALUES statement does not support arithmetic expressions
INSERT INTO tbl_name (col1, col2, …) SELECT * FROM tbl_name WHERE col3 = ? The SELECT clause does not currently support the abbreviation of * and the built-in distributed primary key generator
REPLACE INTO tbl_name (col1, col2, …) SELECT * FROM tbl_name WHERE col3 = ? The SELECT clause does not currently support the abbreviation of * and the built-in distributed primary key generator
SELECT * FROM tbl_name1 UNION SELECT * FROM tbl_name2 UNION
SELECT * FROM tbl_name1 UNION ALL SELECT * FROM tbl_name2 UNION ALL
SELECT SUM(DISTINCT col1), SUM(col1) FROM tbl_name See DISTINCT support for details
SELECT * FROM tbl_name WHERE to_date(create_time, ‘yyyy-mm-dd’) = ? will result in full routing
(SELECT * FROM tbl_name) 暂不支持加括号的查询
SELECT MAX(tbl_name.col1) FROM tbl_name 查询列是函数表达式时,查询列前不能使用表名;若查询表存在别名,则可使用表的别名

DISTINCT支持情况详细说明

支持的SQL

SQL
SELECT DISTINCT * FROM tbl_name WHERE col1 = ?
SELECT DISTINCT col1 FROM tbl_name
SELECT DISTINCT col1, col2, col3 FROM tbl_name
SELECT DISTINCT col1 FROM tbl_name ORDER BY col1
SELECT DISTINCT col1 FROM tbl_name ORDER BY col2
SELECT DISTINCT(col1) FROM tbl_name
SELECT AVG(DISTINCT col1) FROM tbl_name
SELECT SUM(DISTINCT col1) FROM tbl_name
SELECT COUNT(DISTINCT col1) FROM tbl_name
SELECT COUNT(DISTINCT col1) FROM tbl_name GROUP BY col1
SELECT COUNT(DISTINCT col1 + col2) FROM tbl_name
SELECT COUNT(DISTINCT col1), SUM(DISTINCT col1) FROM tbl_name
SELECT COUNT(DISTINCT col1), col1 FROM tbl_name GROUP BY col1
SELECT col1, COUNT(DISTINCT col1) FROM tbl_name GROUP BY col1

不支持的SQL

SQL 不支持原因
SELECT SUM(DISTINCT tbl_name.col1), SUM(tbl_name.col1) FROM tbl_name 查询列是函数表达式时,查询列前不能使用表名;若查询表存在别名,则可使用表的别名

ShardingJdbc数据分片开发总结

作为一个开发者,ShardingJdbc可以帮我们屏蔽底层的细节,

让我们在面对分库分表的场景下,可以像使用单库单表一样简单;

分片策略算法

ShardingSphere-JDBC在分片策略上分别引入了分片算法分片策略两个概念,

当然在分片的过程中分片键也是一个核心的概念;在此可以简单的理解分片策略 = 分片算法 + 分片键

至于为什么要这么设计,应该是ShardingSphere-JDBC考虑更多的灵活性,把分片算法单独抽象出来,方便开发者扩展;

分片算法

提供了抽象分片算法类:ShardingAlgorithm,根据类型又分为:精确分片算法、区间分片算法、复合分片算法以及Hint分片算法;

  • 精确分片算法:对应PreciseShardingAlgorithm类,主要用于处理 =IN的分片;
  • 区间分片算法:对应RangeShardingAlgorithm类,主要用于处理 BETWEEN AND, >, <, >=, <= 分片;
  • 复合分片算法:对应ComplexKeysShardingAlgorithm类,用于处理使用多键作为分片键进行分片的场景;
  • Hint分片算法:对应HintShardingAlgorithm类,用于处理使用 Hint 行分片的场景;

以上所有的算法类都是接口类,具体实现交给开发者自己;

分片策略

分片策略基本和上面的分片算法对应,包括:标准分片策略、复合分片策略、Hint分片策略、内联分片策略、不分片策略;

  • 标准分片策略:对应StandardShardingStrategy类,提供PreciseShardingAlgorithmRangeShardingAlgorithm两个分片算法,PreciseShardingAlgorithm是必须的,RangeShardingAlgorithm可选的;

    public final class StandardShardingStrategy implements ShardingStrategy {
          
          
        private final String shardingColumn;
        private final PreciseShardingAlgorithm preciseShardingAlgorithm;
        private final RangeShardingAlgorithm rangeShardingAlgorithm;
    }
    
    
  • 复合分片策略:对应ComplexShardingStrategy类,提供ComplexKeysShardingAlgorithm分片算法;

    public final class ComplexShardingStrategy implements ShardingStrategy {
          
          
        @Getter
        private final Collection<String> shardingColumns;
        private final ComplexKeysShardingAlgorithm shardingAlgorithm;
    }
    
    

    可以发现支持多个分片键;

  • Hint分片策略:对应HintShardingStrategy类,通过 Hint 指定分片值而非从 SQL 中提取分片值的方式进行分片的策略;提供HintShardingAlgorithm分片算法;

    public final class HintShardingStrategy implements ShardingStrategy {
          
          
        @Getter
        private final Collection<String> shardingColumns;
        private final HintShardingAlgorithm shardingAlgorithm;
    }
    
    
  • 内联分片策略:对应InlineShardingStrategy类,没有提供分片算法,路由规则通过表达式来实现;

  • 不分片策略:对应NoneShardingStrategy类,不分片策略;

分片策略配置类

在使用中我们并没有直接使用上面的分片策略类,ShardingSphere-JDBC分别提供了对应策略的配置类包括:

  • StandardShardingStrategyConfiguration
  • ComplexShardingStrategyConfiguration
  • HintShardingStrategyConfiguration
  • InlineShardingStrategyConfiguration
  • NoneShardingStrategyConfiguration

实战步骤总结

有了以上相关基础概念,接下来针对每种分片策略做一个简单的实战,在实战前首先准备好库和表;

准备

分别准备两个库:ds0ds1;然后每个库分别包含两个表:t_order0t_order1

CREATE TABLE `t_order0` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `user_id` bigint(20) NOT NULL,
  `order_id` bigint(20) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

准备真实数据源

我们这里有两个数据源,这里都使用java代码的方式来配置:

  /**
     * 通过ShardingDataSourceFactory 构建分片数据源
     *
     * @return
     * @throws SQLException
     */
    @Before

    public void buildShardingDataSource() throws SQLException {
    
    
        /*
         * 1. 数据源集合:dataSourceMap
         * 2. 分片规则:shardingRuleConfig
         * 3. 属性:properties
         *
         */

        DataSource druidDs1 = buildDruidDataSource(
                "jdbc:mysql://cdh1:3306/sharding_db1?useUnicode=true&characterEncoding=utf8&allowMultiQueries=true&useSSL=true&serverTimezone=UTC",
                "root", "123456");

        DataSource druidDs2 = buildDruidDataSource(
                "jdbc:mysql://cdh1:3306/sharding_db2?useUnicode=true&characterEncoding=utf8&allowMultiQueries=true&useSSL=true&serverTimezone=UTC",
                "root", "123456");
        // 配置真实数据源
        Map<String, DataSource> dataSourceMap = new HashMap<String, DataSource>();
        // 添加数据源.
        // 两个数据源ds_0和ds_1
        dataSourceMap.put("ds0",druidDs1);
        dataSourceMap.put("ds1", druidDs2);

        /**
         * 需要构建表规则
         * 1. 指定逻辑表.
         * 2. 配置实际节点》
         * 3. 指定主键字段.
         * 4. 分库和分表的规则》
         *
         */
        // 配置分片规则
        ShardingRuleConfiguration shardingRuleConfig = new ShardingRuleConfiguration();


        //step2:分片规则
        TableRuleConfiguration userShardingRuleConfig = userShardingRuleConfig();
        shardingRuleConfig.getTableRuleConfigs().add(userShardingRuleConfig);


        // 多数据源一定要指定默认数据源
        // 只有一个数据源就不需要
        shardingRuleConfig.setDefaultDataSourceName("ds0");

        Properties properties = new Properties();
        //打印sql语句,生产环境关闭
        properties.setProperty("sql.show", Boolean.TRUE.toString());

        dataSource= ShardingDataSourceFactory.createDataSource(
                dataSourceMap, shardingRuleConfig, properties);

    }

这里配置的两个数据源都是普通的数据源,最后会把dataSourceMap交给ShardingDataSourceFactory管理;

表规则配置

表规则配置类TableRuleConfiguration,包含了五个要素:逻辑表、真实数据节点、数据库分片策略、数据表分片策略、分布式主键生成策略;

    /**
     * 表的分片规则
     */
    protected TableRuleConfiguration userShardingRuleConfig() {
    
    
        String logicTable = USER_LOGIC_TB;

        //获取实际的 ActualDataNodes
        String actualDataNodes = "ds$->{0..1}.t_user_$->{0..1}";

        // 两个表达式的 笛卡尔积
//ds0.t_user_0
//ds1.t_user_0
//ds0.t_user_1
//ds1.t_user_1

        TableRuleConfiguration tableRuleConfig = new TableRuleConfiguration(logicTable, actualDataNodes);

        //设置分表策略
        // inline 模式
//        ShardingStrategyConfiguration tableShardingStrategy =
//                new InlineShardingStrategyConfiguration("user_id", "t_user_$->{user_id % 2}");
        //自定义模式
        TablePreciseShardingAlgorithm tablePreciseShardingAlgorithm =
                new TablePreciseShardingAlgorithm();

        RouteInfinityRangeShardingAlgorithm routeInfinityRangeShardingAlgorithm =
                new RouteInfinityRangeShardingAlgorithm();

        RangeOrderShardingAlgorithm tableRangeShardingAlg =
                new RangeOrderShardingAlgorithm();

        PreciseOrderShardingAlgorithm preciseOrderShardingAlgorithm =
                new PreciseOrderShardingAlgorithm();

        ShardingStrategyConfiguration tableShardingStrategy =
                new StandardShardingStrategyConfiguration("user_id",
                        preciseOrderShardingAlgorithm,
                        routeInfinityRangeShardingAlgorithm);

        tableRuleConfig.setTableShardingStrategyConfig(tableShardingStrategy);

        // 配置分库策略(Groovy表达式配置db规则)
        // inline 模式
//        ShardingStrategyConfiguration dsShardingStrategy = new InlineShardingStrategyConfiguration("user_id", "ds${user_id % 2}");
        //自定义模式
        DsPreciseShardingAlgorithm dsPreciseShardingAlgorithm = new DsPreciseShardingAlgorithm();
        RangeOrderShardingAlgorithm dsRangeShardingAlg =
                new RangeOrderShardingAlgorithm();

        ShardingStrategyConfiguration dsShardingStrategy =
                new StandardShardingStrategyConfiguration("user_id",
                        preciseOrderShardingAlgorithm,
                        routeInfinityRangeShardingAlgorithm);

        tableRuleConfig.setDatabaseShardingStrategyConfig(dsShardingStrategy);

        tableRuleConfig.setKeyGeneratorConfig(new KeyGeneratorConfiguration("SNOWFLAKE", "user_id"));
        return tableRuleConfig;
    }

  • 逻辑表:这里配置的逻辑表就是t_user,对应的物理表有t_user_0,t_user_1;

  • 真实数据节点:这里使用行表达式进行配置的,简化了配置;上面的配置就相当于配置了:

    db0
      ├── t_user_0 
      └── t_user_1 
    db1
      ├── t_user_0 
      └── t_user_1
    
    
  • 数据库分片策略:这里的库分片策略就是上面介绍的五种类型,这里使用的StandardShardingStrategyConfiguration,需要指定分片键分片算法,这里使用的是精确分片算法

    
    public final class PreciseOrderShardingAlgorithm implements PreciseShardingAlgorithm<Long> {
          
          
    
        @Override
        public String doSharding(final Collection<String> availableTargetNames,
                                 final PreciseShardingValue<Long> shardingValue) {
          
          
            for (String each : availableTargetNames) {
          
          
                System.out.println("shardingValue = " + shardingValue.getValue()+ " target = " + each + "  shardingValue.getValue() % 2) = " + shardingValue.getValue() % 2L);
                if (each.endsWith(String.valueOf(shardingValue.getValue() % 2L))) {
          
          
                    return each;
                }
            }
            return null;
        }
    }
    
    
    

    这里的shardingValue就是user_id对应的真实值,每次和2取余;availableTargetNames可选择就是{ds0,ds1};看余数和哪个库能匹配上就表示路由到哪个库;

  • 数据表分片策略:指定的**分片键(order_id)**和分库策略不一致,其他都一样;

  • 分布式主键生成策略:ShardingSphere-JDBC提供了多种分布式主键生成策略,后面详细介绍,这里使用雪花算法;

配置分片规则

配置分片规则ShardingRuleConfiguration,包括多种配置规则:表规则配置、绑定表配置、广播表配置、默认数据源名称、默认数据库分片策略、默认表分片策略、默认主键生成策略、主从规则配置、加密规则配置;

  • 表规则配置 tableRuleConfigs:也就是上面配置的库分片策略和表分片策略,也是最常用的配置;
  • 绑定表配置 bindingTableGroups:指分⽚规则⼀致的主表和⼦表;绑定表之间的多表关联查询不会出现笛卡尔积关联,关联查询效率将⼤⼤提升;
  • 广播表配置 broadcastTables:所有的分⽚数据源中都存在的表,表结构和表中的数据在每个数据库中均完全⼀致。适⽤于数据量不⼤且需要与海量数据的表进⾏关联查询的场景;
  • 默认数据源名称 defaultDataSourceName:未配置分片的表将通过默认数据源定位;
  • 默认数据库分片策略 defaultDatabaseShardingStrategyConfig:表规则配置可以设置数据库分片策略,如果没有配置可以在这里面配置默认的;
  • 默认表分片策略 defaultTableShardingStrategyConfig:表规则配置可以设置表分片策略,如果没有配置可以在这里面配置默认的;
  • 默认主键生成策略 defaultKeyGeneratorConfig:表规则配置可以设置主键生成策略,如果没有配置可以在这里面配置默认的;内置UUID、SNOWFLAKE生成器;
  • 主从规则配置 masterSlaveRuleConfigs:用来实现读写分离的,可配置一个主表多个从表,读面对多个从库可以配置负载均衡策略;
  • 加密规则配置 encryptRuleConfig:提供了对某些敏感数据进行加密的功能,提供了⼀套完整、安全、透明化、低改造成本的数据加密整合解决⽅案;

数据插入

以上准备好,就可以操作数据库了,这里执行插入操作:

 /**
     * 新增测试.
     *
     */
    @Test
    public  void testInsertUser() throws SQLException {
    
    

        /*
         * 1. 需要到DataSource
         * 2. 通过DataSource获取Connection
         * 3. 定义一条SQL语句.
         * 4. 通过Connection获取到PreparedStament.
         *  5. 执行SQL语句.
         *  6. 关闭连接.
         */


        // * 2. 通过DataSource获取Connection
        Connection connection = dataSource.getConnection();
        // * 3. 定义一条SQL语句.
        // 注意:******* sql语句中 使用的表是 上面代码中定义的逻辑表 *******
        String sql = "insert into t_user(name) values('name-0001')";

        // * 4. 通过Connection获取到PreparedStament.
        PreparedStatement preparedStatement = connection.prepareStatement(sql);

        // * 5. 执行SQL语句.
        preparedStatement.execute();

         sql = "insert into t_user(name) values('name-0002')";
        preparedStatement = connection.prepareStatement(sql);
        preparedStatement.execute();

        // * 6. 关闭连接.
        preparedStatement.close();
        connection.close();
    }

通过以上配置的真实数据源、分片规则以及属性文件创建分片数据源ShardingDataSource

接下来就可以像使用单库单表一样操作分库分表了,sql中可以直接使用逻辑表,分片算法会根据具体的值就行路由处理;

经过路由最终:奇数入ds1.t_user_1,偶数入ds0.t_user_0;

分片算法

上面的介绍的精确分片算法中,通过PreciseShardingValue来获取当前分片键值,ShardingSphere-JDBC针对每种分片算法都提供了相应的ShardingValue,具体包括:

  • PreciseShardingValue
  • RangeShardingValue
  • ComplexKeysShardingValue
  • HintShardingValue

读写分离

对于同一时刻有大量并发读操作和较少写操作类型的应用系统来说,将数据库拆分为主库和从库,主库负责处理事务性的增删改操作,从库负责处理查询操作,能够有效的避免由数据更新导致的行锁,使得整个系统的查询性能得到极大的改善。

搭建的Mysql主从集群

设置前注意下面几点:
1)要保证同步服务期间之间的网络联通。即能相互ping通,能使用对方授权信息连接到对方数据库(防火墙开放3306端口)。
2)关闭selinux。
3)同步前,双方数据库中需要同步的数据要保持一致。这样,同步环境实现后,再次更新的数据就会如期同步了。如果主库是新库,忽略此步。

创建目录

mkdir -p /usr/local/docker/mysqlMS
cd /usr/local/docker/mysqlMS

编写docker-compose.yml

version: '3.8'
services:
  mysql-master:
    container_name: mysql-master 
    image: mysql:5.7.31
    restart: always
    ports:
      - 3340:3306 
    privileged: true
    volumes:
      - $PWD/msql-master/volumes/log:/var/log/mysql  
      - $PWD/msql-master/volumes/conf/my.cnf:/etc/mysql/my.cnf
      - $PWD/msql-master/volumes/data:/var/lib/mysql
    environment:
      MYSQL_ROOT_PASSWORD: "123456"
    command: [
        '--character-set-server=utf8mb4',
        '--collation-server=utf8mb4_general_ci',
        '--max_connections=3000'
    ]
    networks:
      - myweb
      
  mysql-slave:
    container_name: mysql-slave 
    image: mysql:5.7.31
    restart: always
    ports:
      - 3341:3306 
    privileged: true
    volumes:
      - $PWD/msql-slave/volumes/log:/var/log/mysql  
      - $PWD/msql-slave/volumes/conf/my.cnf:/etc/mysql/my.cnf
      - $PWD/msql-slave/volumes/data:/var/lib/mysql
    environment:
      MYSQL_ROOT_PASSWORD: "123456"
    command: [
        '--character-set-server=utf8mb4',
        '--collation-server=utf8mb4_general_ci',
        '--max_connections=3000'
    ]
    networks:
      - myweb    

networks:

  myweb:
    driver: bridge

创建配置文件夹

root@haima-PC:/usr/local/docker/mysqlMS# mkdir -p msql-master/volumes/conf
root@haima-PC:/usr/local/docker/mysqlMS# mkdir -p msql-slave/volumes/conf
root@haima-PC:/usr/local/docker/mysqlMS# tree
.
├── docker-compose.yml
├── msql-master
│   └── volumes
│       └── conf
└── msql-slave
    └── volumes
        └── conf

6 directories, 1 file

1. 主master配置文件my.cnf

vim msql-master/volumes/conf/my.cnf
[mysqld]
# [必须]服务器唯一ID,默认是1,一般取IP最后一段
server-id=1

# [必须]启用二进制日志
log-bin=mysql-bin 

# 复制过滤:也就是指定哪个数据库不用同步(mysql库一般不同步)
binlog-ignore-db=mysql

# 设置需要同步的数据库 binlog_do_db = 数据库名; 
# 如果是多个同步库,就以此格式另写几行即可。
# 如果不指明对某个具体库同步,表示同步所有库。除了binlog-ignore-db设置的忽略的库
# binlog_do_db = test #需要同步test数据库。

# 确保binlog日志写入后与硬盘同步
sync_binlog = 1

# 跳过所有的错误,继续执行复制操作
slave-skip-errors = all       
温馨提示:在主服务器上最重要的二进制日志设置是sync_binlog,这使得mysql在每次提交事务的时候把二进制日志的内容同步到磁盘上,即使服务器崩溃也会把事件写入日志中。
sync_binlog这个参数是对于MySQL系统来说是至关重要的,他不仅影响到Binlog对MySQL所带来的性能损耗,而且还影响到MySQL中数据的完整性。对于``"sync_binlog"``参数的各种设置的说明如下:
sync_binlog=0,当事务提交之后,MySQL不做fsync之类的磁盘同步指令刷新binlog_cache中的信息到磁盘,而让Filesystem自行决定什么时候来做同步,或者cache满了之后才同步到磁盘。
sync_binlog=n,当每进行n次事务提交之后,MySQL将进行一次fsync之类的磁盘同步指令来将binlog_cache中的数据强制写入磁盘。
  
在MySQL中系统默认的设置是sync_binlog=0,也就是不做任何强制性的磁盘刷新指令,这时候的性能是最好的,但是风险也是最大的。因为一旦系统Crash,在binlog_cache中的所有binlog信息都会被丢失。而当设置为“1”的时候,是最安全但是性能损耗最大的设置。因为当设置为1的时候,即使系统Crash,也最多丢失binlog_cache中未完成的一个事务,对实际数据没有任何实质性影响。
  
从以往经验和相关测试来看,对于高并发事务的系统来说,“sync_binlog”设置为0和设置为1的系统写入性能差距可能高达5倍甚至更多。

2. 从slave配置文件my.cnf

vim msql-slave/volumes/conf/my.cnf
[mysqld]
# [必须]服务器唯一ID,默认是1,一般取IP最后一段  
server-id=2

# 如果想实现 主-从(主)-从 这样的链条式结构,需要设置:
# log-slave-updates      只有加上它,从前一台机器上同步过来的数据才能同步到下一台机器。

# 设置需要同步的数据库,主服务器上不限定数据库,在从服务器上限定replicate-do-db = 数据库名;
# 如果不指明同步哪些库,就去掉这行,表示所有库的同步(除了ignore忽略的库)。
# replicate-do-db = test;

# 不同步test数据库 可以写多个例如 binlog-ignore-db = mysql,information_schema 
replicate-ignore-db=mysql  

## 开启二进制日志功能,以备Slave作为其它Slave的Master时使用
log-bin=mysql-bin
log-bin-index=mysql-bin.index

## relay_log配置中继日志
#relay_log=edu-mysql-relay-bin  

## 还可以设置一个log保存周期:
#expire_logs_days=14

# 跳过所有的错误,继续执行复制操作
slave-skip-errors = all   

启动服务

root@haima-PC:/usr/local/docker/mysqlMM# docker-compose up -d
Creating network "mysqlms_myweb" with driver "bridge"
Creating mysql-master ... done
Creating mysql-slave  ... done

查询服务ip地址

从上面的信息里获取服务创建的网络名称mysqlms_myweb

docker network inspect mysqlms_myweb

查到结果

mysql-master ip为192.168.112.3
mysql-slave ip为192.168.112.2

进入主mysql服务

docker exec -it mysql-master bash

mysql -uroot -p123456

#查看server_id是否生效
mysql> show variables like '%server_id%';
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| server_id      | 1     |
| server_id_bits | 32    |
+----------------+-------+

#看master信息 File 和 Position 从服务上要用
mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000005 |      154 |              | mysql            |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)


#开权限
mysql> grant replication slave,replication client on *.* to 'slave'@'%' identified by "123456";
mysql> flush privileges;

进入从slave服务

docker exec -it mysql-slave bash

mysql -uroot -p123456

#查看server_id是否生效
mysql> show variables like '%server_id%';
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| server_id      | 2     |
| server_id_bits | 32    |
+----------------+-------+


# 连接主mysql服务 master_log_file 和 master_log_pos的值要填写主master里查出来的值

change master to master_host='192.168.112.3',master_user='slave',master_password='123456',master_port=3306,master_log_file='mysql-bin.000005', master_log_pos=154,master_connect_retry=30;


#启动slave
mysql> start slave;

mysql> show slave status \G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.112.3
                  Master_User: slave
                  Master_Port: 3306
                Connect_Retry: 30
              Master_Log_File: mysql-bin.000004
          Read_Master_Log_Pos: 617
               Relay_Log_File: 7fee2f1fd5d2-relay-bin.000002
                Relay_Log_Pos: 783
        Relay_Master_Log_File: mysql-bin.000004
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 617
              Relay_Log_Space: 997
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
                  Master_UUID: 8f6e9f5a-61f4-11eb-ac84-0242c0a86002
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.01 sec)

连接主mysql参数说明:

**master_port**:Master的端口号,指的是容器的端口号

**master_user**:用于数据同步的用户

**master_password**:用于同步的用户的密码

**master_log_file**:指定 Slave 从哪个日志文件开始复制数据,即上文中提到的 File 字段的值

**master_log_pos**:从哪个 Position 开始读,即上文中提到的 Position 字段的值

**master_connect_retry**:如果连接失败,重试的时间间隔,单位是秒,默认是60秒

上面看到,有两个Yes,说明已经成功了

        Relay_Master_Log_File: mysql-bin.000004
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

设置从服务器slave为只读模式

在从服务器slave上操作

SHOW VARIABLES LIKE '%read_only%'; #查看只读状态

SET GLOBAL super_read_only=1; #super权限的用户只读状态 1.只读 0:可写
SET GLOBAL read_only=1; #普通权限用户读状态 1.只读 0:可写

到此已经设置成功了,下面就可以测试一下,已经可以主从同步了

从服务器上的常用操作

stop slave;
start slave;
show slave status;

数据源准备

在cdh1节点的主库创建表

insert image description here

脚本如下:

SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;

-- ----------------------------
-- Table structure for t_user_0
-- ----------------------------
DROP TABLE IF EXISTS `t_user_0`;
CREATE TABLE `t_user_0`  (
  `id` bigint(20) NULL DEFAULT NULL,
  `name` varchar(40) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;

DROP TABLE IF EXISTS `t_user_1`;
CREATE TABLE `t_user_1`  (
  `id` bigint(20) NULL DEFAULT NULL,
  `name` varchar(40) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;

SET FOREIGN_KEY_CHECKS = 1;

确保cdh2节点的从库也有以上两个表:

insert image description here

注意:主库创建的表,会自动复制到从库

binlog(归档日志)

MySQL整体来看就有两块:

  • 一块是Server层,主要做的是MySQL功能层面的事情;比如 binlog是 Server层也有自己的日志

  • 还有一块是引擎层,负责存储相关的具体事宜。比如,redo log是InnoDB引擎特有的日志,

binlog记录了对MySQL数据库执行更改的所有操作,不包括SELECT和SHOW这类操作,主要作用是用于数据库的主从复制及数据的增量恢复

使用mysqldump备份时,只是对一段时间的数据进行全备,但是如果备份后突然发现数据库服务器故障,这个时候就要用到binlog的日志了

binlog格式有三种:STATEMENT,ROW,MIXED

  • STATEMENT模式:binlog里面记录的就是SQL语句的原文。优点是并不需要记录每一行的数据变化,减少了binlog日志量,节约IO,提高性能。缺点是在某些情况下会导致master-slave中的数据不一致
  • ROW模式:不记录每条SQL语句的上下文信息,仅需记录哪条数据被修改了,修改成什么样了,解决了STATEMENT模式下出现master-slave中的数据不一致。缺点是会产生大量的日志,尤其是alter table的时候会让日志暴涨
  • MIXED模式:以上两种模式的混合使用,一般的复制使用STATEMENT模式保存binlog,对于STATEMENT模式无法复制的操作使用ROW模式保存binlog,MySQL会根据执行的SQL语句选择日志保存方式

redo log(重做日志)

MySQL里常说的WAL技术,全称是Write Ahead Log,即当事务提交时,先写redo log,再修改页。

也就是说,当有一条记录需要更新的时候,InnoDB会先把记录写到redo log里面,并更新Buffer Pool的page,这个时候更新操作就算完成了

Buffer Pool是物理页的缓存,对InnoDB的任何修改操作都会首先在Buffer Pool的page上进行,然后这样的页将被标记为脏页并被放到专门的Flush List上,后续将由专门的刷脏线程阶段性的将这些页面写入磁盘

InnoDB的redo log是固定大小的,比如可以配置为一组4个文件,每个文件的大小是1GB,循环使用,从头开始写,写到末尾就又回到开头循环写(顺序写,节省了随机写磁盘的IO消耗)

7224acb71a77c3f5d97f316dcf60b59d.png

Write Pos是当前记录的位置,一边写一边后移,写到第3号文件末尾后就回到0号文件开头。

Check Point是当前要擦除的位置,也是往后推移并且循环的,擦除记录前要把记录更新到数据文件

Write Pos和Check Point之间空着的部分,可以用来记录新的操作。如果Write Pos追上Check Point,这时候不能再执行新的更新,需要停下来擦掉一些记录,把Check Point推进一下

当数据库发生宕机时,数据库不需要重做所有的日志,因为Check Point之前的页都已经刷新回磁盘,只需对Check Point后的redo log进行恢复,从而缩短了恢复的时间

当缓冲池不够用时,根据LRU算法会溢出最近最少使用的页,若此页为脏页,那么需要强制执行Check Point,将脏页刷新回磁盘。

InnoDB首先将redo log放入到redo log buffer,然后按一定频率将其刷新到redo log file

下列三种情况下会将redo log buffer刷新到redo log file:

  • Master Thread每一秒将redo log buffer刷新到redo log file

  • 每个事务提交时会将redo log buffer刷新到redo log file

  • 当redo log缓冲池剩余空间小于1/2时,会将redo log buffer刷新到redo log file

两阶段提交

将redo log的写入拆成了两个步骤:prepare和commit,这就是两阶段提交

create table T(ID int primary key, c int);
update T set c=c+1 where ID=2;

执行器和InnoDB引擎在执行这个update语句时的内部流程:

  • 执行器先找到引擎取ID=2这一行。ID是主键,引擎直接用树搜索找到这一行。如果ID=2这一行所在的数据也本来就在内存中,就直接返回给执行器;否则,需要先从磁盘读入内存,然后再返回
  • 执行器拿到引擎给的行数据,把这个值加上1,得到新的一行数据,再调用引擎接口写入这行新数据
  • 引擎将这行新数据更新到内存中,同时将这个更新操作记录到redo log里面,此时redo log处于prepare状态。然后告知执行器执行完成了,随时可以提交事务
  • 执行器生成这个操作的binlog,并把binlog写入磁盘
  • 执行器调用引擎的提交事务接口,引擎把刚刚写入的redo log改成提交状态,更新完成

update语句的执行流程图如下,图中浅色框表示在InnoDB内部执行的,深色框表示是在执行器中执行的

04c58afebecb9f83ffc8a6982e55bd5f.png

redo log和binlog日志的不同

  • redo log是InnoDB引擎特有的;binlog是MySQL的Server层实现的,所有引擎都可以使用
  • redo log是物理日志,记录的是在某个数据上做了什么修改;binlog是逻辑日志,记录的是这个语句的原始逻辑,比如给ID=2这一行的c字段加1
  • redo log是循环写的,空间固定会用完;binlog是可以追加写入的,binlog文件写到一定大小后会切换到下一个,并不会覆盖以前的日志

binlog主从复制原理

f752c82faa35276f40ebc9e882f10aeb.png

从库B和主库A之间维持了一个长连接。主库A内部有一个线程,专门用于服务从库B的这个长连接。一个事务日志同步的完整过程如下:

  • 在从库B上通过change master命令,设置主库A的IP、端口、用户名、密码,以及要从哪个位置开始请求binlog,这个位置包含文件名和日志偏移量
  • 在从库B上执行start slave命令,这时从库会启动两个线程,就是图中的I/O线程和SQL线程。其中I/O线程负责与主库建立连接
  • 主库A校验完用户名、密码后,开始按照从库B传过来的位置,从本地读取binlog,发给B
  • 从库B拿到binlog后,写到本地文件,称为中继日志
  • SQL线程读取中继日志,解析出日志里的命令,并执行

由于多线程复制方案的引入,SQL线程演化成了多个线程

主从复制不是完全实时地进行同步,而是异步实时。这中间存在主从服务之间的执行延时,如果主服务器的压力很大,则可能导致主从服务器延时较大。

Sharding-JDBC实现读写分离

使用Sharding-JDBC配置读写分离,优点在于数据源完全有Sharding托管,写操作自动执行master库,读操作自动执行slave库。不需要程序员在程序中关注这个实现了。

spring.main.allow-bean-definition-overriding=true
spring.shardingsphere.datasource.names=master,slave
spring.shardingsphere.datasource.master.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.master.driver-class-name=com.mysql.cj.jdbc.Driver
spring.shardingsphere.datasource.master.url=jdbc:mysql://localhost:3306/db_master?characterEncoding=utf-8
spring.shardingsphere.datasource.master.username=
spring.shardingsphere.datasource.master.password=
spring.shardingsphere.datasource.slave.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.slave.driver-class-name=com.mysql.cj.jdbc.Driver
spring.shardingsphere.datasource.slave.url=jdbc:mysql://localhost:3306/db_slave?characterEncoding=utf-8
spring.shardingsphere.datasource.slave.username=
spring.shardingsphere.datasource.slave.password=
spring.shardingsphere.masterslave.load-balance-algorithm-type=round_robin
spring.shardingsphere.masterslave.name=dataSource
spring.shardingsphere.masterslave.master-data-source-name=master
spring.shardingsphere.masterslave.slave-data-source-names=slave
spring.shardingsphere.props.sql.show=true

参数解读:

load-balance-algorithm-type 用于配置从库负载均衡算法类型,可选值:ROUND_ROBIN(轮询),RANDOM(随机)

props.sql.show=true 在执行SQL时,会打印SQL,并显示执行库的名称

Java API主从配置

分别给ds0和ds1准备从库:ds01和ds11,分别配置主从同步;读写分离配置如下:

		
List<String> slaveDataSourceNames1 = new ArrayList<String>();
slaveDataSourceNames1.add("ds11");
MasterSlaveRuleConfiguration masterSlaveRuleConfiguration1 = new MasterSlaveRuleConfiguration("ds1", "ds1",
				slaveDataSourceNames1);
shardingRuleConfig.getMasterSlaveRuleConfigs().add(masterSlaveRuleConfiguration1);

这样在执行查询操作的时候会自动路由到从库,实现读写分离;

MasterSlaveRuleConfiguration

在上面章节介绍分片规则的时候,其中有说到主从规则配置,其目的就是用来实现读写分离的,核心配置类:MasterSlaveRuleConfiguration

public final class MasterSlaveRuleConfiguration implements RuleConfiguration {
    
    
    private final String name;
    private final String masterDataSourceName;
    private final List<String> slaveDataSourceNames;
    private final LoadBalanceStrategyConfiguration loadBalanceStrategyConfiguration;
}

  • name:配置名称,当前使用的4.1.0版本,这里必须是主库的名称;
  • masterDataSourceName:主库数据源名称;
  • slaveDataSourceNames:从库数据源列表,可以配置一主多从;
  • LoadBalanceStrategyConfiguration:面对多个从库,读取的时候会通过负载算法进行选择;

主从负载算法类:MasterSlaveLoadBalanceAlgorithm,实现类包括:随机和循环;

  • ROUND_ROBIN:实现类RoundRobinMasterSlaveLoadBalanceAlgorithm
  • RANDOM:实现类RandomMasterSlaveLoadBalanceAlgorithm

问题: 读写分离架构中经常出现,那就是读延迟的问题如何解决?

刚插入一条数据,然后马上就要去读取,这个时候有可能会读取不到?归根到底是因为主节点写入完之后数据是要复制给从节点的,读不到的原因是复制的时间比较长,也就是说数据还没复制到从节点,你就已经去从节点读取了,肯定读不到。mysql5.7 的主从复制是多线程了,意味着速度会变快,但是不一定能保证百分百马上读取到,这个问题我们可以有两种方式解决:

(1)业务层面妥协,是否操作完之后马上要进行读取

(2)对于操作完马上要读出来的,且业务上不能妥协的,我们可以对于这类的读取直接走主库,当然Sharding-JDBC也是考虑到这个问题的存在,所以给我们提供了一个功能,可以让用户在使用的时候指定要不要走主库进行读取。在读取前使用下面的方式进行设置就可以了:

    public List<UserInfo> getList() {
        // 强制路由主库
        HintManager.getInstance().setMasterRouteOnly();
        return this.list();
    }

问题:Mysql主从环境部署一段时间后,发现主从不同步时,如何进行数据同步至一致?

规避性答法:dba解决

分布式事务

ShardingSphere-JDBC使用分布式事务和使用本地事务没什么区别,提供了透明化的分布式事务;

支持的事务类型包括:本地事务、XA事务和柔性事务,默认是本地事务;

public enum TransactionType {
    
    
    LOCAL, XA, BASE
}

依赖

根据具体使用XA事务还是柔性事务,需要引入不同的模块;

<dependency>
	<groupId>org.apache.shardingsphere</groupId>
	<artifactId>sharding-transaction-xa-core</artifactId>
</dependency>

<dependency>
	<groupId>org.apache.shardingsphere</groupId>
	<artifactId>shardingsphere-transaction-base-seata-at</artifactId>
</dependency>

实现

ShardingSphere-JDBC提供了分布式事务管理器ShardingTransactionManager,实现包括:

  • XAShardingTransactionManager:基于 XA 的分布式事务管理器;
  • SeataATShardingTransactionManager:基于 Seata 的分布式事务管理器;

XA 的分布式事务管理器具体实现包括:Atomikos、Narayana、Bitronix;默认是Atomikos;

实战

默认的事务类型是TransactionType.LOCAL,ShardingSphere-JDBC天生面向多数据源,本地模式其实是循环提交每个数据源的事务,不能保证数据的一致性,所以需要使用分布式事务,具体使用也很简单:

//改变事务类型为XA
TransactionTypeHolder.set(TransactionType.XA);
DataSource dataSource = ShardingDataSourceFactory.createDataSource(dataSourceMap, shardingRuleConfig,
				new Properties());
Connection conn = dataSource.getConnection();
try {
    
    
	//关闭自动提交
	conn.setAutoCommit(false);
			
	String sql = "insert into t_order (user_id,order_id) values (?,?)";
	PreparedStatement preparedStatement = conn.prepareStatement(sql);
	for (int i = 1; i <= 5; i++) {
    
    
		preparedStatement.setInt(1, i - 1);
		preparedStatement.setInt(2, i - 1);
		preparedStatement.executeUpdate();
	}
	//事务提交
	conn.commit();
} catch (Exception e) {
    
    
	e.printStackTrace();
	//事务回滚
	conn.rollback();
}

可以发现使用起来还是很简单的,ShardingSphere-JDBC会根据当前的事务类型,在提交的时候判断是走本地事务提交,还是使用分布式事务管理器ShardingTransactionManager进行提交;

SnowFlake时钟回拨问题

SnowFlake很好,分布式、去中心化、无第三方依赖。

但它并不是完美的,由于SnowFlake强依赖时间戳,所以时间的变动会造成SnowFlake的算法产生错误。

时钟回拨:最常见的问题就是时钟回拨导致的ID重复问题,在SnowFlake算法中并没有什么有效的解法,仅是抛出异常。时钟回拨涉及两种情况①实例停机→时钟回拨→实例重启→计算ID ②实例运行中→时钟回拨→计算ID

手动配置:另一个就是workerId(机器ID)是需要部署时手动配置,而workerId又不能重复。几台实例还好,一旦实例达到一定量级,管理workerId将是一个复杂的操作。

ntp导致的时钟回拨

我们的服务器时间校准一般是通过ntp进程去校准的。但由于校准这个动作,会导致时钟跳跃变化的现象。
而这种情况里面,往往回拨最能引起我们的困扰,回拨如下所示:

img

时钟回拨改进避免

ID生成器一旦不可用,可能造成所有数据库相关新增业务都不可用,影响太大。所以时钟回拨的问题必须解决。

造成时钟回拨的原因多种多样,可能是闰秒回拨,可能是NTP同步,还可能是服务器时间手动调整。总之就是时间回到了过去。针对回退时间的多少可以进行不同的策略改进。

一般有以下几种方案:

  1. 少量服务器部署ID生成器实例,关闭NTP服务器,严格管理服务器。这种方案不需要从代码层面解决,完全人治。
  2. 针对回退时间断的情况,如闰秒回拨仅回拨了1s,可以在代码层面通过判断暂停一定时间内的ID生成器使用。虽然少了几秒钟可用时间,但时钟正常后,业务即可恢复正常。
if (refusedSeconds <= 5) {
    
    
    try {
    
    
    //时间偏差大小小于5ms,则等待两倍时间
		wait(refusedSeconds << 1);//wait
	} catch (InterruptedException e) {
    
    
		e.printStackTrace();
	}
    currentSecond = getCurrentSecond();
}else {
    
    //时钟回拨较大
    //用其他策略修复时钟问题
}

实例启动后,改用内存生成时间。

该方案为baidu开源的UidGenerator使用的方案,由于实例启动后,时间不再从服务器获取,所以不管服务器时钟如何回拨,都影响不了SnowFlake的执行。

如下代码中lastSecond变量是一个AtomicLong类型,用以代替系统时间

 List<Long> uidList = uidProvider.provide(lastSecond.incrementAndGet());

以上2和3都是解决时钟实例运行中→时钟回拨→计算ID的情况。

实例停机→时钟回拨→实例重启→计算ID的情况,可以通过实例启动的时候,采用未使用过的workerId来完成。

As long as the workerId is inconsistent with the previously generated workerId, even if the timestamp is wrong, the generated ID will not be repeated.

UidGenerator adopts this kind of scheme, but this kind of scheme must rely on a storage center, whether it is redis, mysql, zookeeper, but it must store the previously used workerId, which cannot be repeated.

Especially in the case of distributed deployment of Id generators, more attention should be paid to using a storage center to solve this problem.

UidGenerator code can be viewed on Github https://github.com/zer0Black/uid-generator

Note: This article is to be continued, and will be updated in the blog garden later

Reference documents:

https://www.jianshu.com/p/d3c1ee5237e5

https://www.cnblogs.com/zer0black/p/12323541.html?ivk_sa=1024320u

https://shardingsphere.apache.org/document/current/cn/features/sharding/use-norms/sql/

https://blog.csdn.net/free_ant/article/details/111461606

https://www.jianshu.com/p/46b42f7f593c

https://blog.csdn.net/yangguosb/article/details/78772730

https://blog.csdn.net/youanyyou/article/details/121005680

https://www.cnblogs.com/huanshilang/p/12055296.html

https://www.cnblogs.com/haima/p/14341903.html

Guess you like

Origin blog.csdn.net/crazymakercircle/article/details/123420859