Sharding-JDBC--sharding database and table middleware

Table of contents

1. Introduction to ShardingSphere

2. Quick start (sub-table)

1. Create a database

2. Create a physical table 

3. Create a SpringBoot project and introduce dependencies 

4. Add configuration to application.properties

5. Create corresponding entity classes and use MyBatis-Plus to quickly build CRUD

6. Main startup class configuration

7. Write tests

3. Try to sub-database and sub-table

1. Create the db_device_1 database. And create two physical tables in the database:

2. Adjust data source configuration

3. Run test class

4. Make queries under sub-databases and sub-tables

1. Query based on device_id

2. Query based on device_id range

4. Core knowledge points of sub-database and sub-table

1. Core concepts

2. Sharding and sharding strategy

1) Sharding key

2) Fragmentation algorithm

3) Sharding strategy

3. Implementation of fragmentation strategy

1) Precise sharding of Standard standard sharding strategy

2) Scope fragmentation of Standard standard fragmentation strategy 

3) Complex fragmentation strategy

4) Hint forced routing policy

 4. Binding table

5. Broadcast list

5. Achieve reading and writing separation 

1. Build a master-slave synchronization database

2. Use sharding-jdbc to achieve read and write separation 

6. Implementation principle - connection mode

6.1.Connection mode

6.1.1.Memory limited mode 

6.1.2. Connection restriction mode

6.2. Automated execution engine

1. Introduction to ShardingSphere

        Apache ShardingSphere is an ecosystem of open source distributed database solutions. It consists of JDBC, Proxy and Sidecar (under planning), three products that can be deployed independently and support hybrid deployment and use together. . They all provide functions such as standardized data horizontal expansion, distributed transactions, and distributed governance, and can be applied to various diverse application scenarios such as Java isomorphism, heterogeneous languages, and cloud native.

        Apache ShardingSphere aims to fully and reasonably utilize the computing and storage capabilities of relational databases in distributed scenarios, but does not implement a new relational database. Relational databases still occupy a huge market share today and are the foundation of enterprise core systems. They will be difficult to shake in the future. We focus more on providing increments on the original basis rather than subversion.

Official website: Apache ShardingSphere

2. Quick start (sub-table)

1. Create a database

Create a database named db_device_0.

2. Create a physical table 

Logically, tb_device represents a table describing device information. In order to reflect the concept of sub-tables, the tb_device table is divided into two. So tb_device is the logical table, and tb_device_0 and tb_device_1 are the physical tables of the logical table.

CREATE TABLE `tb_device_0` (
 `device_id` bigint NOT NULL AUTO_INCREMENT,
 `device_type` int DEFAULT NULL,
 PRIMARY KEY (`device_id`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb3;


CREATE TABLE `tb_device_1` (
 `device_id` bigint NOT NULL AUTO_INCREMENT,
 `device_type` int DEFAULT NULL,
 PRIMARY KEY (`device_id`)
) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=utf8mb3

3. Create a SpringBoot project and introduce dependencies 

        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>1.18.16</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>druid</artifactId>
            <version>1.1.22</version>
        </dependency>
        <dependency>
            <groupId>org.apache.shardingsphere</groupId>
            <artifactId>sharding-jdbc-spring-boot-starter</artifactId>
            <version>4.1.1</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.49</version>
        </dependency>
        <dependency>
            <groupId>com.baomidou</groupId>
            <artifactId>mybatis-plus-boot-starter</artifactId>
            <version>3.0.5</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>

4. Add configuration to application.properties

# 配置真实数据源
spring.shardingsphere.datasource.names=ds1
# 配置第 1 个数据源
spring.shardingsphere.datasource.ds1.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.ds1.url=jdbc:mysql://localhost:3306/db_device_0?serverTimezone=UTC&characterEncoding=utf-8&useSSL=false
spring.shardingsphere.datasource.ds1.username=db_device_0
spring.shardingsphere.datasource.ds1.password=db_device_0
# 配置物理表
spring.shardingsphere.sharding.tables.tb_device.actual-data-nodes=ds1.tb_device_$->{0..1}

# 配置分表策略:根据device_id作为分⽚的依据(分⽚键、分片算法)

# 将device_id作为分片键
spring.shardingsphere.sharding.tables.tb_device.table-strategy.inline.sharding-column=device_id
# 用device_id % 2 来作为分片算法 奇数会存入 tb_device_1 偶数会存入 tb_device_0
spring.shardingsphere.sharding.tables.tb_device.table-strategy.inline.algorithm-expression=tb_device_$->{device_id%2}
# 开启SQL显示
spring.shardingsphere.props.sql.show = true

5. Create corresponding entity classes and use MyBatis-Plus to quickly build CRUD

package com.my.sharding.shperejdbc.demo.entity;

import lombok.Data;

@Data
public class TbDevice {

    private Long deviceId;
    private Integer deviceType;
}

6. Main startup class configuration

package com.my.sharding.shperejdbc.demo;

import org.mybatis.spring.annotation.MapperScan;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

// 配置mybatis扫描的mapper!
@MapperScan("com.my.shardingshpere.jdbc.demo.mapper")
@SpringBootApplication
public class MyShardingShpereJdbcDemoApplication {

    public static void main(String[] args) {
        SpringApplication.run(MyShardingShpereJdbcDemoApplication.class, args);
    }

}

7. Write tests

@SpringBootTest
class MyShardingShpereJdbcDemoApplicationTests {

    @Autowired
    DeviceMapper deviceMapper;

    @Test
    void testInitData(){
        for (int i = 0; i < 10; i++) {
            TbDevice tbDevice = new TbDevice();
            tbDevice.setDeviceId((long) i);
            tbDevice.setDeviceType(i);
            deviceMapper.insert(tbDevice);
        }
    }

}

Run and view the database:

It is found that according to the fragmentation strategy, the data with odd IDs among these 10 pieces of data will be inserted into the tb_device_1 table, and the data with odd IDs will be inserted into the tb_device_0 table. 

3. Try to sub-database and sub-table

1. Create the db_device_1 database. And create two physical tables in the database:

CREATE TABLE `tb_device_0` (
 `device_id` bigint NOT NULL AUTO_INCREMENT,
 `device_type` int DEFAULT NULL,
 PRIMARY KEY (`device_id`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb3;


CREATE TABLE `tb_device_1` (
 `device_id` bigint NOT NULL AUTO_INCREMENT,
 `device_type` int DEFAULT NULL,
 PRIMARY KEY (`device_id`)
) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=utf8mb3

2. Adjust data source configuration

Provide two data sources, use the two previously created MySQL databases as data sources, and create a sharding strategy

# 配置真实数据源
spring.shardingsphere.datasource.names=ds0,ds1
# 配置第 1 个数据源
spring.shardingsphere.datasource.ds0.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.ds0.url=jdbc:mysql://localhost:3306/db_device_0?serverTimezone=UTC&characterEncoding=utf-8&useSSL=false
spring.shardingsphere.datasource.ds0.username=db_device_0
spring.shardingsphere.datasource.ds0.password=db_device_0

# 配置第 1 个数据源
spring.shardingsphere.datasource.ds1.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.ds1.url=jdbc:mysql://localhost:3306/db_device_1?serverTimezone=UTC&characterEncoding=utf-8&useSSL=false
spring.shardingsphere.datasource.ds1.username=db_device_1
spring.shardingsphere.datasource.ds1.password=db_device_1

# 配置物理表
spring.shardingsphere.sharding.tables.tb_device.actual-data-nodes=ds$->{0..1}.tb_device_$->{0..1}

# 配置分库策略 
spring.shardingsphere.sharding.default-database-strategy.inline.sharding-column=device_id
# ⾏表达式分⽚策略 使⽤Groovy的表达式
spring.shardingsphere.sharding.default-database-strategy.inline.algorithm-expression=ds$->{device_id%2}



# 配置分表策略:根据device_id作为分⽚的依据(分⽚键、分片算法)
# 将device_id作为分片键
spring.shardingsphere.sharding.tables.tb_device.table-strategy.inline.sharding-column=device_id
spring.shardingsphere.sharding.default-database-strategy.inline.sharding-column=device_id
# ⾏表达式分⽚策略 使⽤Groovy的表达式
# 用device_id % 2 来作为分片算法 奇数会存入 tb_device_1 偶数会存入 tb_device_0
spring.shardingsphere.sharding.tables.tb_device.table-strategy.inline.algorithm-expression=tb_device_$->{device_id%2}
# 开启SQL显示
spring.shardingsphere.props.sql.show = true

Compared with the previous configuration, this time a fragmentation strategy for two databases has been added, and which database to store is determined based on the parity characteristics of device_id. At the same time, groovy scripts were used to determine the relationship between the database and tables.

ds$->{0..1}.tb_device_$->{0..1}

Equivalent to:

ds0.tb_device_0
ds0.tb_device_1
ds1.tb_device_0
ds1.tb_device_1

3. Run test class

Result: It is found that odd-numbered data of device_id will be stored in the ds1.tb_device_1 table, and even-numbered data will be stored in the ds0.tb_device_0 table.

4. Make queries under sub-databases and sub-tables

1. Query based on device_id

    /**
     * 根据device_id查询
     */
    @Test
    void testQueryByDeviceId(){
        QueryWrapper<TbDevice> wrapper = new QueryWrapper<>();
        wrapper.eq("device_id",1);
        List<TbDevice> list = deviceMapper.selectList(wrapper);
        list.stream().forEach(e->{
            System.out.println(e);
        });

    }

Result: TbDevice(deviceId=1, deviceType=1) 

And the library queried is tb_device_1

2. Query based on device_id range

    /**
     * 根据 device_id 范围查询
     */
    @Test
    void testDeviceByRange(){
        QueryWrapper<TbDevice> wrapper = new QueryWrapper<>();
        wrapper.between("device_id",1,10);
        List<TbDevice> devices = deviceMapper.selectList(wrapper);
        devices.stream().forEach(e->{
            System.out.println(e);
        });
    }

result:

Error querying database.  Cause: java.lang.IllegalStateException: Inline strategy cannot support this type sharding:RangeRouteValue(columnName=device_id, tableName=tb_device, valueRange=[1‥10])

Reason: The inline fragmentation strategy cannot support range queries.

4. Core knowledge points of sub-database and sub-table

1. Core concepts

        Before understanding the fragmentation strategy, let’s first understand the following key concepts: logical table, real table, data node, binding table, and broadcast table.

  • logical table

        A general term for tables with the same logic and data structure of horizontally split databases (tables). Example: The order data is split into 10 tables based on the primary key's mantissa, namely t_order_0 to t_order_9, and their logical table names are t_order.

  • real table

        A physical table that actually exists in the sharded database. That is, t_order_0 to t_order_9 in the previous example.

  • data node

        The smallest unit of data fragmentation. It consists of data source name and data table, for example: ds_0.t_order_0.

  • binding table

        Refers to the main table and sub-tables with consistent partitioning rules. For example: The t_order table and the t_order_item table are both divided according to order_id, so the two tables are bound to each other. Multi-table correlation queries between bound tables will not have Cartesian product correlations, and the efficiency of correlation queries will be greatly improved. For example, if the SQL is:

SELECT i.* FROM t_order o JOIN t_order_item i ON o.order_id=i.order_id
WHERE o.order_id in (10, 11);

        When the binding table relationship is not configured, assuming that the split key order_id routes the value 10 to the 0th slice and the value 11 to the 1st slice, then the routed SQL should be 4, and they are presented as Cartesian products:

SELECT i.* FROM t_order_0 o JOIN t_order_item_0 i ON
o.order_id=i.order_id WHERE o.order_id in (10, 11);
SELECT i.* FROM t_order_0 o JOIN t_order_item_1 i ON
o.order_id=i.order_id WHERE o.order_id in (10, 11);
SELECT i.* FROM t_order_1 o JOIN t_order_item_0 i ON
o.order_id=i.order_id WHERE o.order_id in (10, 11);
SELECT i.* FROM t_order_1 o JOIN t_order_item_1 i ON
o.order_id=i.order_id WHERE o.order_id in (10, 11);

        After configuring the binding table relationship, the routing SQL should be 2:

SELECT i.* FROM t_order_0 o JOIN t_order_item_0 i ON
o.order_id=i.order_id WHERE o.order_id in (10, 11);
SELECT i.* FROM t_order_1 o JOIN t_order_item_1 i ON
o.order_id=i.order_id WHERE o.order_id in (10, 11);

         Among them, t_order is on the leftmost side of FROM, and ShardingSphere will use it as the main table of the entire binding table. All routing calculations will only use the strategy of the main table, then the partition calculation of the t_order_item table will use the conditions of t_order. Therefore, the partition keys between bound tables must be exactly the same.

2. Sharding and sharding strategy

1) Sharding key

The database field used for sharding is the key field for horizontally splitting the database (table). Example: If the mantissa of the order primary key in the order table is divided modulo, then the order primary key is the split field. If there is no fragmentation field in SQL, full routing will be performed and the performance will be poor. In addition to supporting single sharding fields, ShardingSphere also supports sharding based on multiple fields.

2) Fragmentation algorithm

        Split the data through the splitting algorithm, supporting splitting through =, >=, , <, BETWEEN and IN. The fragmentation algorithm needs to be implemented by application developers themselves, and the achievable flexibility is very high. Currently, 4 fragmentation algorithms are provided. Since the fragmentation algorithm is closely related to business implementation, no built-in fragmentation algorithm is provided. Instead, various scenarios are extracted through fragmentation strategies, providing higher-level abstraction and providing interfaces for application developers to implement themselves. Implement the fragmentation algorithm

  • Exact fragmentation algorithm

        Corresponds to PreciseShardingAlgorithm, which is used to handle the scenario of splitting = and IN using a single key as the splitting key. Need to be used with StandardShardingStrategy.

  • Range partitioning algorithm

        Corresponds to the RangeShardingAlgorithm, used to handle the scenario of using a single key as the splitting key BETWEEN AND, >, =, <= to perform splitting. Need to be used with StandardShardingStrategy.

  • Composite fragmentation algorithm

        Corresponding to the ComplexKeysShardingAlgorithm, it is used to handle the scenario of using multiple keys as the sharding key for sharding. The logic containing multiple sharding keys is relatively complex, and application developers need to handle the complexity themselves. Need to be used with ComplexShardingStrategy.

  • Hint fragmentation algorithm

        Corresponds to HintShardingAlgorithm, used to handle scenarios where Hint is used for sharding. Need to be used with HintShardingStrategy.

3) Sharding strategy

Contains the sharding key and the sharding algorithm. Due to the independence of the sharding algorithm, it is extracted independently. What can really be used for sharding operations is the sharding key + sharding algorithm, which is the sharding strategy. Currently, 5 fragmentation strategies are provided.

  • Standard sharding strategy

        Corresponds to StandardShardingStrategy. Provides sharding for =, >, =, , =, <= in SQL statements. If RangeShardingAlgorithm is not configured, BETWEEN AND in SQL will be processed according to the full database routing.

  • Composite sharding strategy

        Corresponds to ComplexShardingStrategy. Composite fragmentation strategy. Provides support for splitting operations of =, >, =, <=, IN and BETWEEN AND in SQL statements. ComplexShardingStrategy supports multiple shard keys. Due to the complex relationship between multiple shard keys, it does not carry out too much encapsulation. Instead, it directly transparently transmits the shard key value combination and the shard operator to the sharding algorithm, completely by Implemented by application developers to provide maximum flexibility.

  • Row expression sharding strategy

        Corresponds to InlineShardingStrategy. Using Groovy expressions, it provides support for split operations of = and IN in SQL statements, and only supports single split keys. For a simple fragmentation algorithm, it can be used through simple configuration to avoid tedious Java code development, such as: t_user_$->{u_id % 8} means that the t_user table is divided into 8 tables according to u_id modulo 8. The table The names are t_user_0 to t_user_7.

  • Hint sharding strategy

        Corresponds to HintShardingStrategy. A strategy to perform sharding by specifying sharding values ​​via Hint instead of extracting sharding values ​​from SQL.

  • No fragmentation strategy

        Corresponds to NoneShardingStrategy. A non-fragmented strategy.

3. Implementation of fragmentation strategy

1) Precise sharding of Standard standard sharding strategy

        In Standard, standard sharding strategies can be configured in sharding databases and sharding tables respectively. When configuring, you need to specify the splitting key, precise splitting or range splitting.

  • Configure precise sharding of the shards
# 配置分库策略 为 标准分片策略的精准分片
#standard
spring.shardingsphere.sharding.default-databaswe-strategy.standard.sharding-column=device_id
spring.shardingsphere.sharding.default-database-strategy.standard.precise-algorithm-class-name=com.my.sharding.shperejdbc.demo.sharding.algorithm.database.MyDataBasePreciseAlgorithm

It is necessary to provide an implementation class that implements the exact segmentation algorithm, in which the logic of exact segmentation can have the same meaning as the row expression in inline.

import org.apache.shardingsphere.api.sharding.standard.PreciseShardingAlgorithm;
import org.apache.shardingsphere.api.sharding.standard.PreciseShardingValue;

import java.util.Collection;

public class MyDataBasePreciseAlgorithm implements PreciseShardingAlgorithm<Long> {
    /**
     *
     *  数据库标准分片策略
     * @param collection     数据源集合
     * @param preciseShardingValue 分片条件
     * @return
     */
    @Override
    public String doSharding(Collection<String> collection, PreciseShardingValue<Long> preciseShardingValue) {
        // 获取逻辑表明 tb_device
        String logicTableName = preciseShardingValue.getLogicTableName();
        // 获取分片键
        String columnName = preciseShardingValue.getColumnName();
        // 获取分片键的具体值
        Long value = preciseShardingValue.getValue();
        //根据分⽚策略:ds$->{device_id % 2} 做精确分⽚
        String shardingKey = "ds"+(value%2);
        if(!collection.contains(shardingKey)){
            throw new UnsupportedOperationException("数据源:"+shardingKey+"不存在!");
        }
        return shardingKey;
    }

}
  • Configure the precise sharding of the sharding table
# 配置分表策略 为 标准分片策略的精准分片
spring.shardingsphere.sharding.tables.tb_device.table-strategy.standard.sharding-column=device_id
spring.shardingsphere.sharding.tables.tb_device.table-strategy.standard.precise-algorithm-class-name=com.my.sharding.shperejdbc.demo.sharding.algorithm.table.MyTablePreciseAlgorithm

At the same time, it is necessary to provide an implementation class for the precise sharding algorithm for table sharding.

import org.apache.shardingsphere.api.sharding.standard.PreciseShardingAlgorithm;
import org.apache.shardingsphere.api.sharding.standard.PreciseShardingValue;

import java.util.Collection;

public class MyTablePreciseAlgorithm implements PreciseShardingAlgorithm<Long> {
    @Override
    public String doSharding(Collection<String> collection, PreciseShardingValue<Long> preciseShardingValue) {
        String logicTableName = preciseShardingValue.getLogicTableName(); // 获取逻辑表名
        Long value = preciseShardingValue.getValue(); // 获取具体分片键值
        String shardingKey = logicTableName+"_"+(value % 2);
        if(!collection.contains(shardingKey)){
            throw new UnsupportedOperationException("数据表:"+shardingKey+"不存在!");
        }
        return shardingKey;
    }
}

The test case for accurately querying based on ID before testing has the same effect as before. Querying to a table in a certain database based on ID

2) Scope fragmentation of Standard standard fragmentation strategy 

  • Configuring the range sharding of the sharding library
spring.shardingsphere.sharding.default-databaswe-strategy.standard.sharding-column=device_id
spring.shardingsphere.sharding.default-database-strategy.standard.precise-algorithm-class-name=com.my.sharding.shperejdbc.demo.sharding.algorithm.database.MyDataBasePreciseAlgorithm
spring.shardingsphere.sharding.default-database-strategy.standard.range-algorithm-class-name=com.my.sharding.shperejdbc.demo.sharding.algorithm.database.MyDataBaseRangeAlgorithm

Provides an implementation class for the range query algorithm.

import org.apache.shardingsphere.api.sharding.standard.RangeShardingAlgorithm;
import org.apache.shardingsphere.api.sharding.standard.RangeShardingValue;

import java.util.Collection;

public class MyDataBaseRangeAlgorithm implements RangeShardingAlgorithm<Long> {
    /**
     *  直接返回所有的数据源
     *  由于范围查询,需要在两个库的两张表中查。
     * @param collection 具体的数据源集合
     * @param rangeShardingValue
     * @return
     */
    @Override
    public Collection<String> doSharding(Collection<String> collection, RangeShardingValue<Long> rangeShardingValue) {
        return collection;
    }
}
  • Configure range sharding for sharding tables
spring.shardingsphere.sharding.tables.tb_device.table-strategy.standard.sharding-column=device_id
spring.shardingsphere.sharding.tables.tb_device.table-strategy.standard.precise-algorithm-class-name=com.my.sharding.shperejdbc.demo.sharding.algorithm.table.MyTablePreciseAlgorithm
spring.shardingsphere.sharding.tables.tb_device.table-strategy.standard.range-algorithm-class-name=com.my.sharding.shperejdbc.demo.sharding.algorithm.table.MyTableRangeAlgorithm

Provides an implementation class for the range query algorithm:

import org.apache.shardingsphere.api.sharding.standard.RangeShardingAlgorithm;
import org.apache.shardingsphere.api.sharding.standard.RangeShardingValue;

import java.util.Collection;

public class MyTableRangeAlgorithm implements RangeShardingAlgorithm<Long> {
    @Override
    public Collection<String> doSharding(Collection<String> collection, RangeShardingValue<Long> rangeShardingValue) {
        return collection;
    }
}

At this point, I ran the range query test case again and found that it was successful.

3) Complex fragmentation strategy

 @Test
 void queryDeviceByRangeAndDeviceType(){
     QueryWrapper<TbDevice> queryWrapper = new QueryWrapper<>();
     queryWrapper.between("device_id",1,10);
     queryWrapper.eq("device_type", 5);
     List<TbDevice> deviceList =
     deviceMapper.selectList(queryWrapper);
     System.out.println(deviceList);
 }

Problems with the above test code:

While performing a range query on device_id, we need to do a precise search based on device_type. We find that we also need to check three tables in two libraries, but the odd device_type will only be in the odd table in the odd library, which is redundant at this time. Multiple unnecessary queries.

In order to solve redundant multiple searches, you can use the complex fragmentation strategy. 

  • Complex fragmentation strategy

        Supports fragmentation strategies for multiple fields.

# 配置分库策略 complex 传入多个分片键
spring.shardingsphere.sharding.default-database-strategy.complex.sharding-columns=device_id,device_type
spring.shardingsphere.sharding.default-database-strategy.complex.algorithm-class-name=com.sharding.algorithm.database.MyDataBaseComplexAlgorithm


# 配置分表策略 complex 传入多个分片键
spring.shardingsphere.sharding.tables.tb_device.table-strategy.complex.sharding-columns=device_id,device_type
spring.shardingsphere.sharding.tables.tb_device.table-strategy.complex.algorithm-class-name=com.sharding.algorithm.table.MyTableComplexAlgorithm
  • Configure the algorithm implementation class of the branch library
import org.apache.shardingsphere.api.sharding.complex.ComplexKeysShardingAlgorithm;
import org.apache.shardingsphere.api.sharding.complex.ComplexKeysShardingValue;
import java.util.ArrayList;
import java.util.Collection;

public class MyDataBaseComplexAlgorithm implements ComplexKeysShardingAlgorithm<Integer> {
    /**
     *
     * @param collection
     * @param complexKeysShardingValue
     * @return  这一次要查找的数据节点集合
     */
    @Override
    public Collection<String> doSharding(Collection<String> collection, ComplexKeysShardingValue<Integer> complexKeysShardingValue) {
        Collection<Integer> deviceTypeValues = complexKeysShardingValue.getColumnNameAndShardingValuesMap().get("device_type");
        Collection<String> databases = new ArrayList<>();
        for (Integer deviceTypeValue : deviceTypeValues) {
            String databaseName = "ds"+(deviceTypeValue % 2);
            databases.add(databaseName);
        }
        return databases;
    }
}
  • Configure the algorithm implementation class of the sub-table
import org.apache.shardingsphere.api.sharding.complex.ComplexKeysShardingAlgorithm;
import org.apache.shardingsphere.api.sharding.complex.ComplexKeysShardingValue;
import java.util.ArrayList;
import java.util.Collection;

public class MyTableComplexAlgorithm implements ComplexKeysShardingAlgorithm<Integer> {
    @Override
    public Collection<String> doSharding(Collection<String> collection, ComplexKeysShardingValue<Integer> complexKeysShardingValue) {
        String logicTableName = complexKeysShardingValue.getLogicTableName();
        Collection<Integer> deviceTypeValues = complexKeysShardingValue.getColumnNameAndShardingValuesMap().get("device_type");
        Collection<String> tables = new ArrayList<>();
        for (Integer deviceTypeValue : deviceTypeValues) {
            tables.add(logicTableName+"_"+(deviceTypeValue%2));
        }
        return tables;
    }
}

test:

 Only queried the database once

4) Hint forced routing policy

Hint can force routing to a certain table in a certain library regardless of the SQL statement characteristics.

# 配置分库策略 ## ⾏表达式分⽚策略 使⽤Groovy的表达式
# inline
spring.shardingsphere.sharding.default-database-strategy.inline.sharding-column=device_id
spring.shardingsphere.sharding.default-database-strategy.inline.algorithm-expression=ds$->{device_id%2}

Configure the implementation class of hint algorithm

import org.apache.shardingsphere.api.sharding.hint.HintShardingAlgorithm;
import org.apache.shardingsphere.api.sharding.hint.HintShardingValue;

import java.util.Arrays;
import java.util.Collection;

public class MyTableHintAlgorithm implements HintShardingAlgorithm<Long> {
    @Override
    public Collection<String> doSharding(Collection<String> collection, HintShardingValue<Long> hintShardingValue) {
        String logicTableName = hintShardingValue.getLogicTableName();

        String tableName = logicTableName + "_" +hintShardingValue.getValues().toArray()[0];
        if(!collection.contains(tableName)){
            throw new UnsupportedOperationException("数据表:"+tableName + "不存在");
        }
        return Arrays.asList(tableName);

    }
}

Test case:

    @Test
    void testHint(){
        HintManager hintManager = HintManager.getInstance();
        hintManager.addTableShardingValue("tb_device",0); // 强制指定只查询tb_device_0表
        List<TbDevice> devices = deviceMapper.selectList(null);
        devices.stream().forEach(System.out::println);
    }

result:

 4. Binding table

Let’s first simulate the emergence of the Cartesian product.

  • Create tb_device_info_0, tb_device_info_1 tables for the two libraries:
CREATE TABLE `tb_device_info_0` (
 `id` bigint NOT NULL,
 `device_id` bigint DEFAULT NULL,
 `device_intro` varchar(255) COLLATE utf8mb4_general_ci DEFAULT NULL,
 PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
  • Configure the fragmentation strategy for the tb_device and tb_device_info tables.
#tb_device表的分⽚策略
spring.shardingsphere.sharding.tables.tb_device.actual-data-nodes=ds$->
{0..1}.tb_device_$->{0..1}
spring.shardingsphere.sharding.tables.tb_device.tablestrategy.inline.sharding-column=device_id
spring.shardingsphere.sharding.tables.tb_device.tablestrategy.inline.algorithm-expression=tb_device_$->{device_id%2}

#tb_device_info表的分⽚策略
spring.shardingsphere.sharding.tables.tb_device_info.actual-datanodes=ds$->{0..1}.tb_device_info_$->{0..1}
spring.shardingsphere.sharding.tables.tb_device_info.tablestrategy.inline.sharding-column=device_id
spring.shardingsphere.sharding.tables.tb_device_info.tablestrategy.inline.algorithm-expression=tb_device_info_$->{device_id%2}

The partitioning keys of both tables are device_id.

  • Write test cases and insert data
    @Test
    void testInsertDeviceInfo(){
        for (int i = 0; i < 10; i++) {
            TbDevice tbDevice = new TbDevice();
            tbDevice.setDeviceId((long) i);
            tbDevice.setDeviceType(i);
            deviceMapper.insert(tbDevice);

            TbDeviceInfo tbDeviceInfo = new TbDeviceInfo();
            tbDeviceInfo.setDeviceId((long) i);
            tbDeviceInfo.setDeviceIntro(""+i);
            deviceInfoMapper.insert(tbDeviceInfo);
        }
    }
  • Cartesian product appears in join query
import com.baomidou.mybatisplus.core.mapper.BaseMapper;
import com.lc.entity.TbDeviceInfo;
import org.apache.ibatis.annotations.Mapper;
import org.apache.ibatis.annotations.Select;

import java.util.List;
@Mapper
public interface DeviceInfoMapper extends BaseMapper<TbDeviceInfo> {

    @Select("select a.id,a.device_id,b.device_type,a.device_intro from tb_device_info a left join tb_device b on a.device_id = b.device_id")
    public List<TbDeviceInfo> queryDeviceInfo();
}
    @Test
    void testQueryDeviceInfo(){
        List<TbDeviceInfo> tbDeviceInfos = deviceInfoMapper.queryDeviceInfo();
        tbDeviceInfos.stream().forEach(System.out::println);
    }

result:

As you can see, a Cartesian product was generated and 20 pieces of data were found. 

  • Configure binding table 
# 配置 绑定表
spring.shardingsphere.sharding.binding-tables[0]=tb_device,tb_device_info

Query again and no more Cartesian product appears:

5. Broadcast list

 Now there is a scenario where the data in the tb_device_type table corresponding to the device_type column should not be divided into tables. Both libraries should have the full amount of data in the table.

  • Create tb_device_type tables in both databases
CREATE TABLE `tb_device_type` (
 `type_id` int NOT NULL AUTO_INCREMENT,
 `type_name` varchar(255) COLLATE utf8mb4_general_ci DEFAULT NULL,
 PRIMARY KEY (`type_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
  •  Configure broadcast list
#⼴播表配置
spring.shardingsphere.sharding.broadcast-tables=tb_device_type
spring.shardingsphere.sharding.tables.tb_device_type.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.tb_device_type.key-generator.column=type_id
  • Write test cases
    @Test
    void testInsertDeviceType(){
        TbDeviceType tbDeviceType = new TbDeviceType();
        tbDeviceType.setTypeId(1l);
        tbDeviceType.setTypeName("消防器材");
        deviceTypeMapper.insert(tbDeviceType);

        TbDeviceType tbDeviceType1 = new TbDeviceType();
        tbDeviceType1.setTypeId(2l);
        tbDeviceType1.setTypeName("健身器材");
        deviceTypeMapper.insert(tbDeviceType1);
    }

result:

Both tb_device_types of the two libraries have inserted two pieces of data.

5. Achieve reading and writing separation 

1. Build a master-slave synchronization database

  • Master-slave synchronization principle

        Master writes data to binlog. Slave reads the Binlog data of the master node into the local relaylog log file. At this time, the Slave continuously synchronizes with the Master, and the data exists in the relaylog instead of falling in the database. So Slave starts a thread to write the data in relaylog to the database.

  •  Prepare the Master database and Slave database

Create docker-compose.yml under usr/local/docker/mysql and write:

version: '3.1'
services:
  mysql:
    restart: "always"
    image: mysql:5.7.25
    container_name: mysql-test-master
    ports:
     - 3308:3308
    environment:
      TZ: Asia/Shanghai
      MYSQL_ROOT_PASSWORD: 123456
    command:
      --character-set-server=utf8mb4
      --collation-server=utf8mb4_general_ci
      --explicit_defaults_for_timestamp=true
      --lower_case_table_names=1
      --max_allowed_packet=128M
      --server-id=47
      --log_bin=master-bin
      --log_bin-index=master-bin.index
      --skip-name-resolve
      --sql-mode="STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION,NO_ZERO_DATE,NO_ZERO_IN_DATE,ERROR_FOR_DIVISION_BY_ZERO"
    volumes:
     - mysql-data:/var/lib/mysql
volumes:
  mysql-data:

version: '3.1'
services:
  mysql:
    restart: "always"
    image: mysql:5.7.25
    container_name: mysql-test-slave
    ports:
     - 3309:3309
    environment:
      TZ: Asia/Shanghai
      MYSQL_ROOT_PASSWORD: 123456
    command:
      --character-set-server=utf8mb4
      --collation-server=utf8mb4_general_ci
      --explicit_defaults_for_timestamp=true
      --lower_case_table_names=1
      --max_allowed_packet=128M
      --server-id=48
      --relay-log=slave-relay-bin
      --relay-log-index=slave-relay-bin.index
      --log-bin=mysql-bin
      --log-slave-updates=1
      --sql-mode="STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION,NO_ZERO_DATE,NO_ZERO_IN_DATE,ERROR_FOR_DIVISION_BY_ZERO"
    volumes:
     - mysql-data:/var/lib/mysql1
volumes:
  mysql-data:

Start  using docker-compose up -d

Pay attention to the configuration:

Main library:

Service id: server-id=47

Enable binlog: log_bin=master-bin

binlog索引:log_bin-index=master-bin.index

From the library:

Service id: server-id=48

Enable relay log: relay-log-index=slave-relay-bin.index

Enable relay log: relay-log=slave-relay-bin 

Use the bash command to enter the master main library container, and use show master status to view the record file name and offset.

 Use the bash command to enter the slave container, and execute the following commands in sequence:

#登录从服务
mysql -u root -p;
#设置同步主节点:
CHANGE MASTER TO
MASTER_HOST='主库地址',
MASTER_PORT=3306,
MASTER_USER='root',
MASTER_PASSWORD='123456',
MASTER_LOG_FILE='master-bin.000006',
MASTER_LOG_POS=154;
#开启slave
start slave;

 At this point, the master-slave synchronization cluster is completed.

Create the db_device database in the main library and create tables in the library:

CREATE TABLE `tb_user` (
 `id` bigint(20) NOT NULL AUTO_INCREMENT,
 `name` varchar(255) DEFAULT NULL,
 PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8mb4;

It was found that the table was also created synchronously in the slave database after the refresh.

2. Use sharding-jdbc to achieve read and write separation 

  • Write configuration file
spring.shardingsphere.datasource.names=s0,m0
#配置主数据源
spring.shardingsphere.datasource.m0.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.m0.url=jdbc:mysql://localhost:3306/db_device?serverTimezone=UTC&characterEncoding=utf-8&useSSL=false
spring.shardingsphere.datasource.m0.username=root
spring.shardingsphere.datasource.m0.password=123456

# 配置从数据源
spring.shardingsphere.datasource.s1.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.s1.url=jdbc:mysql://localhost:3306/db_device?serverTimezone=UTC&characterEncoding=utf-8&useSSL=false
spring.shardingsphere.datasource.s1.username=root
spring.shardingsphere.datasource.s1.password=123456

# 分配读写规则
spring.shardingsphere.sharding.master-slave-rules.ds0.master-data-source-name=m0
spring.shardingsphere.sharding.master-slave-rules.ds0.slave-data-source-names[0]=s0

# 确定实际表
spring.shardingsphere.sharding.tables.tb_user.actual-data-nodes=ds0.tb_user
# 确定主键⽣成策略
spring.shardingsphere.sharding.tables.t_dict.key-generator.column=id
spring.shardingsphere.sharding.tables.t_dict.key-generator.type=SNOWFLAKE
# 开启显示sql语句
spring.shardingsphere.props.sql.show = true
  • Test write data
 @Test
 void testInsertUser(){
     for (int i = 0; i < 10; i++) {
     TbUser user = new TbUser();
     user.setName(""+i);
     userMapper.insert(user);
 }

        At this time, all data will only be written to the master database and then synchronized to the slave database.

  • Test reading data
 Test
 void testQueryUser(){
     List<TbUser> tbUsers = userMapper.selectList(null);
     tbUsers.forEach( tbUser -> System.out.println(tbUser));
 }

        At this time, all data is read from the slave database.

6. Implementation principle - connection mode

        ShardingSphere uses a set of automated execution engines to safely and efficiently send the real SQL after routing and rewriting to the underlying data source for execution. It does not simply send SQL directly to the data source for execution through JDBC; nor does it directly put execution requests into the thread pool for concurrent execution. It pays more attention to balancing the consumption caused by data source connection creation and memory usage, as well as maximizing reasonable utilization of concurrency and other issues. The goal of the execution engine is to automatically balance resource control and execution efficiency.

6.1.Connection mode

        From the perspective of resource control , the number of connections for business users to access the database should be limited . It can effectively prevent a certain business operation from occupying too many resources, thereby exhausting database connection resources and affecting the normal access of other businesses. Especially when there are many shard tables in a database instance, a logical SQL that does not contain a shard key will generate a large number of real SQLs that fall on different tables in the same database. If each real SQL occupies a If there is an independent connection, then one query will undoubtedly occupy too many resources. ( memory limited mode )

        From the perspective of execution efficiency , maintaining an independent database connection for each shard query can make more effective use of multi-threading to improve execution efficiency. By opening an independent thread for each database connection, the consumption caused by I/O can be processed in parallel . Maintaining an independent database connection for each fragment can also avoid prematurely loading query result data into memory. An independent database connection can hold a reference to the cursor position of the query result set and move the cursor when the corresponding data needs to be obtained . ( Connection restriction mode )

        The method of merging results by moving the result set cursor down is called streaming merging . It does not need to load all the result data into the memory , which can effectively save memory resources and thereby reduce the frequency of garbage collection. When it is not possible to ensure that each sharded query holds an independent database connection, the current query result set needs to be loaded into memory before reusing the database connection to obtain the query result set of the next sharded table . Therefore, even if streaming merging can be used, it will degenerate into memory merging in this scenario.

         On the one hand, it is to control and protect database connection resources, and on the other hand, it is to adopt a better merging mode to save middleware memory resources. How to handle the relationship between the two is what the ShardingSphere execution engine needs to solve. question. Specifically, if a piece of SQL needs to operate 200 tables under a certain database instance after being split by ShardingSphere. So, should we choose to create 200 connections and execute them in parallel, or choose to create one connection and execute them serially? How should we choose between efficiency and resource control?

         For the above scenarios, ShardingSphere provides a solution. It proposes the concept of Connection Mode and divides it into two types: memory restriction mode (MEMORY_STRICTLY) and connection restriction mode (CONNECTION_STRICTLY).

6.1.1.Memory limited mode 

The premise for using this mode is that ShardingSphere does not limit the number of database connections         consumed by one operation . If the actual executed SQL requires operations on 200 tables in a database instance, create a new database connection for each table and process it concurrently through multi-threading to maximize execution efficiency. . And when the SQL conditions are met, streaming merge is preferred to prevent memory overflow or frequent garbage collection.

6.1.2. Connection restriction mode

        The premise for using this mode is that ShardingSphere strictly controls the number of database connections consumed for an operation . If the actual executed SQL requires operations on 200 tables in a database instance, only a unique database connection will be created and the 200 tables will be processed serially. If the fragments in one operation are scattered in different databases, multi-threading is still used to process the operations on different libraries, but each operation of each library still only creates a unique database connection. This can prevent problems caused by taking up too many database connections for one request. This mode always selects memory coalescing.

6.2. Automated execution engine

        ShardingSphere initially leaves the decision of which mode to use to user configuration, allowing developers to choose to use memory-limited mode or connection-limited mode based on the actual scenario requirements of their own business.

        In order to reduce the user's usage costs and make the connection mode dynamic, ShardingSphere refined the idea of ​​an automated execution engine and internally digested the connection mode concept. Users do not need to know what the so-called memory limit mode and connection limit mode are, but let the execution engine automatically select the optimal execution plan based on the current scenario.

        The automated execution engine refines the selection granularity of the connection mode to each SQL operation. For each SQL request, the automated execution engine will perform real-time calculations and trade-offs based on its routing results, and autonomously execute it using the appropriate connection mode to achieve the optimal balance of resource control and efficiency. . For automated execution engines, users only need to configure maxConnectionSizePerQuery . This parameter indicates the maximum number of connections allowed for each database during a query.

        Within the range allowed by maxConnectionSizePerQuery , when the number of requests that a connection needs to perform is greater than 1, it means that the current database connection cannot hold the corresponding data result set, and memory merging must be used; conversely, when a When the number of requests that need to be executed by the connection is equal to 1, it means that the current database connection can hold the corresponding data result set, and streaming merging can be used . Each connection mode selection is specific to each physical database. In other words, in the same query, if it is routed to more than one database, the connection mode of each database is not necessarily the same, and they may exist in a mixed form. (When the maxConnectionSizePerQuery set by the user / the number of all SQLs that need to be executed on the database is equal to 0 or 1, the memory limit mode will be used. If it is greater than 1 , the connection limit mode will be used )

Guess you like

Origin blog.csdn.net/weixin_53922163/article/details/127701570