MySQL from entry to proficiency [Practice] Detailed explanation of using Sharding-JDBC sub-database sub-table


insert image description here

0. Preface

Sharding-JDBC is an open source distributed database middleware. Its main goal is to make full use of the horizontal expansion capability of the database, avoid changing the existing application code as much as possible, and simplify the development difficulty of distributed systems. This article mainly uses sharding-jdbc to implement sub-database and sub-table and practical verification of various sub-database and sub-table strategies.

Before studying this article, it is recommended to read the two articles I wrote before
"MySQL from entry to proficiency [Advanced] Detailed explanation of MySQL sub-database and table"
"MySQL from entry to proficiency [Advanced] Detailed explanation of MySQL's read-write separation "

Technical component version of this article

> spring-boot 2.7.15
> jdk 1.8
> mybatis-plus-boot-starter 3.5.3.2 最新版
> mysql  8.0
> sharding-jdbc 4.0.1

basic introduction

Sharding of databases and tables is a common database expansion strategy, which improves the performance and scalability of the system by spreading data into multiple databases or tables. However, implementing sub-database and sub-table is not simple, it involves a series of complex issues such as data routing, SQL rewriting, and result merging. In fact, we don't need to make wheels ourselves, because there are already a batch of excellent open source components that meet our basic needs.
You must have heard of the most famous component MyCAT. MyCAT provides a series of functions including sub-database and sub-table, read-write separation, high availability, etc. MyCAT is developed based on Java and can transparently merge multiple MySQL databases into one logical database. In this chapter, we are going to talk about Sharding-JDBC developed by the powerful ShardingSphere team in China. Later joined the Apache project.

Currently, Sharding-JDBC is part of the Apache ShardingSphere project. It is a lightweight framework that provides additional services at the JDBC layer of Java. It can convert any database into a distributed database, and through data fragmentation, elastic scaling, encryption and other capabilities Enhance the original database. Using Sharding-JDBC, we can easily implement MySQL's sub-database and sub-table without modifying the business code.

In this article, we will discuss in depth how to use Sharding-JDBC to implement MySQL sub-database and sub-table. We will start with the basic concepts and principles of Sharding-JDBC, and then introduce in detail how to configure and use Sharding-JDBC through practical examples and codes. Whether you are a rookie or a developer with some database experience, I believe you can get some useful knowledge from this article.

2. Use and configuration:

The directory structure of the demo project is as follows
insert image description here

Step 1 introduce dependencies

Add Sharding-JDBC dependency in pom.xml.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>2.7.15</version>
		<relativePath/> <!-- lookup parent from repository -->
	</parent>
	<groupId>com.icepip</groupId>
	<artifactId>springBoot-icepip-sharding-jdbc</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<name>springBoot-icepip-sharding-jdbc</name>
	<description>springBoot-icepip-sharding-jdbc</description>
	<properties>
		<java.version>1.8</java.version>
		<mysql.version>8.0.13</mysql.version>
		<sharding-sphere.version>4.0.1</sharding-sphere.version>
		<mybatis-plus-boot-starter.version>3.5.3.2</mybatis-plus-boot-starter.version>
	</properties>

	<dependencies>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
		</dependency>
		<dependency>
			<groupId>mysql</groupId>
			<artifactId>mysql-connector-java</artifactId>
			<version>${mysql.version}</version>
			<scope>runtime</scope>
		</dependency>
		<dependency>
			<groupId>org.apache.shardingsphere</groupId>
			<artifactId>sharding-jdbc-spring-boot-starter</artifactId>
			<version>${sharding-sphere.version}</version>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
		</dependency>
		<dependency>
			<groupId>com.baomidou</groupId>
			<artifactId>mybatis-plus-boot-starter</artifactId>
			<version>${mybatis-plus-boot-starter.version}</version>
		</dependency>
		<dependency>
			<groupId>org.projectlombok</groupId>
			<artifactId>lombok</artifactId>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-actuator</artifactId>
		</dependency>

	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
			</plugin>
		</plugins>
	</build>

</project>

Step 2 Configure the data source and sharding strategy

Configure data sources and sharding rules in application.properties.

server.port=8098

management.health.db.enabled=false

spring.shardingsphere.datasource.names=ds0,ds1,ds2
logging.level.root=trace # 专门搞成trace 我们可以看到 SQL 解析、路由计算、SQL 改写和 SQL 执行

spring.shardingsphere.datasource.ds0.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds0.driver-class-name=com.mysql.cj.jdbc.Driver
spring.shardingsphere.datasource.ds0.jdbc-url=jdbc:mysql://172.20.6.xx:3306/icepip_demo0?serverTimezone=UTC&useUnicode=true&characterEncoding=utf8&characterSetResults=utf8&useSSL=false&verifyServerCertificate=false&autoReconnct=true&autoReconnectForPools=true&allowMultiQueries=true
spring.shardingsphere.datasource.ds0.username=root
spring.shardingsphere.datasource.ds0.password=password

spring.shardingsphere.datasource.ds1.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds1.driver-class-name=com.mysql.cj.jdbc.Driver
spring.shardingsphere.datasource.ds1.jdbc-url=jdbc:mysql://172.20.6.xx:3306/icepip_demo1?serverTimezone=UTC&useUnicode=true&characterEncoding=utf8&characterSetResults=utf8&useSSL=false&verifyServerCertificate=false&autoReconnct=true&autoReconnectForPools=true&allowMultiQueries=true
spring.shardingsphere.datasource.ds1.username=root
spring.shardingsphere.datasource.ds1.password=password

spring.shardingsphere.datasource.ds2.type=com.zaxxer.hikari.HikariDataSource
spring.shardingsphere.datasource.ds2.driver-class-name=com.mysql.cj.jdbc.Driver
spring.shardingsphere.datasource.ds2.jdbc-url=jdbc:mysql://172.20.6.xx:3306/icepip_demo2?serverTimezone=UTC&useUnicode=true&characterEncoding=utf8&characterSetResults=utf8&useSSL=false&verifyServerCertificate=false&autoReconnct=true&autoReconnectForPools=true&allowMultiQueries=true
spring.shardingsphere.datasource.ds2.username=root
spring.shardingsphere.datasource.ds2.password=password


# 设置分库策略为内联分片策略,分片列为user_id
spring.shardingsphere.sharding.default-database-strategy.inline.sharding-column=user_id

# 设置分库策略算法表达式,根据user_id的值取模3得到分库后缀,拼接到ds前面作为数据源名 因为我上面是3个库所以与3取模,实际根据分片库的数量配置
spring.shardingsphere.sharding.default-database-strategy.inline.algorithm-expression=ds$->{
    
    user_id % 3}

# 设置t_order表的实际数据节点,实际数据节点由数据源名和表名组成
spring.shardingsphere.sharding.tables.t_order.actual-data-nodes=ds$->{
    
    0..2}.t_order$->{
    
    0..1}

# 设置t_order表的内联分片策略,分片列为order_id
spring.shardingsphere.sharding.tables.t_order.table-strategy.inline.sharding-column=order_id

# 设置t_order表的主键生成器列为order_id
spring.shardingsphere.sharding.tables.t_order.key-generator.column=order_id

# 设置t_order表的主键生成器类型为雪花算法
spring.shardingsphere.sharding.tables.t_order.key-generator.type=SNOWFLAKE

# 设置t_order表的内联分片策略算法表达式,根据order_id的值取模2得到分片后缀,拼接到t_order前面作为表名
spring.shardingsphere.sharding.tables.t_order.table-strategy.inline.algorithm-expression=t_order$->{
    
    order_id % 2}

# 设置默认数据源名称为ds0
spring.shardingsphere.sharding.default-data-source-name=ds0

# 设置显示SQL语句
spring.shardingsphere.props.sql.show=true

# 允许覆盖Bean定义
spring.main.allow-bean-definition-overriding=true


# mybatis-plus
mybatis-plus.mapper-locations=classpath:/mapper/*.xml
mybatis-plus.configuration.jdbc-type-for-null='null'



Step 3 core code

Use JdbcTemplate or JPA to perform database operations in the project, and Sharding-JDBC will automatically perform sharding.

MybatisPlusConfig core configuration

package com.icepip.shardingjdbc.config;

import com.baomidou.mybatisplus.extension.plugins.MybatisPlusInterceptor;
import com.baomidou.mybatisplus.extension.plugins.inner.BlockAttackInnerInterceptor;
import com.baomidou.mybatisplus.extension.plugins.inner.OptimisticLockerInnerInterceptor;
import com.baomidou.mybatisplus.extension.plugins.inner.PaginationInnerInterceptor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.actuate.jdbc.DataSourceHealthIndicator;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import javax.sql.DataSource;


/**
 * Mybatis-plus 全局拦截器
 * 可通过开关选择启用不同作用的全局拦截器 {@see MybatisPlusConfig}
 *
 * @author 冰点
 * @date 2022-06-23 2:13 PM
 */
@Configuration
@Slf4j
public class MybatisPlusConfig {
    
    

    @Value("${icepip.mybatis-plus-ext.db-key.enable:false}")
    private boolean dbKeyEnable;

    @Value("${icepip.datasource.dynamic.primary:ds0}")
    private String primaryCode;

    @Value("${icepip.mybatis-plus-ext.db-key:ds0}")
    private String dbKey;

    @Value("${icepip.mybatis-plus-ext.pagination-interceptor.enable:true}")
    private boolean paginationInterceptorEnable;

    @Value("${icepip.mybatis-plus-ext.optimistic-Locker.enable:true}")
    private boolean optimisticLockerEnable;

    @Value("${icepip.mybatis-plus-ext.sql-attack-interceptor.enable:false}")
    private boolean sqlAttackInterceptorEnable;

    @Value("${icepip.mybatis-plus-ext.tenant-interceptor.enable:true}")
    private boolean tenantEnable;

    @Bean
    public DataSourceHealthIndicator dataSourceHealthIndicator(DataSource dataSource) {
    
    
        return new DataSourceHealthIndicator(dataSource, "select 1");
    }

    @Bean
    public MybatisPlusInterceptor mybatisPlusInterceptor() {
    
    
        MybatisPlusInterceptor interceptor = new MybatisPlusInterceptor();
        if (optimisticLockerEnable) {
    
    
            interceptor.addInnerInterceptor(new OptimisticLockerInnerInterceptor());
            log.info("已开启乐观锁校验[当要更新一条记录的时候,希望这条记录没有被别人更新],被@version字段修饰的字段为乐观锁字段,此必须要有默认值或初始值");
        }
        if (sqlAttackInterceptorEnable) {
    
    
            interceptor.addInnerInterceptor(new BlockAttackInnerInterceptor());
            log.info("已开启危险SQL拦截[由于开启了直接输入sql操作数据库,防止误操作],危险SQL拦截判断规则详见@see BlockAttackInnerInterceptor");
        }
        if (tenantEnable) {
    
    
            // 已简化
            log.info("已开启多租户模式,默认项目id为租户id");
        }
        if (paginationInterceptorEnable) {
    
    
            PaginationInnerInterceptor innerInterceptor = new PaginationInnerInterceptor();
            innerInterceptor.setOptimizeJoin(false);
            interceptor.addInnerInterceptor(innerInterceptor);
            log.info("已开启自动分页插件,如果返回类型是 IPage 则入参的 IPage 不能为null,因为 返回的IPage == 入参的IPage,如果返回类型是 List 则入参的 IPage 可以为 null(为 null 则不分页),但需要你手动入参的IPage.setRecords(返回的 List)");
        }

        return interceptor;
    }


}

OrderService

package com.icepip.shardingjdbc.service;


import com.baomidou.mybatisplus.core.metadata.IPage;
import com.baomidou.mybatisplus.extension.plugins.pagination.Page;
import com.baomidou.mybatisplus.extension.service.IService;
import com.icepip.shardingjdbc.model.OrderInfo;

import java.io.Serializable;

public interface OrderService extends IService<OrderInfo> {
    
    

    @Override
    boolean save(OrderInfo entity);

    @Override
    boolean removeById(Serializable id);

    @Override
    boolean updateById(OrderInfo entity);

    IPage<OrderInfo> page(Page<?> page, OrderInfo orderInfo);

}

OrderServiceImpl

package com.icepip.shardingjdbc.service;


import com.baomidou.mybatisplus.core.metadata.IPage;
import com.baomidou.mybatisplus.core.metadata.OrderItem;
import com.baomidou.mybatisplus.extension.plugins.pagination.Page;
import com.baomidou.mybatisplus.extension.service.impl.ServiceImpl;
import com.icepip.shardingjdbc.mapper.OrderMapper;
import com.icepip.shardingjdbc.model.OrderInfo;
import org.springframework.stereotype.Service;

import java.io.Serializable;
import java.util.HashMap;
import java.util.Map;

@Service
public class OrderServiceImpl extends ServiceImpl<OrderMapper, OrderInfo> implements OrderService {
    
    



    @Override
    public boolean save(OrderInfo entity) {
    
    
        return super.save(entity);
    }

    @Override
    public boolean removeById(Serializable id) {
    
    
        return super.removeById(id);
    }

    @Override
    public boolean updateById(OrderInfo entity) {
    
    
        return super.updateById(entity);
    }


    @Override
    public IPage<OrderInfo> page(Page<?> page, OrderInfo orderInfo) {
    
    
        Map<String, Object> map = new HashMap<>();
        map.put("userId", orderInfo.getUserId());
        map.put("orderId", orderInfo.getOrderId());
        map.put("orderName", orderInfo.getOrderName());
        map.put("orderStatus", orderInfo.getOrderStatus());
        page.addOrder(new OrderItem("order_id",false));
        return super.baseMapper.selectAllByCondition(page, map);
    }
}

OrderInfo

package com.icepip.shardingjdbc.model;


import com.baomidou.mybatisplus.annotation.TableField;
import com.baomidou.mybatisplus.annotation.TableId;
import com.baomidou.mybatisplus.annotation.TableName;
import lombok.Data;

@Data
@TableName("t_order")
public class OrderInfo {
    
    

    @TableId(value = "order_id")
    private Long orderId;

    @TableField(value = "order_name")
    private String orderName;

    @TableField(value = "order_status")
    private Integer orderStatus;

    @TableField(value = "user_id")
    private Long userId;

}

OrderMapper

package com.icepip.shardingjdbc.mapper;


import com.baomidou.mybatisplus.core.metadata.IPage;
import com.baomidou.mybatisplus.extension.plugins.pagination.Page;
import com.icepip.shardingjdbc.base.BaseMapper;
import com.icepip.shardingjdbc.model.OrderInfo;
import org.apache.ibatis.annotations.Mapper;
import org.apache.ibatis.annotations.Param;

import java.util.Map;

@Mapper
public interface OrderMapper extends BaseMapper<OrderInfo> {
    
    

    IPage<OrderInfo> selectAllByCondition(Page<?> page, @Param("condition") Map<String, Object> condition);

    int deleteById(@Param("condition")Map<String, Object> condition);

}

OrderController

package com.icepip.shardingjdbc.controller;


import com.baomidou.mybatisplus.core.metadata.IPage;
import com.baomidou.mybatisplus.extension.plugins.pagination.Page;
import com.icepip.shardingjdbc.model.OrderInfo;
import com.icepip.shardingjdbc.service.OrderService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("/api/v1/order")
public class OrderController {
    
    

    @Autowired
    private OrderService orderService;

    @PostMapping("add")
    public boolean save(@RequestBody OrderInfo orderInfo) {
    
    
        return orderService.save(orderInfo);
    }

    @DeleteMapping("/{orderId}")
    public boolean deleteById(
            @PathVariable("orderId")
            Long id) {
    
    
        return orderService.removeById(id);
    }

    @PutMapping("/{orderId}")
    public boolean updateById(
            @PathVariable("orderId")
            Long orderId,
            @RequestBody OrderInfo orderInfo) {
    
    
        orderInfo.setOrderId(orderId);
        return orderService.updateById(orderInfo);
    }

    @GetMapping("/page")
    public IPage page(
            @RequestParam(name = "pageNum", required = false, defaultValue = "1")
                    Integer pageNum,
            @RequestParam(name = "pageSize", required = false, defaultValue = "10")
                    Integer pageSize,
            @RequestParam(name = "orderId", required = false)
            Long orderId,
            @RequestParam(name = "orderStatus", required = false)
            Integer orderStatus,
            @RequestParam(name = "orderName", required = false)
            String orderName) {
    
    
        Page<OrderInfo> page = new Page<>(pageNum,pageSize);
        OrderInfo orderInfo = new OrderInfo();
        orderInfo.setOrderId(orderId);
        orderInfo.setOrderName(orderName);
        orderInfo.setOrderStatus(orderStatus);
        return orderService.page(page, orderInfo);
    }
}

BaseMapper

package com.icepip.shardingjdbc.base;

import com.baomidou.mybatisplus.core.metadata.IPage;
import com.baomidou.mybatisplus.extension.plugins.pagination.Page;
import org.apache.ibatis.annotations.Param;

import java.util.Map;

public interface BaseMapper<T> extends com.baomidou.mybatisplus.core.mapper.BaseMapper<T> {
    
    

    IPage<T> selectAllByCondition(Page<?> page, @Param("condition")Map<String, Object> condition);

    int deleteById(@Param("condition")Map<String, Object> condition);

}

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd" >
<mapper namespace="com.icepip.shardingjdbc.mapper.OrderMapper">

    <sql id="Base_Column_List">
        order_id, order_name, order_status, user_id
    </sql>

    <select id="selectAllByCondition" resultType="com.icepip.shardingjdbc.model.OrderInfo">
        select
        <include refid="Base_Column_List"/>
        from
        t_order
        <where>
            <if test="condition.userId != null">
                and user_id = #{condition.userId, jdbcType=BIGINT}
            </if>
            <if test="condition.orderId != null">
                and order_id = #{condition.orderId, jdbcType=BIGINT}
            </if>
            <if test="condition.orderName != null and condition.orderName != ''">
                and order_name like concat('%',#{condition.orderName,jdbcType=VARCHAR} ,'%')
            </if>
            <if test="condition.orderStatus != null">
                and order_status = #{condition.orderStatus, jdbcType=INTEGER}
            </if>
        </where>
    </select>

    <delete id="deleteById">
        delete from t_order where order_id = #{condition.orderId, jdbcType=BIGINT} and user_id = #{condition.userId, jdbcType=BIGINT}
    </delete>

</mapper>

3. Database fragmentation configuration

Sharding-JDBC provides powerful data sharding capabilities, which can disperse data into multiple databases or tables according to business rules to improve data processing capabilities.

  • Fragmentation strategy: such as modulo fragmentation based on the value of a field, etc.
  • Fragmentation algorithm: such as built-in modulo-based fragmentation algorithm, range-based fragmentation algorithm, etc.

The data sharding function of Sharding-JDBC makes the management and query of massive data more efficient. By distributing data into multiple databases or tables, system performance degradation caused by excessive data volume can be avoided, and the query and processing speed of the system can be improved.
For example, if the business data is mainly queried based on the user ID, the user ID can be selected as the sharding key, and then the modulo operation is performed based on the value of the user ID to evenly disperse the data into different databases or tables.

In my demo project, you can see the following configuration

Set the sub-database strategy to inline sharding strategy, and the sharding column is user_id. Set the sub-database strategy algorithm expression, take the modulo 3 according to the value of user_id to get the suffix of the sub-database, and splicing it in front of ds as the data source name because I have 3 above Therefore, it is modulo 3, and the actual configuration is based on the number of fragmented libraries.
I am three libraries.
insert image description here

insert image description here

3.1 Table Fragmentation Strategy

Sharding-JDBC supports various sharding strategies, such as single key sharding, composite key sharding, etc. Developers can choose the appropriate sharding strategy according to business needs.

insert image description here
This is the configuration analysis of Apache ShardingSphere:

  1. spring.shardingsphere.sharding.tables.t_order.actual-data-nodes=ds$->{0..2}.t_order$->{0..1}
    Set the logical tables and actual data nodes for data sharding. The logical table is t_order, the actual data node is ds0.t_order0、ds0.t_order1、ds1.t_order0、ds1.t_order1、ds2.t_order0、ds2.t_order1, where ds represents the data source.

  2. spring.shardingsphere.sharding.tables.t_order.table-strategy.inline.sharding-column=order_id
    Set the column of the sharding strategy, which is order_idused as the basis for sharding.

  3. spring.shardingsphere.sharding.tables.t_order.key-generator.column=order_idand spring.shardingsphere.sharding.tables.t_order.key-generator.type=SNOWFLAKE
    Set the primary key generation strategy, here is order_idthe primary key generated by the Snowflake algorithm (Snowflake).

  4. spring.shardingsphere.sharding.tables.t_order.table-strategy.inline.algorithm-expression=t_order$->{order_id % 2}
    Set the algorithm expression of the sharding strategy. order_idThe value passed here takes the result of modulo 2 as the suffix of the sharding, which is concatenated to t_orderthe front as the table name.

3.2 Fragmentation algorithm

In fact, Sharding-JDBC has a variety of built-in sharding algorithms, such as modulo-based sharding algorithms, range-based sharding algorithms, and hash-based sharding algorithms. The modulo sharding algorithm is a simple and commonly used sharding algorithm, which is suitable for scenarios with a large amount of data and relatively uniform distribution; the range sharding algorithm is suitable for scenarios where data is unevenly distributed on the sharding key, and can be specified by specifying The range of the sharding key is used for sharding; the hash sharding algorithm is suitable for scenarios with a particularly large amount of data. By hashing the sharding key, the data can be evenly distributed to various databases or tables.
ShardingSphere provides a variety of built-in sharding strategies,

  1. 标准分片策略(StandardShardingStrategy): Values ​​applicable to shard keys have clear sharding logic, for example: even numbers go to one shard, odd numbers go to another shard.

  2. 复合分片策略(ComplexShardingStrategy): Suitable for scenarios where complex relationships exist between multiple shard keys.

  3. Hint分片策略(HintShardingStrategy): The value applicable to the shard key cannot be obtained directly from SQL, and needs to be provided by the application through the Hint API.

  4. 类型分片策略(ClassBasedShardingStrategy): Values ​​suitable for sharding keys need to be sharded by a custom algorithm.

  5. 范围分片策略(RangeShardingStrategy): Suitable for scenarios where the value of the shard key has a range relationship.

Range sharding strategies can also be used for our above requirements. We define a range_order_idrange sharding algorithm called , which order_iddivides the value of a field by 1024 and then uses the result to determine the shard.

sharding.jdbc.config.sharding.tables.t_order.table-strategy.range.sharding-column=order_id
sharding.jdbc.config.sharding.tables.t_order.table-strategy.range.sharding-algorithm-name=range_order_id
sharding.jdbc.config.sharding.sharding-algorithms.range_order_id.type=RANGE
sharding.jdbc.config.sharding.sharding-algorithms.range_order_id.props.algorithm-expression=ds${order_id / 1024}

3.3. Custom Fragmentation Algorithm

3.3.1. Implementation of custom algorithm class

Suppose we want to shard based on the creation time of an order. We can choose to shard by year, that is, the data for each year is stored in one shard. In this way, we can create a custom sharding algorithm, which can also be processed by expressions.

package com.icepip.shardingjdbc.config;

import org.apache.shardingsphere.api.sharding.standard.PreciseShardingValue;
import org.apache.shardingsphere.api.sharding.standard.PreciseShardingAlgorithm;

import java.util.Calendar;
import java.util.Collection;
import java.util.Date;

/**
 *  自定义分片策略
 * @author 冰点
 * @version 1.0.0
 * @date 2023/9/5 14:52
 */

public class OrderTableShardingAlgorithm implements PreciseShardingAlgorithm<Date> {
    
    

    @Override
    public String doSharding(Collection<String> availableTargetNames, PreciseShardingValue<Date> shardingValue) {
    
    
        Date orderCreateTime = shardingValue.getValue();
        int year = getYear(orderCreateTime);

        for (String each : availableTargetNames) {
    
    
            if (each.endsWith(year + "")) {
    
    
                return each;
            }
        }

        throw new IllegalArgumentException();
    }

    private int getYear(Date date) {
    
    
        Calendar calendar = Calendar.getInstance();
        calendar.setTime(date);
        return calendar.get(Calendar.YEAR);
    }
}
sharding:
  tables:
    t_order:
      actual-data-nodes: ds${
    
    0..1}.t_order_${
    
    0..9}
      table-strategy:
        standard:
          sharding-column: create_time
          precise-algorithm-class-name: com.yourpackage.OrderTableShardingAlgorithm 

t_orderIs the logical table name, ds${0..1}.t_order_${0..9}representing the actual data node, create_timeis the sharding column, and com.yourpackage.OrderTableShardingAlgorithmis your custom sharding algorithm class.

3.3.2. Custom expression implementation

It is also achievable if we don't use a custom algorithm class. Requires configuration expressions to parse. This method requires the database to support the direct use of Java's date and time functions in SQL, otherwise errors may occur. If the database does not support it, you still need to write a custom fragmentation strategy.

We first get the creation time of the order and then extract the year. Then, we loop through all available target table names, and if the suffix of the table name matches the year, then we select this table as the target table.

 
sharding.jdbc.config.sharding.tables.t_order.actual-data-nodes=ds${0..1}.t_order_${2018..2023}
sharding.jdbc.config.sharding.tables.t_order.table-strategy.standard.sharding-column=create_time
sharding.jdbc.config.sharding.tables.t_order.table-strategy.standard.sharding-algorithm-name=orderTableShardingAlgorithm
sharding.jdbc.config.sharding.sharding-algorithms.orderTableShardingAlgorithm.type=INLINE
sharding.jdbc.config.sharding.sharding-algorithms.orderTableShardingAlgorithm.props.algorithm-expression=t_order_${create_time.toInstant().atZone(ZoneId.systemDefault()).toLocalDate().getYear()}

insert image description here

sharding.jdbc.config.sharding.tables.t_order.table-strategy.standard.sharding-algorithm-nameDefines the name of the sharding algorithm.
sharding.jdbc.config.sharding.sharding-algorithms.orderTableShardingAlgorithm.typeThe type of fragmentation algorithm is defined, and sharding.jdbc.config.sharding.sharding-algorithms.orderTableShardingAlgorithm.props.algorithm-expressionthe expression of fragmentation is defined.
When you insert a new order, ShardingSphere will select a shard for storage according to the creation time of the order.

4. Data Routing

When processing business requests, Sharding-JDBC can automatically route to the corresponding database or table according to the sharding rules, without developers needing to pay attention to the details of the underlying data storage.
The process of data routing is mainly divided into four steps: SQL parsing, routing calculation, SQL rewriting and SQL execution.

  1. SQL parsing: Sharding-JDBC first parses the SQL statement, obtains the abstract syntax tree (AST), and extracts the routing conditions.

  2. Routing calculation: By using the sharding key and sharding algorithm, combined with the parsed routing conditions, calculate the target database and table that the SQL statement needs to access.

  3. SQL rewriting: According to the routing result, the original SQL statement is rewritten, and the global table and fragmented table are replaced with real physical tables.

  4. SQL execution: Finally, send the rewritten SQL statement to the target database for execution.

These four steps are completely automatically completed by Sharding-JDBC. Developers only need to configure the sharding rules to achieve horizontal data segmentation and distributed access, which greatly simplifies the development work.

We can see the SQL rewrite through the logs. I deliberately added some order by sorting to see
the result of sharding-jdbc rewriting, and found that it executed this sql in each library and each table and then sorted in memory.

insert image description here

specific implementation process

  1. First, Sharding-JDBC will parse the SQL to extract key information such as table names, column names, conditions, sorting fields, etc.

  2. Sharding-JDBC will perform SQL routing. According to the sharding strategy, it will calculate the specific database and data table that the SQL statement needs to execute.

  3. Then, Sharding-JDBC will rewrite the original SQL statement. In this step, it replaces the logical table name with the actual physical table name, and splits the query condition into multiple subqueries based on the shard key. At the same time, it will also add the corresponding sorting and pagination conditions in each subquery.

  4. Parallel query: The rewritten SQL statement will be sent to the corresponding database for execution in parallel. Each database returns a result set.

  5. Merge results: In memory, Sharding-JDBC will merge result sets from different databases. First, it sorts all results by the sort field. Then, according to the paging conditions, it will take out the required part of the data and return it to the client.

For example, raw SQL

SELECT * FROM t_order ORDER BY order_id DESC LIMIT 10 OFFSET 20

If t_orderthe table is order_idfragmented by field and stored in two databases t_order_0and t_order_1two tables, the rewritten SQL may be:

SELECT * FROM t_order_0 ORDER BY order_id DESC LIMIT 30
SELECT * FROM t_order_1 ORDER BY order_id DESC LIMIT 30

These two SQLs will be executed in parallel in their respective databases, and then Sharding-JDBC will merge the two result sets, sort them order_id, and take out the 21st to 30th records and return them to the client.

Have you ever considered that in the above example of sub-database and sub-table, our original query is to retrieve the 21st to 30th records after sorting, that is, we hope to retrieve 10 records, but in the rewritten SQL, we are right Each subquery uses LIMIT 30, why?

This is because in the case of sub-database and sub-table, the data is scattered in different databases and tables. When we need to sort the data and fetch some data, we cannot guarantee that each subquery can contribute enough data.

For example, in our example, we want to fetch the 21st to 30th records after sorting. If we directly use LIMIT 10 OFFSET 20 for each subquery, then it is possible that a subquery returns less than 10 pieces of data (for example, only 5 pieces are returned), so that the combined result set is less than 10 pieces. In order to avoid this situation, we need to make each subquery return more data, such as returning the first 30 pieces of data, so that even if a certain subquery has insufficient data, we may get enough data from other subqueries.

I just made an example, which is just a simplified approach. In fact, Sharding-JDBC will be more complex and intelligent when dealing with this situation.

Usually the processing method of complex query Sharding-jdbc

1. Paging query

Like the example we gave above. In the environment of sub-database and sub-table, processing paging query and sorting query will be relatively complicated. Because the data is scattered in different libraries and tables, we cannot directly use LIMITand OFFSETkeywords for pagination. LIMITThe Sharding-JDBC update method is to use the rewritten sum value for each subquery OFFSET, and then merge and sort all the results in memory.

2. Aggregation query

Similarly, for aggregation functions such as COUNT(), SUM(), AVG()and so on, Sharding-JDBC will also perform aggregation operations on each subquery, and then perform secondary aggregation on all results in memory. For example, for COUNT()functions, Sharding-JDBC will add the results of each subquery; for AVG()functions, Sharding-JDBC will add the results of all subqueries , and then divide by the number obtained by adding the results SUM()of all subqueries .COUNT()

3. Subqueries and associated queries

If subqueries or associated queries are involved in SQL, such as JOIN, INetc., it is relatively complicated. But Sharding-JDBC can also handle it correctly. It gradually disassembles complex SQL containing subqueries or associated queries into single-table queries, executes them one by one, and finally merges all the results in memory. Let's take a simple strength to understand.
Suppose we have two sub-tables in an e-commerce system, users(user table) and orders(order table), and they user_idare sharded according to user ID ( ).

For example, the following is an SQL statement that includes a subquery. Our requirement is to query and hope to find all users whose total order amount is greater than 1000. Sharding-JDBC will be processed as follows:

Raw SQL

SELECT * FROM users WHERE user_id IN (SELECT user_id FROM orders WHERE total_amount > 1000);
  1. Sharding-JDBC will parse the SQL, find the main query and subquery, and then route them separately. Because orderstables are user_idsharded, subqueries may be performed on multiple tables. usersTables in the main query are also user_idsharded, so routing results may involve multiple tables.

  2. For the main query and sub-query, Sharding-JDBC will rewrite and execute them one by one. For the subquery, Sharding-JDBC will ordersexecute the query in each table to find all users whose total order amount is greater than 1000.
    The resulting user_idwill be used in the main query. Execute the main query in each rewritten userstable to find the corresponding user record.

  3. Result Merging Sharding-JDBC will combine query results from different tables to form a unified result set and return it.

So we try to avoid subqueries and join queries during the development process.

4. Cross-library transactions

In the environment of sub-database and sub-table, it is a challenging task to implement transactions across multiple databases. The usual approach is to use some kind of distributed transaction protocol, such as XA or TCC. Sharding-JDBC provides support for XA transactions by integrating the open source distributed transaction framework Seata. Let's talk about this later, how to integrate it.

5. Database connection pool

Sharding-JDBC supports many popular database connection pools, such as HikariCP, Druid, etc. Separate connection pools can be set for each physical database.
If you want to use Druid as a connection pool, you can configure it as follows:
In my example, the hikari connection pool is used, and I will not explain it in detail here. The following is an example of using Druid.

Use druid connection pool

spring.shardingsphere.datasource.names=ds0,ds1

spring.shardingsphere.datasource.ds0.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.ds0.driver-class-name=com.mysql.cj.jdbc.Driver
spring.shardingsphere.datasource.ds0.url=jdbc:mysql://localhost:3306/ds0
spring.shardingsphere.datasource.ds0.username=root
spring.shardingsphere.datasource.ds0.password=123456

spring.shardingsphere.datasource.ds1.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.ds1.driver-class-name=com.mysql.cj.jdbc.Driver
spring.shardingsphere.datasource.ds1.url=jdbc:mysql://localhost:3306/ds1
spring.shardingsphere.datasource.ds1.username=root
spring.shardingsphere.datasource.ds1.password=123456

The starter for Druid ( ) of Spring Boot is used in the project spring-boot-starter-data-druid, and some properties of Druid can also be configured additionally, such as the initial number of connections, the minimum number of idle connections, the maximum number of active connections, etc. These configurations can be spring.shardingsphere.datasource.ds0.druid.*done via .

6. SQL execution engine

Sharding-JDBC parses, rewrites, and executes SQL statements, and can support most of the SQL syntax, including support for complex SQL such as join table queries, subqueries, and paging queries. We have already explained this in detail in data routing, so we won’t go into details here. You only need to know that the core part of Sharding-jdbc is the SQL execution engine and its main functions.

7. Reference documents

This article refers to the following documents

  1. Sharding-JDBC GitHub:https://github.com/apache/shardingsphere

  2. Sharding-JDBC official documents include the installation, configuration and usage of Sharding-JDBC, as well as some advanced topics, such as sharding strategy, SQL support, etc.

  3. The Sharding-JDBC API documentation is a very useful reference resource for developers.

  4. Sharding-JDBC example project: https://github.com/apache/shardingsphere-example

insert image description here

Guess you like

Origin blog.csdn.net/wangshuai6707/article/details/132695204