[SpringBoot Advanced] SpringBoot integrates Sharding-JDBC sub-database and sub-table

Apache ShardingSphere
Sub-library and sub-table
Sharding-JDBC
Summary of sub-database and sub-table

Apache ShardingSphere

Apache ShardingSphere (Incubator) is an ecosystem composed of an open source distributed database middleware solution. It consists of three independent but capable of mixed deployment Composition of products used together. They all provide standardized data sharding, distributed transactions, and database governance functions, and can be applied to various application scenarios such as Java isomorphism, heterogeneous languages, and cloud native.

ShardingSphere is positioned as a relational database middleware, which aims to fully and reasonably utilize the computing and storage capabilities of relational databases in distributed scenarios, rather than implementing a brand new relational database. It captures the essence of things by focusing on the unchanged. Relational databases still occupy a huge market today and are the cornerstone of the core business of various companies, and they will be difficult to shake in the future. At this stage, we pay more attention to the increase on the original basis rather than subversion. The official Apache release starts with version 4.0.0.

Sub-library and sub-table

The amount of data in the database is not necessarily controllable. Without sub-database sub-tables, with the development of time and business, there will be more and more tables in the database, and the amount of data in the tables will also increase. The larger the value, the correspondingly, the overhead of data operation, addition, deletion, modification and query will also increase; in addition, because distributed deployment cannot be performed, the resources (CPU, disk, memory, IO, etc.) of a server are limited In the end, the data volume and data processing capacity that the database can carry will encounter bottlenecks.

Sub-database sub-table is to solve the problem of database performance degradation due to excessive data volume. The original independent database is split into several databases, and the large data table is split into several data tables, so that a single database and a single data The amount of data in the table becomes smaller, so as to achieve the purpose of improving the performance of the database.

The way of sub-database sub-table

The segmentation of the database refers to the distribution of the data we store in the same database into multiple databases (hosts) through certain specific conditions, so as to achieve the effect of spreading the load of a single device, that is, sub-database sub-table .

Data segmentation can be divided into vertical segmentation and horizontal segmentation according to the type of segmentation rules

Vertical segmentation: Split a single table into multiple tables and distribute them to different databases (hosts)
Horizontal segmentation: According to the logical relationship of the data in the table, the data in the table is split into multiple databases according to certain conditions

vertical split

A database consists of multiple tables, and each table corresponds to a different business. Vertical segmentation is to classify the tables according to the business and distribute them to different databases, so that the data is distributed to different databases (special databases) dedicated)

vertical table

Operate a table in the database, save part of the field data in this table into a new table, and then save the other part of the field data in this table into another table

insert image description here

Vertical sub-library

Divide a single database according to business, special database table

insert image description here

The advantages of vertical splitting are as follows:

After the split, the business is clear, and it is easy to integrate or expand between systems.
Prize tables are placed on different machines according to cost, application level, application type, etc., which is easy to manage and data maintenance is simple.

The disadvantages of vertical splitting are as follows:

Some business tables cannot be associated (Join), and can only be solved through interfaces, which increases the complexity of the system.
Due to the different limitations of each business, there is a single-database performance bottleneck, which makes it difficult to expand data and improve performance.
Transaction processing becomes complex.

split horizontally

Compared with vertical segmentation, horizontal segmentation does not classify tables, but disperses them into multiple libraries according to a certain rule of a certain field. Each table contains a part of data, and all tables add up to a full amount. data.

To put it simply, we can understand the horizontal splitting of data as splitting according to data rows, that is, splitting some rows in a table into one database table, and splitting other rows into other database tables.

Horizontal sub-library

insert image description here

level table

insert image description here

Advantages of splitting horizontally:

The data in a single database and single table is kept at a certain level, which helps to improve performance.
The structure of the split tables is the same, and the application layer requires less modification, and only needs to add routing rules.
Improved system stability and load capacity.

The disadvantages of horizontal splitting are as follows:

After splitting, the data is scattered, it is difficult to use the join operation of the database, and the performance of cross-database join is poor.
The consistency of fragmented transactions is difficult to solve, and the difficulty and maintenance of data expansion are extremely large.

Problems caused by sub-database and sub-table

There is a problem of cross-node Join.
There are problems of cross-node merge sorting and paging.
There is a problem with multiple data source management

Sub-database and sub-table middleware

At present, there are many middleware for sub-database and sub-table in domestic use, mainly including:

Apache ShardingSphere
Mycat

Sharding-JDBC

Sharding-JDBC is an open source distributed database middleware developed by Dangdang. Sharding-JDBC has been included in Sharding-Sphere since version 3.0, and then the project entered the Apache incubator, and versions after version 4.0 are Apache versions. maven coordinates

<dependency>
    <groupId>org.apache.shardingsphere</groupId>
    <artifactId>sharding-jdbc-spring-boot-starter</artifactId>
    <version>4.1.1</version>
</dependency>

Sharding-JDBC is the first product of ShardingSphere and the predecessor of ShardingSphere. It is positioned as a lightweight Java framework that provides additional services on top of Java's JDBC layer. It uses the client to directly connect to the database and provides services in the form of jar packages without additional deployment and dependencies. It can be understood as an enhanced version of the JDBC driver and is fully compatible with JDBC and various ORM frameworks.

insert image description here

The core function of Sharding-JDBC is data sharding and separation of reading and writing. Through Sharding-JDBC, applications can transparently use jdbc to access multiple data sources that have been divided into databases and tables, and separated from reading and writing, regardless of the number of data sources and How the data is distributed.

Applicable to any JDBC-based ORM framework, such as: JPA, Hibernate, Mybatis, Spring JDBC Template or use JDBC directly.
Support any third-party database connection pool, such as: DBCP, C3P0, BoneCP, Druid, HikariCP, etc.
Any database that implements the JDBC specification is supported. Currently supports MySQL, Oracle, SQLServer, PostgreSQL and any database that follows the SQL92 standard.

insert image description here

sharding-jdbc implements horizontal table sharding

db script

db: order_db_1
table: t_order_1、t_order_2

CREATE TABLE `t_order_1` (
  `id` int NOT NULL,
  `order_type` int DEFAULT NULL,
  `customer_id` int DEFAULT NULL,
  `amount` int DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

insert image description here

pom

<dependencies>
   <dependency>
       <groupId>org.springframework.boot</groupId>
       <artifactId>spring-boot-starter-jdbc</artifactId>
   </dependency>

   <dependency>
       <groupId>mysql</groupId>
       <artifactId>mysql-connector-java</artifactId>
   </dependency>
   <dependency>
       <groupId>com.alibaba</groupId>
       <artifactId>druid</artifactId>
       <version>1.1.23</version>
   </dependency>
   <dependency>
       <groupId>org.apache.shardingsphere</groupId>
       <artifactId>sharding-jdbc-spring-boot-starter</artifactId>
       <version>4.1.1</version>
   </dependency>

   <dependency>
       <groupId>org.projectlombok</groupId>
       <artifactId>lombok</artifactId>
   </dependency>

   <dependency>
       <groupId>org.springframework.boot</groupId>
       <artifactId>spring-boot-starter-test</artifactId>
   </dependency>
</dependencies>

<build>
   <plugins>
       <plugin>
           <groupId>org.springframework.boot</groupId>
           <artifactId>spring-boot-maven-plugin</artifactId>
       </plugin>
   </plugins>
</build>

Order

@Data
@AllArgsConstructor
@NoArgsConstructor
public class Order {
    
    
    private Integer id;
    private Integer orderType;
    private Integer customerId;
    private Double amount;
}

application.yml

spring:
  shardingsphere:
    datasource:
      # 配置数据源的名称
      names: ds1
      ds1:
        type: com.alibaba.druid.pool.DruidDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        url: jdbc:mysql://localhost:3306/order_db_1?useUnicode=true&characterEncoding=utf-8&serverTimezone=UTC
        username: root
        password: root
    # 打开sql输出日志
    props:
      sql:
        show: true
    sharding:
      tables:
        # 逻辑表,SQL语句中写对应的逻辑表
        # 如：insert into t_order(id,order_type,customer_id,amount) values(?,?,?,?)
        # 虽然写的是添加数据到t_order表，实际会根据分表策略路由到t_order_1或t_order_2
        t_order:
          # 指定t_order表的分布情况，配置表在哪个数据库中，表名称是什么
          actual-data-nodes: ds1.t_order_$->{
    
    1..2}
          # 指定t_order表里主键id生成策略
          key-generator:
            column: id
            type: SNOWFLAKE
          # 指定分片策略。根据id的奇偶性来判断插入到哪个表
          table-strategy:
            inline:
              algorithm-expression: t_order_${
    
    id%2+1}
              sharding-column: id

proficiency test

@SpringBootTest(classes = ShardingJdbcApplication.class)
public class OrderTest {
    
    

    @Autowired
    private JdbcTemplate jdbcTemplate;

    @Test
    public void testSave() {
    
    
        String saveSQL = "insert into t_order(id,order_type,customer_id,amount) values(?,?,?,?)";

        ArrayList<Order> list = new ArrayList<>();
        for (int i = 0; i < 10; i++) {
    
    
            list.add(new Order(i, i, i,1000.0*i));
        }
        jdbcTemplate.batchUpdate(saveSQL, new BatchPreparedStatementSetter() {
    
    
            @Override
            public void setValues(PreparedStatement preparedStatement, int i) throws SQLException {
    
    
                Order order = list.get(i);
                preparedStatement.setInt(1,order.getId());
                preparedStatement.setInt(2,order.getOrderType());
                preparedStatement.setInt(3,order.getCustomerId());
                preparedStatement.setDouble(4,order.getAmount());
            }

            @Override
            public int getBatchSize() {
    
    
                return list.size();
            }
        });
    }

    @Test
    public void getOrder() {
    
    
        String querySQL = "select * from t_order";
        List<Order> orders = jdbcTemplate.query(querySQL, new BeanPropertyRowMapper<>(Order.class));
        orders.forEach(System.out::println);

    }
}

1. Execution result of testSave() method
insert image description here
2. Execution result of getOrder method

sharding-jdbc implements horizontal database sharding

db

insert image description here

application.yml

spring:
  shardingsphere:
    datasource:
      # 配置不同的数据源
      names: ds1,ds2
      #配置ds1数据源的基本信息
      ds1:
        driver-class-name: com.mysql.cj.jdbc.Driver
        type: com.alibaba.druid.pool.DruidDataSource
        url: jdbc:mysql://localhost:3306/order_db_1?useUnicode=true&characterEncoding=utf-8&serverTimezone=UTC
        username: root
        password: root
      #配置ds2数据源的基本信息
      ds2:
        driver-class-name: com.mysql.cj.jdbc.Driver
        type: com.alibaba.druid.pool.DruidDataSource
        url: jdbc:mysql://localhost:3306/order_db_2?useUnicode=true&characterEncoding=utf-8&serverTimezone=UTC
        username: root
        password: root
    #打开sql输出日志
    props:
      sql:
        show: true
    sharding:
      tables:
        # 逻辑表,必须配置正确,如：分表t_order_1,t_order_2，则逻辑表为 t_order
        t_order:
          #指定数据库表的分布情况
          actual-data-nodes: ds$->{
    
    1..2}.t_order_$->{
    
    1..2}
           #指定库分片策略，根据customer_id的奇偶性来添加到不同的库中,customer_id为基数路由到ds2,customer_id为偶数路由到ds1
          database-strategy:
            inline:
              sharding-column: customer_id
              algorithm-expression: ds$->{
    
    customer_id%2+1}
          #指定t_order表的主键生成策略
          key-generator:
            column: id
            type: SNOWFLAKE
          #指定表分片策略，根据id的奇偶性来添加到不同的表中，id为基数路由到ds2,id为偶数路由到ds1
          table-strategy:
            inline:
              sharding-column: id
              algorithm-expression: t_order_$->{
    
    id%2+1}

Routing Test Verification

No routing key specified

/**
* 不指定路由键，扫描全库全表查询汇总，效率低
*/
@Test
public void getOrder() {
    
    
   String querySQL = "select * from t_order";
   List<Order> orders = jdbcTemplate.query(querySQL, new BeanPropertyRowMapper<>(Order.class));
   orders.forEach(System.out::println);
}

insert image description here

Specify library routing key

/**
* 指定库路由键
*/
@Test
public void getOrderByCustomerId() {
    
    
   String querySQL = "select * from t_order where customer_id = ?";
   List<Order> orders = jdbcTemplate.query(querySQL, new BeanPropertyRowMapper<>(Order.class), 2);
   orders.forEach(System.out::println);
}

insert image description here

specify table routing key

/**
* 指定表路由键
*/
@Test
public void getOrderById() {
    
    
   String querySQL = "select * from t_order where id = ?";
   List<Order> orders = jdbcTemplate.query(querySQL, new BeanPropertyRowMapper<>(Order.class), 2);
   orders.forEach(System.out::println);
}

insert image description here

Specify library routing key and table routing key

/**
* 指定库路由键和表路由键 效率最高
*/
@Test
public void getOrderByCustomerIdAndId() {
    
    
   String querySQL = "select * from t_order where customer_id = ? and id = ?";
   List<Order> orders = jdbcTemplate.query(querySQL, new BeanPropertyRowMapper<>(Order.class), 2,2);
   orders.forEach(System.out::println);
}

insert image description here

sharding-jdbc implements vertical database sharding

Create the same library order_db_1 on different data nodes.
I have a local database, and the virtual machine uses docker to start a database

db

docker pull mysql:5.7

docker run -d -p 3306:3306 --privileged=true -e MYSQL_ROOT_PASSWORD=123456 --name mysql mysql:5.7 --character-set-server=utf8mb4 --collation-server=utf8mb4_general_ci

insert image description here

application.yml

spring:
  shardingsphere:
    datasource:
      # 配置不同的数据源
      names: ds1,ds2
      #配置ds1数据源的基本信息
      ds1:
        driver-class-name: com.mysql.cj.jdbc.Driver
        type: com.alibaba.druid.pool.DruidDataSource
        url: jdbc:mysql://localhost:3306/order_db_1?useUnicode=true&characterEncoding=utf-8&serverTimezone=UTC
        username: root
        password: root
      #配置ds2数据源的基本信息
      ds2:
        driver-class-name: com.mysql.cj.jdbc.Driver
        type: com.alibaba.druid.pool.DruidDataSource
        url: jdbc:mysql://192.168.56.10:3306/order_db_1?useUnicode=true&characterEncoding=utf-8&serverTimezone=UTC
        username: root
        password: 123456
    #打开sql输出日志
    props:
      sql:
        show: true
    sharding:
      tables:
        # 逻辑表,必须配置正确
        customer:
          #配置customer表所在的数据节点
          actual-data-nodes: ds2.customer
          #指定orders表的主键生成策略
          key-generator:
            column: id
            type: SNOWFLAKE
          #指定表分片策略
          table-strategy:
            inline:
              sharding-column: id
              algorithm-expression: customer
        # 逻辑表,SQL语句中写对应的逻辑表，必须配置正确，否则报表不存在异常
        # 如：insert into t_order(id,order_type,customer_id,amount) values(?,?,?,?)
        # 虽然写的是添加数据到t_order表，实际会根据分表策略路由到t_order_1或t_order_2
        t_order:
          #实际t_order表所在的数据库表
          actual-data-nodes: ds1.t_order_1

Test the vertical sharding strategy

@Data
@AllArgsConstructor
@NoArgsConstructor
public class Customer {
    
    
    private Integer id;
    private String name;
}

/**
 * 测试垂直分库
 */
@Test
public void testSaveCustomer() {
    
    
    String saveCustomerSQL = "insert into customer(id,name) values(?,?)";
    String saveOrderSQL = "insert into t_order(id,order_type,customer_id,amount) values(?,?,?,?)";

    Customer customer = new Customer(10001,"李星云");

    ArrayList<Order> list = new ArrayList<>();
    for (int i = 0; i < 10; i++) {
    
    
        list.add(new Order(i, i, customer.getId(),1000.0*i));
    }
    jdbcTemplate.batchUpdate(saveOrderSQL, new BatchPreparedStatementSetter() {
    
    
        @Override
        public void setValues(PreparedStatement preparedStatement, int i) throws SQLException {
    
    
            Order order = list.get(i);
            preparedStatement.setInt(1,order.getId());
            preparedStatement.setInt(2,order.getOrderType());
            preparedStatement.setInt(3,order.getCustomerId());
            preparedStatement.setDouble(4,order.getAmount());
        }

        @Override
        public int getBatchSize() {
    
    
            return list.size();
        }
    });
    jdbcTemplate.update(saveCustomerSQL, preparedStatement -> {
    
    
        preparedStatement.setInt(1,customer.getId());
        preparedStatement.setString(2, customer.getName());
    });
}

insert image description here

Summary of sub-database and sub-table

1. When using sharding-jdbc for sub-database and sub-table, it is most efficient to specify the corresponding library routing key and table routing key according to the rules, so the routing key can be set as the index of the corresponding table to improve query efficiency. If it is really impossible to set two routing keys during development, try to set one to avoid scanning the entire database and tables. low efficiency

2. Regarding the configuration of spring.shardingsphere.sharding.tables.(logical table).actual-data-nodes=ds1.t_order_1, the
insert image description here
correct logical table routing configuration should be:

[SpringBoot Advanced] SpringBoot integrates Sharding-JDBC sub-database and sub-table

[SpringBoot Advanced] SpringBoot integrates Sharding-JDBC sub-database and sub-table

Apache ShardingSphere

Sub-library and sub-table

The way of sub-database sub-table

vertical split

vertical table

Vertical sub-library

split horizontally

Horizontal sub-library

level table

Problems caused by sub-database and sub-table

Sub-database and sub-table middleware

Sharding-JDBC

sharding-jdbc implements horizontal table sharding

db script

pom

Order

application.yml

proficiency test

sharding-jdbc implements horizontal database sharding

db

application.yml

Routing Test Verification

No routing key specified

Specify library routing key

specify table routing key

Specify library routing key and table routing key

sharding-jdbc implements vertical database sharding

db

application.yml

Test the vertical sharding strategy

Summary of sub-database and sub-table

Guess you like