Application of sub-database table in e-commerce system

background

Why do we need to sub-library and sub-table?

In the e-commerce system , when the data of a table reaches tens of millions, the time taken for one query will become longer. At this time, if there is a joint query, it may get stuck there, or even bring down the system. The purpose of sub-database sub-table is to reduce the burden on the database, improve the efficiency of the database, and shorten the query time. After weighing the pros and cons of multiple frameworks, we finally chose to use Sharding-JDBC for database and table sharding.

Introduction to Sharding-JDBC

Sharding-jdbc is positioned as a lightweight Java framework, providing additional services in Java's JDBC layer. It uses the client to directly connect to the database and provides services in the form of jar packages without additional deployment and dependencies. It can be understood as an enhanced version of the JDBC driver, fully compatible with JDBC and various ORM frameworks.

Applicable to any JDBC-based ORM framework, such as: JPA, Hibernate, Mybatis, Spring JDBC Template or using JDBC directly.

Support any third-party database connection pool, such as: DBCP, C3P0, BoneCP, Druid, HikariCP, etc.

Any database that implements the JDBC specification is supported. Currently supports MySQL, Oracle, SQLServer, PostgreSQL and any data that follows the SQL92 standard.

From the architecture diagram of the official website, we can see that we only need to introduce Sharding-JDBC into the project.

The use of Sharding-JDBC

Next, take the b2b2c e-commerce system Javashop as an example to illustrate the application of sharding jdbc:

1. Preparation of sub-library and sub-table

Before modifying the configuration file, we should think about our sharding strategy, including:

1. Use several databases to shard

2. The corresponding table should be divided into several tables

In this sub-database sub-table, taking the es_order table as an example, the original database javashop is split into two databases: javashop0 and javashop1. Split the original es_order table into two tables, es_order0 and es_order1.

The sharding fields in the table are as follows:

CREATE TABLE `es_order0` (  `order_id` bigint(20) NOT NULL COMMENT '主键ID',  `trade_sn` varchar(20) DEFAULT NULL COMMENT '交易编号',  其他字段略)

CREATE TABLE `es_order1` (  `order_id` bigint(20) NOT NULL COMMENT '主键ID',  `trade_sn` varchar(20) DEFAULT NULL COMMENT '交易编号',  其他字段略)

Second, the introduction of maven dependencies

<dependency>
    <groupId>org.apache.shardingsphere</groupId>
    <artifactId>sharding-jdbc-spring-boot-starter</artifactId>
    <version>4.1.0</version>
</dependency>

3. Configure the sharding strategy

Define the database sharding strategy application.yaml:

spring:
  profiles:
  include: order
  #分库分表配置
  shardingsphere:
   #配置sql是否显示输出，调试用，生产环境需关闭
    props:
      sql:
        show: true
    sharding:
     #定义默认数据源为：ds0
      default-data-source-name: ds0
    #定义分库的数据源        
    datasource:
     #这里配置所有数据源的名字
      names: ds0,ds1
      ds0:
        type: com.alibaba.druid.pool.DruidDataSource
        driver-class-name: com.mysql.jdbc.Driver
        url: jdbc:mysql://ip:3306/default_database?useUnicode=true&characterEncoding=utf8&autoReconnect=true
        username: root
        password: 123456
       ds1:
        type: com.alibaba.druid.pool.DruidDataSource
        driver-class-name: com.mysql.jdbc.Driver
        url: jdbc:mysql://192.168.2.110:3306/javashop02?useUnicode=true&characterEncoding=utf8&autoReconnect=true
        username: root
        password: 123456

Define the table sharding strategy and modify application-order.yml:

spring:
  #分库分表配置
  shardingsphere:
    sharding:
      tables:
        #交易表(用trade sn 分库，用trade_id分表) 
        es_trade:
          actual-data-nodes: ds$->{0..1}.es_trade$->{0..1}
          database-strategy:
            inline:
              sharding-column: trade_sn
              algorithm-expression: ds$->{ new Long(trade_sn) % 2}
          table-strategy:
            inline:
              sharding-column: trade_id
              algorithm-expression: es_trade$->{trade_id % 2}

Through the above methods, we have divided products, members, and orders into separate warehouses and tables.

Problems and Solutions

If you think it's that easy to do, you're wrong. The problems we encountered are as follows:

Primary key auto increment problem

Because when you use functions such as sub-database and sub-table, you can no longer rely on the primary key generation mechanism that comes with the database. On the one hand, the primary key ID cannot be repeated. In addition, you need to know the primary key ID before adding it to ensure that the ID can be evenly distributed to different The database or data table, so use a reasonable primary key generation strategy.

Our solution is to use the snowflake sender to generate the primary key when doing the insert statement. Snowflake is a common id (number) generation algorithm, which is composed of timestamp + business id + machine id + serial number. In the e-commerce system, it is used for order number generation, payment order number generation, and so on. This issuer mainly solves the problem of maintaining the uniqueness of the machine id during automatic expansion in the case of containerized deployment. For a detailed description of snowflake, please refer to the article " java mall system source code sharing-snowflake issuer ". The insert method of unified encapsulation is as follows:

public void insert(String table, Object po) {
   Long id = snCreator.create(getSubCode(table));
   //设置刚刚主键值到 thread local
   lastIdThreadLocal.set(id);
   Map poMap = new HashedMap();
   ColumnMeta columnMeta = ReflectionUtil.getColumnMeta(po);
   poMap.put(columnMeta.getPrimaryKeyName(), id);
   //这里就是将插入的时候使用雪花发号器作为主键，您们可以根据自己的业务进行修改。
   如下略：
   }

Type issues with snowflake transmitters

Because the type of the snowflake sender is Long type, the id type used by the database before sub-database and sub-table is int (11), so we need to change all the id types involved in sub-database and sub-table to bigint (20). Of course, there are also entity classes in java code that receive data.

The problem of precision loss caused by using snowflake horn

Because the maximum precision of js supports numbers is less than 16 digits, then the number of digits of our snowflake transmitter is 17. It's embarrassing. Later, we uniformly converted the data of Long type to String type. code show as below:

@Configuration
public class JacksonConfig {
@Bean
public Jackson2ObjectMapperBuilderCustomizer customJackson() {
return new Jackson2ObjectMapperBuilderCustomizer() {
    @Override
public void customize(Jackson2ObjectMapperBuilder jacksonObjectMapperBuilder) {
//定义Long型的Json 值转换，转换为String型
//为了适配javascript 不支持Long型的精度
               jacksonObjectMapperBuilder.serializerByType(Long.class,new JsonSerializer(){
@Override
public void serialize(Object value, JsonGenerator gen, SerializerProvider serializers) throws IOException {
    gen.writeString(String.valueOf(value));
                    }
                });
            }

        };
    }

}

The problem of using the same field for sub-database and sub-table

The same fields are used for sub-database and sub-table, that is, we use the primary key id to sub-database, and then use the primary key id to sub-table. If I insert 100 pieces of order data, there are 50 pieces of data with odd ids in the es_order0 table in the javashop0 database, and there are 0 pieces of data in the es_order1 table. There are 0 pieces of data in the es_order0 table in the javashop1 database and 50 pieces of data in the es_order1 table. This is clearly not what we want to see. So when we divide the database, we use the form of taking the modulo of 2 to divide the database. When we divide the table, we move the id to the right by 2 bits to take the modulo of 2 here. Then the result is what we want. The specific configuration is as follows:

spring:
  #分库分表配置
  shardingsphere:
    sharding:
      tables:
        #商品表(用member_id分库，用member_id分表)
        es_member:
          actual-data-nodes: ds$->{0..1}.es_member$->{0..1}
          database-strategy:
            inline:
              sharding-column: member_id
              algorithm-expression: ds$->{member_id% 2}
          table-strategy:
            inline:
              sharding-column: member_id
              #因为都是使用同一个字段进行分库分表的，
              #所以要右移两位之后取模以保证每个表中的数据平均
              algorithm-expression: es_member$->{(member_id>>2)% 2}

Sharding-jdbc does not support sql official website provides unsupported sql as follows:

sql statement	reason
INSERT INTO tbl_name (col1, col2, …) VALUES(1+2, ?, …)	VALUES statement does not support arithmetic expressions
INSERT INTO tbl_name (col1, col2, …) SELECT col1, col2, … FROM tbl_name WHERE col3 = ?	INSERT .. SELECT
SELECT COUNT(col1) as count_alias FROM tbl_name GROUP BY col1 HAVING count_alias > ?	HAVING
SELECT * FROM tbl_name1 UNION SELECT * FROM tbl_name2	UNION
SELECT * FROM tbl_name1 UNION ALL SELECT * FROM tbl_name2	UNION ALL
SELECT * FROM ds.tbl_name1	contains schema
SELECT SUM(DISTINCT col1), SUM(col1) FROM tbl_name	See DISTINCT support details for details
SELECT * FROM tbl_name WHERE to_date(create_time, ‘yyyy-mm-dd’) = ?	will lead to full routing
SELECT SUM(DISTINCT col1), SUM(col1) FROM tbl_name	Use both normal aggregate functions and DISTINCT aggregate functions

For this problem, our solution is to optimize our sql, and use other logic to implement these unsupported sql statements, which will not be explained one by one here. Original article by javashop