Database sub-table practice

table of Contents

1. Background

2、DB proxy与JDBC proxy

3. JDBC proxy solution selection

sharding-jdbc

Mybatis

4. Data migration


1. Background

Since the data volume of a single table of the pg library has exceeded tens of millions, the largest has reached the level of 100 million. The performance of the pg single table has dropped sharply due to the excessive amount of data. The main logic behind this is that the amount of data exceeds a certain size, B+Tree The height of the index will increase, and every time you increase the height of a layer, the entire index scan will have one more IO. Therefore, in order to improve performance, this part of the table needs to be divided into tables, and some tables will be split according to business functions in the future. Complete the sub-library operation.

2、DB proxy与JDBC proxy

Since the table needs to be divided, the data distribution and routing need to be processed, and it is divided into three layers from bottom to top, namely the DB layer, the middle layer, and the application layer. Most of the solutions are implemented in the middle layer, and the middle layer is divided into DB proxy and JDBC proxy according to whether it is DB-oriented or application-oriented.

DB proxy : Taking mycat as an example, a middleware service needs to be deployed and maintained, and then the application layer only needs to pay attention to the business code, and the reading and writing and sharding of the database are completely handled by mycat

  • Advantages: middleware is responsible for cluster management, the changes of nodes in the cluster do not need to be notified to each client; it is convenient to realize global unique ID and transaction management; metadata is centrally managed, and the sharding strategy can be flexibly customized
  • Disadvantages: The entire link is too long, and each layer will increase the response time; middleware is often a single point, and high availability needs to be achieved by other means

Insert picture description here

JDBC proxy : Take sharding-jdbc as an example. You only need to introduce the jar package in the application layer, encapsulate jdbc, and use it to read, write, and slice the database.

  • Advantages: low performance loss; the client status of each application is the same, providing high availability
  • Disadvantages: language limitation; cumbersome access; global primary key distribution, cluster changes, transaction management, etc. require inter-node communication

Insert picture description here

You can probably see that DB proxy is still relatively heavy, based on consideration, we still decided to use JDBC proxy.

3. JDBC proxy solution selection

The alternate choices are sharding-jdbc and mybatis. Since we only need to perform table sharding operations for some tables, sharding-jdbc is global, which may be less controllable

Our main purpose here is to split the user_test table into user_test_0 and user_test_1, and create these two tables:

create table user_test_0
(
	id serial not null
		constraint user_0_pk
			primary key,
	name varchar,
	tenant_id varchar
);

create table user_test_1
(
	id serial not null
		constraint user_1_pk
			primary key,
	name varchar,
	tenant_id varchar
);

insert into user_test_0(name, tenant_id) values ('王五', 'alibaba');
insert into user_test_0(name, tenant_id) values ('赵六', 'alibaba');
insert into user_test_0(name, tenant_id) values ('张三', 'baidu');
insert into user_test_1(name, tenant_id) values ('李四', 'jd');

In order to directly create lazy data, the following two solutions are demonstrated:

sharding-jdbc

Sharding-jdbc is an open source client agent middleware of Dangdang. It includes library sharding and read-write separation functions. It is not intrusive to the application code and has almost no changes. It is compatible with mainstream orm frameworks and mainstream database connection pools. ShardingSphere is currently an incubator project of Apache.

Official document address: http://shardingsphere.apache.org/document/legacy/2.x/cn/00-overview/

github address: https://github.com/apache/shardingsphere

Code:

The advantage of sharding-jdbc is that it is not intrusive to the code. Basically, we don’t need to touch our original code, just change the configuration of the related database connection to the sharding configuration.

The original configuration when the table is not divided:

spring:
  datasource:
    url: "jdbc:postgresql://xxx:5432/xx?currentSchema=xx"
    username: xx
    password: xx
    driver-class-name: org.postgresql.Driver

Configuration after using sharding:

spring:
  shardingsphere:
    datasource:
      # 数据源名称,多数据源以逗号分隔
      names: maycur-pro
      maycur-pro:
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: org.postgresql.Driver
        jdbc-url: jdbc:postgresql://192.168.95.143:5432/maycur-pro?currentSchema=team4
        username: team4
        password: maycur
    sharding:
      tables:
        # 表名
        user_test:
          # inline表达式,${begin..end} 表示范围区间
          actual-data-nodes: maycur-pro.user_test_$->{0..1}
          # 分表配置,根据tenantId分表
          table-strategy:
            standard:
              precise-algorithm-class-name: com.database.subtable.segment.MyPreciseShardingAlgorithm
              sharding-column: tenant_id
#            inline:
#              sharding-column: tenant_id
#              # 分表表达式采用groovy语法
#              algorithm-expression: user_test_$->{tenant_id % 2}

#          # 配置字段的生成策略,column为字段名,type为生成策略,sharding默认提供SNOWFLAKE和UUID两种,可以自己实现其他策略
#          key-generator:
#            column: tenantId
#            type: SNOWFLAKE
    # 属性配置(可选)
    props:
      # 是否开启sql显示,默认false
      sql:
        show: true

The above table splitting algorithm does not use inline expressions, but a custom algorithm implementation class MyPreciseShardingAlgorithm, using the mainstream table splitting algorithm (hash+mod), mainly based on the hash value of the subtable field and then the number of subtables Take the modulus to get the specific sub-table serial number, and correspond each request to user_test_0 or user_test_1. The code is as follows:

public class MyPreciseShardingAlgorithm implements PreciseShardingAlgorithm<String> {

    @Override
    public String doSharding(Collection<String> availableTargetNames, PreciseShardingValue<String> shardingValue) {
        for (String tableName : availableTargetNames) {
            if (tableName.endsWith(Math.abs(shardingValue.hashCode() % 2) + "")) {
                return tableName;
            }
        }
        throw new IllegalArgumentException();
    }
}

When we start to query, sharding will automatically split for us, of course, all sql will be like this, and our most primitive needs may only be for certain tables, which may cause some hidden dangers in large projects, and sharding is complicated for some SQL is also a bit unsupported and incompatible, so it may not be suitable. The query SQL is as follows:

Mybatis

Mybatis implements the sub-table operation by supporting plug-ins. The more popular one is the interceptor, which supports the four levels of ParameterHandler/StatementHandler/Executor/ResultSetHandler for interception, which is briefly summarized as:

  • Intercept parameter processing (ParameterHandler)
  • Intercept the processing of Sql syntax construction (StatementHandler)
  • Method of intercepting executor (Executor)
  • Intercept the processing of the result set (ResultSetHandler)

For example, sql rewrite, which belongs to the StatementHandler stage, is actually the process of replacing the original table name with the table name of the sub-table for the sub-table

Code:

Since the interceptor of mybatis is global, it is necessary to introduce specific annotations to distinguish between target/non-target objects (database tables), first define the table strategy interface and specific implementation classes:

public interface ShardTableStrategy {

    /**
     * 分表算法
     * @param statementHandler
     * @return 替换后的表名
     */
    String shardAlgorithm(StatementHandler statementHandler);
}




public class UserStrategy implements ShardTableStrategy{

    /**
     * 原始表名
     */
    private final static String USER_ORIGIN_TABLE_NAME = "user_test";
    /**
     * 下划线
     */
    private final static String TABLE_LINE = "_";
    /**
     * 分表数量
     */
    public final static Integer USER_TABLE_NUM = 2;
    /**
     * 分表字段
     */
    private final static String USER_TABLE_SUB_FIELD = "tenantId";

    @Override
    public String shardAlgorithm(StatementHandler statementHandler) {
        // 可以增加前置判断是否需要分表
        BoundSql boundSql = statementHandler.getBoundSql();
        Object parameterObject = boundSql.getParameterObject();
        // 参数值
        Map param2ValeMap = JSONObject.parseObject(JSON.toJSONString(parameterObject), Map.class);

        Object subFieldValue = param2ValeMap.get(USER_TABLE_SUB_FIELD);
        if (param2ValeMap.size() == 0 || subFieldValue == null) {
            throw new RuntimeException("User is subTable so must have subFiledValue!");
        }

        return USER_ORIGIN_TABLE_NAME + TABLE_LINE + Math.abs(subFieldValue.hashCode() % USER_TABLE_NUM);
    }
}

Definition annotation (defined here as an array because some mapper files may have multiple different tables that need to be divided into tables):

@Target(ElementType.TYPE)
@Retention(RetentionPolicy.RUNTIME)
public @interface SegmentTable {

    /**
     * 表名
     */
    String[] tableName();


    /**
     * 算法策略
     */
    Class<? extends ShardTableStrategy>[] strategyClazz();
}

Write a specific interceptor:

@Intercepts(@Signature(type = StatementHandler.class,method = "prepare",args = {Connection.class,Integer.class}))
public class ShardTableInterceptor implements Interceptor {

    private final static Logger logger = LoggerFactory.getLogger(ShardTableInterceptor.class);

    private final static String BOUND_SQL_NAME = "delegate.boundSql.sql";

    private final static String MAPPED_STATEMENT_NAME = "delegate.mappedStatement";

    @Override
    public Object intercept(Invocation invocation) throws Throwable {
        StatementHandler statementHandler = (StatementHandler) invocation.getTarget();
        // 全局操作读对象
        MetaObject metaObject = MetaObject.forObject(statementHandler, SystemMetaObject.DEFAULT_OBJECT_FACTORY, SystemMetaObject.DEFAULT_OBJECT_WRAPPER_FACTORY, new DefaultReflectorFactory());
        // @SegmentTable
        SegmentTable segmentTable = getSegmentTable(metaObject);
        if (segmentTable == null) {
            return invocation.proceed();
        }
        // 校验注解:表名与算法必须一致
        Class[] classes = segmentTable.strategyClazz();
        String[] tableNames = segmentTable.tableName();
        if(classes.length != tableNames.length){
            throw new RuntimeException("SegmentTable annotation's subTable tableNames and classes must same length!");
        }

        // 获取表名与算法的映射
        Map<String, Class> tableName2StrategyClazzMap = buildTableName2StrategyClazzMap(classes,tableNames);
        // 处理sql
        String sql = handleSql(statementHandler, metaObject, tableName2StrategyClazzMap);
        // 替换sql
        metaObject.setValue(BOUND_SQL_NAME, sql);
        return invocation.proceed();
    }

    @Override
    public Object plugin(Object target) {
        // 当目标类是StatementHandler类型时,才包装目标类,否者直接返回目标本身, 减少目标被代理的次数
        if (target instanceof StatementHandler) {
            return Plugin.wrap(target, this);
        } else {
            return target;
        }
    }

    @Override
    public void setProperties(Properties properties) {

    }

    private SegmentTable getSegmentTable(MetaObject metaObject) throws ClassNotFoundException {
        MappedStatement mappedStatement = (MappedStatement) metaObject.getValue(MAPPED_STATEMENT_NAME);
        // 在命名空间中唯一的标识符
        String id = mappedStatement.getId();
        id = id.substring(0, id.lastIndexOf("."));
        Class cls = Class.forName(id);
        SegmentTable segmentTable = (SegmentTable) cls.getAnnotation(SegmentTable.class);
        logger.info("ShardTableInterceptor  getSegmentTable SegmentTable={}", JSON.toJSONString(segmentTable));
        return segmentTable;
    }

    private Map<String, Class> buildTableName2StrategyClazzMap(Class[] classes, String[] tableNames) {
        Map<String, Class> tableName2StrategyClazzMap = new HashMap<>();
        for (int i = 0; i < classes.length; i++) {
            tableName2StrategyClazzMap.put(tableNames[i], classes[i]);
        }

        logger.info("ShardTableInterceptor  buildTableName2StrategyClazzMap tableName2StrategyClazzMap={}", JSON.toJSONString(tableName2StrategyClazzMap));
        return tableName2StrategyClazzMap;
    }

    private String handleSql(StatementHandler statementHandler, MetaObject metaObject, Map<String, Class> tableName2StrategyClazzMap) throws InstantiationException, IllegalAccessException {
        String sql = (String) metaObject.getValue(BOUND_SQL_NAME);
        logger.info("ShardTableInterceptor  original sql={}", sql);
        for (Map.Entry<String, Class> entry : tableName2StrategyClazzMap.entrySet()) {
            String tableName = entry.getKey();
            Class strategyClazz = entry.getValue();
            // 没有分表名就不走算法
            if (!sql.contains(tableName)) {
                continue;
            }
            // 1.对value进行算法 -> 确定表名
            ShardTableStrategy strategy = (ShardTableStrategy) strategyClazz.newInstance();
            String replaceTableName = strategy.shardAlgorithm(statementHandler);
            // 2.替换分表表名
            sql = sql.replaceAll(tableName, replaceTableName);
            logger.info("ShardTableInterceptor handleSql sql={},tableName = {},replaceTableName={}", sql, tableName, replaceTableName);
        }
        return sql;
    }
}

Finally, configure in the mybatis configuration file (mybatis-config.xml) and add comments to the mapper file:

<plugins>
        <plugin interceptor="com.database.subtable.segment.ShardTableInterceptor"/>
    </plugins>

@SegmentTable(tableName = {"user_test"}, strategyClazz = {UserStrategy.class})
public interface UserMapper {

    List<User> listUsers(@Param("tenantId") String tenantId);

}

Click on the query, according to the printed log, you can see that the sub-table operation has been implemented

4. Data migration

Determining the table splitting rules is actually only the first step of splitting tables. The troublesome thing is the data migration, or how to do the data migration with the least impact on the business. Since we rely on Alibaba Cloud, we chose to migrate data through DataWorks when we went online late at night. The whole process took less than an hour.

Create a custom hash_code (text) function on postgreSql. This function is consistent with the java hashCode() algorithm to ensure that DataWorks data migration is based on the key field hash to lock the correct table

DROP FUNCTION IF EXISTS  hash_code(text);
CREATE FUNCTION hash_code(text) RETURNS integer
    LANGUAGE plpgsql
AS
$$
DECLARE
    i integer := 0;
    DECLARE
    h bigint  := 0;
BEGIN
    FOR i IN 1..length($1)
        LOOP
            h = (h * 31 + ascii(substring($1, i, 1))) & 4294967295;
        END LOOP;
    RETURN cast(cast(h AS bit(32)) AS int4);
END;
$$;


If the application does not allow scenes that cannot be used for a period of time, you can also write all the newly generated data into the sub-table after the sub-table transformation is launched, but the operation on the historical data is still in the old table, and you only need to do it before operating the data A routing judgment, when enough new data is generated (for example, two or three months), almost all operations at this time are for sub-tables, and then start data migration. After the data migration is completed, the original routing judgment can be removed.

Guess you like

Origin blog.csdn.net/m0_38001814/article/details/109427511