Parse the SQL execution engine of Sharding-Sphere

I. Introduction

Sharding-JDBC is an excellent framework for sharding and sharding tables. Starting from 3.0, Sharding-JDBC was renamed Sharding-Sphere. When Sharding-JDBC 2 was used before, for the same database and sharding tables, the sql execution was serial. Because only one connection from the same data source is obtained, and synchronized is added to the connection, the entire execution process is completely serial for the same database and sub-table. Finally, in order to be able to parallelize the sub-tables in the same library, multiple connection pools have to be configured for the same library. Sharding-Sphere 3.0 optimizes the execution engine and introduces memory limit mode and connection limit mode to dynamically control the degree of parallelism.

This blog mainly analyzes the following two issues:

1. How the memory limit mode and connection limit mode control the serial and parallel of the same data source

2. The elegant design of the execution engine

2. The difference between the two modes of Sharding-Sphere

Memory limit mode: For the same data source, if there are 10 sub-tables, then 10 connections will be obtained in parallel during execution

Connection limit mode: For the same data source, if there are 10 sub-tables, only 1 connection serial will be obtained during execution

The algorithm for controlling the connection mode is as follows:

For more design details, please read the Sharding-Sphere official website: http://shardingsphere.io/document/current/cn/features/sharding/principle/execute/

3. Review of jdbc knowledge points

For a huge sub-database sub-table framework, which entrance should we look into? For the sub-database sub-table framework based on the JDBC specification, we only need to understand the execution process of jdbc to know the context of this huge framework. Let's review the execution process of jdbc.

1. Load the driver: Class.forName()

2. Get the connection connection

3. Create Statement or PreparedStatement by connection

4. Use Statement or PreparedStatement to execute SQL to get the result set

5. Close the resource and the process ends

Then to understand the SQL execution process of Sharding-Sphere, it is enough to look at it from Statement or PreparedStatement.

Four, source code analysis

From PreparedStatement as the entrance, looking in, there are mainly the following 5 classes

1. ShardingPreparedStatement implements the PreparedStatement interface

2. PreparedStatementExecutor inherits from AbstractStatementExecutor and is the executor of SQL

3. SQLExecutePrepareTemplate is used to obtain the fragmented execution unit and determine the connection mode (memory limit mode and connection limit mode)

4. ShardingExecuteEngine is an execution engine that provides a multi-threaded execution environment. In essence, ShardingExecuteEngine does not do any business-related things, but only provides a multi-threaded execution environment to execute incoming callback functions (very clever design)

The relationship between the classes is as follows, at a glance:

Next, let's look at the executeQuery method of ShardingPreparedStatement. The code is as follows:

  @Override
    public ResultSet executeQuery() throws SQLException {
        ResultSet result;
        try {
            clearPrevious();
            sqlRoute();
            initPreparedStatementExecutor();
            MergeEngine mergeEngine = MergeEngineFactory.newInstance(connection.getShardingContext().getShardingRule(), 
                    preparedStatementExecutor.executeQuery(), routeResult.getSqlStatement(), connection.getShardingContext().getMetaData().getTable());
            result = new ShardingResultSet(preparedStatementExecutor.getResultSets(), mergeEngine.merge(), this);
        } finally {
            clearBatch();
        }
        currentResultSet = result;
        return result;
    }

Among them, initPreparedStatementExecutor is used to initialize preparedStatementExecutor. The initialization does the following operations, and obtains the statement execution unit according to the routing unit

 public void init(final SQLRouteResult routeResult) throws SQLException {
        setSqlType(routeResult.getSqlStatement().getType());
        getExecuteGroups().addAll(obtainExecuteGroups(routeResult.getRouteUnits()));
        cacheStatements();
    }
    
    private Collection<ShardingExecuteGroup<StatementExecuteUnit>> obtainExecuteGroups(final Collection<RouteUnit> routeUnits) throws SQLException {
        return getSqlExecutePrepareTemplate().getExecuteUnitGroups(routeUnits, new SQLExecutePrepareCallback() {
            
            @Override
            public List<Connection> getConnections(final ConnectionMode connectionMode, final String dataSourceName, final int connectionSize) throws SQLException {
                return PreparedStatementExecutor.super.getConnection().getConnections(connectionMode, dataSourceName, connectionSize);
            }
            
            @Override
            public StatementExecuteUnit createStatementExecuteUnit(final Connection connection, final RouteUnit routeUnit, final ConnectionMode connectionMode) throws SQLException {
                return new StatementExecuteUnit(routeUnit, createPreparedStatement(connection, routeUnit.getSqlUnit().getSql()), connectionMode);
            }
        });
    }

So how to determine the connection mode when obtaining the statement execution unit? getSqlExecutePrepareTemplate().getExecuteUnitGroups Click to see, what does SQLExecutePrepareTemplate do?

   private List<ShardingExecuteGroup<StatementExecuteUnit>> getSQLExecuteGroups(
            final String dataSourceName, final List<SQLUnit> sqlUnits, final SQLExecutePrepareCallback callback) throws SQLException {
        List<ShardingExecuteGroup<StatementExecuteUnit>> result = new LinkedList<>();
        int desiredPartitionSize = Math.max(sqlUnits.size() / maxConnectionsSizePerQuery, 1);
        List<List<SQLUnit>> sqlUnitGroups = Lists.partition(sqlUnits, desiredPartitionSize);
        ConnectionMode connectionMode = maxConnectionsSizePerQuery < sqlUnits.size() ? ConnectionMode.CONNECTION_STRICTLY : ConnectionMode.MEMORY_STRICTLY;
        List<Connection> connections = callback.getConnections(connectionMode, dataSourceName, sqlUnitGroups.size());
        int count = 0;
        for (List<SQLUnit> each : sqlUnitGroups) {
            result.add(getSQLExecuteGroup(connectionMode, connections.get(count++), dataSourceName, each, callback));
        }
        return result;
    }

The above code is the formula at the beginning of the article. The connection mode is controlled by maxConnectionsSizePerQuery. When maxConnectionsSizePerQuery is smaller than the execution unit of this data source, the connection limit mode is selected. Otherwise, the memory limit mode is selected.

When the preparedStatementExecutor is initialized, it can be queried

  public List<QueryResult> executeQuery() throws SQLException {
        final boolean isExceptionThrown = ExecutorExceptionHandler.isExceptionThrown();
        SQLExecuteCallback<QueryResult> executeCallback = new SQLExecuteCallback<QueryResult>(getDatabaseType(), getSqlType(), isExceptionThrown) {
            
            @Override
            protected QueryResult executeSQL(final StatementExecuteUnit statementExecuteUnit) throws SQLException {
                return getQueryResult(statementExecuteUnit);
            }
        };
        return executeCallback(executeCallback);
    }

Here, callback is a very clever design, executeSQL is the sql that needs to be executed, here it can be flexibly implemented as needed, such as select, update, etc., and executeCallback (executeCallback) is the real executor, executeCallback calls sqlExecuteTemplate executeGroup, pass the execution group into the ShardingExecuteEngine execution engine.

 @SuppressWarnings("unchecked")
    protected final <T> List<T> executeCallback(final SQLExecuteCallback<T> executeCallback) throws SQLException {
        return sqlExecuteTemplate.executeGroup((Collection) executeGroups, executeCallback);
    }

public final class SQLExecuteTemplate {
    
    private final ShardingExecuteEngine executeEngine;
    
    /**
     * Execute group.
     *
     * @param sqlExecuteGroups SQL execute groups
     * @param callback SQL execute callback
     * @param <T> class type of return value
     * @return execute result
     * @throws SQLException SQL exception
     */
    public <T> List<T> executeGroup(final Collection<ShardingExecuteGroup<? extends StatementExecuteUnit>> sqlExecuteGroups, final SQLExecuteCallback<T> callback) throws SQLException {
        return executeGroup(sqlExecuteGroups, null, callback);
    }
    
    /**
     * Execute group.
     *
     * @param sqlExecuteGroups SQL execute groups
     * @param firstCallback first SQL execute callback
     * @param callback SQL execute callback
     * @param <T> class type of return value
     * @return execute result
     * @throws SQLException SQL exception
     */
    @SuppressWarnings("unchecked")
    public <T> List<T> executeGroup(final Collection<ShardingExecuteGroup<? extends StatementExecuteUnit>> sqlExecuteGroups,
                                    final SQLExecuteCallback<T> firstCallback, final SQLExecuteCallback<T> callback) throws SQLException {
        try {
            return executeEngine.groupExecute((Collection) sqlExecuteGroups, firstCallback, callback);
        } catch (final SQLException ex) {
            ExecutorExceptionHandler.handleException(ex);
            return Collections.emptyList();
        }
    }
}

Next, the wonderful moment has come, what does the execution engine do? please watch the following part.

 public <I, O> List<O> groupExecute(
            final Collection<ShardingExecuteGroup<I>> inputGroups, final ShardingGroupExecuteCallback<I, O> firstCallback, final ShardingGroupExecuteCallback<I, O> callback) throws SQLException {
        if (inputGroups.isEmpty()) {
            return Collections.emptyList();
        }
        Iterator<ShardingExecuteGroup<I>> inputGroupsIterator = inputGroups.iterator();
        ShardingExecuteGroup<I> firstInputs = inputGroupsIterator.next();
        Collection<ListenableFuture<Collection<O>>> restResultFutures = asyncGroupExecute(Lists.newArrayList(inputGroupsIterator), callback);
        return getGroupResults(syncGroupExecute(firstInputs, null == firstCallback ? callback : firstCallback), restResultFutures);
    }
    
    private <I, O> Collection<ListenableFuture<Collection<O>>> asyncGroupExecute(final List<ShardingExecuteGroup<I>> inputGroups, final ShardingGroupExecuteCallback<I, O> callback) {
        Collection<ListenableFuture<Collection<O>>> result = new LinkedList<>();
        for (ShardingExecuteGroup<I> each : inputGroups) {
            result.add(asyncGroupExecute(each, callback));
        }
        return result;
    }
    
    private <I, O> ListenableFuture<Collection<O>> asyncGroupExecute(final ShardingExecuteGroup<I> inputGroup, final ShardingGroupExecuteCallback<I, O> callback) {
        final Map<String, Object> dataMap = ShardingExecuteDataMap.getDataMap();
        return executorService.submit(new Callable<Collection<O>>() {
            
            @Override
            public Collection<O> call() throws SQLException {
                ShardingExecuteDataMap.setDataMap(dataMap);
                return callback.execute(inputGroup.getInputs(), false);
            }
        });
    }
    
    private <I, O> Collection<O> syncGroupExecute(final ShardingExecuteGroup<I> executeGroup, final ShardingGroupExecuteCallback<I, O> callback) throws SQLException {
        return callback.execute(executeGroup.getInputs(), true);
    }

sqlExecuteTemplate calls groupExecute of ShardingExecuteEngine. GroupExecute is divided into two main methods, asyncGroupExecute asynchronous execution method and syncGroupExecute synchronous execution method. At first glance, isn't it multi-threaded? How does a synchronization occur? The multi-threading here is very clever. First, the first element, firstInputs, is taken out of the execution group, and the rest are thrown into the thread pool of asyncGroupExecute. The first task is executed by the current thread without wasting a thread.

What the execution engine really executes here is the incoming callback function, so where does this callback come from? Let's go back and look at the executeQuery method of PreparedStatementExecutor, from which the callback function is created.

 public List<QueryResult> executeQuery() throws SQLException {
        final boolean isExceptionThrown = ExecutorExceptionHandler.isExceptionThrown();
        SQLExecuteCallback<QueryResult> executeCallback = new SQLExecuteCallback<QueryResult>(getDatabaseType(), getSqlType(), isExceptionThrown) {
            
            @Override
            protected QueryResult executeSQL(final StatementExecuteUnit statementExecuteUnit) throws SQLException {
                return getQueryResult(statementExecuteUnit);
            }
        };
        return executeCallback(executeCallback);
    }

All the logic in one go, easy to extend, clever design, rare good code.

Finally, Sharding-Sphere is a very good sub-database sub-table framework.

---------------------------------------------------------------------------------------------------------

Happiness comes from sharing.

This blog is original by the author, please indicate the source for reprinting

Parse the SQL execution engine of Sharding-Sphere

Guess you like