【Introduction to sharding-method】

The new idea of ​​sharding-method sub-table sub-database - service layer Sharding framework, full SQL, full database compatibility, ACID features consistent with the native database, can achieve RR-level read-write separation, no SQL parsing performance is higher

 

At present, the mainstream Sharding frameworks in China are based on SQL. The main processes are as follows:

1) is to parse the SQL incoming from the upper layer

2) Combined with the corresponding sub-table and sub-database configuration, the incoming SQL is rewritten and distributed to the corresponding stand-alone database

3) After obtaining the return result of each stand-alone database, return the result expected by the user according to the original SQL merge result

 

This implementation hopes to provide a solution that shields the underlying sharding logic. For the upper-layer application, there is only one RDB, so that the application can transparently access multiple databases.

 

 

However, this is just a beautiful goal. For various reasons, these levels of sharding solutions cannot provide the same functions as native databases:

1) A in ACID cannot be guaranteed

2) C in ACID may be broken

3) The I in ACID is inconsistent with the native

4) Due to the complexity of SQL parsing and performance considerations, many databases do not support SQL

Because of these differences, in essence, the upper-layer application must clearly know the inconsistency between the query results and transaction results obtained after such a sharding scheme and the original ones, in order to write correct and reliable programs.

 

Therefore, the SQL-based sharding scheme is not transparent to the application layer.

If we want to write correct and reliable code based on the framework of the SQL layer, we need to follow some paradigms:

1) All transactions (including read and write) cannot cross libraries

2) The isolation level provided by cross-shard query is inconsistent with the native one

3) Some aggregate queries consume a lot of performance, so use with caution

These paradigms are actually opaque when using the Sharding database framework. These performances are implicitly hidden in SQL and difficult to REVIEW.

And these paradigms don't necessarily fully understand the implications of execution for many people, so much so that they ignore it.

According to the most important point above, "all transactions (including reading and writing) cannot cross-database", the database access in most business codes in a reasonably designed code will not cross-partition, and the core business code is in the same partition carried out within. Therefore, in most cases, all we need is a framework to help us easily select the corresponding shard.

 

 

How to use basic annotations

@ShardingContext represents the Sharding context of the current Service, that is to say, if there are operations such as selecting a data source, Map to each database, Reduce results, and generate IDs, if some parameters are not specified, they will be taken from the configuration in this ShardingContext.

 

@SelectDataSource indicates that a sharding data source is selected for the SQL executed in the method according to the sharding strategy, and the sharding data source cannot be changed before the method ends and returns

 

@GenerateId means to generate an ID and assign it to the specified position of the parameter

 

The logic corresponding to @GenerateId will be executed first, then to @SelectDataSource and then to @Transaction

 

The @Transactional(readOnly=true) tag specifies that the transaction is read-only, so the framework will automatically select the read library (if any) based on the readOnly flag

 

From the method calcUserAvgAge, we can see that under the LAMBADA expression and Stream function of JDK8, JAVA analysis and processing of collection data becomes extremely simple, which will greatly reduce the complexity of processing Sharding shard data by ourselves.

 

@MapReduce means that this method will be executed once for each data shard, and then return after data aggregation. For methods that return the same data type before and after aggregation, the aggregation result can be obtained directly from the return value when invoking. For methods with inconsistent data types before and after aggregation, you need to pass in an object ReduceResultHolder. After the call is completed, the aggregation result is obtained through this object.

 

By default, the framework provides a general Reduce strategy. If it is a number, it will be accumulated and returned, if it is a Collection and its subclasses, it will be combined and returned, and if it is a MAP, it will be combined and returned. If the strategy is not suitable, users can design and specify the Reduce strategy by themselves.

 

@Transaction means that each SQL executed by Sharding is in a transaction, it does not mean that the entire aggregation operation is a whole transaction. Therefore, MapReduce is best not to perform update operations (considering that the framework level restricts MapReduce to only allow ReadOnly transactions).

 

Operations performed by @MapReduce will precede @Transaction.

 

Comparison of advantages and disadvantages

 

The above is the main form of use of the framework, we can find from this implementation that sharding at the service layer has the following benefits

Full database, full SQL compatibility

   SQL layer Sharding can't do it

Can perfectly achieve read-write separation

    After the introduction of read-write separation in Sharding based on the implementation of the SQL layer, there is a problem of chaotic isolation levels in the transaction perceived by the upper-layer Service. At most, it can achieve RC-level read-write separation (if the relevant auxiliary code is not involved in the Service layer), and the Service layer Layer Sharding can determine that the transaction is a read transaction before the service starts. The entire read transaction is completed in one read library, and the isolation level is consistent with the database.

No additional burden of maintaining DBProxy availability

Compared with complex SQL parsing, the implementation is simple. I believe that you can read all the code in a day, and you will know the whole framework like the back of your hand.

No SQL parsing cost, higher performance

Features such as isolation level and transaction atomicity are consistent with the database used, no additional learning burden, and it is easy to write correct programs

   The framework restricts all transactions to be performed in a single library

  Even in the case of non-read-write separation, Sql-based Sharding provides chaotic isolation levels due to the need to merge multiple databases, but this difference does not explicitly prompt the programmer.

 

Of course there are disadvantages

Disadvantage:

Cross-database queries need to aggregate results by themselves

  both disadvantage and advantage

  Disadvantage: Additional aggregation code needs to be done

  Advantages: But it can be better tuned. Using JDK8's Stream and Lambada expressions, it can complete related collection processing as simple as writing SQL.

Cross-database transactions need to be guaranteed by themselves

   both disadvantage and advantage

  Disadvantage: You need to implement cross-database transactions by yourself

  Advantages: At present, all cross-database transactions implemented by Sharding frameworks have defects or limitations. For example, cross-database transactions provided by Sharding-JDBC, Mycat, etc. are not strictly ACID, A may be broken, and I is also different from the original definition. Likewise, it's easy to write unreliable code when programmers are unfamiliar. Therefore, it may be a better choice to control distributed transactions by yourself and use explicit transaction control. You can refer to EasyTransaction, another framework written by me

Unable to implement single database sub-table

   In fact, it is not necessary to separate tables in a single database. This can be achieved by using table partitions native to the data. The performance is the same and it is more convenient to use.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326311215&siteId=291194637