Learning Sharding-JDBC, don't you understand these basic concepts?

Preface

Before understanding the implementation principle of Sharding-JDBC, you need to understand the following concepts:

Logical table

The general term for horizontally split data tables. Example: The order data table is split into 10 tables according to the mantissa of the primary key, which are torder0, torder1 to torder9, and their logical table name is t_order.

Real watch

A physical table that actually exists in a fragmented database. That is, torder0 to torder9 in the previous example.

Data node

The smallest physical unit of data fragmentation. It is composed of data source name and data table, for example: ds0.torder_0.

Binding table

Refers to the main table and sub-tables that have consistent fragmentation rules. For example, the torder table and the torderitem table are both fragmented according to orderid, and the partition keys between the bound tables are exactly the same, so the two tables are bound to each other. There will be no Cartesian product association for multi-table association queries between bound tables, and the efficiency of association queries will be greatly improved.

For example, if the SQL is:

SELECT i.* FROM t_order o JOIN t_order_item i ON o.order_id=i.order_id WHERE o.order_id in (10,11);

When the binding table relationship is not configured , assuming that the fragment key order_id routes the value 10 to the 0th slice, and the value 11 to the 1st slice, then there should be 4 SQLs after routing, and they appear as Cartesian products:

SELECT i.* FROM t_order_0 o JOIN t_order_item_0 i ON o.order_id=i.order_id WHERE o.order_id in(10, 11);
SELECT i.* FROM t_order_0 o JOIN t_order_item_1 i ON o.order_id=i.order_id WHERE o.order_id in(10, 11);
SELECT i.* FROM t_order_1 o JOIN t_order_item_0 i ON o.order_id=i.order_id WHERE o.order_id in(10, 11);
SELECT i.* FROM t_order_1 o JOIN t_order_item_1 i ON o.order_id=i.order_id WHERE o.order_id in(10, 11);

After configuring the binding table relationship , the routing SQL should be two:

SELECT i.* FROM t_order_0 o JOIN t_order_item_0 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);
SELECT i.* FROM t_order_1 o JOIN t_order_item_1 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);

Broadcast table

Refers to the tables that exist in all fragmented data sources. The structure of the table and the data in the table are exactly the same in each database. It is suitable for scenarios where the amount of data is not large and requires associated queries with tables with massive data, such as dictionary tables.

Shard key

The database field used for sharding is the key field for horizontally splitting the database (table). Example: Take the modulus of the order primary key in the order table into fragments, and the order primary key is the fragment field. If there is no fragmentation field in SQL, full routing will be executed, and the performance will be poor. In addition to support for single sharding fields, ShardingJdbc also supports sharding based on multiple fields.

Sharding Algorithm

The data is fragmented by the fragmentation algorithm, and fragmentation by =** , >=* , * <=, >, <, ** BETWEEN and IN is supported . The sharding algorithm needs to be implemented by the application developer by itself, and the flexibility that can be implemented is very high.

Currently, 4 types of fragmentation algorithms are provided. Because the sharding algorithm is closely related to business implementation, it does not provide a built -in sharding algorithm* , but uses a * sharding strategy to extract various scenarios , provide a higher level of abstraction, and provide an interface for application developers to implement by themselves Fragmentation algorithm.

  • Precise sharding algorithm

Corresponds to PreciseShardingAlgorithm, which is used to handle the scenario where a single key is used as the sharding key = and IN are sharded. Need to cooperate with StandardShardingStrategy.

  • Range fragmentation algorithm

Corresponding to RangeShardingAlgorithm, it is used to process the sharding scenario of BETWEEN AND, >, <, >=, <= using a single key as the sharding key. Need to cooperate with StandardShardingStrategy.

  • Compound sharding algorithm

Corresponding to the ComplexKeysShardingAlgorithm, it is used to handle the scenario where multiple keys are used as the sharding key for sharding. The logic of multiple sharding keys is more complicated, and the application developer needs to deal with the complexity. Need to cooperate with ComplexShardingStrategy.

  • Hint fragmentation algorithm

Corresponds to HintShardingAlgorithm, which is used to process scenes that use Hint row sharding. Need to cooperate with HintShardingStrategy.

Sharding strategy

Contains the sharding key and the sharding algorithm. Due to the independence of the sharding algorithm, they are separated independently. What really can be used for sharding operations is the shard key + sharding algorithm , that is, the sharding strategy. There are currently 5 fragmentation strategies.

  • Standard sharding strategy

Corresponds to StandardShardingStrategy. Provide support for fragmentation operations of =, >, <, >=, <=, IN and BETWEEN AND in SQL statements. StandardShardingStrategy only supports a single sharding key , and provides two sharding algorithms: PreciseShardingAlgorithm and RangeShardingAlgorithm.

PreciseShardingAlgorithm is required and used to process = and IN fragmentation .

RangeShardingAlgorithm is optional. It is used to process BETWEEN AND, >, <, >=, <= fragmentation. If RangeShardingAlgorithm is not configured, BETWEEN AND in SQL will be processed according to the whole database routing.

  • Composite sharding strategy

Corresponds to ComplexShardingStrategy. Composite fragmentation strategy. Provide support for fragmentation operations of =, >, <, >=, <=, IN and BETWEEN AND in SQL statements. ComplexShardingStrategy supports multiple sharding keys . Due to the complex relationship between the multiple sharding keys, there is no excessive encapsulation. Instead, the sharding key value combination and the sharding operator are directly transmitted to the sharding algorithm, which is completely developed by the application It can be realized, providing maximum flexibility.

  • Row expression fragmentation strategy

Corresponds to InlineShardingStrategy. Use Groovy expressions to provide support for fragmentation operations for = and IN in SQL statements, and only support single fragmentation keys. For simple sharding algorithms, you can use simple configuration to avoid tedious Java code development, such as: tuser$->{uid% 8} means that the tuser table is divided into 8 tables according to the uid modulo 8. The table name is tuser0 to tuser_7.

  • Hint fragmentation strategy

Corresponds to HintShardingStrategy. The strategy of fragmentation by specifying fragmentation value through Hint instead of extracting fragmentation value from SQL.

  • Non-fragmentation strategy

Corresponds to NoneShardingStrategy. Non-fragmentation strategy.

Self-incrementing primary key generation strategy

By generating an auto-incrementing primary key on the client and replacing it with the database's native auto-incrementing primary key, the distributed primary key is not duplicated.

SQL parsing

When Sharding-JDBC receives a SQL statement, it will execute SQL analysis => query optimization => SQL routing => SQL rewrite => SQL execution =>

The results are merged and the execution result is finally returned.

img

The SQL parsing process is divided into lexical analysis and grammatical analysis . The lexical parser is used to disassemble SQL into indivisible atomic symbols, called Token . And according to the dictionaries provided by different database dialects, they are classified into keywords, expressions, literals and operators. Then use the syntax parser to convert the SQL into an abstract syntax tree.

For example, the following SQL:

SELECT id, name FROM t_user WHERE status = 'ACTIVE' AND age > 18

After parsing, the abstract syntax tree is shown below:img

In order to facilitate understanding, the Token of the keyword in the abstract syntax tree is indicated in green, the Token of the variable is indicated in red, and the gray indicates that it needs to be further split.

The traversal of the abstract syntax tree is used to refine the context required for fragmentation, and to mark the location where SQL rewriting (described later) may be required. The parsing context for sharding includes query selection items (Select Items), table information (Table), sharding condition (Sharding Condition), auto increment primary key information (Auto increment Primary Key), sorting information (Order By), grouping information (Group By) and paging information (Limit, Rownum, Top).

SQL routing

SQL routing is the process of mapping data operations on logical tables to operations on data nodes .

Match the fragmentation strategy of the database and the table according to the analysis context, and generate the routing path. For SQL carrying fragmentation keys, it can be divided into single-slice routing (the operator of the fragmentation key is the equal sign), multi-slice routing (the operator of the fragmentation key is IN) and range routing ( The operator of the fragment key is BETWEEN), and the SQL that does not carry the fragment key uses broadcast routing.

Scenarios for routing based on fragmentation keys can be divided into direct routing, standard routing, and Cartesian routing.

Standard routing

Standard routing is the most recommended sharding method for Sharding-Jdbc, and its applicable scope is SQL that does not include associated queries or only includes associated queries between bound tables.

When the fragmentation operator is an equal sign, the routing result will fall into a single database (table) . When the fragmentation operator is BETWEEN or IN, the routing result may not necessarily fall into the only database (table) , so a logical SQL may eventually It is split into multiple pieces of real SQL for execution.

For example, if data is fragmented according to the odd and even numbers of order_id, the SQL for a logical table query is as follows:

SELECT * FROM t_order WHERE order_id IN (1, 2);

Then the result of routing should be:

SELECT * FROM t_order_0 WHERE order_id IN (1, 2);
SELECT * FROM t_order_1 WHERE order_id IN (1, 2);

The associated query of the bound table has the same complexity and performance as the single table query. For example, if an associative query including a bound table SQL is as follows:

SELECT * FROM t_order o JOIN t_order_item i ON o.order_id=i.order_id  WHERE order_id IN (1, 2);

Then the result of routing should be:

SELECT * FROM t_order_0 o JOIN t_order_item_0 i ON o.order_id=i.order_id  WHERE order_id IN (1,2);
SELECT * FROM t_order_1 o JOIN t_order_item_1 i ON o.order_id=i.order_id  WHERE order_id IN (1,2);

As you can see, the number of SQL splits is consistent with the single table.

Cartesian routing

Cartesian routing is the most complicated situation. It cannot locate fragmentation rules based on the relationship between bound tables. Therefore, the association query between unbound tables needs to be disassembled into Cartesian product combinations for execution .

If the SQL in the previous example does not configure the binding table relationship, the result of the routing should be:

SELECT * FROM t_order_0 o JOIN t_order_item_0 i ON o.order_id=i.order_id  WHERE order_id IN (1,2);
SELECT * FROM t_order_0 o JOIN t_order_item_1 i ON o.order_id=i.order_id  WHERE order_id IN (1,2);
SELECT * FROM t_order_1 o JOIN t_order_item_0 i ON o.order_id=i.order_id  WHERE order_id IN (1,2);
SELECT * FROM t_order_1 o JOIN t_order_item_1 i ON o.order_id=i.order_id  WHERE order_id IN (1,2);

The Cartesian routing query performance is low, so use it with caution.

Full database table routing

For SQL that does not carry a fragmentation key, broadcast routing is adopted . According to the SQL type, it can be divided into five types : full database table routing, full database routing, full instance routing, unicast routing and blocking routing . The full database table routing is used to process operations on all real tables related to its logical tables in the database.

It mainly includes DQL (data query) and DML (data manipulation) without sharding key, and DDL (data definition). E.g:

SELECT * FROM t_order WHERE good_prority IN (1, 10);

It will traverse all the tables in all databases, match the logical table and the real table name one by one, and execute if it matches. Become after routing

SELECT * FROM t_order_0 WHERE good_prority IN (1, 10);
SELECT * FROM t_order_1 WHERE good_prority IN (1, 10);
SELECT * FROM t_order_2 WHERE good_prority IN (1, 10);
SELECT * FROM t_order_3 WHERE good_prority IN (1, 10);

SQL rewriting

SQL written for logical tables cannot be directly executed in the real database. SQL rewriting is used to rewrite the logical SQL into SQL that can be executed correctly in the real database .

As a simple example, if the logical SQL is:

SELECT order_id FROM t_order WHERE order_id=1;

Assuming that the SQL is configured with the fragment key orderid, and orderid=1, it will be routed to fragment table 1 . Then the SQL after rewriting should be:

SELECT order_id FROM t_order_1 WHERE order_id=1;

For another example, Sharding-JDBC needs to obtain the corresponding data when the results are merged, but the data cannot be returned by the query SQL. This situation is mainly for GROUP BY and ORDER BY . When the results are merged, grouping and sorting are required according to the field items of GROUP BY and ORDER BY , but if the original SQL options do not include grouping or sorting items, the original SQL needs to be rewritten.

Let's first look at the scene with the information required for result merging in the original SQL :

SELECT order_id, user_id FROM t_order ORDER BY user_id;

Since userid is used for sorting, the user****id data needs to be obtained in the result merging, and the above SQL can obtain user_id data , so there is no need to supplement the column. If the selection does not contain the columns required for the result merging, you need to supplement the columns, such as the following SQL:

SELECT order_id FROM t_order ORDER BY user_id;

Since the original SQL does not contain the user_id that needs to be obtained in the result merging, the SQL needs to be rewritten. The SQL after adding the column is:

SELECT order_id, user_id AS ORDER_BY_DERIVED_0 FROM t_order ORDER BY user_id;

SQL execution

Sharding-JDBC uses a set of automated execution engine , responsible for safely and efficiently sending the real SQL after routing and rewriting to the underlying data source for execution. It does not simply send SQL directly to the data source for execution through JDBC ; it does not directly put the execution request into the thread pool for concurrent execution.

Merge results

Combining multiple data result sets obtained from each data node into a result set and correctly returning it to the requesting client is called result merging.

The result merging supported by Sharding-JDBC can be divided into five types: traversal, sorting, grouping, paging and aggregation . They are combinations rather than mutually exclusive relationships.

The overall structure of the merge engine is divided as follows:

img

The result merge can be divided into stream merge, memory merge and decorator merge from the structure division . Streaming merge and memory merge are mutually exclusive, decorator merge can be further processed on top of stream merge and memory merge

to sum up

Today, Lao Gu introduced the basic concepts, core functions and implementation principles of Sharding-JDBC .

Basic concepts: logical tables, real tables, data nodes, binding tables, broadcast tables, fragment keys, fragment algorithms, fragment strategies, primary key generation strategies

Execution process: SQL analysis => query optimization => SQL routing => SQL rewriting => SQL execution => result merging

In the following article, Lao Gu will introduce the actual use of Sharding-jdbc, thank you! ! !

Insert picture description here

Guess you like

Origin blog.csdn.net/EnjoyEDU/article/details/109211050