mycat learning-4-sharding JOIN, sharding rules

http://blog.csdn.net/convict_eva/article/details/51992635

mycat supports cross-shard join, there are four main methods:
1. Global table
The dictionary table (infrequent changes, the overall data volume does not change much, the data scale is not large and rarely exceeds 10W records) can be used as a global table
characteristic:
1) The insertion and update operations of the global table will be executed on all nodes in real time, maintaining the data consistency of each shard. There is no drastic update operation.
2) The global table query is only obtained from one node

3) The global table can perform JOIN operation with any table

4) The multi-threaded update may not be the same record. If the same record in the multi-threaded udpate global table is deadlocked, batch insert is possible.

Configuration:
<table name="company" primaryKey="ID" type="global" dataNode="dn1,dn2,dn3" />
No need to write rule rules, it is necessary to execute DDL statement
2 and ER fragmentation on all nodes
Borrowed from foundation DB The design idea of ​​​​depends the storage location of the sub-table on the main table, and physically believes in the storage, thus completely solving the efficiency and performance problems of JOIN. According to this idea, mycat proposes a data sharding strategy for the ER relationship. The records of the child table and the associated parent table records are stored on the same data shard.
There is a type of business, such as order (order) and order detail (order_detail), the detail table will depend on the order table, that is to say, there is a master-slave relationship of the table. This type of table is suitable for ER sharded tables. The records of the child table and the associated parent table records are stored on the same shard to avoid cross-database JOIN operations.
schema.xml placement:
Taking order and order_detail as an example, the following sharding configuration is defined in schema.xml. order and order_detail are sharded according to order_id to ensure that the data of the same order_id is allocated to the same shard. When performing data insertion, mycat will obtain The shard where the order is located, and then insert the order_detail into the shard where the order is located.

<table name="order" dataNode="dn$1-32" rule="mod-long">
<childTable name="order_detail" primaryKey="id" joinKey="order_id" parentKey="order_id" />
</table>

3. catletT (artificial intelligence)
mycat provides api to solve specific SQL JOIN logics that must be cross-sharded in the business system through programming.

4、ShareJoin
ShareJoin is a simple cross-shard Join, implemented based on HBT. Currently, the join of two tables is supported. The principle is to parse the SQL statement, split it into single-table SQL statements for execution, and then aggregate the data of each node.
ShareJoin is under development, the first three are supported by 1.3.0.1.




Sharding rules:

1. Global table


2. ER fragmentation


3. Many-to-many association

There is a business that is "main table A + relation table + main table B", for example: merchant + order + member.
Members want to inquire about purchase orders, and merchants want to inquire about sold orders, so how to do the segmentation. At present, the general principle is that from a business point of view, which table the relational table prefers, that is, "A relation" or "B relation", to decide which direction the relational table is stored in.



4. Primary key and non-primary key sharding
If there is no field that can be used as a sharding field, the primary key sharding is the only choice. The advantage is that the query according to the primary key
is . It can also shard the data evenly on different nodes.
If there is a suitable business field that is suitable as a sharding field, it is recommended to use this business field sharding. The conditions for selecting a sharding field are as follows:
1. As much as possible, distribute the data as evenly as possible to each node;
2. The business field is the most frequent or most important check condition.


When choosing a suitable business field as the sharding rule, don't worry about "sacrificing the query performance of the primary key", and provide mycat with a "primary key to sharding" caching mechanism. There is no performance penalty for querying by primary key.


<table name="t_user" primaryKey="user_id" dataNode="dn$1-32" rule="mod-long">
<childTable name="t_user_detail" primaryKey="id" joinKey="user_id" parentKey="user_id" />
</table>


For non-sharded table, fill in primaryKey, then mycat will query the SQl statement based on the primary key. The first execution result is analyzed to determine which shard a primary key of the table is on, and cache the "primary key --> shard". When querying in the future, it will first query whether there is a "primary key --> fragmentation" mapping in the cache. If there is a direct query, the query performance of non-primary key fragments will be improved.


Mycat performance suggestions
1. Use inner join and try to avoid using left join and right join
. 2. When using left join and right join, the on condition will be executed first, and the where condition will be executed later. When using, try to write the condition after the on to reduce the execution of where
3. Use less subqueries and use join

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325396410&siteId=291194637
Recommended