mycat database middleware

Source: http://blog.csdn.net/nxw_tsp/article/details/56277430

What is MYCAT

During the internship, in a project, the project manager requested to transform the original MySQL data connection based on mycat. I was wondering what MyCat is? Why use it?

* 1. What is MyCat:
MyCat is an open source distributed database system and a server that implements the MySQL protocol. Front-end users can regard it as a database agent, which can be accessed with MySQL client tools and command lines. The backend can communicate with multiple MySQL servers using the MySQL native protocol, or communicate with most mainstream database servers using the JDBC protocol. In the back-end MySQL server or other database.

MyCat has developed to the current version, it is no longer a pure MySQL agent, its backend can support mainstream databases such as MySQL, SQL Server, Oracle, DB2, PostgreSQL, etc., and also supports MongoDB, a new type of NoSQL storage. More types of storage are supported. From the perspective of the end user, no matter what kind of storage method, in MyCat, it is a traditional database table, which supports standard SQL statements for data operations. In this way, for the front-end business system, it can be greatly reduced. Development difficulty, improve development speed



2. So why use MyCat?

* For example, an operating system is an abstraction of various types of computer hardware. So when do we need abstraction? If there is only one kind of hardware, do we need to develop an operating system?
Another example is that when a project requires only one person to complete, a leader is not required, but when dozens of people are required to complete it, there should be a manager to play the role of communication and coordination, and this manager is the key to the project for his superiors. Set of abstractions.
Similarly, when our application only needs one database server, we don't need Mycat, and if you need sub-database or even sub-table, when the application has to face many databases, it is necessary to analyze the database layer. Make an abstraction to manage these databases, and the top application only needs to face the abstraction of a database layer or database middleware, which is the core role of Mycat.
So it can be understood like this: the database is an abstraction of the underlying storage files, and Mycat is an abstraction of the database. *

 

Source: http://blog.csdn.net/u013467442/article/details/56955846

Mycat Getting Started Tutorial

Introduction to mycat

  • The introduction has a more detailed introduction on the official website. It is not interesting to copy and paste here. You can read it on the official website.
  • Official website link

Precondition

This tutorial is run in the window environment, and the actual production is recommended to run on Linux.
Prerequisites (install it by yourself, if you can't install it, please lay the foundation first and learn it):

  • JDK: Recommended is 1.7 and above.
  • MySQL: Must be 5.5 and above.

Topology

write picture description here

  • Two tables users and item, three databases db01, db02, db03 (three databases on one database instance)
  • users are only stored in db01.
  • The item table is split into db02 and db03 for storage.
create database db01; 

 CREATE TABLE users (  
    id INT NOT NULL AUTO_INCREMENT,  
    name varchar(50) NOT NULL default '',  
    indate DATETIME NOT NULL default '0000-00-00 00:00:00',  
    PRIMARY KEY (id)  
)AUTO_INCREMENT= 1 ENGINE=InnoDB DEFAULT CHARSET=utf8;  

Create item tables in db02 and db03 respectively, the SQL script is as follows

create database db02;  
 CREATE TABLE item (  
    id INT NOT NULL AUTO_INCREMENT,  
    value INT NOT NULL default 0,  
    indate DATETIME NOT NULL default '0000-00-00 00:00:00',  
    PRIMARY KEY (id)  
)AUTO_INCREMENT= 1 ENGINE=InnoDB DEFAULT CHARSET=utf8;

create database db03;  
CREATE TABLE item (  
    id INT NOT NULL AUTO_INCREMENT,  
    value INT NOT NULL default 0,  
    indate DATETIME NOT NULL default '0000-00-00 00:00:00',  
    PRIMARY KEY (id)  
)AUTO_INCREMENT= 1 ENGINE=InnoDB DEFAULT CHARSET=utf8; 

start using

  • First download the installation package from the official website of mycat, the website is mycat and the download diagram is as follows:
    download icon
  • Then edit the three files service.xml, rule.xml, and schema.xml in the conf directory.
  • service.xml mainly configures the parameters of the mycat service, such as the port number, the logical database used by the myact username and password, etc.
  • role.xml mainly configures the routing strategy, mainly including the shard key of the shard, and the split strategy (modulo or division by interval, etc.)
  • The schema.xml file mainly configures the information of the database, such as the logical database name, the physical real data source, the correspondence between the table and the data source, and the routing strategy.
  • The configuration looks like this:

  • server.xml

<?xml version="1.0" encoding="UTF-8"?>  
<!DOCTYPE mycat:server SYSTEM "server.dtd">  
<mycat:server xmlns:mycat="http://io.mycat/">  
        <system>  
            <!--   
                <property name="processors">32</property>  
                <property name="processorExecutor">32</property>   
                <property name="bindIp">0.0.0.0</property>   
                <property name="frontWriteQueueSize">4096</property>  
                <property name="idleTimeout">300000</property>  
                <property name="mutiNodePatchSize">100</property>  
            -->  
                <property name="defaultSqlParser">druidparser</property>  
                <property name="mutiNodeLimitType">1</property>  
                <property name="serverPort">8066</property>  
                <property name="managerPort">9066</property>   
        </system>  
        <!-- 任意设置登陆 mycat 的用户名,密码,数据库  -->  
        <user name="test">  
                <property name="password">test</property>  
                <property name="schemas">TESTDB</property>  
        </user>  

        <user name="user">  
                <property name="password">user</property>  
                <property name="schemas">TESTDB</property>  
                <property name="readOnly">true</property>  
        </user>  
        <!--   
        <quarantine>   
           <whitehost>  
              <host host="127.0.0.1" user="mycat"/>  
              <host host="127.0.0.2" user="mycat"/>  
           </whitehost>  
       <blacklist check="false"></blacklist>  
        </quarantine>  
        -->  
</mycat:server>  
  • The routing table indicates that mod2 is used for routing. It can be seen from the following that the id key of the item table is modulo divided into db02 and db03, and the users table is directly routed to db01.
<?xml version="1.0" encoding="UTF-8"?>
<!-- - - Licensed under the Apache License, Version 2.0 (the "License"); 
    - you may not use this file except in compliance with the License. - You 
    may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 
    - - Unless required by applicable law or agreed to in writing, software - 
    distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT 
    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the 
    License for the specific language governing permissions and - limitations 
    under the License. -->
<!DOCTYPE mycat:rule SYSTEM "rule.dtd">
<mycat:rule xmlns:mycat="http://io.mycat/">

    <tableRule name="role1">
        <rule>
            <columns>id</columns>
            <algorithm>mod-long</algorithm>
        </rule>
    </tableRule>

    <function name="mod-long" class="io.mycat.route.function.PartitionByMod">
        <!-- how many data nodes -->
        <property name="count">2</property>
    </function>
</mycat:rule>
  • The url, usename, and password of the schema.xml database are filled in according to the actual situation.
<?xml version="1.0"?>  
<!DOCTYPE mycat:schema SYSTEM "schema.dtd">  
<mycat:schema xmlns:mycat="http://io.mycat/">  

    <!-- 设置表的存储方式.schema name="TESTDB" 与 server.xml中的 TESTDB 设置一致  -->  
    <schema name="TESTDB" checkSQLschema="false" sqlMaxLimit="100">  
        <table name="users" primaryKey="id"  dataNode="node_db01" />  
        <table name="item" primaryKey="id" dataNode="node_db02,node_db03" rule="role1" />  

    </schema>  

    <!-- 设置dataNode 对应的数据库,及 mycat 连接的地址dataHost -->  
    <dataNode name="node_db01" dataHost="dataHost01" database="db01" />  
    <dataNode name="node_db02" dataHost="dataHost01" database="db02" />  
    <dataNode name="node_db03" dataHost="dataHost01" database="db03" />  

    <!-- mycat 逻辑主机dataHost对应的物理主机.其中也设置对应的mysql登陆信息 -->  
    <dataHost name="dataHost01" maxCon="1000" minCon="10" balance="0" writeType="0" dbType="mysql" dbDriver="native">  
            <heartbeat>select user()</heartbeat>  
            <writeHost host="server1" url="127.0.0.1:3306" user="root" password="123456"/>  
    </dataHost>  
</mycat:schema>

start test

  • Switch to the bin directory on the command line, and then execute the following command:
    write picture description here
    After the correct startup, the following command will be displayed:
    write picture description here
    Then it means that we successfully started the service.

Validation results

  • To access the mycat logical database on the command line, use the following command:

mysql -utest -ptest -h127.0.0.1 -P8066 -DTESTDB

  • Now query the database and tables through the database, and find that there is only the logical database TESTDB instead of db01, db02, db03; and the tables are also displayed uniformly, rather than distributed in different actual databases. The reference pictures are as follows:
    write picture description here

    • Now access Mycat to insert data in the database to see if the data can be divided into tables according to the routing rules configured earlier.
    • Now execute the following SQL statement to insert data.
insert into users(name,indate) values('kk',now());
insert into users(name,indate) values('ss',now());
insert into item(id,value,indate) values(1,100,now());
insert into item(id,value,indate) values(2,100,now());
  • Then check whether the insertion is successful on mycat. The following figure shows that the insertion is successful.
    write picture description here
  • Then log in to the actual database to see if the sub-table is successful. The following figure shows that the split table is successful.
    write picture description here
    The figure shows that the data in the inserted users table is all in db01, while the data in the item table is evenly distributed in db02 and db03 after modulo by Id. In this way, the table is divided according to the actual routing policy.
  • Finished test! ! ~Data has been sub-database and sub-table!

Source: http://blog.csdn.net/u013235478/article/details/53178657

Mycat from entry to abandonment

 

1. Non-sharded field query

The routing result in Mycat is determined by the sharding field and sharding method . For example, a Mycat sub-library scheme in the following figure:

  • Fragmentation based on the id field of the tt_waybill table
  • The sharding method is the modulo of the id value of 3 , and a shard in DB1, DB2, and DB3 is determined according to the modulo value.

Non-sharded field query

If there is an id field in the query condition, the query will fall to a specific shard. E.g:

mysql>select * from tt_waybill where id = 12330;

At this point, Mycat will calculate the routing result

12330 % 3 = 0 –> DB1

And route the request to DB1 for execution.


If there is no fragmentation , for example:

mysql>select * from tt_waybill where waybill_no =88661;

At this point, Mycat cannot calculate the route, so it will be sent to all nodes for execution:

DB1 –> select * from tt_waybill where waybill_no =88661;
DB2 –> select * from tt_waybill where waybill_no =88661;
DB3 –> select * from tt_waybill where waybill_no =88661;

If the sharding field has a high degree of selectivity, it is also a commonly used query dimension in the business. Generally, only one or very few DB nodes are hit (result set is returned). There are only 3 DB nodes in the example, and the number of DB nodes in the actual application is far more than this. If there are 50 DB nodes, then a query in the front end will become 50 queries when it falls on the MySQL database, which will greatly consume Mycat and MySQL. database resource.

If you design to use Mycat with non-sharded field queries, please consider giving up!

2. Paging sort

Let's take a look at how Mycat handles paging operations. If there is the following Mycat sub-database scheme:
a table has 30 pieces of data distributed on 3 sharded DBs, and the specific data distribution is as follows

DB1:[0,1,2,3,4,10,11,12,13,14]
DB2:[5,6,7,8,9,16,17,18,19]
DB3:[20,21,22,23,24,25,26,27,28,29]

(There are no query conditions in the scenario of this example, so it is a full sharding query, and the sharding field and sharding method of the table are not assumed)

pagination

When the application executes the following pagination query

mysql>select * from table limit 2;

Mycat distributes the SQL request to each DB node for execution, and receives the return result of each DB node

DB1: [0,1]
DB2: [5,6]
DB3: [20,21]

But the result set that Mycat returns to the application depends on which DB node returns the result to Mycat first. If Mycat receives the result set of the DB1 node first, then the result set returned by Mycat to the application side is [0,1] , if Mycat receives the result set of the DB2 node first, then the result set returned to the application side is [5, 6] . That is to say, under the same circumstances, the same SQL will have different return results when executed on Mycat.

When performing paging operations in Mycat, the sorting conditions must be displayed to ensure the correctness of the results. Let's take a look at the processing logic of Mycat for sorting and paging.
If the sorting condition is added to the previous paging query (if the column name of the table data is id )

mysql>select * from table order by id limit 2;

The processing logic of Mycat is as follows:

Sort pagination

In the case of sorting conditions, Mycat receives the return results of each DB node, performs a minimum heap operation on it, and calculates the two smallest records [0,1] in all result sets and returns them to the application.

However, when there is an offset , the processing logic is different. If the query SQL of the application is as follows:

mysql>select * from table order by id limit 5,2;

If it is processed according to the above sorting and paging logic, the processing result is as follows:

Sort offset pagination

Mycat returns the data [10,11], [16,17], [20,21] returned by each DB node to the application after the minimum heap calculation and the result set is [10,11] . However, for the application, all the data in the table is clearly a set of 30 data from 0-29 . The result set returned by the limit 5,2 operation should be [5,6] , and if it returns [10,11] , it is Incorrect processing logic.

So Mycat is another set of logic when dealing with sorting and paging with offsets - rewriting SQL . As shown below:

Correctly sort offset pagination

Mycat will rewrite the SQL statement with limit m, n when it is issued, and rewrite it to limit 0, m+n to ensure the logical correctness of the query result. So, the SQL statement that Mycat sends to the backend DB is

mysql>select * from table order by id limit 0,7;

The result set returned by each DB to Mycat is

DB1: [0,1,2,3,4,10,11]
DB2: [5,6,7,8,9,16,17]
DB3: [20,21,22,23,24,25,26]

After the min heap calculation, the minimum sequence [0,1,2,3,4,5,6] is obtained , and then the two results with offset 5 are returned as [5,6] .

Although Mycat returned the correct result, careful scrutiny found that the processing logic of such operations is extremely resource-consuming (wasting) resources. The number of result sets required by the application is 2, and the number of results to be processed in Mycat is 21. That is to say, for a full-shard limit m, n operation with t DB nodes, the amount of data that Mycat needs to process is (m+n)*t . For example, there are 50 DB nodes in the actual application. To perform limit 1000, 10 operations, the amount of data processed by Mycat is 50,500 , and the returned result set is 10. When the offset is larger, the consumption of memory and CPU resources is Dozens of times increase.

If you design to use Mycat with paginated sorting, please consider giving up!

3. Any table JOIN

Let's first look at the scene in JOIN in a single library. Assuming that there are two tables player and team in a single library, the team_id field in the player table is associated with the id field in the team table . The operation scenario is as follows:

JOIN in a single DB

The SQL for the JOIN operation is as follows

mysql>select p_name,t_name from player p, team t where p.no = 3 and p.team_id = t.id;

Query results can be found at this time

p_name t_name
Wade Heat


If the data of these two tables is divided into databases, the associated data may be distributed on different DB nodes, as shown in the following figure:

Branch JOIN

This SQL cannot find the result in each separate sharded DB, that is to say, Mycat cannot query the correct result set.

When designing and using Mycat, if you want to perform a table JOIN operation, make sure that the associated fields of the two tables have the same data distribution, otherwise please consider giving up!

4. Distributed Transactions

Mycat does not implement XA transactions according to the two-phase commit protocol , but only weak XA transactions that guarantee data consistency in the prepare phase. The implementation process is as follows:

After the application opens the transaction, Mycat identifies the connection as non-automatic commit, such as front-end execution

mysql>begin;

Mycat does not immediately send the command to the DB node. When the SQL is subsequently issued, Mycat obtains the connection that is not automatically submitted from the connection pool to execute.

Weak XA Transaction 1

Mycat will wait for the return results of each node. If all executions are successful, Mycat will mark the connection as Prepare Ready state. If one node fails to execute, it will be marked as Rollback state.

Weak XA Transaction 2

After the execution is complete, Mycat waits for the front end to send the commit or rollback command. When sending the commit command, Mycat detects whether the current connection is in the Prepare Ready state, and if so, sends the commit command to each DB node.

Weak XA Transaction 3

However, consistency cannot be guaranteed at this stage. If one DB node fails during commit , and other DB nodes commit successfully, Mycat will always wait for the faulty DB node to return the result. Mycat will only return the successful execution package to the front end after receiving the successful execution results of all DB nodes. At this time, Mycat can only wait until TIMEOUT , resulting in the destruction of transaction consistency.

When designing and using Mycat, if there is a distributed transaction, you must first check whether the transaction must be strongly consistent, otherwise please consider giving up!

 

 

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326180940&siteId=291194637