Article directory
Sub-library and sub-table
I. Overview
1.1 Reasons for sub-database and sub-table
Disadvantages of single database
- IO bottleneck: too much hotspot data, insufficient database cache, a large amount of disk IO, and low efficiency. Too much requested data, insufficient bandwidth, and network IO bottleneck.
- CPU bottleneck: SQL for sorting, grouping, connection query, aggregate statistics, etc. will consume a lot of CPU resources. If there are too many requests, CPU bottlenecks will appear.
In order to solve the above problems, we need to sub-database sub-table processing of the database
Distributed storage of data reduces the data volume of a single database/table to alleviate the performance problem of a single database, thereby achieving the purpose of improving database performance.
1.2 Split strategy
There are two main forms of sub-database sub-table : vertical split and horizontal split .
The granularity of splitting is generally divided into sub-databases and sub-tables .
Split database : Split a database and store the data in one database in multiple databases
Split table : the data originally stored in one table structure is now stored in multiple table structures
1.2.1 Vertical Split
Vertical sub-library : Based on the table, split different tables into different libraries according to the business.
features
- The table structure of each library is different
- The data of each database is different
- The union of all libraries is the full amount of data.
Vertical table split : Based on fields, different fields are split into different tables according to field attributes.
The data stored in one table is now scattered and stored in two table structures, and these two tables can be located in different servers
Vertical table splitting : split a table structure into multiple table structures, and the two table structures are associated through primary keys or foreign keys
features
- Each table has a different structure
- The data in each table is also different, and is generally associated through a column (primary key/foreign key).
- The union of all tables is the full amount of data.
1.2.2 Horizontal split
Horizontal sub-database : based on the field, according to a certain strategy, split the data of one database into multiple databases .
Features :
- The table structure of each library is the same.
- The data in each library is different.
- The union of all libraries is the full amount of data.
Horizontal table split : split the data of one table into multiple tables based on fields and according to a certain strategy .
Features :
- The table structure of each table is the same.
- The data in each table is different.
- The union of all tables is the full amount of data.
1.3 Implementation Technology
After splitting the database, the application needs to access multiple databases.
In the application program, we also need to decide which database to operate based on the current business execution, which will increase the difficulty of writing code for the application program, and the processing is quite cumbersome.
At present , many technologies have emerged to solve the problem of sub-database sub-table
-
shardingJDBC
Based on the AOP principle, the locally executed SQL is intercepted, parsed, rewritten, and routed in the application program. It needs to be coded and configured by itself. It only supports java language and has high performance.
-
MyCat
Database sub-database sub-table middleware can realize sub-database sub-table without adjusting the code, supports multiple languages, and its performance is not as good as the former.
There is no need to consider which database we need to connect to every time, which database we need to operate (direct access to MyCat), and we don’t need to integrate any third-party dependencies in the application, and we don’t need to do other coding and configuration
2. Install Mycat
2.1 Introduction
Mycat is an open source, active, Java-based MySQL database middleware.
You can use mycat like using mysql, and developers don't feel the existence of mycat at all.
Developers only need to connect to MyCat, but they don’t need to care about how many databases are used in the bottom layer, and what data is stored in each database server. The specific strategy of sub-database and sub-table only needs to be configured in MyCat.
Disguised protocol : MyCat disguises the MySQL protocol, so we can regard MyCat as a MySQL.
For the application program, we don't need to care about whether it uses MyCat or MySQL. The application program only needs to replace the MySQL connection with the MyCat connection, and the driver does not need to be changed.
Advantage
- Reliable and stable performance
- Strong technical team
- Perfect system
- active community
In the overall structure of MyCat, it is divided into two parts: the upper logical structure and the lower physical structure
Logical library : A logical database does not store specific data, and specific data is stored in a physical structure.
A logical library can include several logical tables, and a logical table is associated with several fragment nodes, also called data nodes.
Sharding node : also called data node, the data of tableA is scattered among the 3 sharding nodes. When the data in tableA is associated with the first shard node and when it is associated with the second shard node is determined by the sharding rules
The three shard nodes are associated with three databases. The underlying physical structure is the real database. The host where each database resides is called the node host .
MyCat does not store data. It only involves some logical sharding and other aggregation operations. It does not store specific data. The specific data is still stored in the underlying MySQL database.
2.2 Installation
Download address : https://github.com/MyCATApache/Mycat-Server/releases
unzip
Configure environment variables
Configure PATH
Then directly double-click to run the startup_nowrap.bat file in the bin directory. If the operation is successful, it will be displayed as follows
3. Getting Started with MyCat
Due to the large amount of data in the tb_order table, the disk IO and capacity have reached the bottleneck. Now it is necessary to fragment the data in the tb_order table and divide it into three nodes. Each node host is located on a different server.
The table structure stored in the three databases is the same, but the data is different
3.1 Environment preparation
Four servers, MyCat middleware server is associated with three servers, and database db01 is created in the above three databases
Do not create tables on the three databases, and do not need to execute some additions, deletions, and modifications. Now all operations are aimed at MyCat, and we need to configure some tb_order table sub-table strategies
3.2 Fragmentation configuration
3.2.1 schema.xml
Under the conf/schema.xml file, we need to configure the corresponding <schema> , that is, we need to configure the corresponding logic library, and we also need to configure the information of the logic table (here, tb_order) and data nodes in the <schema> , the node host corresponding to the data node
Fragmentation rules : how we should split the data in this table is determined by the rule fragmentation rules
<schema> : configuration logic library
<table> : configure the logical table, the dataNode attribute indicates how many data nodes are configured in this logical table
<dataNode> : The database attribute indicates which database is associated
<dataHost> : The dbDriver attribute has two attributes, one is native and the other is jdbc. Here we can choose jdbc. For MySQL8.0, native is not perfect yet
<writeHost> : Fill in the associated database connection information
<?xml version="1.0"?>
<!DOCTYPE mycat:schema SYSTEM "schema.dtd">
<mycat:schema xmlns:mycat="http://io.mycat/">
<schema name="DB01" checkSQLschema="true" sqlMaxLimit="100" randomDataNode="dn1">
<table name="tb_order" dataNode="dn1,dn2,dn3" rule="auto-sharding-long" splitTableNames ="true"/>
</schema>
<dataNode name="dn1" dataHost="dhost1" database="db01" />
<dataNode name="dn2" dataHost="dhost2" database="db01" />
<dataNode name="dn3" dataHost="dhost3" database="db01" />
<dataHost name="dhost1" maxCon="1000" minCon="10" balance="0"
writeType="0" dbType="mysql" dbDriver="jdbc" switchType="1" slaveThreshold="100">
<heartbeat>select user()</heartbeat>
<writeHost host="hostM1"
url="jdbc:mysql://192.168.200.210:3306??serverTimezone=Asia/Shanghai&useUnicode=true&characterEncoding=utf-8&zeroDateTimeBehavior=convertToNull&useSSL=false&allowPublicKeyRetrieval=true"
user="root"
password="1234">
</writeHost>
</dataHost>
<dataHost name="dhost2" maxCon="1000" minCon="10" balance="0"
writeType="0" dbType="mysql" dbDriver="jdbc" switchType="1" slaveThreshold="100">
<heartbeat>select user()</heartbeat>
<writeHost host="hostM1"
url="jdbc:mysql://192.168.200.213:3306??serverTimezone=Asia/Shanghai&useUnicode=true&characterEncoding=utf-8&zeroDateTimeBehavior=convertToNull&useSSL=false&allowPublicKeyRetrieval=true"
user="root"
password="1234">
</writeHost>
</dataHost>
<dataHost name="dhost3" maxCon="1000" minCon="10" balance="0"
writeType="0" dbType="mysql" dbDriver="jdbc" switchType="1" slaveThreshold="100">
<heartbeat>select user()</heartbeat>
<writeHost host="hostM1"
url="jdbc:mysql://192.168.200.214:3306??serverTimezone=Asia/Shanghai&useUnicode=true&characterEncoding=utf-8&zeroDateTimeBehavior=convertToNull&useSSL=false&allowPublicKeyRetrieval=true"
user="root"
password="1234">
</writeHost>
</dataHost>
</mycat:schema>
3.2.2 server.xml
Configure which users can access MyCat under this file, which logic libraries and tables in MyCat can be accessed
For example, in the following, the user logs in as root 123456, and the logic library that can be accessed is TESTDB, but the logic library we access is db01, so this place needs to be changed to db01
<!--表示只能读,不能够写 -->
<property name="readOnly">true</property>
You need to configure the user name, password, and user access rights information in server.xml. The specific configuration is as follows:
<user name="root" defaultAccount="true">
<property name="password">123456</property>
<property name="schemas">DB01</property>
<property name="defaultSchema">TESTDB</property>
<!--No MyCAT Database selected 错误前会尝试使用该schema作为schema,不设置则为null,报错 -->
<!-- 表级 DML 权限设置 -->
<!--
<privileges check="false">
<schema name="TESTDB" dml="0110" >
<table name="tb01" dml="0000"></table>
<table name="tb02" dml="1111"></table>
</schema>
</privileges>
-->
</user>
<user name="user">
<property name="password">123456</property>
<property name="schemas">DB01</property>
<property name="readOnly">true</property>
<property name="defaultSchema">TESTDB</property>
</user>
3.3 Start the test
Switch to the installation directory of MyCat and execute the following command to start MyCat:
Occupies port 8066
#启动
bin/mycat start
#停止
bin/mycat stop
Connect and log into MyCat
is to connect to MyCat through MySQL instructions, because MyCat actually simulates the MySQL protocol at the bottom
mysql -h 192.168.200.210 -P 8066 -uroot -p123456
In the schema.xml file, we configured a tb_order logic table under the db01 logic library. If we use table01, show tables; look, there is a tb_order table, but this table only exists logically in MyCat, but it does not exist in the database
show tables;
Create the table structure first in MyCat
Copy the following statement and execute it on the command line
CREATE TABLE TB_ORDER (
id BIGINT(20) NOT NULL,
title VARCHAR(100) NOT NULL ,
PRIMARY KEY (id)
) ENGINE=INNODB DEFAULT CHARSET=utf8 ;
INSERT INTO TB_ORDER(id,title) VALUES(1,'goods1');
INSERT INTO TB_ORDER(id,title) VALUES(2,'goods2');
INSERT INTO TB_ORDER(id,title) VALUES(3,'goods3');
INSERT INTO TB_ORDER(id,title) VALUES(1,'goods1');
INSERT INTO TB_ORDER(id,title) VALUES(2,'goods2');
INSERT INTO TB_ORDER(id,title) VALUES(3,'goods3');
INSERT INTO TB_ORDER(id,title) VALUES(5000000,'goods5000000');
INSERT INTO TB_ORDER(id,title) VALUES(10000000,'goods10000000');
INSERT INTO TB_ORDER(id,title) VALUES(10000001,'goods10000001');
INSERT INTO TB_ORDER(id,title) VALUES(15000000,'goods15000000');
INSERT INTO TB_ORDER(id,title) VALUES(15000001,'goods15000001');
How is the data we inserted above distributed among the three databases?
Determined by the attributes in the figure below
The auto-sharding-long sharding rule is a reference to the configuration file rule.xml defined by the sharding rules provided to us in MyCat.
In this sharding rule, sharding is based on id
The rang-lang in the <algorithm> tag is also a reference
rang-lang is also a reference, which refers to a function below,
In this function, there is an attribute name in the <property> tag, name="mapFile", mapFile is a mapping file, and this mapping file is associated with a physical file autopartition-long.txt
The physical file autopartition-long.txt, which describes the
- If the value of id is between 1-500w, the data will be stored in the first shard database.
- If the value of id is between 500w-1000w, the data will be stored in the second shard database.
- If the value of id is between 1000w-1500w, the data will be stored in the third shard database.
- If the value of id exceeds 1500w, an error will be reported when inserting data
Four, MyCat configuration
4.1 schema.xml configuration file
Covering the configuration of MyCat's logic library, logic tables, fragmentation rules, fragmentation nodes and data sources
The main tags involved
- schema tag
- datanode label
- datahost tag
4.1.1 schema tags
The schema tag is used to define the logic library in the MyCat instance . In a MyCat instance, there can be multiple logic libraries (and then configure a schema), and different logic libraries can be divided by the schema tag.
The concept of the logic library in MyCat is equivalent to the database concept in MySQL. When you need to operate a table under a certain logic library, you also need to switch the logic library (use xxx, such as use DB01 to switch to the DB01 logic library, this place is case-sensitive )
core attributes
-
name : Specify a custom logic library name
-
checkSQLschema : specifies the database name during the SQL statement operation, whether to automatically remove it during execution; true: automatically removed, false: not automatically removed
For example, the following DB01, if it is true, will be automatically removed
-
sqlMaxLimit
If no limit is specified for query, how many records to query in list query mode
Because querying the full table data is too performance intensive
4.1.1.1 table label
The table tag defines the logical table under the logic library schema in MyCat , and all tables that need to be split need to be defined in the table tag.
Multiple logic tables can be configured under a
Attributes
-
name : Define the logical table name, which is unique under this logical library
-
dataNode : Define the dataNode to which the logical table belongs. This attribute needs to correspond to the name in the dataNode tag; multiple dataNodes are separated by commas
-
rule : the name of the fragmentation rule, which is defined in rule.xml
-
primaryKey : The logical table corresponds to the primary key of the real table
-
type : The type of logical table. At present, there are only global tables and ordinary tables for logical tables. If not configured, it is an ordinary table; for global tables, configure it as global
4.1.2 dataNode label
The dataNode tag defines the data nodes in MyCat , which is what we usually call data sharding .
A dataNode label is an independent data shard.
Core attributes
- name : Define the data node name
- dataHost : the host name of the database instance, referenced from the name attribute in the dataHost tag
- database : Define the database to which the shard belongs
4.1.3 dataHost tag
This tag exists as an underlying tag in the MyCat logic library, directly defining specific database instances, read-write separation, and heartbeat statements.
core attributes
- name : unique identifier, used by the upper label
- maxCon/minCon : maximum number of connections/minimum number of connections
- balance : load balancing strategy, value 0,1,2,3
- writeType : write operation distribution method (0: write operation is forwarded to the first writeHost, if the first one hangs up, switch to the second one; 1: write operation is randomly distributed to the configured writeHost)
- dbDriver : database driver, supports native, jdbc (if we are using MySQL8 or later, this place can be replaced with jdbc)
4.2 rule.xml
All split table rules are defined in rule.xml , and the fragmentation algorithm can be used flexibly during use, or different parameters can be used for the same fragmentation algorithm, which allows the fragmentation process to be configured.
As shown below, rule.xml will be finally referenced here
The figure below is the content of the rule.xml file
4.3 server.xml
The server.xml configuration file contains the system configuration information of MyCat. There are two important tags: system and user
4.3.1 system label
Configure the system configuration information in MyCat (system operating environment information), the corresponding system configuration items and their meanings
Attributes | value | meaning |
---|---|---|
charset | utf8 | Set the character set of Mycat, the character set needs to be consistent with the character set of MySQL |
nonePasswordLogin | 0,1 | 0 means a password is required to log in, 1 means a password is not required to log in, the default is 0, if it is set to 1, a default account needs to be specified |
useHandshakeV10 | 0,1 | The main purpose of using this option is to be compatible with higher versions of the jdbc driver, whether to use HandshakeV10Packet to communicate with the client, 1: yes, 0: no |
useSqlStat | 0,1 | Enable SQL real-time statistics, 1 is enabled, 0 is disabled; after enabled, MyCat will automatically count the execution of SQL statements; mysql -h 127.0.0.1 -P 9066-u root -p View the SQL executed by MyCat, the execution efficiency is relatively low SQL, the overall execution of SQL, the ratio of reading and writing, etc.; show @@sql ; show@@sql.slow ; show @@sql.sum ; |
useGlobleTableCheck | 0,1 | Whether to enable the consistency check of the global table. 1 is on, 0 is off. |
sqlExecuteTimeout | 1000 | The timeout time of SQL statement execution, the unit is s; |
sequnceHandlerType | 0,1,2 | It is used to specify the Mycat global sequence type, 0 is a local file, 1 is a database method, and 2 is a timestamp column method. By default, the local file method is used. The file method is mainly used for testing |
sequenceHandlerPattern | regular expression | Must enter the sequence matching process with MYCATSEQ or mycatseq Note that MYCATSEQ_ has spaces |
subqueryRelationshipCheck | true,false | If there is an associated query in the subquery, check whether there is a fragment field in the associated field. The default is false |
useCompression | 0,1 | Enable mysql compression protocol, 0: off, 1: on |
fakeMySQLVersion | 5.5,5.6 | Set the simulated MySQL version number |
defaultSqlParser | Since the initial version of MyCat used FoundationDB's SQL parser, the Druid parser was added after MyCat1.3, so the defaultSqlParser property should be set to specify the default parser; there are two parsers: druidparser and fdbparser, in MyCat1.4 After that, the default is druidparser, fdbparser has been abolished | |
processors | 1,2… | Specify the number of threads available in the system. The default value is CPU core x number of running threads per core; processors will affect the properties of processorBufferPool, processorBufferLocalPercent, and processorExecutor. All, during performance tuning, the value of processors can be modified appropriately |
processorBufferChunk | Specifies that the default value of Socket Direct Buffer is 4096 bytes each time, which will also affect the length of BufferPool. If too many bytes are obtained at one time and the buffer is not enough, a warning will appear, and the value can be increased | |
processorExecutor | Specifies the size of the shared businessExecutor fixed thread pool on NIOProcessor; MyCat hands asynchronous tasks to the businessExecutor thread pool. In the new version of MyCat, this connection pool is not used frequently, so the value can be appropriately reduced | |
packetHeaderSize | Specifies the length of the header in the MySQL protocol, the default is 4 bytes | |
maxPacketSize | Specify the maximum size of data that the MySQL protocol can carry, the default value is 16M | |
idleTimeout | 30 | Specify the timeout length of the idle time of the connection; if it times out, the resource will be closed and recycled, the default is 30 minutes |
txIsolation | 1,2,3,4 | Initialize the transaction isolation level of the front-end connection, the default is REPEATED_READ, and the corresponding number is 3READ_UNCOMMITED=1;READ_UNCOMMITED=1;SERIALIZABLE=4; |
sqlExecuteTimeout | 300 | The timeout period for executing SQL, if the SQL statement execution times out, the connection will be closed; the default is 300 seconds; |
serverPort | 8066 | Define the port used by MyCat, the default is 8066 |
managerPort | 9066 | Define the management port of MyCat, the default is 9066 |
4.3.2 user label
Configure users and their permission information
If the corresponding permissions are not configured, it means that it can perform any operation on all the logical tables in the logical library
If you want to control, you need to control it in the privileges tag
dml="0000" means no permission
dml = "1111" indicates that it has the permission to add, modify, check and delete
dml="1110" indicates that it has the permission to add, modify and check
如果逻辑库与逻辑表配置的权限不一样,那我们就按照table逻辑表的权限为准
5. MyCat Fragmentation
Vertical table split : Split a table structure into multiple table structures, and the two table structures are associated through primary keys or foreign keys. Vertical table division requires us to operate in the business program
We will not demonstrate the vertical sub-table, let's look at the vertical sub-library
5.1 Vertical sub-library
5.1.1 Scenarios
In the business system, the following table structure is involved. However, since users and orders generate a large amount of data every day, the data storage and processing capabilities of a single server are limited. The database table can be split. The original database table is as follows .
Now consider the vertical database splitting operation, splitting commodity -related tables into a database server, splitting the order table into a database server, and splitting the user and province/city tables into a server. The final structure is as follows:
5.1.2 Server
Three servers, and create database shopping on 192.168.200.210, 192.168.200.213, 192.168.200.214.
5.1.3 schema.xml configuration
It is necessary to configure the corresponding logic library and logic table, and then execute which shard the logic table falls into and which data node is associated
primaryKey : The logical table corresponds to the primary key of the real table
We do not specify the rule below, only specify the rule when it comes to sub-table, we are vertical sub-library here, so there is no need to specify the allocation rule
<schema name="SHOPPING" checkSQLschema="true" sqlMaxLimit="100">
<table name="tb_goods_base" dataNode="dn1" primaryKey="id" />
<table name="tb_goods_brand" dataNode="dn1" primaryKey="id" />
<table name="tb_goods_cat" dataNode="dn1" primaryKey="id" />
<table name="tb_goods_desc" dataNode="dn1" primaryKey="goods_id" />
<table name="tb_goods_item" dataNode="dn1" primaryKey="id" />
<table name="tb_order_item" dataNode="dn2" primaryKey="id" />
<table name="tb_order_master" dataNode="dn2" primaryKey="order_id" />
<table name="tb_order_pay_log" dataNode="dn2" primaryKey="out_trade_no" />
<table name="tb_user" dataNode="dn3" primaryKey="id" />
<table name="tb_user_address" dataNode="dn3" primaryKey="id" />
<table name="tb_areas_provinces" dataNode="dn3" primaryKey="id"/>
<table name="tb_areas_city" dataNode="dn3" primaryKey="id"/>
<table name="tb_areas_region" dataNode="dn3" primaryKey="id"/>
</schema>
The node host associated with the data node/shard node , which is associated with the shopping database
<dataNode name="dn1" dataHost="dhost1" database="shopping" />
<dataNode name="dn2" dataHost="dhost2" database="shopping" />
<dataNode name="dn3" dataHost="dhost3" database="shopping" />
database information
<dataHost name="dhost1" maxCon="1000" minCon="10" balance="0"
writeType="0" dbType="mysql" dbDriver="jdbc" switchType="1"
slaveThreshold="100">
<heartbeat>select user()</heartbeat>
<writeHost host="master" url="jdbc:mysql://192.168.200.210:3306? useSSL=false&serverTimezone=Asia/Shanghai&characterEncoding=utf8"
user="root" password="1234" />
</dataHost>
<dataHost name="dhost2" maxCon="1000" minCon="10" balance="0"
writeType="0" dbType="mysql" dbDriver="jdbc" switchType="1"
slaveThreshold="100">
<heartbeat>select user()</heartbeat>
<writeHost host="master" url="jdbc:mysql://192.168.200.213:3306?useSSL=false&serverTimezone=Asia/Shanghai&characterEncoding=utf8"
user="root" password="1234" />
</dataHost>
<dataHost name="dhost3" maxCon="1000" minCon="10" balance="0"
writeType="0" dbType="mysql" dbDriver="jdbc" switchType="1"
slaveThreshold="100">
<heartbeat>select user()</heartbeat>
<writeHost host="master" url="jdbc:mysql://192.168.200.214:3306?useSSL=false&serverTimezone=Asia/Shanghai&characterEncoding=utf8"
user="root" password="1234" />
</dataHost>
5.1.4 server.xml configuration
<user name="root" defaultAccount="true">
<property name="password">123456</property>
<property name="schemas">SHOPPING</property>
<!-- 表级 DML 权限设置 -->
<!--
<privileges check="true">
<schema name="DB01" dml="0110" >
<table name="TB_ORDER" dml="1110"></table>
</schema>
</privileges>
-->
</user>
<user name="user">
<property name="password">123456</property>
<property name="schemas">SHOPPING</property>
<property name="readOnly">true</property>
</user>
5.2 Vertical sub-database test
The table is only defined in schema.xml, it only exists logically, it does not exist in the database, we need to create a table structure
source /root/shopping-table.sql
source /root/shopping-insert.sql
If we access the 192.168.200.213 and 192.168.200.214 databases at the same time, an error will be reported if there is no configuration, because MyCat needs to route to a specific database server when executing the SQL statement, and currently no database server is completely Contains the table structure of orders and provinces and cities, causing the SQL statement to fail and report an error.
The table of provinces and districts belongs to the dictionary table in the business system, and the data in it is not much and will not change, so for this type of table, we can set it as a global table, which is beneficial to business operations
Global table: This table exists in each data fragment, and the data is consistent
5.2.1 Configuring global tables
The province, city, district/county tables tb_areas_provinces, tb_areas_city, tb_areas_region belong to the data dictionary table and may be encountered in multiple business modules. They can be set as global tables to facilitate business operations.
You only need to add the label type="global" when designing the logical table table
<table name="tb_areas_provinces" dataNode="dn1,dn2,dn3" primaryKey="id" type="global"/>
<table name="tb_areas_city" dataNode="dn1,dn2,dn3" primaryKey="id" type="global"/>
<table name="tb_areas_region" dataNode="dn1,dn2,dn3" primaryKey="id" type="global"/>
5.3 Horizontal table
5.3.1 Scenarios
In the business system, there is a log table. The business system will generate a large amount of log data every day. The data storage and processing capabilities of a single server are limited, and the database table can be split.
Create a logical table tb_log. The data volume of this table should be stored in three nodes scatteredly. At this time, the table structures stored in the three databases are the same, but the stored data is different.
5.3.2 Server
5.3.3 schema.xml configuration
The table is divided here, so we declare the rule of table division rule="mod-long"
rule="mod-long", it will calculate the modulus according to the primary key id (default is modulo 3), if the result is 0, it will fall on the first node, if it is 1, it will fall on the second node, if it is 2, it will fall on the third node nodes
We don't need to define the datahost tag, it was defined before
<schema name="ITCAST" checkSQLschema="true" sqlMaxLimit="100">
<table name="tb_log" dataNode="dn4,dn5,dn6" primaryKey="id" rule="mod-long" />
</schema>
<dataNode name="dn4" dataHost="dhost1" database="itcast" />
<dataNode name="dn5" dataHost="dhost2" database="itcast" />
<dataNode name="dn6" dataHost="dhost3" database="itcast" />
5.3.4 server.xml configuration
Configure the root user to access both the SHOPPING logic library and the ITCAST logic library
<user name="root" defaultAccount="true">
<property name="password">123456</property>
<property name="schemas">SHOPPING,ITCAST</property>
<!-- 表级 DML 权限设置 -->
<!--
<privileges check="true">
<schema name="DB01" dml="0110" >
<table name="TB_ORDER" dml="1110"></table>
</schema>
</privileges>
-->
</user>
5.4 Level sub-table test
CREATE TABLE tb_log (
id bigint(20) NOT NULL COMMENT 'ID',
model_name varchar(200) DEFAULT NULL COMMENT '模块名',
model_value varchar(200) DEFAULT NULL COMMENT '模块值',
return_value varchar(200) DEFAULT NULL COMMENT '返回值',
return_class varchar(200) DEFAULT NULL COMMENT '返回值类型',
operate_user varchar(20) DEFAULT NULL COMMENT '操作用户',
operate_time varchar(20) DEFAULT NULL COMMENT '操作时间',
param_and_value varchar(500) DEFAULT NULL COMMENT '请求参数名及参数值',
operate_class varchar(200) DEFAULT NULL COMMENT '操作类',
operate_method varchar(200) DEFAULT NULL COMMENT '操作方法',
cost_time bigint(20) DEFAULT NULL COMMENT '执行方法耗时, 单位 ms',
source int(1) DEFAULT NULL COMMENT '来源 : 1 PC , 2 Android , 3 IOS',
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
INSERT INTO tb_log (id, model_name, model_value, return_value, return_class, operate_user, operate_time, param_and_value, operate_class, operate_method, cost_time,source) VALUES('1','user','insert','success','java.lang.String','10001','2022-01-06 18:12:28','{\"age\":\"20\",\"name\":\"Tom\",\"gender\":\"1\"}','cn.itcast.controller.UserController','insert','10',1);
INSERT INTO tb_log (id, model_name, model_value, return_value, return_class, operate_user, operate_time, param_and_value, operate_class, operate_method, cost_time,source) VALUES('2','user','insert','success','java.lang.String','10001','2022-01-06 18:12:27','{\"age\":\"20\",\"name\":\"Tom\",\"gender\":\"1\"}','cn.itcast.controller.UserController','insert','23',1);
INSERT INTO tb_log (id, model_name, model_value, return_value, return_class, operate_user, operate_time, param_and_value, operate_class, operate_method, cost_time,source) VALUES('3','user','update','success','java.lang.String','10001','2022-01-06 18:16:45','{\"age\":\"20\",\"name\":\"Tom\",\"gender\":\"1\"}','cn.itcast.controller.UserController','update','34',1);
INSERT INTO tb_log (id, model_name, model_value, return_value, return_class, operate_user, operate_time, param_and_value, operate_class, operate_method, cost_time,source) VALUES('4','user','update','success','java.lang.String','10001','2022-01-06 18:16:45','{\"age\":\"20\",\"name\":\"Tom\",\"gender\":\"1\"}','cn.itcast.controller.UserController','update','13',2);
INSERT INTO tb_log (id, model_name, model_value, return_value, return_class, operate_user, operate_time, param_and_value, operate_class, operate_method, cost_time,source) VALUES('5','user','insert','success','java.lang.String','10001','2022-01-06 18:30:31','{\"age\":\"200\",\"name\":\"TomCat\",\"gender\":\"0\"}','cn.itcast.controller.UserController','insert','29',3);
INSERT INTO tb_log (id, model_name, model_value, return_value, return_class, operate_user, operate_time, param_and_value, operate_class, operate_method, cost_time,source) VALUES('6','user','find','success','java.lang.String','10001','2022-01-06 18:30:31','{\"age\":\"200\",\"name\":\"TomCat\",\"gender\":\"0\"}','cn.itcast.controller.UserController','find','29',2);
6. Fragmentation rules
6.1 Range Sharding
When sharding, we specify the columns id field for sharding, and specify the algorithm algorithm rang-long
If we want to customize the range, we only need to modify the autopartition-long.txt file
Attributes | describe |
---|---|
columns | Identify the table fields that will be sharded |
algorithm | Specify the corresponding relationship between slice function and function |
class | Specify the class corresponding to the fragmentation algorithm |
mapFile | Corresponding external configuration file |
type | The default value is 0; 0 means Integer, 1 means String |
defaultNode | Default node The default node is used: when enumerating fragments, if you encounter an unrecognized enumeration value, let it route to the default node; if there is no default value, you will report an error if you encounter an unrecognized enumeration value |
6.2 Modular Sharding
Perform a modulo operation based on the specified field value and the number of nodes, and determine which shard the data belongs to according to the operation result
Attributes | describe |
---|---|
columns | Identify the table fields that will be sharded |
algorithm | Specify the corresponding relationship between slice function and function |
class | Specify the class corresponding to the fragmentation algorithm |
count | Number of data nodes |
6.3 Consistent hash
During the sharding operation, the hash value of the field we specified will be calculated, and then according to the hash value of the field, it will be determined which data node the current record should fall on.
Let's add another node, as long as our id is the same, it will be placed in the same shard, so don't worry about whether the same data will fall in the same shard after adding shards
Attributes | describe |
---|---|
columns | Identify the table fields that will be sharded |
algorithm | Specify the corresponding relationship between slice function and function |
class | Specify the class corresponding to the fragmentation algorithm |
seed | Create the seed of the murmur_hash object, default 0 |
count | The number of database nodes to be sharded must be specified, otherwise it cannot be sharded |
virtualBucketTimes | 一个实际的数据库节点被映射为这么多虚拟节点,默认是160倍,也就是虚拟节点数是物理节点数的160倍;virtualBucketTimes*count就是虚拟结点数量 ; |
weightMapFile | 节点的权重,没有指定权重的节点默认是1。以properties文件的格式填写,以从0开始到count-1的整数值也就是节点索引为key,以节点权重值为值。所有权重值必须是正整数,否则以1代替 |
bucketMapPath | 用于测试时观察各物理节点与虚拟节点的分布情况,如果指定了这个属性,会把虚拟节点的murmur hash值与物理节点的映射按行输出到这个文件,没有默认值,如果不指定,就不会输出任何东西 |
6.4 枚举分片
通过在配置文件中配置可能的枚举值,指定数据分布到不同数据节点上,本规则适用于按照省份、性别、状态查分数据扥业务
如果枚举值是1,那就落在第一个数据节点;如果枚举值是2,那就落在第二个数据节点;如果枚举值是3,那就落在第三个数据节点
默认节点:如果我们传入的数据超过了枚举值,那我们就默认放入到第三个数据节点中
mapFile映射文件关联的是一个外部文件,在这个外部文件当中所配置的就是枚举值与对应的分片结点
属性 | 描述 |
---|---|
columns | 标识将要分片的表字段 |
algorithm | 指定分片函数与function的对应关系 |
class | 指定该分片算法对应的类 |
mapFile | 对应的外部配置文件 |
type | 默认值为0 ; 0 表示Integer , 1 表示String |
defaultNode | 默认节点 ; 小于0 标识不设置默认节点 , 大于等于0代表设置默认节点 ;默认节点的所用:枚举分片时,如果碰到不识别的枚举值, 就让它路由到默认节点 ; 如果没有默认值,碰到不识别的则报错 。 |
6.5 应用指定算法
运行阶段由应用自主决定路由到那个分片 , 直接根据字符子串(必须是数字)计算分片号。
比如是0就落在第一个分片上,1就落在第二个分片上,2就落在第三个分片上
属性 | 描述 |
---|---|
columns | 标识将要分片的表字段 |
algorithm | 指定分片函数与function的对应关系 |
class | 指定该分片算法对应的类 |
startIndex | 字符子串起始索引 |
size | 字符长度 |
partitionCount | 分区(分片)数量 |
defaultPartition | 默认分片(在分片数量定义时, 字符标示的分片编号不在分片数量内时,使用默认分片) |
插入下面的数据
按照我们上面的配置,id值前两位会决定在哪个数据节点当中
startIndex=0,size = 2
0000001 路由到第一个分片,0100001路由到第二个分片
CREATE TABLE tb_app (
id varchar(10) NOT NULL COMMENT 'ID',
name varchar(200) DEFAULT NULL COMMENT '名称',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
insert into tb_app (id,name) values('0000001','Testx00001');
insert into tb_app (id,name) values('0100001','Test100001');
insert into tb_app (id,name) values('0100002','Test200001');
insert into tb_app (id,name) values('0200001','Test300001');
insert into tb_app (id,name) values('0200002','TesT400001');
6.6 固定Hash算法
该算法类似于十进制求模运算,但是为二进制的操作
例如:取id的二进制低10位与1111111111 进行位 & 运算,位与运算最小值为 0000000000,最大值为1111111111,转换为十进制,也就是位于0-1023之间。
特点
- 如果是求模,连续的值,分别分配到各个不同的分片;但是此算法会将连续的值可能分配到相同的分片,降低事务处理的难度。
- 可以均匀分配,也可以非均匀分配。
- 分片字段必须为数字类型。
下面的语句分别代表分片的数量和分片的长度
2代表有两个分片节点,1代表有一个分片节点
并且前面两个分页节点长度都是256,后面一个分页节点长度是512
512+512=1024
<property name=“partitionCount”>2,1</property>
<property name=“partitionLength”>256,512</property>
也就是说前面两个分片节点dn4,dn5,每一个分片节点的长度数256
dn6分片节点的长度是256
比如我们想插入一个id为515的数据,那515&1023(都转换成二进制),最终会算出一个二进制,将二进制再转换成十进制,并对应的对应的分片,下图是对应在第三个数据分片上
属性 | 描述 |
---|---|
columns | 标识将要分片的表字段名 |
algorithm | 指定分片函数与function的对应关系 |
class | 指定该分片算法对应的类 |
partitionCount | 分片个数列表 |
partitionLength | 分片范围列表 |
6.7 字符串Hash解析
截取字符串中的指定位置的子字符串, 进行hash算法, 算出分片。
<property name=“hashSlice”>0:2</property>,通俗的解释这段话的含义就是截取的字符串是哪一部分,从下表为0的地方开始,截取到索引为2的位置(包含)
属性 | 描述 |
---|---|
columns | 标识将要分片的表字段 |
algorithm | 指定分片函数与function的对应关系 |
class | 指定该分片算法对应的类 |
partitionLength | hash求模基数 ; length*count=1024 (出于性能考虑) |
partitionCount | 分区数 |
hashSlice | hash运算位 , 根据子字符串的hash运算 ; 0 代表 str.length(), -1 代表 str.length()-1 , 大于0只代表数字自身 ; 可以理解为substring(start,end),start为0则只表示0 |
6.8 按天(日期)分片
按照日期及对应的时间周期来分片
partionday: 10 周期为10天来进行分片
需要注意的是:
我们在配置逻辑表的时候所指定的分片有三个,dn4,dn5,dn6,那么在配置分片规则的时候所计算出来的分片数量也必须得是三个
2022-01-01 到 2022-01-30 每10天一个周期, 刚好是三个分片
属性 | 描述 |
---|---|
columns | 标识将要分片的表字段 |
algorithm | 指定分片函数与function的对应关系 |
class | 指定该分片算法对应的类 |
dateFormat | 日期格式 |
sBeginDate | 开始日期 |
sEndDate | 结束日期,如果配置了结束日期,则代码数据到达了这个日期的分片后,会重复从开始分片插入 |
sPartionDay | 分区天数,默认值 10 ,从开始日期算起,每个10天一个分区 |
6.9 按自然月分片
使用场景为按照月份来分片,每个自然月为一个分片。
如果查过了end时间,就需要从头再计算分片,比如插入的是4月份的数据,就会落在第一个节点当中
属性 | 描述 |
---|---|
columns | 标识将要分片的表字段 |
algorithm | 指定分片函数与function的对应关系 |
class | 指定该分片算法对应的类 |
dateFormat | 日期格式 |
sBeginDate | 开始日期 |
sEndDate | 结束日期,如果配置了结束日期,则代码数据到达了这个日期的分片后,会重复从开始分片插入 |
七、MyCat管理及监控
MyCat是数据库分库分表的中间件,有了这个中间件之后,我们的应用程序不需要去直接连接底层的数据库。而只需要去连接MyCat这个中间件,由MyCat这个中间件再去连接底层的数据库。从而完成数据库的分库分表操作。
7.1 原理
我们要想通过MyCat来完成数据库以及表结构的拆分,那么就需要在MyCat的配置文件中去配置对应的逻辑库逻辑表
我们在配置逻辑表的时候,通常会指定逻辑表的数据会位于哪几个数据节点以及分片规则是什么
比如我们这个地方指定的分片规则是根据status字段的枚举值进行分片
对于MyCat来说,接收客户端的请求再去执行以及解析这一块的SQL语句的时候,他会去统计所执行的SQL语句,执行了哪些SQL语句、SQL语句执行的频次、这些SQL语句操作了哪些数据库、哪些表、SQL语句的执行耗时、哪些SQL语句的执行效率比较低、整个MyCat服务器CPU内存磁盘的使用情况
对于上面的情况都会进行监控
我们要发送一条insert语句给MyCat,要经历哪几步操作呢?
-
解析SQL
-
分片分析
这个地方的分片,就是根据配置文件中配置的分片规则,即根据status进行分片
-
路由分析
此处是根据status的值进行路由
-
读写分析分析
假如我们执行select * from tb_user语句,是会进入到哪一个分片?
依然是解析SQL、分片分析、路由分析、读写分离分析…
而在分片分析、路由分析时,都是根据我们配置文件中配置的分片规则status字段,所以我们现在就来看一下查询语句中是否status字段,很显然我们这里是没有的,所以我们将这条SQL路由到所有的分片节点,在所有的服务器上同时执行此SQL语句,执行结果再返回给MyCat。
如果有status字段的话,我们就能根据status的值确定分配在哪个数据分片、路由到哪里
如果没有status字段的时候,会将这条SQL路由给下面的各个分片节点
在MyCat中需要对结果集进行再次的处理,主要包括结果合并、聚合处理、排序处理、分页处理…
select * from tb_user where status in(1,3) order by id; SQL语句是怎么个执行过程?
解析SQL、分片分析、路由分析。读写分离分析
但是在分片分析的时候会判定当前语句的查询条件,status in(1,3),而status 刚好就是我们分片规则中的字段,那1对应的就是第一个节点,3对应的就是第三个节点
所以此条SQL语句会路由到第一个节点和第三个节点,是不会路由到第二个节点的
第一、三节点执行完SQL语句后将执行结果进行返回
在MyCat当中需要对结果集进行结果合并、聚合处理、排序处理、分页处理等
7.2 管理工具
Mycat默认开通2个端口,可以在server.xml中进行修改。
- 8066 数据访问端口,即进行 DML 和 DDL 操作
- 9066 数据库管理端口,即 mycat 服务管理控制功能,用于管理mycat的整个集群状态
连接MyCat,如果是本地的话,不需要添加-h 192.168.200.210,如果是远程的,需要添加 -h 服务端口号
mysql -h 192.168.200.210 -p 9066 -uroot -p123456
管理指令
命令 | 含义 |
---|---|
show @@help | 查看Mycat管理工具帮助文档 |
show @@version | 查看Mycat的版本 |
reload @@config | 重新加载Mycat的配置文件 |
show @@datasource | 查看Mycat的数据源信息 |
show @@datanode | 查看MyCat现有的分片节点信息 |
show @@threadpool | 查看Mycat的线程池信息 |
show @@sql | 查看执行的SQL |
show @@sql.sum | 查看执行的SQL统计 |
上面是命令行的形式,不够直观,我们可以借助管理工具进行查看
Mycat-web(Mycat-eye)是对mycat-server提供监控服务,功能不局限于对mycat-server使用。他通过JDBC连接对Mycat、Mysql监控,监控远程服务器(目前仅限于linux系统)的cpu、内存、网络、磁盘。
Mycat-eye needs to rely on zookeeper during operation, so zookeeper needs to be installed first.
7.2.1 Installation
zookeeper installation
Mycat-web installation
Video: MyCat Surveillance 1
7.3 MyCat monitoring
Configure MyCat information
to the second node
After the first and third nodes execute the SQL statement, the execution result will be returned
In MyCat, the result set needs to be merged, aggregated, sorted, paginated, etc.
7.2 Management tools
Mycat opens 2 ports by default, which can be modified in server.xml.
- 8066 data access port for DML and DDL operations
- 9066 database management port, that is, the mycat service management control function, used to manage the entire cluster status of mycat
To connect to MyCat, if it is local, you do not need to add -h 192.168.200.210, if it is remote, you need to add -h service port number
mysql -h 192.168.200.210 -p 9066 -uroot -p123456
management order
Order | meaning |
---|---|
show @@help | View the Mycat management tool help documentation |
show @@version | Check the version of Mycat |
reload @@config | Reload the configuration file of Mycat |
show @@datasource | View Mycat data source information |
show @@datanode | View MyCat's existing shard node information |
show @@threadpool | View Mycat's thread pool information |
show @@sql | View the executed SQL |
show @@sql.sum | View executed SQL statistics |
The above is in the form of the command line, which is not intuitive enough. We can check it with the help of management tools
Mycat-web (Mycat-eye) is a monitoring service for mycat-server, and its function is not limited to the use of mycat-server. He monitors Mycat and Mysql through a JDBC connection, and monitors the cpu, memory, network, and disk of remote servers (currently limited to Linux systems).
Mycat-eye needs to rely on zookeeper during operation, so zookeeper needs to be installed first.
7.2.1 Installation
zookeeper installation
Mycat-web installation
Video: MyCat Surveillance 1
7.3 MyCat monitoring
Configure MyCat information