MySQL - sub-database sub-table, MyCat configuration, sharding rules

Sub-library and sub-table

I. Overview

1.1 Reasons for sub-database and sub-table

Disadvantages of single database

  • IO bottleneck: too much hotspot data, insufficient database cache, a large amount of disk IO, and low efficiency. Too much requested data, insufficient bandwidth, and network IO bottleneck.
  • CPU bottleneck: SQL for sorting, grouping, connection query, aggregate statistics, etc. will consume a lot of CPU resources. If there are too many requests, CPU bottlenecks will appear.

image-20230531143318439

In order to solve the above problems, we need to sub-database sub-table processing of the database

​ Distributed storage of data reduces the data volume of a single database/table to alleviate the performance problem of a single database, thereby achieving the purpose of improving database performance.

image-20230531143353386

1.2 Split strategy

​ There are two main forms of sub-database sub-table : vertical split and horizontal split .

​ The granularity of splitting is generally divided into sub-databases and sub-tables .

​Split database : Split a database and store the data in one database in multiple databases

​Split table : the data originally stored in one table structure is now stored in multiple table structures

image-20230531152127778

1.2.1 Vertical Split

Vertical sub-library : Based on the table, split different tables into different libraries according to the business.

features

  • The table structure of each library is different
  • The data of each database is different
  • The union of all libraries is the full amount of data.

image-20230531154929305

Vertical table split : Based on fields, different fields are split into different tables according to field attributes.

The data stored in one table is now scattered and stored in two table structures, and these two tables can be located in different servers

Vertical table splitting : split a table structure into multiple table structures, and the two table structures are associated through primary keys or foreign keys

features

  • Each table has a different structure
  • The data in each table is also different, and is generally associated through a column (primary key/foreign key).
  • The union of all tables is the full amount of data.

image-20230531155131097

1.2.2 Horizontal split

Horizontal sub-database : based on the field, according to a certain strategy, split the data of one database into multiple databases .

Features :

  • The table structure of each library is the same.
  • The data in each library is different.
  • The union of all libraries is the full amount of data.

image-20230531160656676

Horizontal table split : split the data of one table into multiple tables based on fields and according to a certain strategy .

Features :

  • The table structure of each table is the same.
  • The data in each table is different.
  • The union of all tables is the full amount of data.

1.3 Implementation Technology

After splitting the database, the application needs to access multiple databases.

​ In the application program, we also need to decide which database to operate based on the current business execution, which will increase the difficulty of writing code for the application program, and the processing is quite cumbersome.

At present , many technologies have emerged to solve the problem of sub-database sub-table

  • shardingJDBC

    Based on the AOP principle, the locally executed SQL is intercepted, parsed, rewritten, and routed in the application program. It needs to be coded and configured by itself. It only supports java language and has high performance.

  • MyCat

    Database sub-database sub-table middleware can realize sub-database sub-table without adjusting the code, supports multiple languages, and its performance is not as good as the former.

    There is no need to consider which database we need to connect to every time, which database we need to operate (direct access to MyCat), and we don’t need to integrate any third-party dependencies in the application, and we don’t need to do other coding and configuration

image-20230531162054020

2. Install Mycat

2.1 Introduction

Mycat is an open source, active, Java-based MySQL database middleware.

​ You can use mycat like using mysql, and developers don't feel the existence of mycat at all.

​ Developers only need to connect to MyCat, but they don’t need to care about how many databases are used in the bottom layer, and what data is stored in each database server. The specific strategy of sub-database and sub-table only needs to be configured in MyCat.

Disguised protocol : MyCat disguises the MySQL protocol, so we can regard MyCat as a MySQL.

For the application program, we don't need to care about whether it uses MyCat or MySQL. The application program only needs to replace the MySQL connection with the MyCat connection, and the driver does not need to be changed.

image-20230531162054020

Advantage

  • Reliable and stable performance
  • Strong technical team
  • Perfect system
  • active community

In the overall structure of MyCat, it is divided into two parts: the upper logical structure and the lower physical structure

Logical library : A logical database does not store specific data, and specific data is stored in a physical structure.

A logical library can include several logical tables, and a logical table is associated with several fragment nodes, also called data nodes.

Sharding node : also called data node, the data of tableA is scattered among the 3 sharding nodes. When the data in tableA is associated with the first shard node and when it is associated with the second shard node is determined by the sharding rules

The three shard nodes are associated with three databases. The underlying physical structure is the real database. The host where each database resides is called the node host .

MyCat does not store data. It only involves some logical sharding and other aggregation operations. It does not store specific data. The specific data is still stored in the underlying MySQL database.

image-20230531191315290

2.2 Installation

Download address : https://github.com/MyCATApache/Mycat-Server/releases

unzip

image-20230531171355528

Configure environment variables

image-20230531171347737

Configure PATH

image-20230531171502934

Then directly double-click to run the startup_nowrap.bat file in the bin directory. If the operation is successful, it will be displayed as follows

image-20230531173035148

3. Getting Started with MyCat

​ Due to the large amount of data in the tb_order table, the disk IO and capacity have reached the bottleneck. Now it is necessary to fragment the data in the tb_order table and divide it into three nodes. Each node host is located on a different server.

The table structure stored in the three databases is the same, but the data is different

image-20230531192631215

3.1 Environment preparation

Four servers, MyCat middleware server is associated with three servers, and database db01 is created in the above three databases

​ Do not create tables on the three databases, and do not need to execute some additions, deletions, and modifications. Now all operations are aimed at MyCat, and we need to configure some tb_order table sub-table strategies

image-20230531193201570

3.2 Fragmentation configuration

3.2.1 schema.xml

Under the conf/schema.xml file, we need to configure the corresponding <schema> , that is, we need to configure the corresponding logic library, and we also need to configure the information of the logic table (here, tb_order) and data nodes in the <schema> , the node host corresponding to the data node

Fragmentation rules : how we should split the data in this table is determined by the rule fragmentation rules

<schema> : configuration logic library

<table> : configure the logical table, the dataNode attribute indicates how many data nodes are configured in this logical table

<dataNode> : The database attribute indicates which database is associated

<dataHost> : The dbDriver attribute has two attributes, one is native and the other is jdbc. Here we can choose jdbc. For MySQL8.0, native is not perfect yet

<writeHost> : Fill in the associated database connection information

image-20230531194402899

<?xml version="1.0"?>
<!DOCTYPE mycat:schema SYSTEM "schema.dtd">
<mycat:schema xmlns:mycat="http://io.mycat/">

	<schema name="DB01" checkSQLschema="true" sqlMaxLimit="100" randomDataNode="dn1">
			<table name="tb_order" dataNode="dn1,dn2,dn3" rule="auto-sharding-long" splitTableNames ="true"/>
	</schema>

	<dataNode name="dn1" dataHost="dhost1" database="db01" />
	<dataNode name="dn2" dataHost="dhost2" database="db01" />
	<dataNode name="dn3" dataHost="dhost3" database="db01" />

	<dataHost name="dhost1" maxCon="1000" minCon="10" balance="0"
			  writeType="0" dbType="mysql" dbDriver="jdbc" switchType="1"  slaveThreshold="100">
		<heartbeat>select user()</heartbeat>

		<writeHost host="hostM1" 
		           url="jdbc:mysql://192.168.200.210:3306??serverTimezone=Asia/Shanghai&useUnicode=true&characterEncoding=utf-8&zeroDateTimeBehavior=convertToNull&useSSL=false&allowPublicKeyRetrieval=true" 
		           user="root"
				   password="1234">
		</writeHost>
	</dataHost>
	
	
	<dataHost name="dhost2" maxCon="1000" minCon="10" balance="0"
			  writeType="0" dbType="mysql" dbDriver="jdbc" switchType="1"  slaveThreshold="100">
		<heartbeat>select user()</heartbeat>

		<writeHost host="hostM1" 
		           url="jdbc:mysql://192.168.200.213:3306??serverTimezone=Asia/Shanghai&useUnicode=true&characterEncoding=utf-8&zeroDateTimeBehavior=convertToNull&useSSL=false&allowPublicKeyRetrieval=true" 
		           user="root"
				   password="1234">
		</writeHost>
	</dataHost>
	
	
	<dataHost name="dhost3" maxCon="1000" minCon="10" balance="0"
			  writeType="0" dbType="mysql" dbDriver="jdbc" switchType="1"  slaveThreshold="100">
		<heartbeat>select user()</heartbeat>

		<writeHost host="hostM1" 
		           url="jdbc:mysql://192.168.200.214:3306??serverTimezone=Asia/Shanghai&useUnicode=true&characterEncoding=utf-8&zeroDateTimeBehavior=convertToNull&useSSL=false&allowPublicKeyRetrieval=true" 
		           user="root"
				   password="1234">
		</writeHost>
	</dataHost>
	

</mycat:schema>

3.2.2 server.xml

Configure which users can access MyCat under this file, which logic libraries and tables in MyCat can be accessed

For example, in the following, the user logs in as root 123456, and the logic library that can be accessed is TESTDB, but the logic library we access is db01, so this place needs to be changed to db01

 <!--表示只能读,不能够写 -->
<property name="readOnly">true</property>

image-20230531201229430

You need to configure the user name, password, and user access rights information in server.xml. The specific configuration is as follows:

	<user name="root" defaultAccount="true">
		<property name="password">123456</property>
		<property name="schemas">DB01</property>
		<property name="defaultSchema">TESTDB</property>
		<!--No MyCAT Database selected 错误前会尝试使用该schema作为schema,不设置则为null,报错 -->
		
		<!-- 表级 DML 权限设置 -->
		<!-- 		
		<privileges check="false">
			<schema name="TESTDB" dml="0110" >
				<table name="tb01" dml="0000"></table>
				<table name="tb02" dml="1111"></table>
			</schema>
		</privileges>		
		 -->
	</user>

	<user name="user">
		<property name="password">123456</property>
		<property name="schemas">DB01</property>
		<property name="readOnly">true</property>
		<property name="defaultSchema">TESTDB</property>
	</user>

3.3 Start the test

Switch to the installation directory of MyCat and execute the following command to start MyCat:

Occupies port 8066

#启动
bin/mycat start
#停止
bin/mycat stop

Connect and log into MyCat

​ is to connect to MyCat through MySQL instructions, because MyCat actually simulates the MySQL protocol at the bottom

mysql -h 192.168.200.210 -P 8066 -uroot -p123456

​ In the schema.xml file, we configured a tb_order logic table under the db01 logic library. If we use table01, show tables; look, there is a tb_order table, but this table only exists logically in MyCat, but it does not exist in the database

show tables;

Create the table structure first in MyCat

Copy the following statement and execute it on the command line

CREATE TABLE TB_ORDER (
  id BIGINT(20) NOT NULL,
  title VARCHAR(100) NOT NULL ,
  PRIMARY KEY (id)
) ENGINE=INNODB DEFAULT CHARSET=utf8 ;

INSERT INTO TB_ORDER(id,title) VALUES(1,'goods1');
INSERT INTO TB_ORDER(id,title) VALUES(2,'goods2');
INSERT INTO TB_ORDER(id,title) VALUES(3,'goods3');

INSERT INTO TB_ORDER(id,title) VALUES(1,'goods1');
INSERT INTO TB_ORDER(id,title) VALUES(2,'goods2');
INSERT INTO TB_ORDER(id,title) VALUES(3,'goods3');
INSERT INTO TB_ORDER(id,title) VALUES(5000000,'goods5000000');
INSERT INTO TB_ORDER(id,title) VALUES(10000000,'goods10000000');
INSERT INTO TB_ORDER(id,title) VALUES(10000001,'goods10000001');
INSERT INTO TB_ORDER(id,title) VALUES(15000000,'goods15000000');
INSERT INTO TB_ORDER(id,title) VALUES(15000001,'goods15000001');

How is the data we inserted above distributed among the three databases?

​ Determined by the attributes in the figure below

image-20230531213149312

​ The auto-sharding-long sharding rule is a reference to the configuration file rule.xml defined by the sharding rules provided to us in MyCat.

​ In this sharding rule, sharding is based on id

The rang-lang in the <algorithm> tag is also a reference

image-20230531213440337

rang-lang is also a reference, which refers to a function below,

In this function, there is an attribute name in the <property> tag, name="mapFile", mapFile is a mapping file, and this mapping file is associated with a physical file autopartition-long.txt

image-20230531213714615

The physical file autopartition-long.txt, which describes the

  • If the value of id is between 1-500w, the data will be stored in the first shard database.
  • If the value of id is between 500w-1000w, the data will be stored in the second shard database.
  • If the value of id is between 1000w-1500w, the data will be stored in the third shard database.
  • If the value of id exceeds 1500w, an error will be reported when inserting data

image-20230531214058463

Four, MyCat configuration

4.1 schema.xml configuration file

​Covering the configuration of MyCat's logic library, logic tables, fragmentation rules, fragmentation nodes and data sources

The main tags involved

  • schema tag
  • datanode label
  • datahost tag

image-20230531215440424

4.1.1 schema tags

The schema tag is used to define the logic library in the MyCat instance . In a MyCat instance, there can be multiple logic libraries (and then configure a schema), and different logic libraries can be divided by the schema tag.

The concept of the logic library in MyCat is equivalent to the database concept in MySQL. When you need to operate a table under a certain logic library, you also need to switch the logic library (use xxx, such as use DB01 to switch to the DB01 logic library, this place is case-sensitive )

image-20230601093520972

core attributes

  • name : Specify a custom logic library name

  • checkSQLschema : specifies the database name during the SQL statement operation, whether to automatically remove it during execution; true: automatically removed, false: not automatically removed

    For example, the following DB01, if it is true, will be automatically removed

image-20230601094603889

  • sqlMaxLimit

    If no limit is specified for query, how many records to query in list query mode

    Because querying the full table data is too performance intensive

4.1.1.1 table label

​The table tag defines the logical table under the logic library schema in MyCat , and all tables that need to be split need to be defined in the table tag.

​Multiple logic tables can be configured under a

image-20230601095712654

Attributes

  • name : Define the logical table name, which is unique under this logical library

  • dataNode : Define the dataNode to which the logical table belongs. This attribute needs to correspond to the name in the dataNode tag; multiple dataNodes are separated by commas

  • rule : the name of the fragmentation rule, which is defined in rule.xml

  • primaryKey : The logical table corresponds to the primary key of the real table

  • type : The type of logical table. At present, there are only global tables and ordinary tables for logical tables. If not configured, it is an ordinary table; for global tables, configure it as global

4.1.2 dataNode label

​The dataNode tag defines the data nodes in MyCat , which is what we usually call data sharding .

​ A dataNode label is an independent data shard.

image-20230601100811483

​Core attributes

  • name : Define the data node name
  • dataHost : the host name of the database instance, referenced from the name attribute in the dataHost tag
  • database : Define the database to which the shard belongs

4.1.3 dataHost tag

​ This tag exists as an underlying tag in the MyCat logic library, directly defining specific database instances, read-write separation, and heartbeat statements.

image-20230601101510991

core attributes

  • name : unique identifier, used by the upper label
  • maxCon/minCon : maximum number of connections/minimum number of connections
  • balance : load balancing strategy, value 0,1,2,3
  • writeType : write operation distribution method (0: write operation is forwarded to the first writeHost, if the first one hangs up, switch to the second one; 1: write operation is randomly distributed to the configured writeHost)
  • dbDriver : database driver, supports native, jdbc (if we are using MySQL8 or later, this place can be replaced with jdbc)

4.2 rule.xml

​All split table rules are defined in rule.xml , and the fragmentation algorithm can be used flexibly during use, or different parameters can be used for the same fragmentation algorithm, which allows the fragmentation process to be configured.

​ As shown below, rule.xml will be finally referenced here

image-20230601103404845

The figure below is the content of the rule.xml file

image-20230601103815524

4.3 server.xml

​ The server.xml configuration file contains the system configuration information of MyCat. There are two important tags: system and user

4.3.1 system label

Configure the system configuration information in MyCat (system operating environment information), the corresponding system configuration items and their meanings

image-20230601104321163

Attributes value meaning
charset utf8 Set the character set of Mycat, the character set needs to be consistent with the character set of MySQL
nonePasswordLogin 0,1 0 means a password is required to log in, 1 means a password is not required to log in, the default is 0, if it is set to 1, a default account needs to be specified
useHandshakeV10 0,1 The main purpose of using this option is to be compatible with higher versions of the jdbc driver, whether to use HandshakeV10Packet to communicate with the client, 1: yes, 0: no
useSqlStat 0,1 Enable SQL real-time statistics, 1 is enabled, 0 is disabled; after enabled, MyCat will automatically count the execution of SQL statements; mysql -h 127.0.0.1 -P 9066-u root -p View the SQL executed by MyCat, the execution efficiency is relatively low SQL, the overall execution of SQL, the ratio of reading and writing, etc.; show @@sql ; show@@sql.slow ; show @@sql.sum ;
useGlobleTableCheck 0,1 Whether to enable the consistency check of the global table. 1 is on, 0 is off.
sqlExecuteTimeout 1000 The timeout time of SQL statement execution, the unit is s;
sequnceHandlerType 0,1,2 It is used to specify the Mycat global sequence type, 0 is a local file, 1 is a database method, and 2 is a timestamp column method. By default, the local file method is used. The file method is mainly used for testing
sequenceHandlerPattern regular expression Must enter the sequence matching process with MYCATSEQ or mycatseq Note that MYCATSEQ_ has spaces
subqueryRelationshipCheck true,false If there is an associated query in the subquery, check whether there is a fragment field in the associated field. The default is false
useCompression 0,1 Enable mysql compression protocol, 0: off, 1: on
fakeMySQLVersion 5.5,5.6 Set the simulated MySQL version number
defaultSqlParser Since the initial version of MyCat used FoundationDB's SQL parser, the Druid parser was added after MyCat1.3, so the defaultSqlParser property should be set to specify the default parser; there are two parsers: druidparser and fdbparser, in MyCat1.4 After that, the default is druidparser, fdbparser has been abolished
processors 1,2… Specify the number of threads available in the system. The default value is CPU core x number of running threads per core; processors will affect the properties of processorBufferPool, processorBufferLocalPercent, and processorExecutor. All, during performance tuning, the value of processors can be modified appropriately
processorBufferChunk Specifies that the default value of Socket Direct Buffer is 4096 bytes each time, which will also affect the length of BufferPool. If too many bytes are obtained at one time and the buffer is not enough, a warning will appear, and the value can be increased
processorExecutor Specifies the size of the shared businessExecutor fixed thread pool on NIOProcessor; MyCat hands asynchronous tasks to the businessExecutor thread pool. In the new version of MyCat, this connection pool is not used frequently, so the value can be appropriately reduced
packetHeaderSize Specifies the length of the header in the MySQL protocol, the default is 4 bytes
maxPacketSize Specify the maximum size of data that the MySQL protocol can carry, the default value is 16M
idleTimeout 30 Specify the timeout length of the idle time of the connection; if it times out, the resource will be closed and recycled, the default is 30 minutes
txIsolation 1,2,3,4 Initialize the transaction isolation level of the front-end connection, the default is REPEATED_READ, and the corresponding number is 3READ_UNCOMMITED=1;READ_UNCOMMITED=1;SERIALIZABLE=4;
sqlExecuteTimeout 300 The timeout period for executing SQL, if the SQL statement execution times out, the connection will be closed; the default is 300 seconds;
serverPort 8066 Define the port used by MyCat, the default is 8066
managerPort 9066 Define the management port of MyCat, the default is 9066

4.3.2 user label

Configure users and their permission information

​ If the corresponding permissions are not configured, it means that it can perform any operation on all the logical tables in the logical library

​ If you want to control, you need to control it in the privileges tag

​ dml="0000" means no permission

​ dml = "1111" indicates that it has the permission to add, modify, check and delete

​ dml="1110" indicates that it has the permission to add, modify and check

如果逻辑库与逻辑表配置的权限不一样,那我们就按照table逻辑表的权限为准

image-20230601105309216

5. MyCat Fragmentation

​Vertical table split : Split a table structure into multiple table structures, and the two table structures are associated through primary keys or foreign keys. Vertical table division requires us to operate in the business program

​ We will not demonstrate the vertical sub-table, let's look at the vertical sub-library

5.1 Vertical sub-library

5.1.1 Scenarios

​ In the business system, the following table structure is involved. However, since users and orders generate a large amount of data every day, the data storage and processing capabilities of a single server are limited. The database table can be split. The original database table is as follows .

image-20230601113538302

Now consider the vertical database splitting operation, splitting commodity -related tables into a database server, splitting the order table into a database server, and splitting the user and province/city tables into a server. The final structure is as follows:

image-20230601113712645

5.1.2 Server

Three servers, and create database shopping on 192.168.200.210, 192.168.200.213, 192.168.200.214.

image-20230601114308082

5.1.3 schema.xml configuration

It is necessary to configure the corresponding logic library and logic table, and then execute which shard the logic table falls into and which data node is associated

primaryKey : The logical table corresponds to the primary key of the real table

​ We do not specify the rule below, only specify the rule when it comes to sub-table, we are vertical sub-library here, so there is no need to specify the allocation rule

<schema name="SHOPPING" checkSQLschema="true" sqlMaxLimit="100">
  <table name="tb_goods_base" dataNode="dn1" primaryKey="id" />
  <table name="tb_goods_brand" dataNode="dn1" primaryKey="id" />
  <table name="tb_goods_cat" dataNode="dn1" primaryKey="id" />
  <table name="tb_goods_desc" dataNode="dn1" primaryKey="goods_id" />
  <table name="tb_goods_item" dataNode="dn1" primaryKey="id" />
  <table name="tb_order_item" dataNode="dn2" primaryKey="id" />
  <table name="tb_order_master" dataNode="dn2" primaryKey="order_id" />
  <table name="tb_order_pay_log" dataNode="dn2" primaryKey="out_trade_no" />
  <table name="tb_user" dataNode="dn3" primaryKey="id" />
  <table name="tb_user_address" dataNode="dn3" primaryKey="id" />
  <table name="tb_areas_provinces" dataNode="dn3" primaryKey="id"/>
  <table name="tb_areas_city" dataNode="dn3" primaryKey="id"/>
  <table name="tb_areas_region" dataNode="dn3" primaryKey="id"/>
</schema>

The node host associated with the data node/shard node , which is associated with the shopping database

<dataNode name="dn1" dataHost="dhost1" database="shopping" />
<dataNode name="dn2" dataHost="dhost2" database="shopping" />
<dataNode name="dn3" dataHost="dhost3" database="shopping" />

database information

<dataHost name="dhost1" maxCon="1000" minCon="10" balance="0"
          writeType="0" dbType="mysql" dbDriver="jdbc" switchType="1"
          slaveThreshold="100">
   <heartbeat>select user()</heartbeat>
   <writeHost host="master" url="jdbc:mysql://192.168.200.210:3306? useSSL=false&amp;serverTimezone=Asia/Shanghai&amp;characterEncoding=utf8"
       user="root" password="1234" />
</dataHost>


<dataHost name="dhost2" maxCon="1000" minCon="10" balance="0"
          writeType="0" dbType="mysql" dbDriver="jdbc" switchType="1"
          slaveThreshold="100">
    <heartbeat>select user()</heartbeat>
    <writeHost host="master" url="jdbc:mysql://192.168.200.213:3306?useSSL=false&amp;serverTimezone=Asia/Shanghai&amp;characterEncoding=utf8"
      user="root" password="1234" />
</dataHost>

<dataHost name="dhost3" maxCon="1000" minCon="10" balance="0"
          writeType="0" dbType="mysql" dbDriver="jdbc" switchType="1"
          slaveThreshold="100">
     <heartbeat>select user()</heartbeat>
     <writeHost host="master" url="jdbc:mysql://192.168.200.214:3306?useSSL=false&amp;serverTimezone=Asia/Shanghai&amp;characterEncoding=utf8"
       user="root" password="1234" />
</dataHost>


5.1.4 server.xml configuration

<user name="root" defaultAccount="true">
   <property name="password">123456</property>
   <property name="schemas">SHOPPING</property>
   <!-- 表级 DML 权限设置 -->
   <!--
   <privileges check="true">
      <schema name="DB01" dml="0110" >
         <table name="TB_ORDER" dml="1110"></table>
       </schema>
    </privileges>
    -->
</user>
<user name="user">
    <property name="password">123456</property>
    <property name="schemas">SHOPPING</property>
    <property name="readOnly">true</property>
</user>

5.2 Vertical sub-database test

The table is only defined in schema.xml, it only exists logically, it does not exist in the database, we need to create a table structure

source /root/shopping-table.sql
source /root/shopping-insert.sql

​ If we access the 192.168.200.213 and 192.168.200.214 databases at the same time, an error will be reported if there is no configuration, because MyCat needs to route to a specific database server when executing the SQL statement, and currently no database server is completely Contains the table structure of orders and provinces and cities, causing the SQL statement to fail and report an error.

The table of provinces and districts belongs to the dictionary table in the business system, and the data in it is not much and will not change, so for this type of table, we can set it as a global table, which is beneficial to business operations

Global table: This table exists in each data fragment, and the data is consistent

5.2.1 Configuring global tables

The province, city, district/county tables tb_areas_provinces, tb_areas_city, tb_areas_region belong to the data dictionary table and may be encountered in multiple business modules. They can be set as global tables to facilitate business operations.

You only need to add the label type="global" when designing the logical table table

<table name="tb_areas_provinces" dataNode="dn1,dn2,dn3" primaryKey="id" type="global"/>

<table name="tb_areas_city" dataNode="dn1,dn2,dn3" primaryKey="id" type="global"/>

<table name="tb_areas_region" dataNode="dn1,dn2,dn3" primaryKey="id" type="global"/>

image-20230601144810255

5.3 Horizontal table

5.3.1 Scenarios

​ In the business system, there is a log table. The business system will generate a large amount of log data every day. The data storage and processing capabilities of a single server are limited, and the database table can be split.

Create a logical table tb_log. The data volume of this table should be stored in three nodes scatteredly. At this time, the table structures stored in the three databases are the same, but the stored data is different.

image-20230601150140929

5.3.2 Server

image-20230601150343707

5.3.3 schema.xml configuration

The table is divided here, so we declare the rule of table division rule="mod-long"

rule="mod-long", it will calculate the modulus according to the primary key id (default is modulo 3), if the result is 0, it will fall on the first node, if it is 1, it will fall on the second node, if it is 2, it will fall on the third node nodes

We don't need to define the datahost tag, it was defined before

<schema name="ITCAST" checkSQLschema="true" sqlMaxLimit="100">
    <table name="tb_log" dataNode="dn4,dn5,dn6" primaryKey="id" rule="mod-long" />
</schema>

<dataNode name="dn4" dataHost="dhost1" database="itcast" />

<dataNode name="dn5" dataHost="dhost2" database="itcast" />

<dataNode name="dn6" dataHost="dhost3" database="itcast" />

5.3.4 server.xml configuration

Configure the root user to access both the SHOPPING logic library and the ITCAST logic library

<user name="root" defaultAccount="true">
   <property name="password">123456</property>
   <property name="schemas">SHOPPING,ITCAST</property>
   <!-- 表级 DML 权限设置 -->
   <!--
    <privileges check="true">
     <schema name="DB01" dml="0110" >
       <table name="TB_ORDER" dml="1110"></table>
     </schema>
    </privileges>
   -->
</user>

5.4 Level sub-table test

CREATE TABLE tb_log (
id bigint(20) NOT NULL COMMENT 'ID',
model_name varchar(200) DEFAULT NULL COMMENT '模块名',
model_value varchar(200) DEFAULT NULL COMMENT '模块值',
return_value varchar(200) DEFAULT NULL COMMENT '返回值',
return_class varchar(200) DEFAULT NULL COMMENT '返回值类型',
operate_user varchar(20) DEFAULT NULL COMMENT '操作用户',
operate_time varchar(20) DEFAULT NULL COMMENT '操作时间',
param_and_value varchar(500) DEFAULT NULL COMMENT '请求参数名及参数值',
operate_class varchar(200) DEFAULT NULL COMMENT '操作类',
operate_method varchar(200) DEFAULT NULL COMMENT '操作方法',
cost_time bigint(20) DEFAULT NULL COMMENT '执行方法耗时, 单位 ms',
source int(1) DEFAULT NULL COMMENT '来源 : 1 PC , 2 Android , 3 IOS',
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
INSERT INTO tb_log (id, model_name, model_value, return_value, return_class, operate_user, operate_time, param_and_value, operate_class, operate_method, cost_time,source) VALUES('1','user','insert','success','java.lang.String','10001','2022-01-06 18:12:28','{\"age\":\"20\",\"name\":\"Tom\",\"gender\":\"1\"}','cn.itcast.controller.UserController','insert','10',1);

INSERT INTO tb_log (id, model_name, model_value, return_value, return_class, operate_user, operate_time, param_and_value, operate_class, operate_method, cost_time,source) VALUES('2','user','insert','success','java.lang.String','10001','2022-01-06 18:12:27','{\"age\":\"20\",\"name\":\"Tom\",\"gender\":\"1\"}','cn.itcast.controller.UserController','insert','23',1);

INSERT INTO tb_log (id, model_name, model_value, return_value, return_class, operate_user, operate_time, param_and_value, operate_class, operate_method, cost_time,source) VALUES('3','user','update','success','java.lang.String','10001','2022-01-06 18:16:45','{\"age\":\"20\",\"name\":\"Tom\",\"gender\":\"1\"}','cn.itcast.controller.UserController','update','34',1);

INSERT INTO tb_log (id, model_name, model_value, return_value, return_class, operate_user, operate_time, param_and_value, operate_class, operate_method, cost_time,source) VALUES('4','user','update','success','java.lang.String','10001','2022-01-06 18:16:45','{\"age\":\"20\",\"name\":\"Tom\",\"gender\":\"1\"}','cn.itcast.controller.UserController','update','13',2);

INSERT INTO tb_log (id, model_name, model_value, return_value, return_class, operate_user, operate_time, param_and_value, operate_class, operate_method, cost_time,source) VALUES('5','user','insert','success','java.lang.String','10001','2022-01-06 18:30:31','{\"age\":\"200\",\"name\":\"TomCat\",\"gender\":\"0\"}','cn.itcast.controller.UserController','insert','29',3);

INSERT INTO tb_log (id, model_name, model_value, return_value, return_class, operate_user, operate_time, param_and_value, operate_class, operate_method, cost_time,source) VALUES('6','user','find','success','java.lang.String','10001','2022-01-06 18:30:31','{\"age\":\"200\",\"name\":\"TomCat\",\"gender\":\"0\"}','cn.itcast.controller.UserController','find','29',2);

6. Fragmentation rules

6.1 Range Sharding

image-20230601161053930

When sharding, we specify the columns id field for sharding, and specify the algorithm algorithm rang-long

If we want to customize the range, we only need to modify the autopartition-long.txt file

image-20230601161324848

Attributes describe
columns Identify the table fields that will be sharded
algorithm Specify the corresponding relationship between slice function and function
class Specify the class corresponding to the fragmentation algorithm
mapFile Corresponding external configuration file
type The default value is 0; 0 means Integer, 1 means String
defaultNode Default node The default node is used: when enumerating fragments, if you encounter an unrecognized enumeration value, let it route to the default node; if there is no default value, you will report an error if you encounter an unrecognized enumeration value

6.2 Modular Sharding

Perform a modulo operation based on the specified field value and the number of nodes, and determine which shard the data belongs to according to the operation result

image-20230601161641127

image-20230601163403292

Attributes describe
columns Identify the table fields that will be sharded
algorithm Specify the corresponding relationship between slice function and function
class Specify the class corresponding to the fragmentation algorithm
count Number of data nodes

6.3 Consistent hash

​ During the sharding operation, the hash value of the field we specified will be calculated, and then according to the hash value of the field, it will be determined which data node the current record should fall on.

Let's add another node, as long as our id is the same, it will be placed in the same shard, so don't worry about whether the same data will fall in the same shard after adding shards

image-20230601165003999

image-20230601165035831

Attributes describe
columns Identify the table fields that will be sharded
algorithm Specify the corresponding relationship between slice function and function
class Specify the class corresponding to the fragmentation algorithm
seed Create the seed of the murmur_hash object, default 0
count The number of database nodes to be sharded must be specified, otherwise it cannot be sharded
virtualBucketTimes 一个实际的数据库节点被映射为这么多虚拟节点,默认是160倍,也就是虚拟节点数是物理节点数的160倍;virtualBucketTimes*count就是虚拟结点数量 ;
weightMapFile 节点的权重,没有指定权重的节点默认是1。以properties文件的格式填写,以从0开始到count-1的整数值也就是节点索引为key,以节点权重值为值。所有权重值必须是正整数,否则以1代替
bucketMapPath 用于测试时观察各物理节点与虚拟节点的分布情况,如果指定了这个属性,会把虚拟节点的murmur hash值与物理节点的映射按行输出到这个文件,没有默认值,如果不指定,就不会输出任何东西

6.4 枚举分片

​ 通过在配置文件中配置可能的枚举值,指定数据分布到不同数据节点上,本规则适用于按照省份、性别、状态查分数据扥业务

如果枚举值是1,那就落在第一个数据节点;如果枚举值是2,那就落在第二个数据节点;如果枚举值是3,那就落在第三个数据节点

image-20230601170245423

默认节点:如果我们传入的数据超过了枚举值,那我们就默认放入到第三个数据节点中

mapFile映射文件关联的是一个外部文件,在这个外部文件当中所配置的就是枚举值与对应的分片结点

image-20230601170456532

属性 描述
columns 标识将要分片的表字段
algorithm 指定分片函数与function的对应关系
class 指定该分片算法对应的类
mapFile 对应的外部配置文件
type 默认值为0 ; 0 表示Integer , 1 表示String
defaultNode 默认节点 ; 小于0 标识不设置默认节点 , 大于等于0代表设置默认节点 ;默认节点的所用:枚举分片时,如果碰到不识别的枚举值, 就让它路由到默认节点 ; 如果没有默认值,碰到不识别的则报错 。

6.5 应用指定算法

运行阶段由应用自主决定路由到那个分片 , 直接根据字符子串(必须是数字)计算分片号

​ 比如是0就落在第一个分片上,1就落在第二个分片上,2就落在第三个分片上

image-20230601194700626

image-20230601194851622

属性 描述
columns 标识将要分片的表字段
algorithm 指定分片函数与function的对应关系
class 指定该分片算法对应的类
startIndex 字符子串起始索引
size 字符长度
partitionCount 分区(分片)数量
defaultPartition 默认分片(在分片数量定义时, 字符标示的分片编号不在分片数量内时,使用默认分片)

插入下面的数据

​ 按照我们上面的配置,id值前两位会决定在哪个数据节点当中

startIndex=0,size = 2

0000001 路由到第一个分片,0100001路由到第二个分片

CREATE TABLE tb_app (
  id varchar(10) NOT NULL COMMENT 'ID',
  name varchar(200) DEFAULT NULL COMMENT '名称',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

insert into tb_app (id,name) values('0000001','Testx00001');
insert into tb_app (id,name) values('0100001','Test100001');
insert into tb_app (id,name) values('0100002','Test200001');
insert into tb_app (id,name) values('0200001','Test300001');
insert into tb_app (id,name) values('0200002','TesT400001');

6.6 固定Hash算法

该算法类似于十进制求模运算,但是为二进制的操作

例如:取id的二进制低10位与1111111111 进行位 & 运算,位与运算最小值为 0000000000,最大值为1111111111,转换为十进制,也就是位于0-1023之间。

image-20230601200534450

特点

  • 如果是求模,连续的值,分别分配到各个不同的分片;但是此算法会将连续的值可能分配到相同的分片,降低事务处理的难度。
  • 可以均匀分配,也可以非均匀分配。
  • 分片字段必须为数字类型。

下面的语句分别代表分片的数量和分片的长度

2代表有两个分片节点,1代表有一个分片节点

并且前面两个分页节点长度都是256,后面一个分页节点长度是512

512+512=1024

<property name=“partitionCount”>2,1</property>

<property name=“partitionLength”>256,512</property>

也就是说前面两个分片节点dn4,dn5,每一个分片节点的长度数256

dn6分片节点的长度是256

image-20230601201020315

​ 比如我们想插入一个id为515的数据,那515&1023(都转换成二进制),最终会算出一个二进制,将二进制再转换成十进制,并对应的对应的分片,下图是对应在第三个数据分片上

image-20230601201839950

属性 描述
columns 标识将要分片的表字段名
algorithm 指定分片函数与function的对应关系
class 指定该分片算法对应的类
partitionCount 分片个数列表
partitionLength 分片范围列表

6.7 字符串Hash解析

截取字符串中的指定位置的子字符串, 进行hash算法, 算出分片

image-20230601202913197

<property name=“hashSlice”>0:2</property>,通俗的解释这段话的含义就是截取的字符串是哪一部分,从下表为0的地方开始,截取到索引为2的位置(包含)

image-20230601203116473

image-20230601203754656

属性 描述
columns 标识将要分片的表字段
algorithm 指定分片函数与function的对应关系
class 指定该分片算法对应的类
partitionLength hash求模基数 ; length*count=1024 (出于性能考虑)
partitionCount 分区数
hashSlice hash运算位 , 根据子字符串的hash运算 ; 0 代表 str.length(), -1 代表 str.length()-1 , 大于0只代表数字自身 ; 可以理解为substring(start,end),start为0则只表示0

6.8 按天(日期)分片

按照日期及对应的时间周期来分片

partionday: 10 周期为10天来进行分片

image-20230601204533559

需要注意的是:

​ 我们在配置逻辑表的时候所指定的分片有三个,dn4,dn5,dn6,那么在配置分片规则的时候所计算出来的分片数量也必须得是三个

​ 2022-01-01 到 2022-01-30 每10天一个周期, 刚好是三个分片

image-20230601210212215

属性 描述
columns 标识将要分片的表字段
algorithm 指定分片函数与function的对应关系
class 指定该分片算法对应的类
dateFormat 日期格式
sBeginDate 开始日期
sEndDate 结束日期,如果配置了结束日期,则代码数据到达了这个日期的分片后,会重复从开始分片插入
sPartionDay 分区天数,默认值 10 ,从开始日期算起,每个10天一个分区

6.9 按自然月分片

使用场景为按照月份来分片,每个自然月为一个分片

​ 如果查过了end时间,就需要从头再计算分片,比如插入的是4月份的数据,就会落在第一个节点当中

image-20230601211545329

image-20230601212019007

属性 描述
columns 标识将要分片的表字段
algorithm 指定分片函数与function的对应关系
class 指定该分片算法对应的类
dateFormat 日期格式
sBeginDate 开始日期
sEndDate 结束日期,如果配置了结束日期,则代码数据到达了这个日期的分片后,会重复从开始分片插入

七、MyCat管理及监控

​ MyCat是数据库分库分表的中间件,有了这个中间件之后,我们的应用程序不需要去直接连接底层的数据库。而只需要去连接MyCat这个中间件,由MyCat这个中间件再去连接底层的数据库。从而完成数据库的分库分表操作。

7.1 原理

我们要想通过MyCat来完成数据库以及表结构的拆分,那么就需要在MyCat的配置文件中去配置对应的逻辑库逻辑表

我们在配置逻辑表的时候,通常会指定逻辑表的数据会位于哪几个数据节点以及分片规则是什么

比如我们这个地方指定的分片规则是根据status字段的枚举值进行分片

image-20230602095403646

​ 对于MyCat来说,接收客户端的请求再去执行以及解析这一块的SQL语句的时候,他会去统计所执行的SQL语句,执行了哪些SQL语句、SQL语句执行的频次、这些SQL语句操作了哪些数据库、哪些表、SQL语句的执行耗时、哪些SQL语句的执行效率比较低、整个MyCat服务器CPU内存磁盘的使用情况

对于上面的情况都会进行监控

我们要发送一条insert语句给MyCat,要经历哪几步操作呢?

  • 解析SQL

  • 分片分析

    这个地方的分片,就是根据配置文件中配置的分片规则,即根据status进行分片

  • 路由分析

    此处是根据status的值进行路由

  • 读写分析分析

假如我们执行select * from tb_user语句,是会进入到哪一个分片?

依然是解析SQL、分片分析、路由分析、读写分离分析…

​ 而在分片分析、路由分析时,都是根据我们配置文件中配置的分片规则status字段,所以我们现在就来看一下查询语句中是否status字段,很显然我们这里是没有的,所以我们将这条SQL路由到所有的分片节点,在所有的服务器上同时执行此SQL语句,执行结果再返回给MyCat。

​ 如果有status字段的话,我们就能根据status的值确定分配在哪个数据分片、路由到哪里

​ 如果没有status字段的时候,会将这条SQL路由给下面的各个分片节点

​ 在MyCat中需要对结果集进行再次的处理,主要包括结果合并、聚合处理、排序处理、分页处理…

select * from tb_user where status in(1,3) order by id; SQL语句是怎么个执行过程?

​ 解析SQL、分片分析、路由分析。读写分离分析

​ 但是在分片分析的时候会判定当前语句的查询条件,status in(1,3),而status 刚好就是我们分片规则中的字段,那1对应的就是第一个节点,3对应的就是第三个节点

所以此条SQL语句会路由到第一个节点和第三个节点,是不会路由到第二个节点的

​ 第一、三节点执行完SQL语句后将执行结果进行返回

​ 在MyCat当中需要对结果集进行结果合并、聚合处理、排序处理、分页处理等

7.2 管理工具

Mycat默认开通2个端口,可以在server.xml中进行修改。

  • 8066 数据访问端口,即进行 DML 和 DDL 操作
  • 9066 数据库管理端口,即 mycat 服务管理控制功能,用于管理mycat的整个集群状态

连接MyCat,如果是本地的话,不需要添加-h 192.168.200.210,如果是远程的,需要添加 -h 服务端口号

mysql -h 192.168.200.210 -p 9066 -uroot -p123456

管理指令

命令 含义
show @@help 查看Mycat管理工具帮助文档
show @@version 查看Mycat的版本
reload @@config 重新加载Mycat的配置文件
show @@datasource 查看Mycat的数据源信息
show @@datanode 查看MyCat现有的分片节点信息
show @@threadpool 查看Mycat的线程池信息
show @@sql 查看执行的SQL
show @@sql.sum 查看执行的SQL统计

上面是命令行的形式,不够直观,我们可以借助管理工具进行查看

​ Mycat-web(Mycat-eye)是对mycat-server提供监控服务,功能不局限于对mycat-server使用。他通过JDBC连接对Mycat、Mysql监控,监控远程服务器(目前仅限于linux系统)的cpu、内存、网络、磁盘。

​ Mycat-eye needs to rely on zookeeper during operation, so zookeeper needs to be installed first.

7.2.1 Installation

zookeeper installation

Mycat-web installation

Video: MyCat Surveillance 1

7.3 MyCat monitoring

Configure MyCat information

to the second node

​ After the first and third nodes execute the SQL statement, the execution result will be returned

​ In MyCat, the result set needs to be merged, aggregated, sorted, paginated, etc.

7.2 Management tools

Mycat opens 2 ports by default, which can be modified in server.xml.

  • 8066 data access port for DML and DDL operations
  • 9066 database management port, that is, the mycat service management control function, used to manage the entire cluster status of mycat

To connect to MyCat, if it is local, you do not need to add -h 192.168.200.210, if it is remote, you need to add -h service port number

mysql -h 192.168.200.210 -p 9066 -uroot -p123456

management order

Order meaning
show @@help View the Mycat management tool help documentation
show @@version Check the version of Mycat
reload @@config Reload the configuration file of Mycat
show @@datasource View Mycat data source information
show @@datanode View MyCat's existing shard node information
show @@threadpool View Mycat's thread pool information
show @@sql View the executed SQL
show @@sql.sum View executed SQL statistics

The above is in the form of the command line, which is not intuitive enough. We can check it with the help of management tools

​ Mycat-web (Mycat-eye) is a monitoring service for mycat-server, and its function is not limited to the use of mycat-server. He monitors Mycat and Mysql through a JDBC connection, and monitors the cpu, memory, network, and disk of remote servers (currently limited to Linux systems).

​ Mycat-eye needs to rely on zookeeper during operation, so zookeeper needs to be installed first.

7.2.1 Installation

zookeeper installation

Mycat-web installation

Video: MyCat Surveillance 1

7.3 MyCat monitoring

Configure MyCat information

image-20230602114132193

Guess you like

Origin blog.csdn.net/weixin_51351637/article/details/131006392