Read and write separation of Mysql database based on MyCat

1. Data segmentation

1.1 Reason

  In traditional business scenarios, the amount of data is not large and the amount of concurrency is not high, so a single-machine database service can basically meet business needs. In the Internet age, data has exploded, and the amount of data and concurrency have increased dramatically, which brings many challenges to single-machine databases. In order to solve the bottleneck problem of the stand-alone database, we need to segment the database and turn the large database into multiple small databases.

1.2, data switching

  In order to realize a large database into multiple small databases, it is necessary to segment the database. Data segmentation is to distribute the data originally stored in a single database to multiple databases (generally multiple hosts) through some rules, so as to reduce the pressure of a single database. Data segmentation is generally divided into vertical segmentation and horizontal segmentation.

1.2.1, vertical segmentation

  Vertical segmentation is to segment into different databases according to different tables or schemas. The characteristics of vertical segmentation are simple rules, easy implementation, and can be divided according to business modules, with low coupling between various businesses and less mutual influence.

Advantages of vertical segmentation:

  • After the split, the business is clear and the split rules are clear;
  • Easy to expand and integrate between systems;
  • Simple data maintenance

Disadvantages of vertical segmentation:

  • There is no way to query cross-database data, but can only be called through the interface, which increases the complexity of the system;
  • Cross-database transactions need to be processed;
  • After vertical segmentation, there is still a single performance bottleneck for large watches;
1.2.2, horizontal segmentation

  Horizontal segmentation is mainly to split the data in a table into different database tables according to certain rules, which is more complicated than vertical segmentation.

Advantages of horizontal segmentation:

  • Solved the performance bottleneck of single database big data and high concurrency;
  • The splitting rules are well encapsulated and are almost transparent to the application side, and developers do not need to care about the details of splitting;
  • Improve the stability and load capacity of the system;

Disadvantages of horizontal segmentation:

  • Splitting rules are difficult to abstract;
  • There is no way to query cross-database data
  • The consistency of fragmented transactions is difficult to solve;
  • During the second expansion, data migration and maintenance are difficult.

2. Read and write separation

  Most Internet services tend to read more and write less. At this time, database read operations are often called the bottleneck of the database first. At this time, if we want to linearly improve the read performance of the database, eliminate read-write lock conflicts and improve the database The write performance, then you can use the read-write separation architecture.

  The separation of reading and writing is generally realized by the dual-system hot backup function of the database, that is, the first database server provides external services for adding, deleting and modifying services; the second database mainly performs read operations. ·

  Separating a large number of read operations from the database, allowing read operations to read data from a dedicated read database, greatly eases the access pressure of the database, and also greatly improves the response speed of reading data. Is read-write separation a "silver bullet" to solve database performance bottlenecks? What if the synchronization between the master and slave databases is delayed or even goes down? This is the drawback of the separation of read and write. When the data synchronization between the master and the slave fails, or the synchronization delay is relatively large, the data in the write library and the read library are inconsistent. For systems with high real-time requirements, users may not be able to accept this data inconsistency. .

3. Overview of MyCat

  What is MyCat? From the point of view of definition and classification, it is an open source distributed database system, front-end users can regard it as a database proxy, which can be accessed with MySql client and command line tools, and its back-end is natively MySql The protocol communicates with multiple MySql services. The core function of MyCat is to sub-database and sub-table, that is, a large table is horizontally divided into N small tables, and then stored in the back-end MySql data.

  MyCat is no longer a pure MySql agent. Its backend supports mainstream databases such as MySql, Oracle, SqlServer, and DB2, as well as NoSql databases such as MongoDB. For front-end users, no matter which database is used in the back-end, it is a traditional database in MyCat, which supports standard SQL statements. For front-end developers, it can greatly reduce the difficulty of development and improve development. speed.

  • For DBA, you can understand MyCat like this:
    MyCat is MySql, and the MySql connected behind MyCat can be understood as the storage engine in MySql, such as: MyISAM, InnoDB, etc. Therefore, MyCat itself does not store data. The data is stored on MySql connected behind MyCat. The reliability and transaction of the data are guaranteed by MySql.
  • For developers, you can understand MyCat like this:
    MyCat is a database library service that is approximately equal to MySql. You can connect to MyCat by connecting to MySql. In most cases, you can also use the commonly used ORM framework to connect to MyCat. However, for fragmented tables, it is recommended to use standard SQL statements to achieve the best performance.
  • For architects, you can understand MyCat like this:
    MyCat is a powerful database middleware. It can not only be used for read-write separation, database and table, but also for disaster recovery and backup, cloud platform construction, etc., so that your The architecture is highly adaptable and flexible.

The above content comes from the teaching notes of the Java Architect course on Moke.com.

Basic concepts in MyCat
  • Schema
    database middleware can be regarded as a logic library composed of one or more database clusters. Developers do not need to know the existence of database middleware. Developers only need to have the concept of database.
  • Logical table (table)
    For the application system, the table that reads and writes data is the logical table. The data in the logical table is divided horizontally and distributed in different shard libraries. For developers, they only need to manipulate the logical table, and the subsequent fragmentation details are transparent to developers.
  • Fragmented table For tables
    whose data volume is very large and require us to split the data horizontally, we call it a fragmented table. The non-sliced ​​table is called non-sliced ​​table.
  • Global tables
    In actual business, there are a large number of dictionary tables. In order to avoid cross-database queries, we will copy these tables that do not need to be sharded to all the tables in the sharding library through data redundancy, which is called global table.

  • After the data of the shard node (dataNode) is divided, a large table is divided into different shard databases, and the database where each shard table is located is called the shard node.
  • After the node host (dataHost)
    data is split, each shard node does not necessarily occupy a real physical host. There will be multiple shard nodes on the same physical host. The host where these shard nodes are located is called Node host. In order to avoid the limitation of the concurrency of a single node, try to put the shard nodes with high read and write pressure on different node hosts.
  • Fragmentation rule (rule) When
    a large table is split into multiple fragmentation tables, certain rules are required. According to a certain business logic, data is divided into a certain fragmentation. This rule is called a fragmentation rule.
  • Global sequence number (sequence), that is, distributed global ID.
    After the data is split horizontally, we need to ensure that the ID of the data record on each shard is unique. At this time, the auto-increment rule of the library table is definitely not satisfied. Required. At this time, we need to use an external mechanism to ensure the unique identification of data. This mechanism to ensure the unique identification of data is called a global serial number.

4. Realize read-write separation based on MyCat

4.1. Environmental preparation

  To realize the separation of read and write based on MyCat, you need to prepare the master-slave environment of MySQl. For the specific construction method, please refer to "MySql5.7 database installation and master-slave synchronization configuration" .

4.2, download MyCat

MyCat download link: http://dl.mycat.org.cn/1.6.7.4/Mycat-server-1.6.7.4-release/

Or download via wget command:

wget http://dl.mycat.org.cn/1.6.7.4/Mycat-server-1.6.7.4-release/Mycat-server-1.6.7.4-release-20200105164103-linux.tar.gz
4.3, unzip
tar -zxvf Mycat-server-1.6.7.4-release-20200105164103-linux.tar.gz

The unzipped directory:
Insert picture description here

4.4, configure server.xml

  Modifying this file is mainly used to configure the MyCat account password and database name.

<mycat:server xmlns:mycat="http://io.mycat/">
	<system>
		 <!-- 0为需要密码登陆、1为不需要密码登陆 ,默认为0,设置为1则需要指定默认账户-->
		<property name="nonePasswordLogin">0</property>
		<!--省略了其他的属性配置,详细请参考server.xml配置文件--->
	</system>
	
	<!-- 全局SQL防火墙设置 -->
	<!--白名单可以使用通配符%或着*-->
	<!--例如<host host="127.0.0.*" user="root"/>-->
	<!--例如<host host="127.0.*" user="root"/>-->
	<!--例如<host host="127.*" user="root"/>-->
	<!--例如<host host="1*7.*" user="root"/>-->
	<!--这些配置情况下对于127.0.0.1都能以root账户登录-->
	<firewall>
	   <whitehost>
	      <host host="1*7.0.0.*" user="root"/>
		  <host host="*" user="root" />
	   </whitehost>
       <blacklist check="false">
       </blacklist>
	</firewall>
	<!--配置了写操作用户,其中name为用户名,password为用户密码,schemas为逻辑库名称-->
	<user name="root" defaultAccount="true">
		<property name="password">123456</property>
		<property name="schemas">db_test</property>
		<property name="defaultSchema">db_test</property>
		<!--No MyCAT Database selected 错误前会尝试使用该schema作为schema,不设置则为null,报错 -->
		
		<!-- 表级 DML 权限设置 -->
		<!-- 		
		<privileges check="false">
			<schema name="TESTDB" dml="0110" >
				<table name="tb01" dml="0000"></table>
				<table name="tb02" dml="1111"></table>
			</schema>
		</privileges>		
		 -->
	</user>
	<!--配置了user用户,设置readOnly=true,表示该用户是用来读操作的-->
	<user name="user">
		<property name="password">123456</property>
		<property name="schemas">db_test</property>
		<property name="readOnly">true</property>
		<property name="defaultSchema">db_test</property>
	</user>
</mycat:server>

4.5, placement schema.xml

  The schema.xml file is mainly used to configure the mapping relationship between the mycat logic library and the real MySQL database. The following elements are involved:

  • The schema mycat logic library configuration corresponds to the database in server.xml, and then manages the corresponding write operation data node through dataNode
  • dataNode data node, mainly used to associate the information of the real data node dataHost with the corresponding real database name. Among them, name is used to set the name of the node for reference by the schema node, dataHost represents the configuration information of the real database, and the real database name of the database.
  • dataHost is the configuration information node of the real database, including some configurations of data read and write strategies. It also includes the heartbeat node, writeHost write library configuration information, etc.
  • writeHost writes library configuration information. It is mainly used to configure the write library information. There can be multiple writeHost write library configurations under the dataHost node. Under the writeHost element, you can configure the read library configuration readHost node.
  • readHost reads the library configuration information node.

  Among them, there are several attributes balance, writeType, and switchType in the dataHost node. Their meanings are as follows:

balance attribute:

  • balance="0", the read-write separation mechanism is not turned on, and all read operations are sent to the currently available writeHost.
  • balance="1", all readHost and stand by writeHost participate in the load balancing of the select statement. Simply put, when the dual-master dual-slave mode (M1 ->S1, M2->S2, and M1 and M2 are the master and backup for each other) , Under normal circumstances, M2, S1, S2 all participate in the load balancing of the select statement.
  • balance="2", all read operations are randomly distributed on writeHost and readhost.
  • balance=“3”, all read requests are randomly distributed to the readhost corresponding to the writerHost for execution. The writerHost does not bear the read pressure. Note that balance=3 is only available in version 1.4 and later, but not in 1.3.

writeType attribute:

  • writeType="0", all write operations are sent to the first writeHost of the configuration, the first one is hung up and cut to the second writeHost that is still alive, and the one that has been switched after restarting shall prevail. The switch is recorded in the configuration file: dnindex.properties.
  • writeType="1", all write operations are randomly sent to the configured writeHost.
  • writeType="2", not implemented.

switchType attribute:

switchType="-1", means no automatic switching
switchType="1", the default value, automatic switching
switchType="2", decide whether to switch based on the state of MySQL master-slave synchronization

  Our main purpose here is to achieve read-write separation, so the final configuration is as follows:

<mycat:schema xmlns:mycat="http://io.mycat/">
	<!--name的值需要和servler.xml文件中的一致,然后需要添加dataNode属性-->
	<schema name="db_test" checkSQLschema="true" sqlMaxLimit="100" dataNode="dn1"></schema>
	<!--这里name值可以自定义,在schema中引用即可,dataHost对应dataHost 节点的name值,database对应真正需要操作的写库名称-->
	<dataNode name="dn1" dataHost="db_host" database="test" />
	<!--这里需要修改balance=3,这样所有读请求随机的分发到 wiriterHost 对应的 readhost 执行,writerHost 不负担读压力,不然,写库可能会用于读操作,不利于后面测试-->
	<dataHost name="db_host" maxCon="1000" minCon="10" balance="3"
			  writeType="0" dbType="mysql" dbDriver="native" switchType="1"  slaveThreshold="100">
		<heartbeat>select user()</heartbeat>
		<!-- 配置写库信息 -->
		<writeHost host="hostM1" url="192.168.1.8:3306" user="root" password="123456">
				   <!-- 配置读库信息 -->
			 <readHost host="hostS2" url="192.168.1.9:3306" user="root" password="123456" />
		</writeHost>
	</dataHost>
</mycat:schema>
4.6, start Mycat

  In the bin directory of mycat, execute the startup command:

 ./mycat start
4.7, test

  After the startup is successful, use Navicat to connect to mycat in the same way as MySQL, but the default port of mycat is 8066. At the same time, you need to configure a whitelist in server.xml, otherwise the login will be rejected.

  Use the root and user users (previously configured in server.xml) to establish a connection. We add data to the main database 192.168.1.8. At this time, the two connections see the same data. If we are in the slave database 192.168.1.8 Add data on 1.9. At this time, only the user's connection can see the data (mainly to verify the library reading operation, remember to set balance="3" here).

5. Summary

  In this blog post, we learned some basic knowledge of mycat and the separation of reading and writing based on mycat, and we have a preliminary understanding of the usage of mycat. In the follow-up learning process, we will continue to try and record, and attempts to sub-databases and tables will be carried out in the follow-up.

Guess you like

Origin blog.csdn.net/hou_ge/article/details/112919428