[Introduction to Open Source Distributed Database Middleware MyCat]

1. Why do you need MyCat?

Although in the cloud computing era, traditional databases have inherent drawbacks, NoSQL databases cannot replace them. If traditional data is easy to expand and can be split, the performance defects of a single machine (single database) can be avoided. The goal of MyCat is to smoothly migrate existing stand-alone databases and applications to the "cloud" side at a low cost, so as to solve the data bottleneck problem in the case of rapid data storage and business scale growth.



 

 

2. What is data segmentation?

Simply put, it means that the data we store in the same database is distributed to multiple databases (hosts) under certain conditions, so as to achieve the effect of dispersing the load of a single device.

Data sharding can be divided into two sharding modes according to the type of sharding rules. One is to split into different databases (hosts) according to different tables (or Schemas), which can be called vertical (vertical) splitting of data; the other is based on the data in the table. The logical relationship is to split the data in the same table into multiple databases (hosts) according to certain conditions. This splitting is called horizontal (horizontal) splitting of data .

The biggest feature of vertical segmentation is that the rules are simple and the implementation is more convenient. It is especially suitable for systems with very low coupling between various services, little mutual influence, and very clear business logic. In such a system, it is easy to split tables used by different business modules into different databases. Splitting according to different tables has less impact on the application, and the splitting rules will be simpler and clearer.

Horizontal segmentation is a bit more complicated than vertical segmentation. Because different data in the same table needs to be split into different databases, for the application, the splitting rules themselves are more complicated than splitting according to the table name, and the later data maintenance will also be more complicated. .

[The biggest difficulty in distribution: For example, the first 10 pieces of data are fetched from the database each time. If there are multiple DataNodes, if the central control node does not do well, then the data is queried twice before and after, and data inconsistency may occur, just like In the same way as random query, so distributed will lose its meaning add by gaojignsong]



 

3. What is MyCat?

A completely open source, large database cluster for enterprise application development

An enhanced database that supports transactions, ACID, and can replace MySQL

An enterprise-grade database that can be thought of as a MySQL cluster to replace an expensive Oracle cluster

A new SQL Server that integrates memory caching technology, NoSQL technology, and HDFS big data

A new generation of enterprise-level database products combining traditional databases and new distributed data warehouses

A Novel Database Middleware Product

Developed based on Alibaba's open source Cobar products, Cobar's stability, reliability, excellent architecture and performance, and many mature use cases make MYCAT a good starting point from the very beginning. Standing on the shoulders of giants, we can see to further. Excellent open source projects and innovative ideas in the industry are widely integrated into the genes of MYCAT, making MYCAT ahead of other similar open source projects in many aspects, and even surpassing some commercial products.

From the definition and classification point of view, it is an open source distributed database system and a server that implements the MySQL protocol. Front-end users can regard it as a database proxy, which can be accessed with MySQL client tools and command lines. The backend can communicate with multiple MySQL servers using the MySQL native protocol, or communicate with most mainstream database servers using the JDBC protocol. In the back-end MySQL server or other database.

MyCat has developed to the current version, it is no longer a pure MySQL agent, its backend can support mainstream databases such as MySQL, SQL Server, Oracle, DB2, PostgreSQL, etc., and also supports MongoDB, a new type of NoSQL storage. Support for more types of storage

 

4. The development history of MyCat

In 2013, Ali's Cobar found some serious problems and limitations in its use in the community. After Mycat was first improved, the first-generation improved version, Mycat, was born. After Mycat was open sourced, some Cobar users participated in the development of Mycat, and eventually Mycat developed into a community-based open source software maintained by powerful architects and senior developers of many software companies.

The predecessor of Mycat was OpencloudDB, and now Mycat is used to develop a cloud platform called MycloudOA for SAAS enterprise office software. In half a year, this group has gathered a large group of IT people and has more than 10 "consultant" titles , a large team of volunteers with more than ten "architect" titles, and more than 20 "R&D" titles, and then, less than 3 people have submitted documentation and a small amount of code, and the others are very professional talking about requirements , talking about the framework, talking about the market, and in the end, everyone became a senior soy sauce bottle, so MycloudOA died before it was successful.

OpencloudDB changed its name to Mycat. One reason is that it is easy to remember, and another reason is that it plans to settle in Apache in the future. Because Apache Tomcat is also a cat, in terms of age, Tomcat can be regarded as Mycat's cousin. In terms of appearance and body, Tomcat's cousin is definitely the first cute girl in the East, although the Mycat Logo designed by Rainbow heroes currently looks like a 100% female man.

 

 

5. Where Cobra is not satisfactory

First secret: Cobra will feign death

select sleep(500) from company; this SQL will execute and wait for 500 seconds

 

Second Secret: The High Availability Trap

Behind the secret of Cobra's suspended animation, there is also a more "powerful" secret, that is, after suspended animation, Cobra's frequent master-slave switching problem

 

 

The third secret: beautiful looking automatic switching

A very attractive feature of Cobar is high availability. The principle of high availability is that the DataNode configuration of the data node refers to two DataSources and performs heartbeat detection. When the heartbeat detection of the first DataSource fails, Cobar automatically switches to the second node. After the second node failed, it automatically switched back to the first node, and everything looked beautiful, unattended, with almost no downtime.

 

Fourth Secret: Only half of NIO is implemented

NIO technology is used as a technical standard for JAVA server programming, which is an unquestionable industry practice. If a Java programmer has never heard of NIO, he is embarrassed to say that he is a Java person. So it's not surprising that Cobar adopted NIO technology, but the surprise is that only half of it is used.

Cobar is essentially a "database router". The client connects to Cobar and generates an SQL statement. Cobar sends the SQL statement through the Socket, the communication interface between the backend and MySQL, and then returns the result to the Socket of the client.

 

 

The fifth secret: blocking, see blocking

Cobar is essentially similar to a switch. The data returned by the back-end Mysql is processed and then written to the front-end connection and returned. Therefore, there is a "write queue" for the front-end and back-end connections for buffering, and the data returned by the back-end is sent to the front-end connection. FrontConnection's write queue is queued to be sent, while normally the backend writes

The speed is higher than the speed of front-end consumption. In the case of cross-shard query, this phenomenon is more obvious, so the writing thread is blocked here again.

There are two solutions. Increase the length of the "write queue" of each front-end connection to reduce the occurrence of blocking, but this method just throws the problem to the user. If the user can know that the default value of the write queue is small , and then manually try to adjust according to the situation, but Cobar's code does not expose this problem, such as writing an alarm log, the queue is full, it is recommended to increase the number of queues.

 

 

 

The Sixth Secret: The Love-Hate SQL Batch Mode

Just as the front and back of a coin cannot be separated, no matter how a magnet is cut, there are north and south poles. The same is true in love. Love and hate are always entangled and cannot be sorted out. Cobar's SQL batch mode is exactly such a command. A love-hate personality.

The simultaneous execution of multiple concurrent connections means that the execution speed of Batch is improved, which is a surprising feature, but the concurrent execution of a separate database connection brings an unexpected side effect, that is, the transaction is cross-connected. Commit succeeds and another part fails, causing dirty data problem

 

 

The seventh secret: the courtyard is deeply locked in Qingqiu

Speaking of deadlocks, it seems that all of us only stay in memories from a long time ago. We have only seen them in textbooks, and we have also seen the causes of deadlocks and how to solve them. Only DBAs may occasionally encounter database deadlocks. The problem. However, many students who used Cobar often found a strange problem later. SQL did not respond for a long time, and they were puzzled. In desperation, they searched for DBA and found that there was a database deadlock phenomenon, and it happened more frequently. To understand why Cobar increases the probability of database deadlock, we can only analyze the source code. When a SQL needs to be split into multiple SQLs and executed on multiple shards, the execution process is executed concurrently, that is, N SQL is executed on N shards at the same time. This process is abstracted into the transaction model in textbooks, and it becomes a thread that needs to lock N resources and perform operations before ending the transaction. When the locking order of these N resources is random, it is easy to cause deadlock, and it happens that Cobar does not guarantee the locking order of N resources

 

 

 

Eighth Secret: Unexpected Connection Pooling

The database connection pool may be the most reliant "resource pool" after the thread pool. Its importance is self-evident, and many well-known open source database connection pools have been born in the industry. We know that for a MySQL Server, the maximum connection is usually between 1000-3000. These connections are enough for common applications. Usually, each application has an exclusive database connection, so it is enough. Here, there is a problem, because Cobar's connection pool management for the back-end MySQL is based on sharding - Database, rather than the connection pool sharing of the entire MySQL, taking a table with 100 shards as an example, If 50 shards are on Server1, it means that the database connection on Server1 is divided into 50 connection pools, and each pool has about 20 connections. These connection pools cannot communicate with each other. Therefore, in the sharding table In this case, our concurrency capability is severely weakened. Obviously the other pools are full, but you can only wait for the empty pool

 

 

The ninth secret: helpless hot loading

Cobar has an advantage that the configuration file is hot loaded, and the configuration file is hot loaded without restarting the system, but there are several problems here, one of which is dissatisfied by many people, that is, the back-end database is disconnected once every time it is reloaded. Causes business interruption, and many times, everyone changes the configuration just to modify the definition, rules of the sharding table, add sharding table or sharding definition, without changing the configuration information of the database

 

 

Tenth Secret: No Read-Write Separation Support

It does not support read-write separation, and the first reaction of students who may be familiar with related middleware is surprise, because the most basic function of a MySQL Proxy is to provide read-write separation capability to improve the query throughput and query performance of the system. But it is true that Cobar does not support read-write separation, and according to Cobar's configuration file, it is very troublesome to achieve read-write separation. Some people may think that because the latency of read-write separation cannot be guaranteed, it is impossible to determine whether the previously written data can be found, so read-write separation is not important, but in fact, almost no Mycat users do not use read-write. For the writing separation function, some volunteers later added the function of forcing query statements to go to the main library (writing library) to solve the problem just now.

 

 

 

The Eleventh Secret: Uncontrollable Master-Slave Switching

Cobar provides MySQL master-slave switching capability, which is very practical and convenient, but you can't control its switching on or off. Sometimes we don't want it to switch automatically, because so far, there is no good way to confirm MySQL When the write node is down, whether the standby node has completed 100% data synchronization, so there is a risk of data inconsistency. How to more reliably determine whether the switch can be safely switched is a complex issue, and Mycat has been working hard to improve this feature.

 



 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326615056&siteId=291194637