The Realization of Database Cluster System Based on MySQL

Go to: http://www.ibm.com/developerworks/cn/linux/database/mysql-ha/index.html

Is your WebApp system using a MySQL database system? Are your customers always complaining that the page results are very slow? Is the load on your MySQL system always maintained at a very high state? This article will provide you with a method to share the load of the MySQL system, and a development project of MySQL-HA-Proxy derived from it. Using the methods provided in this article, you will achieve efficient operation of your MySQL system with minimal source code changes.

The first section of the current status of database cluster technology The

current database cluster system is relatively successful application, the application range is more extensive: Oracle's Oracle9 and IBM's DB2. Oracle9 adopts Shared-storage technology, DB2 chooses Shared-nothing technology, both have their own strengths and weaknesses.

The theoretical basis of the latest database cluster system is distributed computing, which distributes data to each node, and all computing nodes process the data in parallel and summarize the results. This way is undoubtedly the most perfect. But still can not achieve all the functions.


For Shared-storage and Shared-nothing technologies, please refer to the relevant information on Oracle and IBM websites.

The second section of the current database application status

The current database application status is roughly divided into two categories. The first category is that the amount of data is less than 100G, the database access is frequent, and the requests are intensive. Mainly Web APP type applications, such as: websites, forums, etc. The characteristics of these Web APP-type applications accessing the database are: frequent access, the database needs to receive thousands of queries per second, and data needs to be added frequently, and the response speed of the data is relatively high. The other category is applications used for scientific computing and storage of historical data, and the amount of data often reaches hundreds of gigabytes. The characteristics of these applications accessing the database are: most of them are query operations, and the data is poured into the database in batches, timed, and centralized.

Problems exposed in Section 3 For the

first type of application, due to frequent access and to support more access, Web Server generally uses a load-balanced cluster, but for the database, due to the inability to implement cluster operations, every second The request of the clock keeps increasing. With the increase of server load, the speed of responding to a single request is getting slower and slower. If the library file is relatively large, the lock table time will be too long when the write operation occurs, which will affect the access efficiency.

The second type of application is mainly because the data file is too large. It takes a lot of time to process the data each time. If you write a wrong statement, it will take several hours to redo the query.

How to solve the fourth section

First , we should optimize from the aspects of hardware, software, programs, indexes, and SQL statements. If the problem still cannot be solved, we must consider the cluster (parallel processing) of the database system.

For the first type of application, when the database server is running normally and the load is not high, the application is still satisfied with the status of the database system. However, when the load of the database system is too high, the time to complete the request will be prolonged, and the required time of the system will not be met. Since the load is caused by too many requests, we adopt the method of sharing requests, allowing some requests to access another server, so that the load of a single server is reduced, so as to solve the problem.

For the second type of application, a distributed computing system is needed to solve it, and the general system is powerless.

Section 5 Solution to the first type of application problem of "Linux+Apache+PHP+MySQL" Solution to

a practical case:

I encountered such a problem at work, our Web Server is Linux+Apache+Php A cluster of three machines, MySQL runs on a SUN450 platform with 2G memory. Since WEB traffic is almost at full capacity during peak hours, LoadAvg (that is, the number of processes in the Running state within one minute) is between 10-20, which reflects that a large number of requests are suspended when accessing the database. Live, resulting in a request not completed, the next request comes in again, and finally a vicious circle. LoadAvg will soar above 800 in an instant. The database side is even worse. LoadAvg reaches more than 300, the database has a lot of threads, and the CPU is busy switching thread states. At this time, unless MySQL is restarted, it will not be good. After the optimization of the SQL statement was completed, the problem could not be solved very well. We added a database server. Through the data synchronization mechanism of MySQL, the data on the two databases was kept in synchronization. php program, let these programs connect to another database, which can be regarded as a part of the load separation, and the problem has been initially solved. But later the business became bigger, we added multiple servers, modified a lot of programs, separated their read operations on the database, and accessed different servers.

Section 6 Proposal of MySQL-HA-Proxy Scheme It is a very painful thing to separate the load of the system

by modifying the program. The project is huge, and it cannot be mistaken, because in addition to the main server can write and modify data, and Other servers can only update their own data through data synchronization, so if you write to those databases, the results will be disastrous.

If we could have a program that sorts SQL statements, according to their type (read/write), sends them to different servers, and then returns the results. A method similar to HTTP PROXY is adopted, so that we do not need to modify the source program to share the load. If we can judge based on the load status of the server or the status of the table (available/locked), we should Which server this request is assigned to is better than what we can achieve by modifying the source program.

Section 7 How to communicate between MySQL Client and Server I searched

around , but did not find an article about the Mysql communication protocol. It seems that only the source program of Mysql is analyzed. So I found the code of mysql 3.23.49 and opened the sniffer tool. The communication protocol of MySQL may have changed many times. In version 3.23.49, the version of the communication protocol is actually 10.

I briefly analyzed the communication protocol, and now it is organized as follows, some places are not perfect, because I really don't have much time to study the mysql code carefully, so far I only know these.


When FLAG=0, 2, the definition of CMD Code and Message defines

the format of data submitted by Client to Server:

Command ID and Command Data Description:

Section 8 How Client Passes Server User Authentication

The protocol analysis is completed, I try to make it work, but the authentication part is in trouble. When Mysql Server connects to it from the Client, it will first return a data packet to the Client, including the version number of the protocol, version information, and SessionID. , an 8-byte Key, which is the reason for this Key. The client will use this key to encrypt the password, and then send the user name, password, database to be opened and other information to the server, thus completing the authentication. I don't know how the client uses this key to encrypt, so I plan to skip the password. I reorganize the client's data packet and remove the password information, and I succeed, but the Mysql users in the cluster do not have passwords. There are some problems with security, but these servers are placed behind the HA, and there is no external IP address. It should not be a problem, but it is more or less a pity.

But I always have to know if the user's password is correct, right? How to do it? Use a dedicated Mysql to complete password authentication. Install a Mysql Server with minimized resources to be used as MysqlAuth (dedicated authentication server). When the Client is connected, the first data packet of MysqlAuth is returned to the Client, which of course contains the Key, and then the Client will use this Key. , After encrypting the password, send the authentication information back. At this time, the MysqlHA system will forward this information to MysqlAuth, and keep a copy for itself. If the authentication is passed, it will reorganize the retained copy and remove the password information. Then use the reorganized authentication information to connect to the servers in the cluster.

Section IX System Structure and Flow


Chart In the figure, HA is a high-reliability system established by using HeartBeat (for the specific implementation method, please refer to http://www.linuxvirtualserver.org/). Proxy is the Mysql-Proxy system, and MysqlAuth is a dedicated authentication server. The red RealServer is the main server, which can perform data update operations and synchronize data to other RealServers.

The following figure describes the client authentication process. The

following figure describes the process of establishing a connection with RealServer after the authentication fails and after the authentication is passed. The above figure describes the process

of the system processing SQL Query requests after the connection is established. Conclusion of

Section 10

I have basically completed it now. The development of the mysql-proxy program, but it is still in the testing stage, the latest version is 0.0.4, and the next version is still under revision. Since version 0.0.3, mysql-proxy can completely run the sql-bench provided by mysql itself, but this sql-bench can only provide single-point performance, and does not provide test functions for the clustered mysql system.

The system provides a program that dynamically collects LoadAvg on RealServer and then feeds it back to Mysql Proxy, but since I did not test this part, the request allocation method I used in the previous test is the polling method. If two loads are the same The RealServer system will automatically rotate between them.

You can download the source code of Mysql-proxy from my website: http://netsock.org/bbs/Mysql-HA-Cluster project. I will also publish some test data there.

How to perform system testing?

Since it is a cluster specially made for systems such as Linux+Apache+Php+Mysql, you should find an actual application to run and see, and then simulate a large number of visits for testing.

It may be good to choose a forum system, VBB, it is used more and more popular. The simulated access is done with the AB provided by Apache itself.
The minimum environment for the test system is: (five machines)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326495731&siteId=291194637