MySql architecture design: How to make reasonable use of third-party Cache solutions?

The biggest advantage of using a more mature third-party solution is that while saving its own R&D costs, it is also very helpful to be able to find more documents on the Internet to help us solve some of the problems we encounter on a daily basis.

The more popular third-party Cache solutions are mainly distributed memory object-based Cache software Memcachedand 嵌入式数据库编程库 Berkeley DBtwo. Below I will do an analysis and architecture discussion for these two solutions.

1. Distributed memory Cache software Memcached

I believe many readers friends, Memcachedand will not be too unfamiliar to the bar, his popularity has now more than MySQLa big change not. MemcachedThe reason why it is so popular is mainly due to the following reasons:

  • The communication protocol is simple, and the API interface is clear;
  • Efficient Cache algorithm, event processing mechanism based on libevent, excellent performance;
  • Object-oriented features are very friendly to application developers;
  • All data is stored in memory, data access is efficient;
  • The software is open source, based on the BSD open source agreement;

For Memcacheditself, the details, I will not involve too much, after all, this is not the focus of this article. Here we focus on to see how Memcachedto help you enhance our data services (here, if we could use the database itself is not appropriate) scalability.

To Memcachedbetter integrated into the system architecture, it must first make the application system Memcachedhave an accurate positioning. Is it just a Cache tool to improve the performance of data services, or let it be better integrated with the MySQL database to become a more efficient and ideal data service layer.

①As a Cache tool to improve system performance

If we just systems Memcachedto improve system performance, as a Cache software, more is needed to maintain the application by Memcachedsynchronizing data and update the data in the database. This time Memcachedcan be understood as essentially a database than MySQL Cache layer more front end.

If we Memcachedas a Data Cache service application system, the MySQL database, it basically do not have to do any reform, only to be maintained for the Cache updated by the application itself. The biggest advantage of doing this is that it can be done without touching the database-related architecture, but at the same time there will be a drawback, that is, if there are more data objects that require Cache, the amount of code that the application needs to increase will increase. Many, while the system complexity and maintenance costs will rise linearly.

Below is Memcacheda diagram of a simple architecture Cache service layer time.

We can see from the figure, all data will be written MySQL Master, including the first data write time INSERT, but also includes existing data UPDATEand DELETE. However, if the data already exists, you will need UPDATEor DELETE MySQLthe data at the same time, delete Memcachedthe data, thus ensuring the overall consistency of the data. And all will be sent to the first read request Memcached, if the read data is directly returned, if no data is read, and then the MySQL Slavesread data is written into the read data obtained Memcachedis performed Cache.

This method of use is generally more suitable for environments where there are few types of objects to be cached, and the amount of data to be cached is relatively large, and it is a fast and effective solution to performance problems. Since this architecture has nothing to do with the MySQL database itself, there are not too many technical details involved here.

②Integrate with MySQL as a data service layer

In addition to Memcacheduse as a tool to quickly improve efficiency beyond, in fact, we can also use it to improve the scalability of data services layer, and our database integrated into a whole, or as a buffer database.

We first take a look at how Memcachedand MySQLdatabase integration into a whole to provide services it outside. Generally speaking, we have two ways Memcachedand MySQLdatabase integrated into a whole to provide external data services. One is the direct utilization Memcachedof the memory capacity as a MySQLsecondary cache database, improve MySQL Serverthe cache size , the other is through MySQLthe UDFto and Memcacheddata communication, maintenance and updating Memcacheddata, and the end is directly applied Memcachedto read data.

For the first method, it is mainly used for very special business requirements, it is difficult to perform data segmentation, and it is difficult to use the Cache outside the database by modifying the application.

Of course, this is certainly not possible under normal circumstances. At present, we must rely on external forces. The open source project Waffle Grid is the external force we need to rely on.

Waffle Grid abroad several DBA in his spare time out of the whim of an idea: Since PC Serverthe cost so attractive to us, and it Scale Up's very difficult to have a greater ability to break, why not take advantage of the now very popular Memcachedas a breakthrough in a single PC Servermemory limit it? It is driven by the idea, several boys started the Waffle Grid open source project, use MySQL, and Memcachedboth open-source characteristics, combined with Memcachedsimple communication protocol characteristics of the Memcachedsuccessful realization becoming MySQLexternal "secondary cache" host, is currently only supported by in Innodbthe Buffer Pool.

Waffle Grid realization of the principle of not complicated, what he was doing was Innodbin the local Buffer Pool(let's call Local Buffer Poolit) when, before reading data from the disk data file, first through Memcacheda communication attempt from the API interface Memcachedin reads the corresponding cache data (we call Remote Bufferit), only Remote Bufferthere is no data in the required time, Innodbyou will access the disk to read the data file. And only in Innodb Buffer poolthe LRU Listdata it will be sent to the Remote Buffer Poolmiddle, and these data, once modified, will be Innodbwill be the move FLUSH List, Waffle Gridat the same time will enter FLUSH Listdata from Remote Buffer Poolrid of. So, yes, Remote Buffer Poolthat will never exist Dirty Pages, it also ensures that when Remote Buffer Poolthe time does not produce faulty data loss problems. The following figure is a simplified diagram of the architecture when using the Waffle Grid project:


As shown on Figure architecture, we first MySQLdatabase client application Waffle Grid Patch, even by him with other Memcachedserver communication. In order to ensure the performance of network communications, MySQLand Memcachedbetween the private network as a high bandwidth.

In addition, where the architecture diagram and then the database does not distinguish between Masterand Slave, and not saying we can not distinguish, only a schematic. In practical application process, most of the time only in the Slaveapplication of the above Waffle Grid can, Masteritself does not need such a large memory.

After reading the implementation principle of Waffle Grid , some readers may have some questions. So do not all need to produce physical read Queryperformance will be directly affected by it? All read Remote Bufferoperations are required to get through a network, its performance is high enough it? In this regard, I also use the author of Wafflemeasured data to reach everyone's concerns:


Through the test comparison data obtained by DBT2, I don't think there is much to worry about in terms of performance. As for whether Waffle Grid is suitable for your application scenario, it can only be evaluated by readers and friends.

Here let us introduce Memcachedand MySQLanother an integrated way, that is by MySQLUDF functions provided, write your own appropriate procedures to implement MySQLthe Memcacheddata communication update operations.

In this way and Waffle Grid not the same as Memcachedthe data is not entirely MySQLmaintenance to control, but by application and MySQLwork together to maintain the data. Each time an application from Memcachedreading the data when, if found can not find the data they need, then turned again to read data from the database, and then the read data is written Memcachedin. The MySQLcontrol Memcachedfailures clean-up of data, each database has when data is updated or deleted, MySQLthe user-written by UDF to call Memcachedthe API to notify Memcachedcertain data has failed and delete the data.

Based on the above implementation principles, we can design a data service layer architecture as follows:


As shown, this architecture and the above Memcachedfully and MySQLread as a routine to compare the departure Cache server, the biggest difference is that the Memcacheddata is changed from MySQLto maintain a database update, and not to update the application. First, data is written to the application MySQLdatabase, which will trigger a time MySQL-related user-written UDF above, and then calling the UDF Memcached-related communication interface, the data is written Memcached. And when MySQLdata is updated or deleted when MySQLthe relevant UDF will also update or delete Memcacheddata. Of course, we can also make MySQLdo less of some things that just come across data is updated or deleted when, by UDF to delete Memcachedthe data, the write operation is the same as the previous architecture by the application to make.

Since Memcachedbased on the data access objects, and by Hashcharacterizing data retrieval, all stored Memcacheddata are required we set a Key for identifying the data, all data access operations are performed by the Key. That is, if you did not like MySQLthe Querystatement to read as a result set containing a plurality of data by one (or more) key condition applies only to obtain the data of a single data read by a unique key the way.

Second, the embedded database programming library Berkeley DB

To be honest, this is called database programming library is somewhat awkward, but I really can not find other appropriate term to refer to Berkeley DB, and then let's use the more common name for it online.

MemcachedIs achieved Cache memory type, we ask if the performance is not so high, the budget is not too abundant, we can also choose Berkeley DBthis type Cache database software. Many readers may be puzzled friends will, we use the MySQLdatabase, why then use a Berkeley DBkind of "database" mean? In fact Berkeley DBbefore it is MySQLone of the storage engine, but the latter do not know the reason (Acquisition and commercial competition about it), was MySQLremoved from the supported storage engine. The reason why the use of the database at the same time also use Berkeley DBthis type of database Cache, because we can give full play to their respective advantages of both, while using a conventional general-purpose database, and can take advantage of Berkeley DB efficient key-value pair as an efficient way to store data The performance of retrieval is supplemented to obtain better data service layer scalability and higher overall performance .

Berkeley DBIts own architecture can be divided into five functional modules. The five modules are relatively independent in the entire system, and one (or several) modules can be set to use or disable, so it may be more appropriate to call it five subsystems. . The five subsystems and their basic introduction are as follows:

  • Data access The
    data access subsystem is mainly responsible for the most important and basic data storage and retrieval work. And Berkeley DBsupports the following four ways to store the results of the data: Hash, B-Tree, Fixed Lengthand Dynamic Length. In fact, these four methods correspond to the actual storage formats of the four data files. The data storage subsystem can be used completely alone, and it is also a subsystem that must be turned on.
  • Transaction management
    Transaction management subsystem is mainly for data processing services have a transaction requirements, providing complete ACIDtransaction attributes. When opening the transaction management subsystem, in addition to the most basic data access subsystem, at least the lock management subsystem and the log system need to be opened to help achieve the consistency and integrity of the transaction.
  • Lock management The
    lock management system is mainly to ensure the consistency of the data and provide the shared data control function. Support row-level and page-level locking mechanisms, while providing services for the transaction management subsystem.
  • Shared memory
    shared memory subsystem I think we should basically see the name to know is to do something, that is used to manage shared maintenance Cacheand Buffer, in order to enhance system performance and provides data caching services.
  • Log system The
    log system mainly serves the transaction management system. In order to ensure transaction consistency, Berkeley DBit also adopts the strategy of writing the log first and then writing the data. It is generally used at the same time as the transaction management system and closed at the same time.

Based on Berkeley DBthe properties, it is difficult to use like Memcachedthat to him and MySQLdatabase so tightly bound. Data maintenance and update operations mainly need to be completed through applications. In general, use MySQLshould use both the Berkeley DBmain reason is to improve the performance and scalability of the system. So, most of the time mainly use Hashand B-Treedata storage format of these two structures, in particular the Hashformat is the most widely used, because this approach is the most efficient access.

In the application, for each request, we are first set in advance Keyto Berkeley DBthe time taken to find if the data exists, the data acquisition is returned, if the bit data is retrieved, the database is read again. Then the read data according to a predefined Key, stored in the whole Berkeley DB, and then returned to the client. And when the data modification occurs, modify the application MySQLmust also be after the data in the Berkeley DBdata deleted. Of course, if you prefer, you can directly modify the Berkeley DBdata, but this may introduce more risk and improve data consistency complexity of the system.

From the principle point of view, the use of Berkeley DBmethods and will Memcachedas pure Cacheto use little difference Well, why do not we Memcacheddo it? In fact, there are two main reasons, one is Memcachedthe use of pure memory to store data, and Berkeley DByou can use the physical disk, or the two are quite different in terms of cost. Another reason that Berkeley DBcan support data is stored in addition to Memcachedbeing used Hashoutside the storage format, but also can use other storage formats, such as B-Treeand the like.

And because the Memcachedbasic principle of the use of the difference is not big, so there is no longer a schematic drawing.

Guess you like

Origin blog.csdn.net/Java_Caiyo/article/details/115014999