The biggest advantage of using a more mature third-party solution is that while saving its own R&D costs, it is also very helpful to be able to find more documents on the Internet to help us solve some of the problems we encounter on a daily basis.
The more popular third-party Cache solutions are mainly distributed memory object-based Cache software Memcached
and 嵌入式数据库编程库 Berkeley DB
two. Below I will do an analysis and architecture discussion for these two solutions.
1. Distributed memory Cache software Memcached
I believe many readers friends, Memcached
and will not be too unfamiliar to the bar, his popularity has now more than MySQL
a big change not. Memcached
The reason why it is so popular is mainly due to the following reasons:
- The communication protocol is simple, and the API interface is clear;
- Efficient Cache algorithm, event processing mechanism based on libevent, excellent performance;
- Object-oriented features are very friendly to application developers;
- All data is stored in memory, data access is efficient;
- The software is open source, based on the BSD open source agreement;
For Memcached
itself, the details, I will not involve too much, after all, this is not the focus of this article. Here we focus on to see how Memcached
to help you enhance our data services (here, if we could use the database itself is not appropriate) scalability.
To Memcached
better integrated into the system architecture, it must first make the application system Memcached
have an accurate positioning. Is it just a Cache tool to improve the performance of data services, or let it be better integrated with the MySQL database to become a more efficient and ideal data service layer.
①As a Cache tool to improve system performance
If we just systems Memcached
to improve system performance, as a Cache software, more is needed to maintain the application by Memcached
synchronizing data and update the data in the database. This time Memcached
can be understood as essentially a database than MySQL Cache layer more front end.
If we Memcached
as a Data Cache service application system, the MySQL database, it basically do not have to do any reform, only to be maintained for the Cache updated by the application itself. The biggest advantage of doing this is that it can be done without touching the database-related architecture, but at the same time there will be a drawback, that is, if there are more data objects that require Cache, the amount of code that the application needs to increase will increase. Many, while the system complexity and maintenance costs will rise linearly.
Below is Memcached
a diagram of a simple architecture Cache service layer time.
We can see from the figure, all data will be written MySQL Master
, including the first data write time INSERT
, but also includes existing data UPDATE
and DELETE
. However, if the data already exists, you will need UPDATE
or DELETE MySQL
the data at the same time, delete Memcached
the data, thus ensuring the overall consistency of the data. And all will be sent to the first read request Memcached
, if the read data is directly returned, if no data is read, and then the MySQL Slaves
read data is written into the read data obtained Memcached
is performed Cache
.
This method of use is generally more suitable for environments where there are few types of objects to be cached, and the amount of data to be cached is relatively large, and it is a fast and effective solution to performance problems. Since this architecture has nothing to do with the MySQL database itself, there are not too many technical details involved here.
②Integrate with MySQL as a data service layer
In addition to Memcached
use as a tool to quickly improve efficiency beyond, in fact, we can also use it to improve the scalability of data services layer, and our database integrated into a whole, or as a buffer database.
We first take a look at how Memcached
and MySQL
database integration into a whole to provide services it outside. Generally speaking, we have two ways Memcached
and MySQL
database integrated into a whole to provide external data services. One is the direct utilization Memcached
of the memory capacity as a MySQL
secondary cache database, improve MySQL Server
the cache size , the other is through MySQL
the UDF
to and Memcached
data communication, maintenance and updating Memcached
data, and the end is directly applied Memcached
to read data.
For the first method, it is mainly used for very special business requirements, it is difficult to perform data segmentation, and it is difficult to use the Cache outside the database by modifying the application.
Of course, this is certainly not possible under normal circumstances. At present, we must rely on external forces. The open source project Waffle Grid is the external force we need to rely on.
Waffle Grid abroad several DBA in his spare time out of the whim of an idea: Since PC Server
the cost so attractive to us, and it Scale Up
's very difficult to have a greater ability to break, why not take advantage of the now very popular Memcached
as a breakthrough in a single PC Server
memory limit it? It is driven by the idea, several boys started the Waffle Grid open source project, use MySQL
, and Memcached
both open-source characteristics, combined with Memcached
simple communication protocol characteristics of the Memcached
successful realization becoming MySQL
external "secondary cache" host, is currently only supported by in Innodb
the Buffer Pool
.
Waffle Grid realization of the principle of not complicated, what he was doing was Innodb
in the local Buffer Pool
(let's call Local Buffer Pool
it) when, before reading data from the disk data file, first through Memcached
a communication attempt from the API interface Memcached
in reads the corresponding cache data (we call Remote Buffer
it), only Remote Buffer
there is no data in the required time, Innodb
you will access the disk to read the data file. And only in Innodb Buffer pool
the LRU List
data it will be sent to the Remote Buffer Pool
middle, and these data, once modified, will be Innodb
will be the move FLUSH List
, Waffle Grid
at the same time will enter FLUSH List
data from Remote Buffer Pool
rid of. So, yes, Remote Buffer Pool
that will never exist Dirty Pages
, it also ensures that when Remote Buffer Pool
the time does not produce faulty data loss problems. The following figure is a simplified diagram of the architecture when using the Waffle Grid project:
As shown on Figure architecture, we first MySQL
database client application Waffle Grid Patch
, even by him with other Memcached
server communication. In order to ensure the performance of network communications, MySQL
and Memcached
between the private network as a high bandwidth.
In addition, where the architecture diagram and then the database does not distinguish between Master
and Slave
, and not saying we can not distinguish, only a schematic. In practical application process, most of the time only in the Slave
application of the above Waffle Grid can, Master
itself does not need such a large memory.
After reading the implementation principle of Waffle Grid , some readers may have some questions. So do not all need to produce physical read Query
performance will be directly affected by it? All read Remote Buffer
operations are required to get through a network, its performance is high enough it? In this regard, I also use the author of Waffle
measured data to reach everyone's concerns:
Through the test comparison data obtained by DBT2, I don't think there is much to worry about in terms of performance. As for whether Waffle Grid is suitable for your application scenario, it can only be evaluated by readers and friends.
Here let us introduce Memcached
and MySQL
another an integrated way, that is by MySQL
UDF functions provided, write your own appropriate procedures to implement MySQL
the Memcached
data communication update operations.
In this way and Waffle Grid not the same as Memcached
the data is not entirely MySQL
maintenance to control, but by application and MySQL
work together to maintain the data. Each time an application from Memcached
reading the data when, if found can not find the data they need, then turned again to read data from the database, and then the read data is written Memcached
in. The MySQL
control Memcached
failures clean-up of data, each database has when data is updated or deleted, MySQL
the user-written by UDF to call Memcached
the API to notify Memcached
certain data has failed and delete the data.
Based on the above implementation principles, we can design a data service layer architecture as follows:
As shown, this architecture and the above Memcached
fully and MySQL
read as a routine to compare the departure Cache server, the biggest difference is that the Memcached
data is changed from MySQL
to maintain a database update, and not to update the application. First, data is written to the application MySQL
database, which will trigger a time MySQL
-related user-written UDF above, and then calling the UDF Memcached
-related communication interface, the data is written Memcached
. And when MySQL
data is updated or deleted when MySQL
the relevant UDF will also update or delete Memcached
data. Of course, we can also make MySQL
do less of some things that just come across data is updated or deleted when, by UDF to delete Memcached
the data, the write operation is the same as the previous architecture by the application to make.
Since Memcached
based on the data access objects, and by Hash
characterizing data retrieval, all stored Memcached
data are required we set a Key for identifying the data, all data access operations are performed by the Key. That is, if you did not like MySQL
the Query
statement to read as a result set containing a plurality of data by one (or more) key condition applies only to obtain the data of a single data read by a unique key the way.
Second, the embedded database programming library Berkeley DB
To be honest, this is called database programming library is somewhat awkward, but I really can not find other appropriate term to refer to Berkeley DB
, and then let's use the more common name for it online.
Memcached
Is achieved Cache memory type, we ask if the performance is not so high, the budget is not too abundant, we can also choose Berkeley DB
this type Cache database software. Many readers may be puzzled friends will, we use the MySQL
database, why then use a Berkeley DB
kind of "database" mean? In fact Berkeley DB
before it is MySQL
one of the storage engine, but the latter do not know the reason (Acquisition and commercial competition about it), was MySQL
removed from the supported storage engine. The reason why the use of the database at the same time also use Berkeley DB
this type of database Cache, because we can give full play to their respective advantages of both, while using a conventional general-purpose database, and can take advantage of Berkeley DB
efficient key-value pair as an efficient way to store data The performance of retrieval is supplemented to obtain better data service layer scalability and higher overall performance .
Berkeley DB
Its own architecture can be divided into five functional modules. The five modules are relatively independent in the entire system, and one (or several) modules can be set to use or disable, so it may be more appropriate to call it five subsystems. . The five subsystems and their basic introduction are as follows:
- Data access The
data access subsystem is mainly responsible for the most important and basic data storage and retrieval work. AndBerkeley DB
supports the following four ways to store the results of the data:Hash
,B-Tree
,Fixed Length
andDynamic Length
. In fact, these four methods correspond to the actual storage formats of the four data files. The data storage subsystem can be used completely alone, and it is also a subsystem that must be turned on. - Transaction management
Transaction management subsystem is mainly for data processing services have a transaction requirements, providing completeACID
transaction attributes. When opening the transaction management subsystem, in addition to the most basic data access subsystem, at least the lock management subsystem and the log system need to be opened to help achieve the consistency and integrity of the transaction. - Lock management The
lock management system is mainly to ensure the consistency of the data and provide the shared data control function. Support row-level and page-level locking mechanisms, while providing services for the transaction management subsystem. - Shared memory
shared memory subsystem I think we should basically see the name to know is to do something, that is used to manage shared maintenanceCache
andBuffer
, in order to enhance system performance and provides data caching services. - Log system The
log system mainly serves the transaction management system. In order to ensure transaction consistency,Berkeley DB
it also adopts the strategy of writing the log first and then writing the data. It is generally used at the same time as the transaction management system and closed at the same time.
Based on Berkeley DB
the properties, it is difficult to use like Memcached
that to him and MySQL
database so tightly bound. Data maintenance and update operations mainly need to be completed through applications. In general, use MySQL
should use both the Berkeley DB
main reason is to improve the performance and scalability of the system. So, most of the time mainly use Hash
and B-Tree
data storage format of these two structures, in particular the Hash
format is the most widely used, because this approach is the most efficient access.
In the application, for each request, we are first set in advance Key
to Berkeley DB
the time taken to find if the data exists, the data acquisition is returned, if the bit data is retrieved, the database is read again. Then the read data according to a predefined Key
, stored in the whole Berkeley DB
, and then returned to the client. And when the data modification occurs, modify the application MySQL
must also be after the data in the Berkeley DB
data deleted. Of course, if you prefer, you can directly modify the Berkeley DB
data, but this may introduce more risk and improve data consistency complexity of the system.
From the principle point of view, the use of Berkeley DB
methods and will Memcached
as pure Cache
to use little difference Well, why do not we Memcached
do it? In fact, there are two main reasons, one is Memcached
the use of pure memory to store data, and Berkeley DB
you can use the physical disk, or the two are quite different in terms of cost. Another reason that Berkeley DB
can support data is stored in addition to Memcached
being used Hash
outside the storage format, but also can use other storage formats, such as B-Tree
and the like.
And because the Memcached
basic principle of the use of the difference is not big, so there is no longer a schematic drawing.