Cache architecture design details two or three things

This article mainly discusses the following issues:

(1) The origin of the requirement of "cache and database"

(2) "elimination of cache" or "update of cache"

(3) operation timing

of cache and database (4) brief analysis of cache and database architecture



1 . Origin of Demand

Scenario Introduction

Cache is a common technology to improve system read performance. For application scenarios with more reads and fewer writes, we often use cache to optimize.

For example, for the user's balance information table account(uid, money), the business requirements are:

(1) Query the user's balance, SELECT money FROM account WHERE uid=XXX, accounting for 99% of the requests

(2) Change the user's balance, UPDATE account SET money=XXX WHERE uid=XXX, accounting for 1% of requests



Since most of the requests are queries, we create a key-value pair from uid to money in the cache, which can greatly reduce the pressure on the database.

Read operation

process After the data is stored in the database and the cache (uid->money), whenever the relevant data needs to be read (money), the operation process is generally as follows:

(1) Whether there is any related data in the read cache Data, uid->money

(2) If there is relevant data money in the cache, return [this is the so-called data hit "hit"]

(3) If there is no relevant data money in the cache, read the relevant data money from the database [ This is the so-called data miss "miss"], put it in the cache uid->money, and then return the

cache hit rate = number of hit cache requests/total number of cache access requests = hit/(hit+miss)

In the balance scenario of the above example, 99% of reads and 1% of writes, the hit rate of this cache is very high, which will be above 95%.



Then the question comes

when the data money changes:

(1) Should the data in the cache be updated, or should the data in the cache be eliminated?

(2) Should the data in the database be manipulated first and then the data in the cache, or should the data in the cache be manipulated first and then the data in the database?

(3) Is there any room for optimization in the architecture of cache and database operations?

These are the three core issues that this paper focuses on.



2. Update cache VS cache cache

What is update cache: data is not only written to the database, but also written to the cache

What is cache cache: data will only be written to the database, not written to the cache, only the data will be eliminated



Update the cache Advantages: The cache will not increase a miss, and the hit rate is high. Advantages of

eliminating the cache: simple (I will go, I also think it is very simple to update the cache, the landlord is too perfunctory)



So whether to choose to update the cache or to eliminate the cache, the main Depends on the "complexity of updating the cache".

For example, in the above scenario, the balance money is simply set to a value, then:

(1) The operation of eliminating the cache is deleteCache(uid)

(2) The operation of updating the cache is setCache(uid, money)

The cost of updating the cache is very small , at this time, we should be more inclined to update the cache to ensure a higher cache hit rate.



If balance is calculated from very complex data, for example, in addition to the account table in the business, there are also the product table product and the discount table discount

account (uid, money)

product(pid, type, price, pinfo)

discount(type, zhekou) The

business scenario is that the user buys a product product, the price of this product is price, this product belongs to the type product, and the type product needs to be discounted in promotional activities zhekou, after purchasing the product, the calculation of the balance is complicated. You need to:

(1) First take out the category and price of the product: SELECT type, price FROM product WHERE pid=XXX

(2) Then take the discount of this category Come out: SELECT zhekou FROM discount WHERE type=XXX

(3) Then query the original balance from the cache money = getCache(uid)

(4) Then write the new balance into the cache to setCache(uid, money-price *zhekou)

Updating the cache is expensive, and we should be more inclined to evict the cache at this time.



However, the operation of eliminating the cache is simple, and the side effect is only an increase of a cache miss, which is recommended as a general approach.



3. Operating the database first vs. operating the cache first is

OK. When a write operation occurs, assuming that the cache is eliminated as a common processing method for the cache, there are two choices:

(1) Write the database first, then eliminate the cache

(2) Eliminate the cache first , and then write the database

What kind of timing is used?



Remember the conclusion of "Write the positive table first or write the reverse table first" in the article "How to Ensure Data Consistency with Redundant Tables" (click to view)?

For an operation that cannot be guaranteed to be transactional, it must involve the problem of "which task to do first and which task to do later". The direction to solve this problem is:

if there is inconsistency, whoever does it first has less impact on the business, and whoever executes it first .



Since the atomicity of writing to the database and eliminating the cache cannot be guaranteed, whoever comes first must also follow the above principles.




Assuming that the database is written first, and then the cache is eliminated: the first step of writing to the database is successful, and the second step of eliminating the cache fails, there will be new data in the DB and old data in the cache, and the data is inconsistent.






Assuming that the cache is eliminated first, and then the database is written: if the first step is successful in eliminating the cache, and the second step is unsuccessful in writing to the database, only one cache miss will be caused.



Conclusion: The operation timing of data and cache, the conclusion is clear: first eliminate the cache, and then write to the database.



4. Cache Architecture Optimization




The above cache architecture has a disadvantage: the business side needs to pay attention to both the cache and the DB. Is there any room for further optimization? There are two common schemes, a mainstream scheme and a non-mainstream scheme (one family's words, do not shoot).






The mainstream optimization solution is service-oriented: adding a service layer, providing a handsome data access interface to the upstream, and shielding the details of the underlying data storage to the upstream, so that the business line does not need to pay attention to whether the data comes from the cache or the DB.




The non-mainstream solution is asynchronous cache update: all write operations of the business line go to the database, all read operations are always cached, and an asynchronous tool is used to synchronize data between the database and the cache. The specific details are:

(1) To There is an init cache process, which writes all the data that needs to be cached into the cache

(2) If there is a write operation in the DB, the asynchronous updater reads the binlog and updates the cache

. With the cooperation of (1) and (2), the cache has all the data, so:

(a) The business line reads the cache, it must be able to hit (in a short period of time, there may be dirty data), no need to pay attention to the database

(b) When the business line writes to the DB, the cache can be updated asynchronously, and there is no need to pay attention to the cache

. This will greatly simplify the calling logic of the business line. The disadvantage is that if the business logic of the cached data is complex, the logic of async-update asynchronous update may be will be more complicated.



V. Other unfinished matters

This article only discusses a few details that need to be paid attention to in the design of the cache architecture. If the database architecture adopts the architecture of one master and multiple slaves, and the read-write separation, under special timing, it is likely to cause the database and cache The inconsistency, how to optimize this inconsistency, will be discussed in subsequent articles.



6. The conclusion emphasizes that

(1) Eliminating the cache is a common cache processing method

(2) The timing of first eliminating the cache and then writing to the database is unquestionable

(3) Servicing is to shield the complexity of the underlying database and cache from the business side A general way


http://mp.weixin.qq.com/s?__biz=MjM5ODYxMDA5OQ==&mid=404087915&idx=1&sn=075664193f334874a3fc87fd4f712ebc&scene=21#wechat_redirect

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326713323&siteId=291194637