Redis project practice - merchant query cache

Why use Redis to implement merchant query caching?

  • Cache is a buffer for data exchange and a temporary area for storing frequently used data. It requires high read and write performance. The high performance characteristics of redis based on memory read and write exactly meet this requirement.
  • Using redis to implement caching can reduce back-end load and improve reading and writing efficiency.
  • Increasing the cache function will increase costs, such as data consistency costs, code maintenance costs, operation and maintenance costs, etc. Therefore, appropriate redis functions and mechanisms must be selected according to business scenarios.

The basic idea of ​​using Redis to implement merchant query caching?

Practical video

  • If there is no cache, when the client frequently sends requests to query a piece of data, each request will directly reach the database, which will put a lot of pressure on the database.
  • Redis cache is equivalent to adding an interception between the client and the database. When the client sends a request to query data, the Redis query is reached first. If there is a hit, the query result is directly returned, so the request will not reach the database, reducing the database load. pressure
  • If the request in Redis does not hit, the request will continue to the database and the database query results will be returned to the client.
  • If the client repeatedly requests this data that does not exist in Redis, it will still request the database multiple times. This is not the result we want. Therefore, the database should return the query results to the client and write the query results to redis at the same time. Caching
    Insert image description here

Problems and solutions when using Redis cache?

1. How to maintain the consistency of database data and Redis cache data?

  • Data consistency requirements are low: the frequency of data changes is low, and the memory elimination mechanism is enough.
  • Data consistency requirements are high: the frequency of data changes is high, active updates are used, and timeout elimination is used as a guarantee.

1 Memory elimination mechanism

  • In order to solve the problem of insufficient memory, redis has a built-in function that will automatically eliminate some data when the memory is insufficient. The cache will be updated the next time the data is queried, which can maintain data consistency to a certain extent.
  • Advantages: Maintenance cost is almost 0, Redis manages the whole process itself
  • Disadvantages: I don’t know which data will be eliminated and when, so the data consistency is relatively poor.

2 Timeout elimination mechanism

  • Add a TTL time to the cached data, automatically delete the cache after expiration, and update the cache the next time you query it.
  • Advantages: Low maintenance cost, only need to add an expiration time based on the original data
  • Disadvantages: You can control which data will be eliminated when. Data consistency is better than the memory elimination mechanism, but it is not strong either.

3 Active update mechanism (win)

  • Update the cache while modifying the data in the database
  • Advantages: better data consistency
  • Disadvantages: High maintenance costs, you need to write a lot of business logic yourself, which is difficult

How to implement active update mechanism?

  • The first solution implemented: Write your own code and update the cache while updating the database (most commonly used in actual business).
  • The second solution is to integrate the cache and database into a service. You only need to call the service, and the data consistency is implemented internally by the service itself. However, it is difficult to maintain the service and it is difficult to find a ready-made service.
  • The third implementation plan: the caller does not care about the database, only adds, deletes, modifies and checks in the cache, and the cache always maintains the latest data. A dedicated thread updates the cached data to the database asynchronously to ensure that the data is ultimately consistent. The asynchronous mechanism can greatly improve efficiency. The main scenario is that if a certain data in the cache is updated n times, the asynchronous thread will check whether the data in the cache has changed, and then write the latest data to the database, which is equivalent to The database is only updated once, and reading and writing the database is more laborious than reading and writing the cache, so the time wasted in updating the database n-1 times is saved. The disadvantages are the cost of maintaining asynchronous threads, loss of data when Redis is down, and completely inconsistent data before the asynchronous thread updates the database.

When operating cache and database should I update the cache or delete the cache?

  • Update cache: Every time the database is updated, the cache needs to be updated. There are many invalid write operations for the cache.
  • Delete cache: Delete the data in the cache when updating the database. It will only be deleted once after n updates. Only when the client uses the data, accesses the redis cache. If there is no such data in redis, the latest data will be fetched from the database and written to the cache. .

How to ensure the atomicity of cache and database operations?

  • This means that when updating the database, you must ensure that deleting the cache is also successfully executed. Either both succeed or both fail.
  • For a monolithic system: put cache and database operations in one transaction
  • For distributed systems: utilize distributed transaction solutions such as TCC

Should I delete the cache first or operate the database first?

  • In a multi-threaded scenario, different execution orders of the two will cause different consequences.
  • Delete the cache first: Assume that thread 1 deletes the cache first and then updates the database. Thread 2 suddenly requests when thread 1 has just deleted the cache and has not updated the database. At this time, the data in the cache has been deleted, and thread 2 will query the database, but Thread 1 has not updated the database, so thread 2 finds old data and writes the old data to the cache according to the process. At this time, thread 1 updates the database again, which will cause data inconsistency.
  • Update the database first: Assume that thread 1 updates the database first and then deletes the cache. Thread 2 makes a request before thread 1 updates the database. If the cache fails for some reason, thread 2 queries the database and returns old data to the client. At this time, thread 2 updates normally. The database deletes the cache, and thread 1 continues to write the old data just read from the database into the cache, which will cause data inconsistency.
  • The possibility of data inconsistency caused by updating the database first is lower, and the TTL can be added to the cache, which can avoid the occurrence of this problem to a certain extent, so it shouldUpdate the database first and then delete the cache
    Insert image description here

2. Cache penetration?

The data requested by the client is not in the cache or the database, so the cache cannot be established. If the non-existent data is requested multiple times, all these requests will hit the database, putting pressure on the database.

Solution 1 Cache empty objects

  • Simple brute force, returning an empty value of null. The next request will hit null in the cache and will not continue to request the database.
  • Advantages: simple to implement and easy to maintain
  • Disadvantage 1: If there are many different requested data that do not exist, many nulls will be stored in the cache, causing additional memory consumption. Solution: You can also add a TTL to null to avoid long-term occupation of cache space.
  • Disadvantage 2: The data obtained by the request before null expires is null. If the database updates the data at this time, it will cause short-term inconsistency. Solution: Use the active update mechanism to replace null while updating the database.

Solution 2 Bloom filter

  • Add a Bloom filter between the client and Redis. When the client sends a request, the Bloom filter first determines whether the data exists. If it exists, it will be released. The request will continue to execute the normal process. If it does not exist, it will be rejected. Disable frequent requests to cache and database
  • Bloom filter is a bit array, each bit is either 0 or 1, which can quickly determine whether an element exists.
  • Store data: map the data to k positions of the bit array according to k different hash functions, and set the k positions to 1
  • Determine the data: Map the data to k positions of the bit array based on k different hash functions. If there are 0s in the k positions, then the data definitely does not exist. Intercept. If the k positions are all 1, then the data may be Exists (hash may conflict, so there is a misjudgment), allowed to pass
  • Advantages: fast, takes up little space
  • Disadvantages: complex implementation, misjudgment exists

Insert image description here

3. Cache avalanche?

A large number of keys in the redis cache are invalid or expired at the same time, or the redis server is down, and a large number of requests directly reach the database, causing huge pressure on the database.

Solution 1 Random TTL value

If you set the same TTL for a large number of keys, a large number of keys will expire at the same time. Therefore, you can add random TTLs that are evenly distributed to different keys.

Solution 2 redis cluster

If one redis is down, there will be other redis

Solution 3 Service interruption and degradation

  • When redis goes down, the redis cache service is suspended (circuit breaker) to prevent avalanches.
  • If redis explodes directly, the service can only be downgraded, the request will be ignored, and an error will be returned directly.

Solution 4 Multi-level caching

Establish caches at multiple levels (such as browsers, databases, etc.). The first-level cache and the second-level cache have different expiration times. If the first-level cache fails, use the second-level cache.

4. Cache breakdown? Hotspot key problem?

Some hot keys (keys that are accessed highly concurrently and have complex cache reconstruction operations) are invalid or expired in the cache. If the first request fails, the database will be queried and written to the cache. However, if there are a large number of the same keys during this period, When requests flood in, a large number of requests will flood into the database for query, causing huge pressure on the database.

Solution 1 Mutex lock

  • When thread 1 misses the query cache, it tries to acquire the lock. Only after the acquisition is successful can it query the database and release the lock after writing to the cache. During this period, when other same requests fail to query the cache, they will also try to acquire the lock, but if the acquisition fails, they will wait for a period of time and try to query the cache again and acquire the lock.
  • Advantages: No additional memory consumption, better data consistency, simple implementation
  • Disadvantages: Other threads need to wait until the thread that acquired the lock writes to the cache or releases the lock, and there is a risk of deadlock.
    Insert image description here

Solution 2 Logical expiration

  • Instead of setting a TTL, set a logical expiration time. TTL is equivalent to a countdown, and the logical expiration time is equivalent to a specific time.
  • When thread 1 queries the cache, it finds that the logical time of the cache has expired, so it tries to obtain the mutex lock and starts a new thread 2 to query the database and rebuild the cache. There is no need to wait for thread 2 to finish executing and directly returns expired data.
  • If Thread 3 also queries the cache immediately after Thread 1 and finds that the logical time has expired, it also tries to acquire the mutex lock, but at this time the lock is occupied by Thread 2 and the acquisition fails, then it gives up directly and returns the expired data.
  • Only thread 4 that comes after the lock is released can hit the non-expired data in the cache
  • Advantages: Threads do not need to wait, performance is better
  • Disadvantages: Poor data consistency, additional memory consumption, complex implementation

Insert image description here

Guess you like

Origin blog.csdn.net/weixin_46838605/article/details/132562299