Architectural Thinking Growth Series Tutorials (3) - How "Cache" Copes with Traffic Peaks of Hundreds of Millions

background

Many large-scale Internet systems, such as e-commerce, social networking, news and other apps or websites, often have tens of millions or even hundreds of millions of daily users, and the peak traffic per minute is also hundreds of thousands or more. How to deal with such a high traffic peak in terms of architecture? You can decompress the system by using "caching" techniques.

content

cache usage

The main harm brought by traffic peaks to the system is that it will instantly cause a large amount of disk data to be read and searched. The usual data source is a database or file system. When the number of data accesses increases, excessive disk reads Fetching may eventually become the performance bottleneck of the entire system, or even overwhelm the entire database, resulting in serious consequences such as system freezing and service unavailability.

In a conventional application system, we usually search the database when needed, so the general structure of the system is as follows:

Conventional application system

When the amount of data is high, it is necessary to reduce the disk read and write operations in the database, so it is usually chosen to add a layer of cache between the business system and the database to reduce the access pressure on the database, as shown in the following figure:

Application system with added caching layer

When the amount of data is high, it is necessary to reduce the disk read and write operations in the database. Therefore, it is usually chosen to add a cache layer between the business system and the MySQL database to reduce the access pressure on the database.

The application of caching in actual scenarios is not so simple. Let's list some problems through several classic caching application scenarios.

Cache FAQ

1. Data consistency between cache and database

The mechanisms commonly used for cache processing are as follows:

  • Cache Aside mode.

This mode handles the cache usually first from the database cache query, if the cache does not hit, then look it up from the database.

There are three situations that can happen here:

  1. Cache hit: When the query finds that the cache exists, it is directly extracted from the cache.
  2. Cache invalidation: When there is no data in the cache, the source data is read from the database and then synchronized to the cache.
  3. Cache update: When there is a new write operation to modify the data in the database, it is necessary to invalidate the corresponding data in the cache after the write operation is completed for cache synchronization.

This Cache aside mode is usually the most commonly used mode in practical application development. But it does not mean that the cache processing of this mode must be perfect.

There are still flaws in this mode. For example, one is a read operation, but it does not hit the cache, and then fetches data from the database. At this time, a write operation occurs. After the database is written, the cache is invalidated, and then the previous read operation puts the old data into it. , so it will cause dirty data.

It is extremely difficult to completely guarantee data consistency in a distributed environment. We can only reduce the occurrence of this data inconsistency as much as possible.

  • Read Through mode.

It means that the application always requests data from the cache. If the cache has no data, it is responsible for retrieving the data from the database using the underlying provider plugin. After the data is retrieved, the cache updates itself and returns the data to the calling application.

One advantage of using Read Through is that we always use the key to retrieve data from the cache. The calling application does not know the database, and the storage party is responsible for its own cache processing, which makes the code more readable and clearer.

But this also has corresponding defects: developers need to write related program plug-ins, which increases the difficulty of development.

  • Write Through mode.

Similar to the Read Through mode, when the data is updated, it is updated in the Cache first, and if it is hit, the cache is updated first, and then the Cache side updates the database. If there is no hit, the data in the Cache is directly updated.

  • Write Behind Caching mode.

In this mode, data is usually written into the cache first, and then asynchronously written into the database for data synchronization.

Such a design can not only directly reduce our direct access to the data in the database, but also reduce the pressure. At the same time, multiple modifications to the database can be merged, which greatly improves the carrying capacity of the system.

However, this mode of processing cached data has certain risks. For example, when the cache machine is down, data may be lost.

What problems exist in the high concurrency scenario of caching?

After the cache expires, it will try to fetch data from the backend database, which is a plausible process.

However, in a high-concurrency scenario, multiple requests may concurrently obtain data from the database, causing a great impact on the back-end database, and even causing an "avalanche" phenomenon.

In addition, when a cache key is being updated, it may also be obtained by a large number of requests, which will also cause consistency problems.

So how to avoid similar problems? We will think of a mechanism similar to "lock". In the case of cache update or expiration, first try to acquire the lock, and then release the lock after the update or acquisition from the database is completed. Other requests only need to sacrifice a certain amount of waiting time. Continue fetching data directly from the cache.

  • Cache penetration problem.

Cache penetration is also known as "punching". Many friends' understanding of cache penetration is that due to cache failure or cache expiration, a large number of requests penetrate to the back-end database server, causing a huge impact on the database.

This is actually a misunderstanding. The real cache penetration should be like this:

In a high-concurrency scenario, if a certain key is accessed by high concurrency and is not hit, for fault tolerance considerations, it will try to obtain it from the back-end database, resulting in a large number of requests reaching the database, and when the corresponding key When the data itself is empty, this causes many unnecessary query operations to be executed concurrently in the database, resulting in huge impact and pressure.

The cache penetration problem can be avoided in the following common ways:

1. Cache empty objects

Objects whose query results are empty are also cached. If it is a collection, an empty collection (non-null) can be cached. If a single object is cached, it can be distinguished by field identification. This prevents requests from penetrating to the backend database. At the same time, it is also necessary to ensure the timeliness of cached data.

This method is less costly to implement and is more suitable for data that does not have a high hit rate but may be updated frequently.

2. Separate filter treatment

All the keys that may correspond to empty data are stored in a unified manner, and intercepted before the request, so as to avoid the request from penetrating to the back-end database.

This method is relatively complicated to implement, and is more suitable for data with low hits but infrequent updates.

  • Cache thrashing problem.

Also known as "cache thrashing", it can be seen as a milder failure than "avalanche", but it can also cause shock and performance impact on the system for a period of time. Generally, it is caused by a cache node failure.

The recommended method in the industry is to solve it through the consistent Hash algorithm.

  • An avalanche of caches.

It means that due to caching, a large number of requests arrive at the back-end database, causing the database to crash, the entire system to crash, and disasters to occur.

There are many reasons for this phenomenon. The problems mentioned above such as "cache concurrency", "cache penetration", and "cache thrashing" may actually lead to cache avalanche.

These issues can also be exploited by malicious actors. In another case, for example, at a certain point in time, the cache preloaded by the system fails periodically, which may also cause an avalanche. In order to avoid this periodic failure, you can set different expiration times to stagger the cache expiration, so as to avoid centralized cache failure.

From the perspective of application architecture, we can reduce the impact through current limiting, downgrading, fusing and other means, and we can also avoid such disasters through multi-level caching.

In addition, from the perspective of the entire R&D system process, stress testing should be strengthened, try to simulate real scenarios, and expose problems as early as possible to prevent them.

at last:

Cache technology is a technology commonly used in large-scale Internet system architecture.

In the process of designing the cache architecture, it is necessary to carry out targeted design according to the business scenario, simultaneously avoid problems such as cache delay, dirty data, and cache avalanche, and improve the high availability and robustness of the system.

 

Previous Chapter Tutorial

Architectural Thinking Growth Series Tutorials (2) - Application of CAP Theory in Large Internet Systems

The series of tutorials

Architectural Thinking Growth Series Tutorials

my column

 

 

At this point, all the introductions are over

 

 

-------------------------------

-------------------------------

 

My CSDN homepage

About me (personal domain name, more information about me)

My open source project collection Github

 

I look forward to learning, growing and encouraging together with everyone , O(∩_∩)O Thank you

Welcome to exchange questions, you can add personal QQ 469580884,

Or, add my group number  751925591 to discuss communication issues together

Don't talk about falsehood, just be a doer

Talk is cheap,show me the code

Guess you like

Origin blog.csdn.net/hemin1003/article/details/114928744