These 6 most common high-concurrency caching problems in job interviews, how many do you know?

Preface

Generally speaking, the current Internet application website or APP, its overall flow can be shown in our picture, the user request starts, from this interface is the innermost browser and APP, to the network forwarding, and then to the application Services, and finally storage, this may be a database file system, and then return to the interface to present the content.
Insert picture description here
With the popularization of the Internet, content and information become more and more complex, and the number of users and visits is increasing. Our applications need to support more concurrency. At the same time, the calculations done by our application servers and database servers are also increasing. However, the resources of our application servers are often limited, and technological changes are slow, so the number of requests that can be received per second is also limited, or the read and write of files is also limited.

How can we effectively use limited resources to provide the largest possible throughput? An effective way is to introduce a cache to break the standard process in the figure. In each link, the request can directly obtain the target data from the cache and return it, thereby reducing their calculations, effectively improving the response speed, and allowing limited resources to serve More users, like the use of the cache shown in this picture, can actually appear in every link from 1 to 4.

Due to time constraints, it may not be written in detail. Friends who need the full version can click the link below to get it for free

Link: 1103806531 Password: CSDN

Insert picture description here

1 Cache consistency problem

When the data timeliness requirements are high, it is necessary to ensure that the data in the cache is consistent with that in the database, and it is necessary to ensure that the data in the cache node and the copy are also consistent, and there should be no differences.

This is more dependent on the expiration and update strategy of the cache. Generally, when the data changes, the data in the cache is actively updated or the corresponding cache is removed.

Insert picture description here

2 Cache concurrency issues

After the cache expires, it will try to get data from the back-end database, which is a seemingly reasonable process. However, in a high-concurrency scenario, it is possible that multiple requests concurrently obtain data from the database, which will have a great impact on the back-end database and even lead to an "avalanche" phenomenon.

In addition, when a cache key is being updated, it may also be obtained by a large number of requests, which will also cause consistency problems. How to avoid similar problems?

We will think of a mechanism similar to "lock". In the case of cache update or expiration, first try to acquire the lock, and release the lock after the update or acquisition from the database is completed. Other requests only need to sacrifice a certain waiting time. Continue to get data directly from the cache.

Insert picture description here

3 Cache penetration problem

Cache penetration is also called "breakdown" in some places. Many friends' understanding of cache penetration is that a large number of requests penetrate to the back-end database server due to cache failure or cache expiration, which has a huge impact on the database.

This is actually a misunderstanding. The real cache penetration should look like this:

In a high-concurrency scenario, if a key is accessed by high-concurrency and is not hit, it will try to get it from the back-end database for fault tolerance, resulting in a large number of requests reaching the database, and when the key corresponds to When the data itself is empty, this causes a lot of unnecessary query operations to be executed concurrently in the database, which leads to huge impact and pressure.

There are several common ways to avoid traditional caching problems:

(1) Cache empty objects

Objects whose query results are empty are also cached. If it is a collection, an empty collection (not null) can be cached. If a single object is cached, it can be distinguished by field identification. This avoids the request to penetrate the back-end database.

At the same time, it is also necessary to ensure the timeliness of cached data. This method is less expensive to implement and is more suitable for data that is not high in hits but may be updated frequently.

(2) Separate filtering treatment

All keys that may correspond to empty data are stored uniformly, and intercepted before the request, so as to avoid the request from penetrating to the back-end database. This method is relatively complicated to implement, and is more suitable for data with low hits but infrequent updates.

Insert picture description here

4 Cache thrashing problem

The problem of cache thrashing may be referred to as "cache jitter" in some places, which can be regarded as a minor failure than "avalanche", but it will also cause shock and performance impact on the system for a period of time. It is generally caused by a cache node failure. The recommended method in the industry is to solve it through a consistent Hash algorithm.

5 Caching avalanche

Cache avalanche refers to the fact that a large number of requests arrive at the back-end database due to caching, which causes the database to crash, the entire system crashes, and a disaster occurs.

There are many reasons for this phenomenon. The above-mentioned "cache concurrency", "cache penetration", "cache thrashing" and other issues may actually cause cache avalanches. These problems may also be exploited by malicious attackers.

There is another situation. For example, at a certain point in time, the system preloaded cache periodically and concentratedly fails, which may also cause an avalanche. In order to avoid this periodic invalidation, different expiration times can be set to stagger the cache expiration, thereby avoiding centralized cache invalidation.

From the perspective of application architecture, we can reduce the impact through current limiting, degradation, fusing, etc., and we can also avoid this disaster through multi-level caching.

In addition, from the perspective of the entire R&D system process, stress testing should be strengthened to simulate real scenarios as much as possible to expose problems as early as possible to prevent them.

Insert picture description here

6 Cache bottomless phenomenon

This question was raised by the staff of Facebook. Facebook had 3000 memcached nodes around 2010, caching thousands of gigabytes of content.

They found a problem—memcached connection frequency and efficiency decreased, so they added memcached nodes. After adding them, they found that the problem caused by the connection frequency still exists and has not improved. This is called the "bottomless pit phenomenon."

Insert picture description here
At present, mainstream database, cache, Nosql, search middleware and other technology stacks all support "sharding" technology to meet the requirements of "high performance, high concurrency, high availability, and scalability".

Some are mapped to different instances through Hash modulus (or consistent Hash) on the client side, and some are mapped by range values ​​on the client side. Of course, some are done on the server side.

However, each operation may require network communication with different nodes to complete. The more instance nodes, the greater the overhead and the greater the impact on performance.

It can be avoided and optimized from the following aspects:

(1) Data distribution method

Some business data may be suitable for Hash distribution, and some businesses are suitable for range distribution, which can avoid network IO overhead to a certain extent.

(2) IO optimization

You can make full use of connection pool, NIO and other technologies to reduce connection overhead as much as possible and enhance concurrent connection capabilities.

(3) Data access method

Obtaining a large data set at one time will reduce the network IO overhead of obtaining a small data set multiple times.

Of course, the cache bottomless phenomenon is not common. It may not be encountered at all in most companies.

to sum up

Hope this article is helpful to everyone!

There are also various knowledge point modules to organize documents and more real interview questions from major factories. Friends in need can click the link below to get them for free

Link: 1103806531 Password: CSDN

Insert picture description here
Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_48655626/article/details/109027898