Three questions you must know about Redis interviews!

Author: _BKing Address: cnblogs.com/xiaowei123/p/13211403.html

I haven’t watched Redis recently , now I’m back to review it, and now start from the three major caches of Redis to explore how deep and how shallow it is ( ^▽^ )

Let me start the initiation of knowledge! It's time to perform real technology. (Beep beep beep...)

Tiezi, for the sake of the hard work of Erha, if you think this article is helpful to you, please move your little hand, than a ❥(^_-) love recommendation.

Next, we will start our journey of the three major cache problems of Redis . Let us take a small spaceship in Erha and take a tour of the Saint-Jeans Peak.

There are three must-know concepts in Redis cache: cache penetration, cache breakdown and cache avalanche.

Cache penetration

So what is cache penetration, it refers to when the user is querying a piece of data, and at this time the database and cache do not have any records about this piece of data, and if this piece of data is not found in the cache, it will request it from the database. data. When it cannot get the data, it will always query the database, which will cause a lot of pressure on the database access.

For example, if a user queries a product information with id = -1, the id value of a database generally increases from 1. Obviously this information is not in the database. When no information is returned, it will always query the database for the current database. Caused a lot of visit pressure.

At this time, we have to think about how to solve this problem? o(╥﹏╥)o

Generally, we can think of starting from the cache. If we set a cache for information that does not exist in the current database, cache it as an empty object and return it to the user.

^_^Yes, this is a solution, which is what we often call caching empty objects (code maintenance is simple, but the effect is not very good).

Redis also provides us with a solution, which is the Bloom filter (code maintenance is more complicated and the effect is quite good).

Next, Erha first explain these two solutions:

Cache empty objects

So what is caching empty objects, two ha! Don’t worry, caching an empty object means that a request is sent. If the relevant information for this request does not exist in the cache and the database at this time, the database will return an empty object and associate the empty object with the request. Save it in the cache. When the request comes next time, the cache will hit, and the empty object will be returned directly from the cache. This can reduce the pressure of accessing the database and improve the access performance of the current database. We can look at the following process next~

At this time, we will ask , If a large number of non-existent requests come, then the cache will cache many empty objects at this time?

That's right! This is also a problem that the use of caching empty objects will cause: if it takes a long time this will lead to a large number of empty objects in the cache, which will not only occupy a lot of memory space, but also waste a lot of resources! . Is there any solution to this? Let's think about it: Can we clean up these objects after a period of time?

Hmm, that's right! Think about whether Redis provides us with an expiration time command ( ^▽^ ), so that we can set an empty object time, by the way, set an expiration time, we can solve the problem!

In addition to the nickname, you can pay attention to the public account Java technology stack and reply to the benefits in the background to get a copy of the latest interview question materials I have compiled.

setex key seconds valule:设置键值对的同时指定过期时间(s)

Just call the API operation directly in Java:

redisCache.put(Integer.toString(id), null, 60) //过期时间为 60s

Bloom filter

Is the Bloom filter a filter? It filters things! Oops, you are too smart. Yes, it is used to filter things. It is a data structure based on probability. It mainly uses love to determine whether an element is currently in the set. It runs fast. We can also simply understand it as an inaccurate set structure (set has the effect of deduplication).

But there is a small problem: when you use its contains method to determine whether an object exists, it may misjudge. In other words, the Bloom filter is not particularly inaccurate, but as long as the parameters are set reasonably, its accuracy can be controlled relatively accurately, and there will only be a small probability of misjudgment (this is acceptable~). When the Bloom filter says that a certain value exists, the value may not exist; when it says it does not exist, it must not exist.

Here is a typical example from Qian Da:

For example, when it says that it does not know you, it must not know it; when it says that it has met you, it may not have met at all, but because your face is similar to a certain face of the people it knows (some Coefficient combination of some familiar faces), so I have seen you before.

In the above usage scenarios, the Bloom filter can accurately filter out the content that has already been viewed, and those new content that has not been viewed, it will also filter out a very small part (misjudgment), but most of the new content it Can be accurately identified. In this way, it can be fully guaranteed that the content recommended to users is non-repetitive.

After talking for so long, what are the characteristics of the Bloom filter :

Features? Let’s let me blow with you one by one (Blowing until you doubt life (≧∇≦)ノ)

  1. A very large binary bit array (only 0 and 1 exist in the array)

  2. Have several hash functions (Hash Function)

  3. Very high in space efficiency and query efficiency

  4. Bloom filter does not provide a delete method, and it is more difficult to maintain the code.

Each Bloom filter corresponds to the Redis data structure, which is a large bit array and several different unbiased hash functions. The so-called unbiased means that the hash value of the element can be calculated more uniformly. For details, please refer to this article: What is the use of Bloom filters? .

When adding a key to the Bloom filter, multiple hash functions are used to hash the key to obtain an integer index value, and then the bit array length is modulized to obtain a position. Each hash function calculates a different position. Then set these positions of the bit array to 1 to complete the add operation. (Each key is mapped to a huge bit array through several hash functions. After the mapping is successful, the corresponding position on the bit array will be changed to 1.)

So why does Bloom filter have a false positive rate?

Is it a misjudgment? In life, there is no wrestling, as long as you swing your hoe well, you can still dig. (Cough cough cough, it's wrong...)

In fact, it will misjudge as follows:

When the position of key1 and key2 mapped to the bit array is 1, suppose there is a key3 at this time, and you want to query whether it is in it, and it happens that the corresponding position of key3 is also mapped to the middle, then the bloom filter will think it exists Yes, a misjudgment will occur at this time (because it is clear that key3 is not there).

O(∩_∩)O haha~, at this time you will ask: How to improve the accuracy of the Bloom filter?

To improve the accuracy of the Bloom filter , we must talk about three important factors that affect it:

  1. The quality of the hash function

  2. Storage space size

  3. Number of hash functions

The design of the hash function is also a very important issue. A good hash function can greatly reduce the false positive rate of the Bloom filter.

(This is like a good accessory that runs so smoothly because of its proper internal design.)

At the same time, for a Bloom filter, if its bit array is larger, then the location of each key mapped by the hash function will become much sparser, not so compact, which is beneficial to improve the accuracy of the Bloom filter .

At the same time, for a Bloom filter, if the key is mapped through many hash functions, then there will be marks in many positions on the bit array, so that when the user queries, when looking through the Bloom filter, The misjudgment rate will be reduced accordingly.

For its internal principles, interested students can take a look at the mathematical knowledge about Bloom Filter, which contains its design algorithm and mathematical knowledge. (Actually it's quite simple~)

Cache breakdown

Cache breakdown refers to a key that is frequently queried and is often given special care by users. Users love it very much ( ^▽^ ), which is analogous to a "familiar customer" or a key that is often not accessed. Recommended reading: Three major caching problems and solutions .

But at this time, if the key expires at the expiration time of the cache or when it is an unpopular key, suddenly there are a large number of access requests for this key, which will cause large concurrent requests to directly penetrate the cache, request the database, and instantly respond The database access pressure increases.

To sum up: there are two reasons for cache breakdown.

(1) A "unpopular" key is suddenly requested to be accessed by a large number of users.

(2) A "hot" key just expires in the cache, and a large number of users are visiting at this time.

For the problem of cache breakdown: our common solution is to lock. When the key expires, add a lock when the key wants to query the database. At this time, only the first request can be made to query the database, and then the value queried from the database is stored in the cache. For the rest The same key can be obtained directly from the cache.

If we are in a stand-alone environment: directly use commonly used locks (such as Lock , Synchronized, etc.). In a distributed environment, we can use distributed locks, such as distributed locks based on databases, Redis or zookeeper .

Cache avalanche

Cache avalanche means that the centralized cache expires in a certain period of time. If there are a large number of requests during this period of time, and the amount of query data is huge, all requests will reach the storage layer, and the amount of storage layer calls will increase sharply, causing the database Excessive pressure or even downtime.

the reason:

  1. Redis is down suddenly

  2. Most data is invalid

Give an example to understand:

For example, we have basically experienced shopping carnivals, suppose the merchants hold 23:00-24:00 merchandise fracture promotion activities. When designing the program, the little brother of the program puts the broken goods of the merchant in the cache at 23:00, and sets the expiration time to 1 hour through the expire of redis.

Many users visit these product information, purchases, etc. during this time period. But at 24:00, many users happened to be accessing these products. At this time, the access to these products would fall on the database, causing the database to resist huge pressure. A little carelessness will cause the database Direct downtime (over).

This is when the product has not expired:

When GG is cached (invalidated), it looks like this:

There are the following solutions for cache avalanche:

(1) Redis is highly available

Redis may hang, add a few more redis instances, (one master, multiple slaves or multiple masters and multiple slaves), so that after one hangs up, the others can continue to work, which is actually a built cluster.

(2) Current limit downgrade

After the cache is invalid, the number of threads that read the database and write the cache is controlled by locking or queues. For a key, only one thread is allowed to query data and write the cache, and other threads wait.

(3) Data warm-up

The meaning of data heating is that before the formal deployment, I first visit the possible data first, so that part of the data that may be accessed in a large amount will be loaded into the cache. Before a large concurrent access is about to happen, manually trigger the loading of different keys in the cache.

(4) Different expiration time

Set different expiration times to make the time points of cache invalidation as even as possible.

(Interested friends, please help me with a recommendation (*  ̄3)(ε ̄) Thank you, (づ ̄3 ̄)づ╭❤~ I love you)

Recommend to read more on my blog:

1. A series of tutorials on Java JVM, collections, multithreading, and new features

2. Spring MVC, Spring Boot, Spring Cloud series of tutorials

3. Maven, Git, Eclipse, Intellij IDEA series tool tutorial

4. The latest interview questions from major manufacturers such as Java, back-end, architecture, and Alibaba

Feel good, don’t forget to like + forward!

Finally, pay attention to the WeChat official account of the stack leader: Java technology stack, reply: Welfare, you can get a free copy of the latest Java interview questions I have compiled for 2020. It is really complete (including answers), without any routine.

Guess you like

Origin blog.csdn.net/youanyyou/article/details/108488194