Distributed cache breakdown

Before talking about cache breakdown, let's recall the logic of loading data from the cache, as shown in the following figure

Therefore, if a hacker deliberately queries a piece of data that does not exist in the cache every time, it will cause each request to go to the storage layer to query, and the cache will lose its meaning. The database may hang under heavy traffic. This is cache breakdown.
The scene is shown below:

When normal people log in to the homepage, they hit data based on userID. However, the purpose of hackers is to destroy your system. Hackers can randomly generate a bunch of userIDs, and then send these requests to your server. These requests are cached. If it does not exist in the cache, it will pass through the cache and directly to the database, resulting in an abnormal database connection.

0 solutions

Here we give three sets of solutions, you can choose to use .

Before talking about the following three solutions, let's recall the setnx method of redis

SETNX key value

Set the value of key to value if and only if key does not exist.

If the given key already exists, SETNX does nothing.

SETNX is shorthand for "SET if Not eXists" (SET if not present).

Available Versions : >= 1.0.0

Time Complexity: O(1)

Return value: Set successfully, return 1. Setting failed, returns 0.

The effect is as follows

redis> EXISTS job                # job 不存在
(integer) 0
redis> SETNX job "programmer"    # job 设置成功
(integer) 1
redis> SETNX job "code-farmer"   # 尝试覆盖 job ，失败
(integer) 0
redis> GET job                   # 没有被覆盖
"programmer"

1. Use a mutex

This method is a common practice, that is, when the value obtained according to the key is empty, lock it first, then load it from the database, and release the lock after the load is complete. If other threads find that the acquisition of the lock fails, they will try again after sleeping for 50ms.

As for the type of lock, the single-machine environment uses the concurrent package Lock type, and the cluster environment uses distributed locks (setnx of redis)

The code of redis in the cluster environment is as follows:

String get(String key) {  
   String value = redis.get(key);  
   if (value  == null) {  
    if (redis.setnx(key_mutex, "1")) {  
        // 3 min timeout to avoid mutex holder crash  
        redis.expire(key_mutex, 3 * 60)  
        value = db.get(key);  
        redis.set(key, value);  
        redis.delete(key_mutex);  
    } else {  
        //其他线程休息50毫秒后重试  
        Thread.sleep(50);  
        get(key);  
    }  
  }  
}

advantage:

simple idea
Guaranteed Consistency

shortcoming

Code complexity increases
Risk of deadlock

2. Asynchronous build cache

Under this scheme, an asynchronous strategy is adopted to build the cache, and threads are taken from the thread pool to build the cache asynchronously, so that all requests will not be sent directly to the database. In this scheme, redis maintains a timeout by itself. When the timeout is less than System.currentTimeMillis(), the cache is updated, otherwise the value value is returned directly.
The redis code for the cluster environment is as follows:

String get(final String key) {  
        V v = redis.get(key);  
        String value = v.getValue();  
        long timeout = v.getTimeout();  
        if (v.timeout <= System.currentTimeMillis()) {  
            // 异步更新后台异常执行  
            threadPool.execute(new Runnable() {  
                public void run() {  
                    String keyMutex = "mutex:" + key;  
                    if (redis.setnx(keyMutex, "1")) {  
                        // 3 min timeout to avoid mutex holder crash  
                        redis.expire(keyMutex, 3 * 60);  
                        String dbValue = db.get(key);  
                        redis.set(key, dbValue);  
                        redis.delete(keyMutex);  
                    }  
                }  
            });  
        }  
        return value;  
    }

advantage:

Best value for money, users don't have to wait

shortcoming

Cache coherence is not guaranteed

3. Bloom filter

1. Principle

The great use of the Bloom filter is that it can quickly determine whether an element is in a set. Therefore, he has the following three usage scenarios:

The web crawler deduplicates the URL to avoid crawling the same URL address
Anti-spam, determine whether a mailbox is a spam mailbox from billions of spam lists (similarly, spam text messages)
Cache breakdown, put the existing cache in the Bloom filter, and quickly return to avoid cache and DB hang when hackers access the non-existent cache.

OK, let's talk about the principle of the Bloom filter.
It maintains a bit array of all 0s. It should be noted that the Bloom filter has a concept of false positive rate. The lower the false positive rate, the more the array The longer it is, the more space it takes up. The higher the false positive rate, the smaller the array and the smaller the space occupied.

2. Performance test

code show as below:

(1) Create a new maven project and introduce the guava package

<dependencies>  
        <dependency>  
            <groupId>com.google.guava</groupId>  
            <artifactId>guava</artifactId>  
            <version>22.0</version>  
        </dependency>  
    </dependencies>

(2) The time required to test whether an element belongs to a million-element set

package bloomfilter;
import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;
import java.nio.charset.Charset;
public class Test {
    private static int size = 1000000;
    private static BloomFilter<Integer> bloomFilter = BloomFilter.create(Funnels.integerFunnel(), size);
    public static void main(String[] args) {
        for (int i = 0; i < size; i++) {
            bloomFilter.put(i);
        }
        long startTime = System.nanoTime(); // 获取开始时间
        //判断这一百万个数中是否包含29999这个数
        if (bloomFilter.mightContain(29999)) {
            System.out.println("命中了");
        }
        long endTime = System.nanoTime();   // 获取结束时间
        System.out.println("程序运行时间： " + (endTime - startTime) + "纳秒");
    }
}

The output looks like this

hit
Program runtime: 219386 nanoseconds

That is to say, it only takes 0.219ms to determine whether a number belongs to a million-level set, and the performance is excellent.

(3) Some concepts of false positive rate

First of all, let's do a test without setting the display of the false positive rate. The code is as follows

package bloomfilter;
import java.util.ArrayList;
import java.util.List;
import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;
public class Test {
    private static int size = 1000000;
    private static BloomFilter<Integer> bloomFilter = BloomFilter.create(Funnels.integerFunnel(), size);
    public static void main(String[] args) {
        for (int i = 0; i < size; i++) {
            bloomFilter.put(i);
        }
        List<Integer> list = new ArrayList<Integer>(1000);  
        //故意取10000个不在过滤器里的值，看看有多少个会被认为在过滤器里
        for (int i = size + 10000; i < size + 20000; i++) {  
            if (bloomFilter.mightContain(i)) {  
                list.add(i);  
            }  
        }  
        System.out.println("误判的数量：" + list.size()); 
    }
}

The output is as follows

Number of false positive pairs: 330

If the above code shows that we deliberately take 10000 values that are not in the filter, but 330 are considered to be in the filter, this shows that the false positive rate is 0.03. That is, without doing any settings, The default false positive rate is 0.03.
Here's the source code to prove it:

Next, let's take a look. When the false positive rate is 0.03, the length of the bit array maintained by the bottom layer is shown in the following figure

Change the constructor of bloomfilter to

private static BloomFilter<Integer> bloomFilter = BloomFilter.create(Funnels.integerFunnel(), size,0.01);

That is, the misjudgment rate at this time is 0.01. In this case, the length of the bit array maintained by the underlying layer is shown in the following figure

It can be seen that the lower the false positive rate, the longer the array maintained at the bottom layer and the larger the space occupied. Therefore, the actual value of the false positive rate is determined according to the load that the server can bear, and it is not a blind guess.

3. Actual use

The redis pseudocode is as follows

String get(String key) {  
   String value = redis.get(key);  
   if (value  == null) {  
        if(!bloomfilter.mightContain(key)){
            return null;
        }else{
           value = db.get(key);  
           redis.set(key, value);  
        }
    } 
    return value；
}

advantage:

simple idea
Guaranteed Consistency
Strong performance

shortcoming

Code complexity increases
An additional collection needs to be maintained to store the cached Key
Bloom filter does not support delete operation

4 Summary

In the summary section, come to a cartoon. Hope it helps you find a job