"Easing the bit interviewer" series - Cache avalanche breakdown, penetration

The more you know, the more you do not know

Thumbs up again, a habit

Foreword

Redis such a broad use of Internet technology in storage, almost all of the back-end technology interviewer should be in Redis of small partners were making things difficult for the use of 360 ° and the principle of respect. As a first Internet companies to take on a plane surface Pa offer of ( 请允许我使用一下夸张的修辞手法), beat numerous competitors, each time only to see the countless lonely figure left disappointed, somewhat guilty, at night a lonely days, I learn from their mistakes , decided to start writing 《吊打面试官》the series, want to help you readers like gangbusters after the interview, the interviewer counterattack of 360 °, beating and hanging ask your interviewer, let the interview with colleagues staggering, crazy harvesting manufacturers offer!

A little emotion

Originally regarded manuscript into public numbers saved, bath time thought about the night of the game, think it is open the computer to write something, it does not matter with the content of the article, just a personal feeling, I do not know how many small partners saw yesterday SKT VS G2's game and I do not know how many small partners also remember Fakerthat scene hand-shake.

img

You do not know what it feels watched, I saw him when my heart tremor also shook, I support the World Cup is LPLthe team, but I like Lee brother man, the kind of dedication to win, for so many years the kind of stick to their insistence, but only so much temptation want to win in front of such people I love ah, I think a lot of people like it.

Like many as possible, heroes twilight friends said, but I think he was a little thing, like a lot of people said we could only eat the young programmers, like rice, but if you stick to their insistence, to be abdominal poetry and gas Aberdeen from China, I think the final will certainly get their get.

Well, I do not sensational, and we began to talk about technology now.

text

Beating and hanging on a series we mention the basics of Redis, has yet to read the small partners can look back

"Easing the bit interviewer" series -Redis basis

Mentioned that RedisI believe that in the interview, or the actual development process cache 雪崩, 穿透, 击穿is not familiar with it, but even if not met you've heard, that there is any difference between the three in the end, we should be how to prevent such it happens, we have requested the next victim.

Beginning of the interview

A paunchy, middle-aged man wearing a plaid shirt, holding a full scratch mac come to you, looking at the fast balding hair, thinking it is certainly Nima's top architects! But we abdominal poetry and gas from China, not virtual virtual.

img

I think the young man wrote Redis on your resume, then we direct straight to the point, direct hate several major problems common, Redis avalanche understand it?

Hello handsome charming interviewer, I know, the current electricity supplier will do home and hotspot data cache, the cache are usually timed task to refresh, or to update after finding out, there is a regular task refresh problem .

Here is a simple example : If all of Key Home expiration time is 12 hours, 12:00 refresh, I have a zero spike activity influx of large numbers of users, the assumption was 6000 requests per second, it may have been cached in each Kang Zhu s 5000 request, but the cache was all Key are ineffective. At this point one seconds off the 6000 request for all database, which inevitably could not carry, it will be reported at the police, the real situation may be no response came directly linked to the DBA. At this point, if any special programs to deal with this useless failure, DBA was in a hurry to restart the database, but the database has been a new flow rate immediately to the killing. This is my understanding of the cache avalanche.

我刻意看了下我做过的项目感觉再吊的都不允许这么大的QPS直接打DB去,不过没慢SQL加上分库,大表分表可能还还算能顶,但是跟用了Redis的差距还是很大

同一时间大面积失效,那一瞬间Redis跟没有一样,那这个数量级别的请求直接打到数据库几乎是灾难性的,你想想如果打挂的是一个用户服务的库,那其他依赖他的库所有的接口几乎都会报错,如果没做熔断等策略基本上就是瞬间挂一片的节奏,你怎么重启用户都会把你打挂,等你能重启的时候,用户早就睡觉去了,并且对你的产品失去了信心,什么垃圾产品。

The interviewer touched his hair, ah well, that this situation Zezheng? You are how to deal with?

Cache avalanche process is simple, in a batch to Rediswhen the data stored, the time of each Key's failure to add a random value just fine, so you can ensure that data is not a large area of failure at the same time, I believe, Redis this traffic or the hold up.

setRedis(Key,value,time + Math.random() * 10000);
复制代码

If Redisa cluster deployment, uniform distribution of hot data in different Redislibraries can avoid all the problems of failure, but when I operate this slag clusters in a production environment, corresponding to a single individual services are Redisfragmented, in order to facilitate data management, but also it may have drawbacks such failure, random failure time is a good strategy.

Or set the hot data never expires, there is an update operation on updating the cache like (such as operation and maintenance updates Home Products, then you brush Cached get away, do not set an expiration time), home electricity supplier can also use this data operation, insurance.

Then you understand what the cache penetrate and breakdown, they can tell the difference with the avalanche of it?

Ah, to understand, let me talk about it to penetrate the cache, the cache penetration refers to the data cache and the database are not, and users continue to initiate a request, our database id increment is 1 start up, such as the launch for the id id data or a value of -1 is particularly large data does not exist. This time, the user is likely to be the attacker, the attacker can cause excessive pressure on the database, it would seriously defeat the database.

小点的单机系统,基本上用postman就能搞死,比如我自己买的阿里云服务

像这种你如果不对参数做校验,数据库id都是大于0的,我一直用小于0的参数去请求你,每次都能绕开Redis直接打到数据库,数据库也查不到,每次都这样,并发高点就容易崩掉了。

As 缓存击穿well, with this 缓存雪崩a bit like, but there are a little different, because the cache buffer is a large area avalanche of failure, collapse hit the DB, and the cache is different breakdown 缓存击穿refers to a Key is very hot, non-stop carrying concurrent big, big complicated by a centralized access point to this, when the Key at the moment of failure, complicated by ongoing large cache wore out, direct database requests, like drilling a hole in a bucket intact.

img

Interviewer exposed comforting vision, then how they were resolved

缓存穿透I will check increase in the interface layer, such as user authentication check, make the check parameter, illegal parameters directly Code Return, for example: id do basic check, id <= 0 direct interception.

这里我想提的一点就是,我们在开发程序的时候都要有一颗“不信任”的心,就是不要相信任何调用方,比如你提供了API接口出去,你有这几个参数,那我觉得作为被调用方,任何可能的参数情况都应该被考虑到,做校验,因为你不相信调用你的人,你不知道他会传什么参数给你。举个简单的例子,你这个接口是分页查询的,但是你没对分页参数的大小做限制,调用的人万一一口气查 Integer.MAX_VALUE 一次请求就要你几秒,多几个并发你不就挂了么?是公司同事调用还好大不了发现了改掉,但是如果是黑客或者竞争对手呢?在你双十一当天就调你这个接口会发生什么,就不用我说了吧。这是之前的Leader跟我说的,我觉得大家也都应该了解下。

The reach of data from the cache, the database does not get to, then you can also write the corresponding Key of Value is null, position error, try again later this value is specifically asked to take what product, or see the specific scene, the cache valid time point can be set short as 30 seconds (set too long leads to normally would not be able to use).

This prevents attacks repeatedly use the same user id violent attack, but we need to know is the normal user does not initiate the request so many times in a single second, that the gateway layer Nginxof this slag I also remember a configuration item that allows operation and maintenance greatly access to a single IP exceeds a threshold number of times per second IP dragged black.

Do you have any other ways?

And I remember that Redisthere is a high usage of 布隆过滤器(Bloom Filter)the cache can be very good to prevent penetration occurred, he's also very simple principle is the use of efficient data structures and algorithms quickly determine whether you are the Key exists in the database, there is no like you return, you went to check the existence of DB refresh KV and then return.

Partners that have little to say if there are many IP hacker attack it at the same time? This I have been not very Xiangde Tong, but the general level of hacker did not so much chicken, moreover normal level of Redisclusters can be withstood this level of access, the small company I do not think they are interested in. The highly available systems do a good job, it can still top of the cluster.

缓存击穿Then set the hot data never expires. Or add mutex able to get作为暖男,代码我肯定帮你们准备好了

   /**
     * 获取数据
     * @param Key											查询参数
     * @return data										数据
     * @throws InterruptedException		异常
     * @author 敖丙
     */
    public static String getData(String Key) throws InterruptedException {
        // 从redis查询数据
        String result = getDataByKV(Key);
        // 参数校验
        if (StringUtils.isBlank(result)) {
            // 获取锁
            if (reenLock.tryLock()) {
                // 去数据库查询
                result = getDataByDB(Key);
                // 校验
                if (StringUtils.isNotBlank(result)) {
                    // 搞进缓存
                    setDataToKV(Key, result);
                }
                // !!!释放锁 正常会在finally里面释放
                reenLock.unLock();
            } else {
                // 睡一会再拿
                Thread.sleep(100L);
                result = getData(Key);
            }
        }
        return result;
    }
// 这里面的锁都是单机玩玩,分布式锁还是得靠lua脚本这样的
复制代码

End of the interview

Mmm good, three points are answered very well, today is not late for the interview on the first here, and then you come second interview tomorrow I continue to ask you about Redis cluster high availability, master-slave synchronization, and other knowledge Sentinel problem points.

Halo actually have the next round of interviews! (Foreshadowing the next issue of forced ha ha) but to offer still have to lick, uh-huh, good handsome interviewer.

Answer was so full so the details could not help but point Like ( 暗示点赞,每次都看了不点赞,你们想白嫖我么?你们好坏喲,不过我喜欢⁄(⁄ ⁄•⁄ω⁄•⁄ ⁄)⁄)

to sum up

We play go play, go downtown downtown, Do not take a joke interview.

This paper briefly introduced, Redisthe 雪崩, , 击穿, 穿透three in fact, are similar, but there are some differences, in fact, this interview is to ask the cache will ask, we do not take three confused, because the cache avalanche, penetrating and breakdown, the biggest problem is the cache, or does not appear, once the problem is fatal, so the interviewer will ask you.

We must understand that 怎么发生的, and how to be 避免, and how they go after the occurrence 抢救, you can not know it in depth, but you can not not think about it, sometimes the interview is not necessarily torture of knowledge, perhaps you torture attitude, if you clear thinking, then 知其然还知其所以然it is like, but also know how to prevent it from work again.

Finally, I continue to give you a warm man to be a small technical summary:

Generally avoid the above happening to our analysis from three time periods:

  • Beforehand: Redisavailability, master-slave + Sentinel, Redis clusteravoiding total collapse.

  • Things in: local ehcachecache + Hystrixlimiting + demoted to avoid MySQLbeing killed.

  • Afterwards: RedisPersistence RDB+ AOF, once restarted, the data is automatically loaded from disk, rapid recovery of cached data.

All the above points I will speak in the beating and hanging series of articles about the Redis this month Redis it should be more complete, current limiting components, you can set up requests per second, the number of components through the remainder of the request does not pass, how do ? Go downgrade ! You can return some default value, or Tips, or blank values.

benefit:

Database will never die, limiting the number of components to ensure that only requests per second pass. As long as the database die, that is to say, for the user, request 3/5 are can be processed. 3/5 as long as there is a request can be processed, it means that your system is not dead, for users, probably a few clicks brush out the page, but several times more, you can brush it once.

这个在目前主流的互联网大厂里面是最常见的,你是不是好奇,某明星爆出什么事情,你发现你去微博怎么刷都空白界面,但是有的人又直接进了,你多刷几次也出来了,现在知道了吧,那是做了降级,牺牲部分用户的体验换来服务器的安全,可还行?

Well, everybody, that's the entire contents of this article, I will be back a few are updated weekly, " beating and hanging interviewer " series and Java technology stack related articles. If you want to know what, you can also leave a message to me, I will have time to write, we progress together.

Thank you very much to see here, if this is not bad at writing words 求点赞 求关注 求分享 求留言** (very useful to me) ** for your support and recognition, creation is my greatest motivation, we see the next article!

Ao propane | text [original]


Will continue to update, "beating and hanging interviewer" series of weekly public can follow my first time to read the numbers, there will be friends in the first-tier manufacturers the opportunity to push for occasional (byte beating Ali, Netease, PDD, and pieces ), employment and work there, mushrooms street, what problems can be directly drops me, I was a rookie, but does not affect our progress together as slag man, I can not give you work, you can not be returned to the warmth of thing?

Guess you like

Origin juejin.im/post/5dbef8306fb9a0203f6fa3e2