Cache interview preparation

1. Basic instructions

When it comes to caching, perhaps the first thing you think of is redis, which is currently the most widely used and most common distributed cache architecture in the industry. Regarding this issue, Internet companies must ask, if you don’t even know the cache, it’s really embarrassing. You say you don’t know about message queues, or you say you haven’t been in contact with search engines, but it’s excusable, but if you say you don’t understand redis, basically bye.

2. Cached application

1. Interview questions

How is the cache used in the project? What are the consequences if the cache is not used properly?

2. The interviewer's psychological analysis

As long as you ask about caching, the first question that comes up is sure to ask where your project uses caching first? Why use it? Doesn't it work? What are the possible undesirable consequences if used?

This is to see if you have any thoughts about your use of caching. If you are using it stupidly, you can't give the interviewer a reasonable answer. The interviewer must have a bad impression of you, and think that you usually think too little, so you know how to work.

3. Analysis of interview questions

  • How is the cache used in the project?

This, you combine the business of your own project. If you use it, congratulations. If you don't use it, I'm sorry. You have to add a scene too.

  • Why use cache in the project?

Caching is mainly used for two purposes, high performance and high concurrency.

high performance

Assuming such a scenario, you have an operation, a request comes, you all kinds of messy operations mysql, a result is found in a long time, it takes 600ms. However, this result may not change in the next few hours, or it does not need to be reported to users immediately. So what to do now?

Cache, toss 600ms to find out the results, throw in the cache, a key corresponds to a value, and someone will check it next time, don't use mysql to toss 600ms. Find a value directly from the cache through a key, and it takes 2ms. Performance improved by 300 times. This is the so-called high performance.

It is to find out the results of some complex operations that are time-consuming. If you are sure that there are no changes later, but there are still many read requests immediately, then the results are directly cached, and the cache is read directly later.

High concurrency

A database as heavy as mysql is not designed to allow you to play with high concurrency at all. Although you can play with it, the natural support is not good. mysql stand-alone support to 2000qps is also easy to alarm.

So if you have a system with 10,000 requests per second during the peak period, that single MySQL server will definitely die. You can only use the cache at this time, put a lot of data in the cache, don't put mysql. The cache function is simple. To put it bluntly, it is a key-value operation. The amount of concurrency supported by a single machine is easily tens of thousands per second, and it supports high concurrency. The concurrency of a single machine is dozens of times that of a mysql single machine.

 

  • Consider combining these two scenarios, why do you use caching?

Maybe your project does not have high concurrency scenarios, so don’t worry, just use the high-performance scenario, just think about whether there is a complex query scenario that can cache the results, which can greatly improve the performance and optimize the user experience in the future. Just say this reason, no? Then you have to make up one too, otherwise you are not funny.

  • What are the bad consequences after using the cache?

If you haven't considered this question, then you will be embarrassed. The interviewer will think that you have a simple mind and low limbs. Don't just use one thing stupidly, think about some things behind it. There are three common cache problems (of course there are many, I will talk about three here)

1) Cache and database double write are inconsistent

2) Cache avalanche

3) Cache penetration

These three questions are common interview questions, which will be discussed later, but if someone asks you, you can at least say it yourself and give the corresponding solution.

Three, redis thread model

1. Interview questions

What is the difference between redis and memcached? What is the threading model of redis? Why is single-threaded redis much more efficient than multi-threaded memcached (why redis is single-threaded but can still support high concurrency)?

2. The interviewer's psychological analysis

This is the most basic question when asking redis. One of the most basic internal principles and characteristics of redis is that redis is actually a single-threaded working model. If you don’t know this, then when you play redis later, it will happen. Isn't the problem not knowing anything?

3. Analysis of interview questions

1. What is the difference between redis and memcached

For this matter, you can compare N multiple differences, but the main ones are the following differences:

  • Redis supports server-side data operations: Compared with Memcached, Redis has more data structures and supports richer data operations. Usually in Memcached, you need to take the data to the client to make similar modifications and then set back. . This greatly increases the number of network IOs and data volume. In Redis, these complex operations are usually as efficient as general GET/SET. Therefore, if you need a cache to support more complex structures and operations, then Redis will be a good choice.
  • Comparison of memory usage efficiency: Memcached has higher memory utilization if simple key-value storage is used, and if Redis uses hash structure for key-value storage, its memory utilization will be higher than Memcached due to its combined compression .
  • Performance comparison: Since Redis only uses a single core, and Memcached can use multiple cores, on average, Redis has higher performance than Memcached when storing small data on each core. In the data above 100k, the performance of Memcached is higher than that of Redis. Although Redis has recently optimized the performance of storing big data, it is still slightly inferior to Memcached.
  • Cluster mode: memcached does not have a native cluster mode, and needs to rely on the client to write data into the cluster; but redis currently supports the cluster mode natively. Redis officially supports the redis cluster cluster mode, which is more important than memcached better.

2. Redis thread model

  • File event handler

Redis developed a network event handler based on the reactor model. This handler is called a file event handler. This file event handler is single-threaded. Redis is called a single-threaded model. It uses IO multiplexing mechanism to monitor multiple sockets at the same time, and selects the corresponding event handler to process the event according to the event on the socket.

If the monitored socket is ready to perform accept, read, write, close and other operations, the file event corresponding to the operation will be generated. At this time, the file event handler will call the previously associated event handler to handle the event .

The file event processor runs in single-threaded mode, but through the IO multiplexing mechanism to monitor multiple sockets, a high-performance network communication model can be realized, and it can be docked with other internal single-threaded modules to ensure the internal redis The simplicity of the threading model.

The structure of the file event handler consists of 4 parts: multiple sockets , IO multiplexing program , file event dispatcher , event handler (command request handler, command reply handler, connection response handler, etc.).

Multiple sockets may produce different operations concurrently, each operation corresponds to a different file event, but the IO multiplexing program will monitor multiple sockets, but will put the sockets in a queue and queue them each time. A socket is assigned to the event dispatcher, and the event dispatcher assigns the socket to the corresponding event handler.

Then after the event of a socket is processed, the IO multiplexing program will send the next socket in the queue to the event dispatcher. The file event dispatcher will select the corresponding event handler for processing according to the events currently generated by each socket.

  • File event

When the socket becomes readable (for example, the client performs a write operation or close operation on redis), or when a new sccket that can be answered appears (the client performs a connect operation on redis), the socket will generate an AE_READABLE event.

When the socket becomes writable (the client performs a read operation on redis), the socket generates an AE_WRITABLE event.

The IO multiplexing program can monitor both AE_REABLE and AE_WRITABLE events at the same time. If a socket generates both AE_READABLE and AE_WRITABLE events, the file event dispatcher will give priority to the AE_REABLE event and then the AE_WRITABLE event.

If the client wants to connect to redis, it will associate the connection response processor for the socket; if the client wants to write data to redis, it will request the processor for the socket associated command; if the client wants to read data from redis, it will Reply to the processor for socket association commands.

  • A process of communication between the client and redis

When redis starts and initializes, redis associates the connection response processor with the AE_READABLE event, and then if a client initiates a connection with redis, an AE_READABLE event will be generated at this time, and the connection response processor will handle the establishment with the client Connect, create a socket corresponding to the client, and associate the AE_READABLE event of this socket with the command request processor.

When the client initiates a request to redis (whether it is a read request or a write request, the same), first an AE_READABLE event will be generated on the socket, and then the corresponding command request processor will process it. This command request processor will read request related data from the socket, and then execute and process it.

Then after redis has prepared the response data to the client, it will associate the AE_WRITABLE event of the socket with the command reply processor. When the client is ready to read the response data, an AE_WRITABLE will be generated on the socket. The event will be processed by the corresponding command reply processor, that is, the prepared response data will be written to the socket for the client to read.

After the command reply processor is written, the association relationship between the AE_WRITABLE event of this socket and the command reply processor will be deleted.

3. Why is the redis single-threaded model so efficient?

  • Pure memory operation
  • The core is based on non-blocking IO multiplexing mechanism
  • Single thread avoids the frequent context switching problem of multiple threads (Baidu)

Four, redis data type

1. Interview questions

What data types does redis have? In which scenarios is it appropriate to use?

2. The interviewer's psychological analysis

Ask this question, unless you are looking at your resume and feel like a novice, and you have not studied the technology very deeply. Actually, there are two main reasons for asking this question:

  • First, see if you have a comprehensive understanding of the functions of redis, how to use it in general, and what to use in any scene, I am afraid that you will not be the easiest kv operation
  • Second, see how you have played redis in actual projects

If you didn't answer well, didn't mention several data types, and didn't mention any scenes, the interviewer must have a bad impression of you when you are done, and think that you usually do a simple set and get.

3. Analysis of interview questions

1.string

This is the most basic type, there is nothing to say, it is ordinary set and get, do simple kv cache.

2.hash

This is a structure similar to map. This generally means that structured data, such as an object (provided that the object is not nested with other objects) can be cached in redis, and then every time you read and write the cache, you can Just manipulate a field in the hash.

The data structure of the hash class is mainly used to store some objects and cache some simple objects. During subsequent operations, you can directly modify only the value of a field in this object.

value={
  “id”: 150,
  “name”: “zhangsan”,
  “age”: 21
}

3.list

There is an ordered list, this can be played with many tricks. Weibo, a fan of a big v, can be cached in redis in the format of a list.

key=某大v
value=[zhangsan, lisi, wangwu]

For example, you can store some list-type data structures through the list, such as a list of fans and a list of comments on articles.

For example, you can use the lrange command, which is how many elements to read from a certain element, and you can implement paging query based on the list. This is a great feature, based on redis to achieve simple high-performance paging, and you can do continuous pull-downs like Weibo Paging things, high performance, just go page by page.

For example, you can set up a simple message queue, go in from the head of the list, and get it out from the tail of the list.

4.set

Unordered collection, automatic deduplication.

Throw in the data that needs to be deduplicated in the system directly based on the set, and it will automatically be deduplicated. If you need to perform rapid global deduplication on some data, you can of course also deduplicate based on the HashSet in the jvm memory, but if How about one of your systems deployed on multiple machines?

It is necessary to perform global set deduplication based on redis.

You can play the operations of intersection, union, and difference based on set. For example, intersection. You can merge the fan lists of two people to see who their mutual friends are? Right. Put two fans of big v in two sets, and make an intersection of the two sets.

5.sorted set

The sorted set is deduplicated but can be sorted. When writing in, give a score and automatically sort it according to the score. This can play a lot of tricks. The biggest feature is that there is a score that can customize the sorting rules.

For example, if you want to sort the data based on time, you can use a certain time as the score when writing it in, and they will automatically sort you by time.

Ranking: Write each user and its corresponding score into it, zadd board score username, then zrevrange board 0 99, you can get the top 100 users; zrank board username, you can see the user’s ranking in the ranking Ranking.

zadd board 85 zhangsan
zadd board 72 wangwu
zadd board 96 lisi
zadd board 62 zhaoliu

For example, to get the top 3 users

zrevrange board 0 3
96 lisi
85 zhangsan
72 wangwu

Another example is to get the ranking of "zhaoliu"

zrank board zhaoliu
4

Five, redis expiration strategy

1. Interview questions

What are the expiration strategies of redis? What are the memory elimination mechanisms? Write the LRU code by hand?

2. The interviewer's psychological analysis

1) Why is the data I wrote to redis gone?

What is cache? Use memory as a cache. Is memory unlimited? Memory is precious and limited. Disks are cheap and large. A machine may have dozens of gigabytes of memory, but it can have several terabytes of hard disk space. Redis is mainly based on memory for high-performance, high-concurrency read and write operations.

Since the memory is limited, for example, redis can only use 10G. What if you write 20G data in it? Of course, 10 G of data will be killed, and then 10 G of data will be retained. What data will be killed? What data is retained? Of course, get rid of the infrequently used data and keep the frequently used data.

So, this is the most basic concept of caching. Data will expire. Either you set an expiration time yourself, or redis kills it yourself.

2) My data is obviously out of date, why is it still occupying memory?

The other is if you set an expiration time, do you know how redis made it expire for you? When will it be deleted? Why a lot of data should have expired, but it turns out that redis memory usage is still very high? That's because you don't know how redis deletes those expired keys.

The total amount of redis memory is 10g. Now you have written 5g of data in it. It turns out that you have set an expiration time for these data. It is required that these data will expire after 1 hour. As a result, after 1 hour, you come back and see the redis machine. Is the memory usage still 50%? The 5g data has expired. I checked it from redis, but it was not found. As a result, the expired data still occupied the memory of redis.

If you don’t even know this question, you are confused when you come up and can’t answer, then when you write code online, you take it for granted that the data written into redis will definitely exist, which will cause various vulnerabilities and bugs in the system later, who To be responsible?

3. Analysis of interview questions

1. Set expiration time

When we set the key, we can give an expire time, which is the expiration time. For example, if the key can only survive for 1 hour? 10 minutes? This is very useful. We can specify that the cache expires when it expires.

If you assume that you set a batch of keys to survive for only 1 hour, how does redis delete this batch of keys after the next 1 hour? The answer is: regular deletion + lazy deletion

The so-called periodic deletion means that by default, Redis randomly extracts some keys with an expiration time set every 100ms, checks whether they expire, and deletes them if they expire. Suppose there are 100,000 keys in redis, and the expiration time is set. If you check 100,000 keys every few hundred milliseconds, then redis will basically die, the CPU load will be very high, and it will be consumed in your inspection The expired key is up. Note that here is not to traverse all the keys that set the expiration time every 100ms, that would be a performance disaster. In fact, redis randomly selects some keys every 100ms to check and delete.

But the problem is that regular deletion may result in many expired keys that are not deleted when the time is up. What about the whole thing? So it is lazy deletion. This means that when you get a key, redis will check, if the key is set with an expiration time, does it expire? If it expires, it will be deleted and nothing will be returned to you.

It’s not that the key will be deleted at the time, but when you query the key, redis will check it lazily.

By combining the above two methods, it is guaranteed that the expired key will be killed.

It is very simple, that is, your expired key is not deleted by regular deletion, and it still stays in the memory, occupying your memory. Unless your system checks the key, it will be deleted by redis.

But in fact, this is still problematic. If you delete a lot of expired keys on a regular basis, and then you don’t check it in time, you don’t do lazy deletion. What will happen at this time? If a large number of expired keys accumulate in the memory, causing the redis memory block to be exhausted, how can it be done? The answer is: take the memory elimination mechanism .

2. Memory elimination

If the memory of redis is too much, the memory will be eliminated at this time. There are some strategies as follows, redis 10 keys are now full, redis needs to delete 5 keys

1个key,最近1分钟被查询了100次
1个key,最近10分钟被查询了50次
1个key,最近1个小时倍查询了1次
  • noeviction: When the memory is not enough to hold the newly written data, the new write operation will report an error, this is generally no one uses it, it is really disgusting
  • allkeys-lru: When the memory is insufficient to accommodate the newly written data, in the key space, remove the least recently used key (this is the most commonly used)
  • allkeys-random: When the memory is not enough to accommodate the newly written data, randomly remove a key in the key space. Generally no one uses this. Why should it be random? It must be the least recently used key.
  • Volatile-lru: When the memory is insufficient to accommodate the newly written data, remove the least recently used key in the key space with the expiration time set (this is generally not appropriate)
  • Volatile-random: When the memory is insufficient to accommodate the newly written data, a key is randomly removed in the key space with the expiration time set
  • Volatile-ttl: When the memory is insufficient to accommodate the newly written data, in the key space with the expiration time set, the key with the earlier expiration time will be removed first

Six, redis high availability

1. Interview questions

How to ensure the high concurrency and high availability of Redis? Can you introduce the principle of redis master-slave replication? Can you introduce the principle of redis sentry?

2. The interviewer's psychological analysis

In fact, to ask this question is mainly to test you, how high concurrency can a single machine of redis carry? How to expand the capacity and resist more concurrency if the single machine cannot handle it? Will redis hang? Since redis will hang, how to ensure that redis is highly available?

In fact, it is all about some issues that you must consider in the project. If you haven't considered it, then you really think too little about the issues in the production system.

3. Analysis of interview questions

That is, if you use redis caching technology, you must consider how to use redis to add multiple machines to ensure that redis is highly concurrent, and how to make Redis ensure that it does not die immediately after it hangs, and redis is highly available.

There is too much content about redis high availability. If you want to make redis highly available, you need to do persistence, master-slave replication, you need to know the sentinel, and redis cluster. . . These contents are very important. To master the relevant details, and there are more, you can refer to the redis series of articles for a detailed look.

"Redis series-enterprise-level persistence solution"

"Redis series-master-slave replication"

"Redis Series-Detailed Explanation of Redis Cluster in Production Environment"

Seven, cache avalanche and penetration

1. Interview questions

Understand what is the avalanche and penetration of redis? What happens after redis crashes? How should the system respond to this situation? How to deal with redis penetration?

2. The interviewer's psychological analysis

In fact, this is a must to ask about the cache, because cache avalanche and penetration are the two biggest problems of the cache, or it does not appear, once it occurs, it is a fatal problem. So the interviewer will definitely ask you.

3. Analysis of interview questions

To solve the problem, we must first know what is a cache avalanche, come on, take a look at the picture below

Now that you know the avalanche phenomenon, it will be easier to solve it. The solution to the cache avalanche event can be roughly divided into three solutions: beforehand, during the event, and afterwards:

  • Beforehand: Redis is highly available, master-slave + sentinel, redis cluster, to avoid total crash
  • In the event: local ehcache cache + hystrix current limit & downgrade to prevent MySQL from being killed
  • After the event: Redis persistence, fast recovery of cached data

The second problem is cache penetration. In fact, cache penetration means that there is no data in the cache and no data in MySQL. In this case, each request will directly penetrate the caches of all layers, and then a large number of highly concurrent requests will be sent to MySQL to execute various SQL statements, but the data queried are empty every time. If this situation is not handled properly, it may cause a large number of high concurrent accesses to enter MySQL, killing MySQL.

Our cache penetration solution is actually very simple, which means that every time the data queried from the database is empty, it means that the data does not exist at all. So if this data does not exist, we should not write data to redis, we write an empty data, or default data.

8. Deployment architecture of redis cluster in production environment

1. Interview questions

How is redis deployed in the production environment?

2. The interviewer's psychological analysis

See if you understand the deployment architecture of your company's redis production cluster. If you don't understand, then you are indeed negligent. Is your redis a master-slave architecture? Cluster architecture? Which cluster solution was used? Is there any guarantee of high availability? Is there a persistence mechanism to ensure data recovery? How many G memory does online redis give? What parameters are set? How many QPS does your redis cluster carry after the stress test?

3. Analysis of interview questions

Redis cluster, 10 machines, 5 machines are deployed with redis master instances, another 5 machines are deployed with redis slave instances, each master instance has a slave instance, 5 nodes provide external read and write services, each node The peak read and write qps may reach 50,000 per second, and the maximum read and write requests/s for 5 machines is 250,000.

What is the configuration of the machine? 32G memory + 8 core CPU + 1T disk, but 10g memory is allocated to the redis process. In general online production environments, redis memory should not exceed 10g as much as possible. It may cause problems if it exceeds 10g.

5 machines provide external reading and writing, a total of 50g of memory. Because each master instance has a slave instance, it is highly available. If any master instance goes down, it will automatically fail migration, and the redis slave instance will automatically become the master instance and continue to provide read and write services.

What data are you writing to memory? What is the size of each piece of data? Commodity data, each piece of data is 10kb. 100 pieces of data are 1mb, and 100,000 pieces of data are 1g. The resident memory is 2 million pieces of product data, and the occupied memory is 20g, which is only less than 50% of the total memory. The current peak period is about 3500 requests per second.

Guess you like

Origin blog.csdn.net/qq_22172133/article/details/104456775