Redis advanced interview questions, skills that must be learned to enter a big factory! (@Including answer)

Foreword:

Redis is an open source, memory-based, non-relational database storage system that can be persistent. In actual projects, you can use Redis as a cache or message server. Redis is also a non-relational database widely used in the Internet.

The interview questions for this article are as follows:

  • Redis persistence mechanism
  • Cache avalanche, cache penetration, cache warm-up, cache update, cache degradation, etc.
  • What are hot data and cold data
  • What are the differences between Memcache and Redis?
  • Why is single-threaded redis so fast
  • Redis data types, usage scenarios of each data type, Redis internal structure
  • Redis's expiration strategy and memory elimination mechanism [~]
  • Why Redis is single-threaded, advantages
  • How to solve redis's concurrent competition key problem
  • What should the Redis cluster solution do? What are the options?
  • Have you tried to deploy redis on multiple machines? How to ensure that the data is consistent?
  • How to deal with a large number of requests
  • Common Redis performance problems and solutions?
  • Explain the Redis threading model
  • Why is Redis's operation atomic and how to ensure atomicity?
  • Redis transaction
  • Redis implements distributed locks

Redis persistence mechanism

Redis is an in-memory database that supports persistence. The data in the memory is synchronized to the hard disk file through the persistence mechanism to ensure data persistence. When Redis restarts, by reloading the hard disk file into the memory, the purpose of data recovery can be achieved.
Realization: Create fork() a child process separately, copy the database data of the current parent process to the memory of the child process, and then write it to a temporary file by the child process, the persistence process is over, and then replace it with this temporary file The second snapshot file, then the child process exits and the memory is released.

RDB is the default persistence method of Redis. According to a certain time period strategy, the memory data is saved to the binary file of the hard disk in the form of snapshot. That is, Snapshot snapshot storage, the corresponding data file is dump.rdb, and the snapshot cycle is defined by the save parameter in the configuration file. (A snapshot can be a copy of the data it represents, or a copy of the data.)
AOF: Redis will append every write command received to the end of the file through the Write function, similar to MySQL binlog. When Redis restarts, it will re-execute the write command saved in the file to rebuild the entire database content in memory.
When the two methods are enabled at the same time, the data recovery Redis will give priority to AOF recovery.

Cache avalanche, cache penetration, cache warm-up, cache update, cache degradation, etc.

Cache avalanche can be simply understood as: because the original cache is invalid, the new cache has not yet reached the period
(for example: we set the cache with the same expiration time, and a large area of ​​cache expires at the same time), all should access the cache All of the requests are to query the database, which puts huge pressure on the database CPU and memory, and severely causes database downtime. Thus formed a series of chain reactions, causing the entire system to collapse.
Solution:
Most system designers consider locking (the most solution) or queue guarantee to ensure that there will not be a large number of threads reading and writing to the database at one time, so as to avoid a large number of concurrent requests falling to the bottom when failure On the storage system. There is also a simple solution to spread the cache invalidation time.

2. Cache penetration

Cache penetration refers to users querying data, which is not in the database, and naturally there is no in the cache. This causes the user to query, but it cannot be found in the cache, and every time the user has to query the database again, and then return empty (equivalent to two useless queries). In this way, the request bypasses the cache and directly checks the database, which is also a frequently mentioned cache hit rate problem.

Solution;

The most common is to use Bloom filters to hash all possible data into a bitmap that is large enough. A data that must not exist will be intercepted by this bitmap, thus avoiding the query pressure on the underlying storage system. .
There is also a simpler and more rude method. If the data returned by a query is empty (whether the data does not exist or the system is faulty), we still cache the empty result, but its expiration time will be very short, the longest No more than five minutes. The default value set directly is stored in the cache, so that the second time to get the value from the cache, without continuing to access the database, this method is the simplest and rude.
The 5TB hard disk is full of data. Please write an algorithm to sort these data. How to solve if the data is some 32bit data? What if it is 64bit?

The use of space has reached an extreme, that is, Bitmap and Bloom Filter.
Bitmap: A typical hash table. The
disadvantage is that Bitmap can only record 1 bit of information for each element. If you want to complete additional functions, I am afraid that you can only sacrifice more space and time to complete it.

Bloom filter (recommended)
is to introduce k(k>1)k(k>1) mutually independent hash functions to ensure that the process of element weight judgment is completed under a given space and misjudgment rate.
Its advantage is that the space efficiency and query time far exceed the general algorithm, but the disadvantage is that it has a certain misrecognition rate and difficulty in deletion.
The core idea of ​​Bloom-Filter algorithm is to use multiple different Hash functions to resolve "conflicts".
Hash has a conflict (collision) problem, and the values ​​of two URLs obtained with the same Hash may be the same. In order to reduce conflicts, we can introduce a few more hashes. If we find that an element is not in the set through one of the hash values, then the element is definitely not in the set. Only when all the Hash functions tell us that the element is in the set, can it be determined that the element exists in the set. This is the basic idea of ​​Bloom-Filter.
Bloom-Filter is generally used to determine whether an element exists in a large data collection.

Three, cache warm-up

Cache preheating should be a relatively common concept. I believe that many friends should be able to understand it easily. Cache preheating is to load related cache data directly to the cache system after the system is online. This can avoid the problem of querying the database first and then caching the data when the user requests it! The user directly queries the pre-heated cache data!
Solutions:

  • Write a cache to refresh the page directly, manually operate it when going online;
  • The amount of data is not large and can be automatically loaded when the project starts;
  • Refresh the cache regularly;

Fourth, cache update

In addition to the cache invalidation strategy that comes with the cache server (Redis has 6 strategies to choose from by default), we can also customize the cache elimination according to specific business needs. There are two common strategies:
(1) Timing To clean up the expired cache;
(2) When a user request comes, judge whether the cache used by the request has expired, if it expires, go to the underlying system to get new data and update the cache.
Both have their own advantages and disadvantages. The disadvantage of the first is that it is troublesome to maintain a large number of cached keys. The disadvantage of the second is that the cache is invalid every time a user requests it, and the logic is relatively complicated! You can weigh it according to your own application scenarios.

Five, cache degradation

When the traffic increases sharply, the service has problems (such as slow response time or unresponsive), or non-core services affect the performance of the core process, it is still necessary to ensure that the service is still available, even if the service is damaged. The system can be automatically degraded based on some key data, or it can be configured with a switch to manually degrade.
The ultimate goal of downgrading is to ensure that core services are available, even if they are lossy. And some services cannot be downgraded (such as adding to shopping cart, settlement).
Set the plan based on the reference log level:
(1) General: For example, some services occasionally time out due to network jitter or when the service is online, and can be automatically downgraded;
(2) Warning: Some services fluctuate in success rate within a period of time (such as 95~ 100%), you can automatically downgrade or manually downgrade, and send an alarm;
(3) Error: For example, the availability rate is lower than 90%, or the database connection pool is burst, or the traffic suddenly soars to the system can withstand The maximum threshold value. At this time, it can be automatically downgraded or manually downgraded according to the situation;
(4) Serious errors: For example, due to special reasons, the data is wrong, and urgent manual downgrade is required.

The purpose of service downgrading is to prevent Redis service failures, resulting in an avalanche problem in the database. Therefore, for unimportant cached data, a service degradation strategy can be adopted. For example, a common practice is that Redis does not query the database, but directly returns the default value to the user.

What are hot data and cold data

Hot data, cache is valuable.
For cold data, most of the data may have been squeezed out of memory before being accessed again, which not only takes up memory, but also has little value. For frequently modified data, consider using the cache depending on the situation.
For the above two examples, both the birthday star list and navigation information have a feature, that is, the frequency of information modification is not high, and the reading is usually very high.
For hot data, such as one of our IM products, birthday blessing modules, and the birthday list of the day, the cache may be read hundreds of thousands of times in the future. For another example, for a navigation product, we cache the navigation information, which may be read millions of times in the future.
**Read the data at least twice before updating,** caching is meaningful. This is the most basic strategy. If the cache fails before it works, it is of little value.
Does it exist, the frequency of modification is high, but the cache has to be considered? Have! For example, this reading interface puts a lot of pressure on the database, but it is also hot data. At this time, we need to consider caching to reduce the pressure on the database, such as our assistant product, the number of likes, the number of favorites, and the number of shares. Waiting is a very typical hot data, but it is constantly changing. At this time, you need to synchronize the data to the Redis cache to reduce database pressure.

What are the differences between Memcache and Redis?

1). Storage method Memecache stores all the data in the memory, and it will hang after a power failure. The data cannot exceed the memory size. Part of Redis exists on the hard disk, redis can persist its data
2), the data support type memcached all values ​​are simple strings, redis as its replacement, supports richer data types, providing list, set, The storage of data structures such as zset, hash, etc.
3), the use of the underlying model is different, the underlying implementation between them, and the application protocol for communication with the client are different. Redis directly built the VM mechanism itself, because the general system calls system functions, it will waste a certain amount of time to move and request.
4). The value is different in size: Redis can reach up to 1gb; memcache is only 1mb.
5)
Redis is much faster than memcached. 6) Redis supports data backup, that is, data backup in master-slave mode.

Why is single-threaded redis so fast

(1) Pure memory operation
(2) Single-threaded operation, avoiding frequent context switching
(3) Using non-blocking I/O multiplexing mechanism

Redis data types and usage scenarios for each data type

Answer: a
total of five

  • String
    actually has nothing to say, the most conventional set/get operation, value can be either String or number. Generally do some complex counting function cache.

  • The value of hash here is a structured object, and it is more convenient to manipulate one of the fields. When bloggers do single sign-on, they use this data structure to store user information, use cookieId as the key, and set 30 minutes as the cache expiration time, which can simulate a session-like effect.
  • List
    uses the data structure of List and can do simple message queue functions. Another one is that you can use the lrange command to do redis-based paging function, which has excellent performance and good user experience. I also use a scene, which is very suitable-to get market information. It is also a scenario of producers and consumers. LIST can well complete the queuing, first-in first-out principle.
  • set
    because set is a collection of unique values. So it can do the global de-duplication function. Why not use the JVM built-in Set for deduplication? Because our systems are generally deployed in clusters, it is more troublesome to use the set that comes with the JVM. Is it too troublesome to do a global de-duplication and start a public service?
    In addition, by using operations such as intersection, union, and difference, you can calculate common preferences, all preferences, and your own unique preferences.
  • sorted set
    sorted set has one more weight parameter score, and the elements in the set can be arranged according to score. It can be used as a leaderboard application to take TOP N operations.

Redis internal structure

The dict is essentially to solve the search problem in the algorithm (Searching) is a data structure used to maintain the key and value mapping relationship, similar to the Map or dictionary in many languages. Essentially to solve the search problem in the algorithm (Searching)
sds sds is equivalent to char * It can store arbitrary binary data, and cannot use the character'\0' to identify the end of the string like C language strings, so it must There is a length field.
skiplist (skip list) The skip list is a simple, single-level, multi-pointer linked list. It has a high search efficiency. It is comparable to the optimized binary balanced tree and is better than the implementation of the balanced tree.
Quicklist
ziplist compression table ziplist Is a coded list, a sequential data structure composed of a series of specially coded contiguous memory blocks,

Redis expiration strategy and memory elimination mechanism

Redis uses a regular deletion + lazy deletion strategy.
Why not use timed deletion strategy?
Timed deletion, use a timer to monitor the key, and automatically delete it when it expires. Although the memory is released in time, it consumes CPU resources very much. Under large concurrent requests, the CPU will use time to process the request instead of deleting the key, so this strategy is not adopted.
How does regular deletion + lazy deletion work?
Regular deletion, redis checks every 100ms by default, is there any The expired key will be deleted if there is an expired key. It should be noted that redis does not check all keys once every 100ms, but randomly selects them for inspection (if every 100ms, all keys are checked, redis is not stuck). Therefore, if only the regular deletion strategy is adopted, many keys will not be deleted in time.
Thus, lazy deletion comes in handy. In other words, when you get a key, redis will check, if the key is set with an expiration time, does it expire? If it expires, it will be deleted.
Is there any other problem with regular deletion + lazy deletion?
No, if you delete the key regularly, the key is not deleted. Then you did not request the key immediately, which means that the lazy deletion did not take effect. In this way, the memory of redis will get higher and higher. Then the memory elimination mechanism should be adopted.
There is a line of configuration in redis.conf

maxmemory-policy volatile-lru

This configuration is equipped with a memory elimination strategy (what, you haven't matched it? Examine yourself)

  • Volatile-lru: select the least recently used data from the data set (server.db[i].expires) that has set expiration time
  • Volatile-ttl: select the data to be expired from the data set (server.db[i].expires) for which the expiration time has been set
  • Volatile-random: arbitrarily select data to be eliminated from the data set (server.db[i].expires) with an expiration time set
  • allkeys-lru: select the least recently used data from the data set (server.db[i].dict) to eliminate
  • allkeys-random: arbitrarily select data from the data set (server.db[i].dict) to eliminate
  • no-enviction (eviction): Prohibit eviction of data, new write operations will report an error

ps: If the expire key is not set, the prerequisites are not met; then the behavior of the volatile-lru, volatile-random and volatile-ttl strategies is basically the same as noeviction (not deleted).

Why Redis is single-threaded

The official FAQ stated that because Redis is a memory-based operation, the CPU is not the bottleneck of Redis. The bottleneck of Redis is most likely the size of machine memory or network bandwidth. Since single-threading is easy to implement, and the CPU will not become a bottleneck, it is logical to adopt a single-threaded solution (after all, multi-threading will be a lot of trouble!) Redis uses queue technology to turn concurrent access into serial access
1) Great Part of the request is pure memory operation (very fast) 2) Single thread is used to avoid unnecessary context switching and race conditions
3) Non-blocking IO advantages:

  • Fast, because the data is stored in memory, similar to HashMap, the advantage of HashMap is that the time complexity of search and operation is O(1)
  • Support rich data types, support string, list, set, sorted set, hash
  • Support transactions, operations are all atomic. The so-called atomicity means that all data changes are executed or not executed at all
  • Rich features: it can be used for caching, message, setting expiration time by key, and it will be automatically deleted after expiration. How to solve the problem of concurrent competition in redis

There are multiple subsystems to set a key at the same time. What should I pay attention to at this time? It is not recommended to use redis' transaction mechanism. Because our production environment is basically a redis cluster environment, we have done data sharding operations. When multiple key operations are involved in a transaction, these multiple keys are not necessarily stored on the same redis-server. Therefore, the transaction mechanism of redis is very tasteless.
(1) If you operate on this key, the order is not required: Prepare a distributed lock, everyone grabs the lock, and you can do the set operation
if you grab the lock (2) If you operate on this key, the order is required: distributed lock + time stamp. Assuming that system B will grab the lock first, set key1 to {valueB 3:05}. Then system A grabs the lock and finds that the timestamp of its valueA is earlier than the timestamp in the cache, so it does not do the set operation. And so on.
(3) Using queues to change the set method into serial access can also redis encounter high concurrency. If the consistency
of reading and writing keys is guaranteed, the redis operations are all atomic, which is a thread-safe operation, so you don't need to consider Concurrency issues, redis has helped you deal with concurrency issues internally.

What should the Redis cluster solution do? What are the options?

1. Twemproxy, the general concept is that it is similar to a proxy mode. When used, it is changed to connect to twemproxy where redis needs to be connected. It will receive the request as a proxy and use the consistent hash algorithm to transfer the request to For specific redis, return the result to twemproxy.
Disadvantages: The pressure of twemproxy's own single-port instance. After using consistent hashing, the calculated value changes when the number of redis nodes changes, and the data cannot be automatically moved to the new node.

2. Codis, the most used cluster solution at present, basically has the same effect as twemproxy, but it supports that the data of the old node can be restored to the new hash node when the number of nodes is changed.

3. The cluster that comes with redis cluster3.0 is characterized in that its distributed algorithm is not a consistent hash, but the concept of a hash slot, and it supports node setting slave nodes. See the official documentation for details.

Have you tried to deploy redis on multiple machines? How to ensure that the data is consistent?

Master-slave replication, read-write separation,
one type is the master database (master) and the other is the slave database (slave). The master database can perform read and write operations. When a write operation occurs, the data is automatically synchronized to the slave database, and the slave database is generally It is read-only and receives data synchronized from the master database. A master database can have multiple slave databases, and a slave database can only have one master database.

How to deal with a large number of requests

Redis is a single-threaded program, which means that it can only process one client request at a time;
redis handles multiple through IO multiplexing (select, epoll, kqueue, according to different platforms, different implementations) Client requested

Common Redis performance problems and solutions?

(1) Master is best not to do any persistence work, such as RDB memory snapshots and AOF log files
(2) If the data is more important, a Slave turns on AOF to back up data, and the policy is set to synchronize once per second
(3) For master-slave For the speed of replication and the stability of the connection, it is best for Master and Slave to be in the same LAN
(4) Try to avoid adding slave libraries to the stressed master library
(5) Do not use graph-like structures for master-slave replication, use one-way The linked list structure is more stable, namely: Master <- Slave1 <- Slave2 <-
Slave3...

Explain the Redis threading model

File event handlers include sockets, I/O multiplexing programs, file event dispatchers, and event handlers. Use the I/O multiplexing program to monitor multiple sockets at the same time, and associate different event handlers for the sockets according to the tasks currently performed by the sockets. When the monitored socket is ready to perform connection response (accept), read (read), write (write), close (close) and other operations, the file event corresponding to the operation will be generated. At this time, the file The event handler will call the event handler associated with the socket to handle these events.
The I/O multiplexer is responsible for monitoring multiple sockets and sending those sockets that generated the event to the file event dispatcher.
working principle:

  • The I/O multiplexer is responsible for monitoring multiple sockets and sending those sockets that generated the event to the file event dispatcher.
    Although multiple file events may appear concurrently, the I/O multiplexing program always enqueues all the sockets that generate the event into a queue, and then passes through the queue in order (sequentially) , Synchronously, one socket at a time to send the socket to the file event dispatcher: When the event generated by the previous socket is processed (the socket is the event handler associated with the event After execution), the I/O multiplexing program will continue to send the next socket to the file event dispatcher. If a socket is both readable and writable, the server will read the socket first, and then write the socket.

Insert picture description here
Why is Redis's operation atomic and how to ensure atomicity?

For Redis, the atomicity of commands refers to: an operation cannot be subdivided, and the operation is either executed or not executed.
Redis operations are atomic because Redis is single-threaded.
All APIs provided by Redis itself are atomic operations, and transactions in Redis are actually to ensure the atomicity of batch operations.
Are multiple commands also atomic in concurrency?
Not necessarily, change get and set to single command operation, incr. Use Redis transactions, or use Redis+Lua== to achieve.

Redis transaction

The Redis transaction function is that
Redis , implemented by the four primitives MULTI, EXEC, DISCARD and WATCH, will serialize all commands in a transaction and then execute them in order.
1. Redis does not support rollback "Redis does not roll back when the transaction fails, but continues to execute the remaining commands", so the internals of Redis can be kept simple and fast.
2. If an error occurs in a command in a transaction, then all the commands will not be executed;
3. If an error occurs in a transaction, the correct command will be executed.

1) The MULTI command is used to start a transaction, and it always returns OK. After MULTI is executed, the client can continue to send any number of commands to the server. These commands will not be executed immediately, but will be placed in a queue. When the EXEC command is called, all the commands in the queue will be executed.
2) EXEC: execute all commands in the transaction block. Return the return value of all commands in the transaction block, arranged in the order of execution of the commands. When the operation is interrupted, the null value nil is returned.
3) By calling DISCARD, the client can clear the transaction queue and give up executing the transaction, and the client will exit from the transaction state.
4) The WATCH command can provide check-and-set (CAS) behavior for Redis transactions. One or more keys can be monitored. Once one of the keys is modified (or deleted), the subsequent transactions will not be executed, and the monitoring continues until the EXEC command.

Redis implements distributed locks

Redis is a single-process single-threaded mode, which uses a queue mode to turn concurrent access into serial access, and there is no competition between multiple client connections to Redis. Redis can use the SETNX command to implement distributed locks.
Set the value of key to value if and only if the key does not exist. If the given key already exists, SETNX will not do anything.
Insert picture description here
Unlock: Use the del key command to release the lock.
Solve the deadlock:
1) Set the maximum holding time for the lock through expire() in Redis. If it exceeds, then Redis Come and help us release the lock.
2) Use setnx key "current system time + lock holding time" and getset key "current system time + lock holding time" to achieve this.

At last

In view of the fact that many people have been interviewing recently, I have also compiled a lot of interview topic materials here, as well as experience from other major companies. Hope it helps everyone.

Latest finishing interview questions

Insert picture description here

Insert picture description here

The answers to the above interview questions are organized into document notes.
I also sorted out some interview materials & the latest interview questions collected by some big companies in 2020 (all organized into documents, a small part of the screenshots), if necessary, you can click to enter the password: qf

Newly organized e-books

Insert picture description here

The latest compilation of interview documents

Insert picture description here
The above is the whole content of this article, I hope it will be helpful to everyone's study, and I hope you can support it. One-click three consecutive!
Insert picture description here

Guess you like

Origin blog.csdn.net/SpringBoot_/article/details/109210842