Thorough understanding of Redis interview stereotyped essay

Redis serial 40 questions, absolutely enough!

What is Redis?

Redis ( Remote Dictionary Server) is a high-performance non-relational key-value database written in C language. Different from traditional databases, Redis data is stored in memory, so the reading and writing speed is very fast, and it is widely used in the cache direction. Redis can write data to the disk, which ensures that the data is safe and not lost, and the operation of Redis is atomic.

Redis advantages and disadvantages?

Advantages :

  1. Based on memory operations , memory read and write speeds are fast.
  2. Supports multiple data types , including String, Hash, List, Set, ZSet, etc.
  3. Persistence is supported . Redis supports two persistence mechanisms, RDB and AOF, and the persistence function can effectively avoid data loss.
  4. Support transactions . All Redis operations are atomic, and Redis also supports the atomic execution of several operations after merging.
  5. Support master-slave replication . The master node will automatically synchronize the data to the slave node, which can separate read and write.
  6. The processing of Redis commands is single-threaded . Redis6.0 introduces multi-threading. It should be noted that multi-threading is used to process network data reading and writing and protocol analysis , and Redis command execution is still single-threaded.

Disadvantages :

  1. Support for structured queries is relatively poor.
  2. The database capacity is limited by physical memory, so it is not suitable for high-performance reading and writing of massive data. Therefore, the suitable scenarios for Redis are mainly limited to operations with small data volumes.
  3. It is difficult for Redis to support online expansion, and online expansion will become very complicated when the cluster capacity reaches the upper limit.

Why is Redis so fast?

  • Memory-based : Redis uses memory storage without the overhead of disk IO. The data is stored in the memory, and the reading and writing speed is fast.
  • IO multiplexing model : Redis uses IO multiplexing technology. Redis uses a single thread to poll descriptors, converts database operations into events, and does not waste too much time on network I/O.
  • Efficient data structure : Redis optimizes the bottom layer of each data type in order to pursue faster speed.

This article has been included in the Github warehouse, which includes computer foundation, Java foundation, multithreading, JVM, database, Redis, Spring, Mybatis, SpringMVC, SpringBoot, distributed, microservices, design patterns, architecture, school recruitment and social recruitment sharing, etc. Core knowledge points, welcome to star~

Github address

If you can't access Github, you can access the gitee address.

gitee address

Since Redis is so fast, why not use it as the main database and only use it as a cache?

Although Redis is very fast, it has some limitations and cannot completely replace the main database. There are the following reasons:

**Transaction processing: **Redis only supports simple transaction processing, and is powerless for complex transactions, such as transaction processing across multiple keys.

**Data Persistence:**Redis is an in-memory database, and data is stored in memory. If the server crashes or the power is cut off, the data may be lost. Although Redis provides a data persistence mechanism, there are some limitations.

**Data processing: **Redis only supports some simple data structures, such as strings, lists, hash tables, etc. If you need to deal with complex data structures, such as tables in a relational database, then Redis may not be a good choice.

**Data security:**Redis does not provide security mechanisms like the main database, such as user authentication, access control, and so on.

So while Redis is very fast, it has some limitations and cannot completely replace the main database. So, using Redis as a cache is a great way to improve application performance and reduce database load.

Tell me about the threading model of Redis?

Redis developed a network event processor based on the Reactor model, which is called a file event processor. It consists of 4 parts: multiple sockets, IO multiplexing program, file event dispatcher, and event processor. Because the consumption of the file event dispatcher queue is single-threaded, Redis is called a single-threaded model.

  • The file event handler uses the I/O multiplexing (multiplexing) program to listen to multiple sockets at the same time, and associates different event handlers for the socket according to the task currently performed by the socket.
  • When the monitored socket is ready to perform connection accept, read, write, close and other operations, the file event corresponding to the operation will be generated, and the file event handler will call the previously associated event of the socket handlers to handle these events.

Although the file event handler runs on a single thread, by using an I/O multiplexer to listen to multiple sockets, the file event handler implements a high-performance network communication model and works well with In the redis server, other modules that also run in a single-threaded mode are connected, which keeps the simplicity of the single-threaded design inside Redis.

What are the application scenarios of Redis?

  1. Cache hot data to relieve the pressure on the database.
  2. Using the atomic self-increment operation of Redis, the counter function can be realized, such as counting the number of user likes and user visits.
  3. Distributed locks . In a distributed scenario, locks in a stand-alone environment cannot be used to synchronize processes on multiple nodes. You can use the SETNX command that comes with Redis to implement distributed locks. In addition, you can also use the officially provided RedLock distributed locks.
  4. Simple message queue , you can use Redis's own publish/subscribe mode or List to implement simple message queue and realize asynchronous operation.
  5. The speed limiter can be used to limit the frequency of a user's access to a certain interface. For example, the flash kill scene is used to prevent users from clicking quickly to bring unnecessary pressure.
  6. Friend relationship , use some commands of the set, such as intersection, union, difference, etc., to realize functions such as mutual friends and common hobbies.

The difference between Memcached and Redis?

  1. MemCached has a single data structure and is only used to cache data, while Redis supports multiple data types .
  2. MemCached does not support data persistence, and the data will disappear after restarting. Redis supports data persistence .
  3. Redis provides master-slave synchronization mechanism and cluster cluster deployment capability , which can provide high availability services. Memcached does not provide a native cluster mode, and needs to rely on the client to write data into cluster fragments.
  4. Redis is much faster than Memcached.
  5. Redis uses a single-threaded multi-channel IO multiplexing model , and Memcached uses a multi-threaded non-blocking IO model. (Redis 6.0 introduces multi-threaded IO to handle network data reading and writing and protocol analysis , but command execution is still single-threaded)
  6. The size of value is different: Redis can reach a maximum of 512M; memcache is only 1mb.

Finally, I would like to share with you a Github warehouse, which has more than 300 classic computer book PDFs compiled by Dabin, including C language, C++, Java, Python, front-end, database, operating system, computer network, data structure and algorithm, machine learning , programming life , etc., you can star it, next time you look for a book directly search on it, the warehouse is continuously updated~

Github address

Why use Redis instead of map/guava for caching?

The local cache is implemented using the built-in map or guava . The main feature is light weight and fast. The life cycle ends with the destruction of the jvm. In the case of multiple instances, each instance needs to save a copy Cache, cache is not consistent.

Using redis or memcached is called a distributed cache . In the case of multiple instances, each instance shares a cached data, and the cache is consistent.

What are the Redis data types?

Basic data types :

1. String : The most commonly used data type. The value of the String type can be a string, number or binary, but the maximum value cannot exceed 512MB.

2. Hash : Hash is a collection of key-value pairs.

3. Set : An unordered and deduplicated collection. Set provides methods such as intersection and union, which are especially convenient for realizing functions such as mutual friends and mutual attention.

4. List : An ordered and repeatable collection, the bottom layer is implemented by relying on a doubly linked list.

5. SortedSet : Ordered Set. scoreA parameter is maintained internally to achieve this. Applicable to scenarios such as leaderboards and weighted message queues.

Special data types :

1. Bitmap : A bitmap can be considered as an array in units of bits. Each unit in the array can only store 0 or 1. The subscript of the array is called an offset in Bitmap. The length of the Bitmap has nothing to do with the number of elements in the collection, but with the upper limit of the cardinality.

2. Hyperloglog . HyperLogLog is an algorithm for cardinality statistics. Its advantage is that when the number or volume of input elements is very, very large, the space required to calculate the cardinality is always fixed and small. A typical usage scenario is to count unique visitors.

3. Geospatial : It is mainly used to store geographical location information and operate on the stored information, applicable scenarios such as positioning, nearby people, etc.

Similarities and differences between SortedSet and List?

Same point :

  1. are in order;
  2. All elements within a certain range can be obtained.

difference:

  1. The list is implemented based on the linked list, and the speed of obtaining elements at both ends is fast, and the speed of accessing the middle elements is slow;
  2. Ordered sets are implemented based on hash tables and jump tables, and the time complexity of accessing intermediate elements is OlogN;
  3. The list cannot simply adjust the position of an element, and the ordered list can (change the score of the element);
  4. Sorted collections are more memory intensive.

What happens when Redis runs out of memory?

If the set upper limit is reached, the Redis write command will return an error message (but the read command can still return normally).

You can also configure the memory elimination mechanism. When Redis reaches the memory limit, it will flush out the old content.

How does Redis do memory optimization?

You can make good use of collection type data such as Hash, list, sorted set, set, etc., because usually many small Key-Values ​​can be stored together in a more compact way. Use hash tables (hashes) as much as possible. A hash table (meaning that the number stored in the hash table is small) uses very little memory, so you should abstract your data model into a hash table as much as possible. For example, if you have a user object in your web system, don't set a separate key for the user's name, surname, email, and password, but store all the user's information in a hash table.

Problems with the keys command?

Redis is single-threaded. The keys instruction will cause the thread to block for a period of time, and the service cannot be resumed until the execution is completed. scan uses a progressive traversal method to solve the blocking problem that may be caused by the keys command. The time complexity of each scan command is O(1), but to truly realize the function of keys, it is necessary to perform multiple scans.

Disadvantages of scan: If there is a key change (addition, deletion, modification) during the scan process, the traversal process may have the following problems: the newly added key may not be traversed, and duplicate keys may be traversed, that is, Saying scan does not guarantee a complete traversal of all keys.

Redis transaction

The principle of a transaction is to send several commands within a transaction range to Redis, and then let Redis execute these commands in turn.

Transaction life cycle:

  1. Open a transaction using MULTI

  2. When opening a transaction, the command for each operation will be inserted into a queue, and the command will not be actually executed at the same time

  3. EXEC command to commit transaction

An error in a command within a transaction scope will not affect the execution of other commands, and atomicity is not guaranteed:

127.0.0.1:6379> multi
OK
127.0.0.1:6379> set a 1
QUEUED
127.0.0.1:6379> set b 1 2
QUEUED
127.0.0.1:6379> set c 3
QUEUED
127.0.0.1:6379> exec
1) OK
2) (error) ERR syntax error
3) OK

WATCH command

WATCHThe command can monitor one or more keys. Once one of the keys is modified, subsequent transactions will not be executed (similar to optimistic locking). After the command is executed EXEC, the monitoring will be canceled automatically.

127.0.0.1:6379> watch name
OK
127.0.0.1:6379> set name 1
OK
127.0.0.1:6379> multi
OK
127.0.0.1:6379> set name 2
QUEUED
127.0.0.1:6379> set gender 1
QUEUED
127.0.0.1:6379> exec
(nil)
127.0.0.1:6379> get gender
(nil)

For example in the code above:

  1. watch nameTurned on monitoring for namethiskey
  2. modified namevalue
  3. start transaction a
  4. The value of nameand is set in transaction agender
  5. Use EXECthe command to commit the transaction
  6. Use the command get genderto find that it does not exist, that is, transaction a is not executed

With a monitor UNWATCHthat can cancel WATCHa command pair key, all monitor locks will be cancelled.

Does Redis transaction support isolation?

Redis is a single-process program, and it guarantees that the transaction will not be interrupted when the transaction is executed, and the transaction can run until all commands in the transaction queue are executed. Therefore, Redis transactions are always isolated.

Does Redis transaction guarantee atomicity and support rollback?

A single Redis command is executed atomically, but transactions do not guarantee atomicity and there is no rollback. If any command in the transaction fails to execute, the rest of the commands will still be executed.

Persistence mechanism

Persistence is to write the data in the memory to the disk to prevent the loss of memory data caused by service downtime.

Redis supports two methods of persistence, one is RDBthe method and the other is AOFthe method. The former will regularly store the data in the memory on the hard disk according to the specified rules , while the latter will record the command after each execution . Usually a combination of the two is used.

RDB way

RDBIt is the default persistence scheme of Redis. dump.rdbWhen the RDB is persisted, the data in the memory will be written to the disk, and a file will be generated in the specified directory . Redis restart will load dump.rdbthe file to restore the data.

bgsaveIt is the mainstream way to trigger RDB persistence. The execution process is as follows:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-3XA0jFDa-1682294775068) (http://img.topjavaer.cn/img/rdb persistence process.png) ]

  • execute BGSAVEorder
  • The Redis parent process judges whether there is a currently executing child process , and if so, BGSAVEthe command returns directly.
  • The parent process performs forkan operation to create a child process , and the parent process will block during the fork operation.
  • forkAfter the parent process is completed, the parent process continues to receive and process the client's request , and the child process starts to write the data in the memory to the temporary file on the hard disk ;
  • When the child process finishes writing all the data, it will replace the old RDB file with the temporary file .

When Redis starts, it will read the RDB snapshot file and load the data from the hard disk into the memory. Through RDB persistence, once Redis exits abnormally, the data changed after the latest persistence will be lost.

The way to trigger RDB persistence:

  1. Manual trigger : user execution SAVEor BGSAVEcommand. SAVEThe process of executing the snapshot command will block all client requests, so you should avoid using this command in the production environment. BGSAVECommands can perform snapshot operations asynchronously in the background, and the server can continue to respond to client requests while taking snapshots. Therefore, it is recommended to use BGSAVEcommands when snapshots need to be performed manually.

  2. Passive trigger :

    • Automatic snapshot is performed according to the configuration rules, for example SAVE 100 10, if at least 10 keys are modified within 100 seconds, the snapshot will be performed.
    • If the slave node performs a full copy operation, the master node will automatically generate BGSAVEan RDB file and send it to the slave node.
    • By default, shutdownwhen the command is executed, BGSAVE will be automatically executed if the AOF persistence function is not enabled.

Advantages :

  1. Redis loads RDB to restore data much faster than AOF .
  2. Use a separate child process for persistence, and the main process will not perform any IO operations, ensuring the high performance of Redis .

Disadvantages :

  1. RDB data cannot be persisted in real time . Because BGSAVEevery time it runs, it needs to execute forkthe operation to create a child process, which is a heavyweight operation, and the cost of frequent execution is relatively high.
  2. RDB files are saved in a specific binary format. During the Redis version upgrade process, there are RDB versions in multiple formats. There is a problem that the old version of Redis cannot be compatible with the new version of the RDB format .

AOF method

AOF (append only file) persistence: Record each write command in an independent log, and when Redis restarts, it will re-execute the commands in the AOF file to restore data. The main function of AOF is to solve the real-time nature of data persistence . AOF is the mainstream way of Redis persistence.

By default, Redis does not enable AOF persistence, which can be appendonlyenabled through parameters: appendonly yes. After the AOF mode is enabled for persistence, each time a write command is executed, Redis will write the command into aof_bufthe buffer, and the AOF buffer will synchronize to the hard disk according to the corresponding strategy.

By default, the system will perform a synchronization operation every 30 seconds . In order to prevent buffer data loss, you can actively request the system to synchronize the buffer data to the hard disk after Redis writes the AOF file. appendfsyncThe timing of synchronization can be set by parameter.

appendfsync always //每次写入aof文件都会执行同步,最安全最慢,不建议配置
appendfsync everysec  //既保证性能也保证安全,建议配置
appendfsync no //由操作系统决定何时进行同步操作

Next, look at the AOF persistence execution process:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-dz72G7tG-1682294775069) (http://img.topjavaer.cn/img/aofworkflow0.png) ]

  1. All write commands are appended to the AOP buffer.
  2. The AOF buffer is synchronized to the hard disk according to the corresponding strategy.
  3. As the AOF file becomes larger and larger, it is necessary to periodically rewrite the AOF file to achieve the purpose of compressing the file size. AOF file rewriting is the process of converting the data in the Redis process into write commands and synchronizing them to the new AOF file.
  4. When the Redis server is restarted, the AOF file can be loaded for data recovery.

Advantages :

  1. fsyncAOF can better protect data from loss. You can configure AOF to perform an operation every second . If the Redis process hangs up, you will lose up to 1 second of data.
  2. AOF is append-onlywritten in the mode of , so there is no disk addressing overhead, and the writing performance is very high.

Disadvantages :

  1. For the same file, the AOF file is larger than the RDB data snapshot.
  2. Data recovery is relatively slow.

How to choose RDB and AOF?

Generally speaking, two persistence schemes should be used at the same time to ensure data security.

  • Persistence can be turned off if the data is not sensitive and can be regenerated from elsewhere.
  • If the data is important and can withstand a few minutes of data loss, such as caching, you only need to use RDB.
  • If it is used as in-memory data, to use Redis persistence, it is recommended to enable both RDB and AOF.
  • If only AOF is used, the configuration option of everysec is preferred because it strikes a balance between reliability and performance.

When both RDB and AOF are enabled, Redis will give priority to using AOF to restore data, because the files saved by AOF are more complete than RDB files.

What are the deployment options for Redis?

Stand-alone version : Deployed on a single machine, the QPS that a single-machine Redis can carry ranges from tens of thousands to tens of thousands. This deployment method is rarely used. Existing problems: 1. Limited memory capacity 2. Limited processing power 3. Cannot be highly available.

Master-slave mode : one master and multiple slaves, the master is responsible for writing and copying data to other slave nodes, and the slave nodes are responsible for reading. All read requests go to the slave nodes. In this way, horizontal expansion can be easily realized, and high read concurrency can be supported. After the master node hangs up, you need to manually specify a new master. The availability is not high, and it is basically not used.

Sentinel mode : Master-slave replication has the problem of not being able to automatically fail over and not achieve high availability. Sentry mode solves these problems. The master-slave node can be automatically switched through the sentinel mechanism. After the master node hangs up, the sentinel process will actively elect a new master, which has high availability, but the data stored in each node is the same, wasting memory space. The amount of data is not very large, the cluster size is not very large, and it is used when automatic fault tolerance and disaster recovery are required.

Redis cluster : server-side sharding technology, officially available in version 3.0. Redis Cluster does not use consistent hash, but uses the concept of slot (slot), which is divided into 16384 slots in total. Send the request to any node, and the node that receives the request will send the query request to the correct node for execution. It is mainly for the scenario of massive data + high concurrency + high availability. If it is massive data, if you have a large amount of data, then it is recommended to use Redis cluster. The sum of the capacity of all master nodes is the data capacity that can be cached by Redis cluster.

master-slave architecture

The QPS that can be carried by a stand-alone redis is probably in the tens of thousands to tens of thousands. For caching, it is generally used to support high read concurrency. Therefore, the architecture is made into a master-slave (master-slave) architecture, one master and many slaves, the master is responsible for writing, and copies the data to other slave nodes, and the slave nodes are responsible for reading. All read requests go to the slave nodes. In this way, horizontal expansion can be easily realized, and high read concurrency can be supported.

The replication function of Redis supports data synchronization between multiple databases. The master database can perform read and write operations, and when the data in the master database changes, it will automatically synchronize the data to the slave database. The slave database is generally read-only, and it will receive data synchronized from the master database. A master database can have multiple slave databases, and a slave database can only have one master database.

The principle of master-slave replication?

  1. When starting a slave node, it sends a PSYNCcommand to the master node;
  2. If the slave node connects to the master node for the first time, a full copy will be triggered. At this time, the master node will start a background thread and start to generate a RDBsnapshot file;
  3. At the same time, all write commands newly received from the client client are cached in memory. RDBAfter the file is generated, the master node will RDBsend the file to the slave node, and the slave node will first write RDBthe file to the local disk, and then load it into the memory from the local disk ;
  4. Then the master node will send the write command cached in the memory to the slave node, and the slave node will synchronize the data;
  5. If the network between the slave node and the master node fails and the connection is disconnected, it will automatically reconnect. After the connection, the master node will only synchronize some missing data to the slave node.

Sentinel

There are problems in master-slave replication that cannot automatically failover and fail to achieve high availability. Sentry mode solves these problems. The master-slave node can be automatically switched through the sentinel mechanism.

When the client connects to Redis, it first connects to the sentinel, and the sentinel will tell the client the address of the Redis master node, and then the client connects to Redis and performs subsequent operations. When the master node is down, Sentinel detects that the master node is down, and will re-elect a slave node with good performance to become the new master node, and then notify other slave servers through the publish-subscribe mode to let them switch hosts.

working principle

  • Each sends a command Sentinelonce per second to all instances it knows about Master, Slaveas well as to other Sentinel instances PING.
  • If the time since the last valid reply PINGcommand exceeds the specified value, the instance will be Sentinemarked as subjectively offline.
  • If one Masteris marked as subjectively offline, then Masterall Sentinel users are monitoring this will confirm once per second Masterwhether they have actually entered the subjectively offline state.
  • When a sufficient number (greater than or equal to the value specified in the configuration file) confirms that they have indeed entered the subjective offline state Sentinelwithin the specified time range , they will be marked as objective offline. If there is not a sufficient number of consents to be offline, the objective offline status will be lifted. If returns a valid reply to the command of the , the subjective offline status of the will be removed.MasterMasterSentinel MasterMasterMasterSentinelPINGMaster
  • The sentinel node will elect a sentinel leader to be responsible for the failover work.
  • The sentinel leader will elect a well-behaved slave node to become the new master node, and then notify other slave nodes to update the master node information.

Redis cluster

Sentinel mode solves the problem that the master-slave replication cannot automatically failover and cannot achieve high availability, but there is still the problem that the writing ability and capacity of the master node are limited by the stand-alone configuration. The cluster mode realizes the distributed storage of Redis, and each node stores different content, which solves the problem that the writing ability and capacity of the master node are limited by the single-machine configuration.

The minimum configuration of Redis cluster cluster nodes is more than 6 nodes (3 masters and 3 slaves). The master node provides read and write operations, and the slave node is used as a backup node. It does not provide requests and is only used for failover.

Redis cluster uses virtual slot partitioning , all keys are mapped to 0 to 16383 integer slots according to the hash function, and each node is responsible for maintaining a part of the slots and the key-value data mapped to the slots.

working principle:

  1. By means of hashing, the data is fragmented, and each node equally stores data in a certain range of hash slots (hash values). By default, 16384 slots are allocated.
  2. Each data shard will be stored on multiple master-slave nodes
  3. Data is written to the master node first, and then synchronized to the slave node (supports configuration as blocking synchronization)
  4. The data between multiple nodes in the same shard is not consistent
  5. When reading data, when the key operated by the client is not allocated on the node, redis will return the steering command to point to the correct node
  6. During capacity expansion, it is necessary to migrate part of the data from the old node to the new node

Under the redis cluster architecture, each redis needs to release two port numbers, for example, one is 6379, and the other is a port number with 1w added, such as 16379.

The port number 16379 is used for inter-node communication, that is, cluster bus, cluster bus communication, used for fault detection, configuration update, and failover authorization. The cluster bus uses another binary protocol gossip, which is used for efficient data exchange between nodes and takes up less network bandwidth and processing time.

advantage:

  • Centerless architecture, supports dynamic expansion ;
  • Data slotis distributed in multiple nodes according to storage, and data sharing between nodes can dynamically adjust data distribution ;
  • high availability . When some nodes are unavailable, the cluster is still available. The cluster mode can realize automatic failover (failover), gossipexchange status information between nodes through the agreement, and use the voting mechanism to complete Slavethe Masterrole conversion.

shortcoming:

  • Batch operations (pipeline) are not supported.
  • Data is replicated asynchronously, which does not guarantee strong data consistency .
  • The support for transaction operations is limitedkey , and only multiple transaction operations on the same node are supported . When multiple keytransactions are distributed on different nodes, the transaction function cannot be used.
  • keyAs the minimum granularity of data partitioning, it is not possible to map a large key-value object such as hash, listetc. to different nodes.
  • Multiple database spaces are not supported . Redis under a single machine can support up to 16 databases, and only one database space can be used in cluster mode.
  • Only database number 0 can be used.

What are the hash partition algorithms?

The node takes the residual partition. Use specific data, such as Redis keys or user IDs, to take the remainder of the number of nodes N: hash(key)%N calculates the hash value, which is used to determine which node the data is mapped to.
The advantage is simplicity. When expanding the capacity, double expansion is usually used to avoid the situation where all data mappings are disrupted and lead to full migration.

Consistent hash partitioning. A token is assigned to each node in the system, generally ranging from 0 to 232, and these tokens form a hash ring. When performing node lookup operations for data reading and writing, first calculate the hash value based on the key, and then find the first token node greater than or equal to the hash value clockwise.
The biggest advantage of this method compared with the node remainder is that adding and deleting nodes only affect the adjacent nodes in the hash ring, and have no effect on other nodes.

For virtual slot partitioning, all keys are mapped to 0~16383 integer slots according to the hash function, and the calculation formula is: slot=CRC16(key)&16383. Each node is responsible for maintaining a part of the slot and the key-value data mapped to the slot. Redis Cluser uses a virtual slot partition algorithm.

Deletion strategy for expired keys?

1. Passive deletion . When accessing the key, if the key is found to have expired, the key will be deleted.

2. Actively delete . Regularly clean up the keys. Each time the cleanup will traverse all the DBs in turn, randomly take out 20 keys from the db, and delete them if they expire. If 5 of the keys expire, then continue to clean up the db, otherwise start to clean up the next db.

3. Clean up when the memory is not enough . Redis has a maximum memory limit. The maximum memory can be set through the maxmemory parameter. When the used memory exceeds the set maximum memory, the memory must be released. When the memory is released, the memory will be cleaned up according to the configured elimination strategy.

What are the memory elimination strategies?

When the memory of Redis exceeds the maximum allowed memory, Redis will trigger the memory elimination strategy to delete some infrequently used data to ensure the normal operation of the Redis server.

Before Redisv4.0, 6 data elimination strategies were provided :

  • volatile-lru : LRU( Least Recently Used), most recently used. Use the LRU algorithm to remove the key with the expiration time set
  • allkeys-lru : removes the least recently used key from the dataset when memory is insufficient to hold newly written data
  • volatile-ttl : Select the data that will expire from the data set that has set the expiration time and eliminate it
  • volatile-random : Arbitrarily select data elimination from the data set with expiration time set
  • allkeys-random : Randomly select data from the dataset for elimination
  • no-eviction : It is forbidden to delete data. When the memory is not enough to accommodate the newly written data, the new write operation will report an error

After Redisv4.0, the following two types are added :

  • volatile-lfu : LFU, Least Frequently Used, least used, select the least frequently used data from the data set with an expiration time set to eliminate.
  • allkeys-lfu : Remove the least frequently used keys from the dataset when there is not enough memory to hold the newly written data.

The memory elimination policy can be modified through the configuration file , the corresponding configuration item is maxmemory-policy, and the default configuration is noeviction.

How to ensure the data consistency between the cache and the database when double-writing?

1. Delete the cache first and then update the database

When performing an update operation, delete the cache first, and then update the database. When the subsequent request reads again, it will read from the database and then update the new data to the cache.

Existing problem: After deleting the cached data and before updating the database, if there is a new read request during this period, the old data will be read from the database and rewritten to the cache, causing inconsistency again, and subsequent reads are all old data.

2. Update the database first and then delete the cache

When performing an update operation, update MySQL first. After success, delete the cache, and then write the new data back to the cache for subsequent read requests.

Existing problems: During the period between updating MySQL and deleting the cache, the old data in the cache is still requested to be read, but after the database update is completed, it will be restored to consistency, and the impact is relatively small.

3. Asynchronously update the cache

After the database update operation is completed, the cache is not directly operated, but the operation command is encapsulated into a message and thrown into the message queue, and then Redis consumes the updated data by itself. The message queue can ensure the consistency of the data operation sequence and ensure the data of the cache system normal.

The above solutions are not perfect. It is necessary to evaluate which solution has less impact according to business needs, and then choose the corresponding solution.

Cache FAQ

cache penetration

Cache penetration refers to querying a non-existing data . Since the cache is passively written when it misses, if the data cannot be found from the DB, it will not be written into the cache. This will cause the non-existing data to be sent to the DB every time it is requested. To query, the meaning of caching is lost. When the traffic is heavy, the DB may hang up.

How to deal with it?

  1. Cache empty values , will not check the database.
  2. The Bloom filter is used to hash all possible data into a large enough medium bitmap, and the data that does not exist in the query will be bitmapintercepted by this, thus avoiding the right DBquery pressure.

The principle of the Bloom filter: When an element is added to the set, the element is mapped to K points in a bit array through K hash functions, and they are set to 1. When querying, after mapping the elements through the hash function, k points will be obtained. If there is any 0 in these points, the checked element must not exist, and it will be returned directly; if they are all 1, the query element is likely to exist, and it will go to Query Redis and databases.

Bloom filters are generally used to determine whether an element exists in a collection of large amounts of data.

cache avalanche

Cache avalanche means that we use the same expiration time when we set the cache, causing the cache to fail at a certain moment at the same time , all requests are forwarded to the DB, and the DB hangs due to excessive instantaneous pressure.

Solution:

  1. Add a random value on the basis of the original expiration time to make the expiration time more dispersed. In this way, the repetition rate of the expiration time of each cache will be reduced, and it will be difficult to cause collective invalidation events.
  2. Locking and queuing can play a buffer role to prevent a large number of requests from operating the database at the same time, but its disadvantage is that it increases the response time of the system , reduces the throughput of the system , and sacrifices part of the user experience. When the cache is not queried, the key to be requested is locked, and only one thread is allowed to check in the database, while other threads wait in line.
  3. Set up the second level cache. The second-level cache refers to setting up a layer of cache in addition to the cache of Redis itself . When Redis fails, the second-level cache is first queried. For example, you can set up a local cache to query the local cache instead of the database when the Redis cache fails.

cache breakdown

Cache breakdown: When a large number of requests query a key at the same time, the key just fails at this time, which will cause a large number of requests to fall into the database. Cache breakdown is to query the invalid key in the cache, and cache penetration is to query the key that does not exist.

Solution:

1. Add a mutex . Among multiple concurrent requests, only the first requesting thread can obtain the lock and execute the database query operation. Other threads will block and wait if they cannot obtain the lock. After the first thread writes the data into the cache, go directly cache. It can be implemented using Redis distributed lock, the code is as follows:

public String get(String key) {
    
    
    String value = redis.get(key);
    if (value == null) {
    
     //缓存值过期
        String unique_key = systemId + ":" + key;
        //设置30s的超时
        if (redis.set(unique_key, 1, 'NX', 'PX', 30000) == 1) {
    
      //设置成功
            value = db.get(key);
            redis.set(key, value, expire_secs);
            redis.del(unique_key);
        } else {
    
      //其他线程已经到数据库取值并回写到缓存了,可以重试获取缓存值
            sleep(50);
            get(key);  //重试
        }
    } else {
    
    
        return value;
    }
}

2. Hotspot data does not expire . Set the cache directly to not expire, and then use the scheduled task to asynchronously load data and update the cache. This method is suitable for extreme scenarios, such as scenarios with particularly large traffic. When using it, it is necessary to consider the time when the business can accept data inconsistency, and also handle abnormal situations to ensure that the cache can be refreshed regularly.

Cache Warming

Cache warm-up means that after the system goes online, relevant cached data is directly loaded into the cache system. This can avoid the problem of first querying the database and then caching the data when the user requests it! Users directly query the cached data that has been warmed up in advance!

solution:

  1. Write a cache refresh page directly, and manually operate it when going online;
  2. The amount of data is not large, and it can be loaded automatically when the project starts;
  3. Regularly refresh the cache;

cache downgrade

When the traffic increases sharply, the service has problems (such as slow response time or no response), or non-core services affect the performance of the core process, it is still necessary to ensure that the service is still available, even if the service is damaged. The system can perform automatic downgrade based on some key data, or configure switches to achieve manual downgrade.

The ultimate goal of cache downscaling is to keep core services available, even if lossy. And some services cannot be downgraded (such as adding to shopping cart, checkout).

Before downgrading, it is necessary to sort out the system to see if the system can be kept safe; so as to sort out which ones must be protected to the death and which ones can be downgraded; for example, you can refer to the log level setting plan:

  1. General: For example, if some services occasionally time out due to network jitter or the service is going online, they can be automatically downgraded;
  2. Warning: For some services, the success rate fluctuates within a period of time (for example, between 95% and 100%), which can be downgraded automatically or manually, and an alarm will be sent;
  3. Error: For example, if the availability rate is lower than 90%, or the database connection pool is blown up, or the traffic suddenly increases to the maximum threshold that the system can bear, it can be downgraded automatically or manually according to the situation;
  4. Serious error: For example, the data is wrong due to special reasons, and an emergency manual downgrade is required at this time.

The purpose of service downgrade is to prevent the failure of Redis service, which will cause the avalanche problem of the database. Therefore, for unimportant cached data, a service downgrade strategy can be adopted. For example, a common practice is that when Redis encounters a problem, instead of querying the database, it directly returns the default value to the user.

How does Redis implement message queue?

Use the list type to save data information, rpush to produce messages, and lpop to consume messages. When lpop has no news, you can sleep for a period of time, and then check whether there is any information. If you don’t want to sleep, you can use blpop. When there is no information, it will always Block until the message arrives.

BLPOP queue 0  //0表示不限制等待时间

The BLPOP and LPOP commands are similar, the only difference is that when the list has no elements, the BLPOP command will block the connection until a new element is added.

Redis can implement one producer and multiple consumers through the pub/sub topic subscription mode . Of course, there are certain shortcomings. When consumers go offline, the produced messages will be lost.

PUBLISH channel1 hi
SUBSCRIBE channel1
UNSUBSCRIBE channel1 //退订通过SUBSCRIBE命令订阅的频道。

PSUBSCRIBE channel?*Subscribe according to the rules.
PUNSUBSCRIBE channel?*Unsubscribe the channels subscribed by certain rules through the PSUBSCRIBE command. Among them, the subscription rules must perform strict string matching, and the rules PUNSUBSCRIBE *cannot be unsubscribed channel?*.

How Redis implements a delayed queue

Use sortedset, take the timestamp as the score, and the message content as the key, call zadd to generate the message, and the consumer uses the zrangebyscorecommand to obtain the data polling before N seconds for processing.

The role of the pipeline?

The redis client executes a command in four processes: sending the command, queuing the command, executing the command, and returning the result. It can be used to pipelinebatch requests and return results in batches, and the execution speed is faster than executing one by one.

The number of assembled pipelinecommands should not be too large, otherwise the amount of data will be too large, which will increase the waiting time of the client and may cause network congestion. A large number of commands can be split into multiple small commands to complete pipeline.

Native batch commands (mset and mget) and pipelinecomparison:

  1. Native batch commands are atomic and pipelinenon- atomic . The pipeline command exits abnormally halfway, and the previously successfully executed command will not be rolled back .

  2. The native batch command has only one command, but pipelinesupports multiple commands .

LUA script

Redis creates atomic commands through LUA scripts: When a Lua script command is running, no other scripts or Redis commands will be executed, realizing the atomic operation of combined commands.

There are two ways to execute Lua scripts in Redis: evaland evalsha. evalCommands evaluate Lua scripts using the built-in Lua interpreter.

//第一个参数是lua脚本,第二个参数是键名参数个数,剩下的是键名参数和附加参数
> eval "return {KEYS[1],KEYS[2],ARGV[1],ARGV[2]}" 2 key1 key2 first second
1) "key1"
2) "key2"
3) "first"
4) "second"

Lua script function

1. Lua scripts are executed atomically in Redis, and no other commands will be inserted during the execution process.

2. Lua scripts can package multiple commands at one time, effectively reducing network overhead.

Application Scenario

Example: Limit interface access frequency.

A key-value pair of interface access times is maintained in Redis, keywhich is the interface name and valuethe number of access times. Each time an interface is accessed, the following actions are performed:

  • By aopintercepting the requests of the interface, the interface requests are counted. Each time a request comes in, the corresponding interface access times countare increased by 1 and stored in redis.
  • If it is the first request, it will be set count=1and the expiration time will be set. Because the combined operation here set()is expire()not an atomic operation, luascripts are introduced to implement atomic operations and avoid concurrent access problems.
  • An exception is thrown if the maximum number of visits is exceeded within the given time frame.
private String buildLuaScript() {
    
    
    return "local c" +
        "\nc = redis.call('get',KEYS[1])" +
        "\nif c and tonumber(c) > tonumber(ARGV[1]) then" +
        "\nreturn c;" +
        "\nend" +
        "\nc = redis.call('incr',KEYS[1])" +
        "\nif tonumber(c) == 1 then" +
        "\nredis.call('expire',KEYS[1],ARGV[2])" +
        "\nend" +
        "\nreturn c;";
}

String luaScript = buildLuaScript();
RedisScript<Number> redisScript = new DefaultRedisScript<>(luaScript, Number.class);
Number count = redisTemplate.execute(redisScript, keys, limit.count(), limit.period());

PS: The implementation of this kind of interface current limiting is relatively simple, and there are many problems. It is generally not used. The token bucket algorithm and leaky bucket algorithm are mostly used for interface current limiting.

What is RedLock?

The Redis official website has proposed an authoritative way to implement distributed locks based on Redis called Redlock , which is more secure than the original single-node method. It guarantees the following properties:

  1. Security feature: Mutually exclusive access, that is, only one client can always get the lock
  2. Avoid deadlock: In the end, the client may get the lock, and there will be no deadlock, even if the client that originally locked a resource hangs up
  3. Fault tolerance: As long as most of the Redis nodes survive, the service can be provided normally

How to deal with Redis big key?

Usually, we call a key with large data or a large number of members and lists as a large key.

The following is a description of the large key of each data type:

  • value is STRING type, its value exceeds 5MB
  • When value is a collection type such as ZSET, Hash, List, Set, etc., its number of members exceeds 1w

The above definition is not absolute, it is mainly determined according to the number and size of value members, and the standard is determined according to the business scenario.

How to deal with:

  1. When the vaule is a string, serialization and compression algorithms can be used to control the size of the key within a reasonable range, but serialization and deserialization will bring more time consumption. Or split the key, divide a large key into different parts, record the key of each part, and use operations such as multiget to achieve transactional reading.
  2. When the value is a collection type such as list/set, fragmentation is performed according to the estimated data size, and different elements are divided into different fragments after calculation.

Redis common performance problems and solutions?

  1. Master is best not to do any persistence work, including memory snapshots and AOF log files, especially not to enable memory snapshots for persistence.
  2. If the data is critical, a Slave starts AOF backup data, and the strategy is to synchronize once per second.
  3. For the speed of master-slave replication and the stability of the connection, it is better for Slave and Master to be in the same local area network.
  4. Try to avoid adding slave libraries on the main library with high pressure
  5. The Master calls BGREWRITEAOF to rewrite the AOF file. AOF will occupy a large amount of CPU and memory resources during rewriting, resulting in high service load and short-term service suspension.
  6. For the stability of the Master, do not use a graph structure for master-slave replication. It is more stable to use a one-way linked list structure, that is, the master-slave relationship is: Master<–Slave1<–Slave2<–Slave3…, this structure is also convenient to solve single point failures The problem is to realize the replacement of the Master by the Slave, that is, if the Master hangs up, you can immediately enable Slave1 to be the Master, and the others remain unchanged.

Tell me why Redis has expired and why the memory is not released?

In the first case, the previous key may be overwritten, causing the key expiration time to change.

When a key already exists in Redis, but due to some misoperations, the key expiration time has changed, so that the key does not expire within the time it should expire, resulting in memory occupation.

The second case is that the memory is not released due to the Redis expired key processing strategy.

Generally, Redis has two processing strategies for expired keys: lazy deletion and regular deletion.

Let me talk about the case of lazy deletion

When a key has been determined to expire in xx seconds and it has not been modified in the middle, it has indeed expired after xx seconds, but the lazy deletion strategy will not delete the key immediately, but will read and write the key again. Only then will it check whether it is expired, and if it expires, the key will be deleted. That is to say, under the lazy deletion strategy, even if the key expires, the content will not be released immediately, and the key will not be deleted until the next time the key is read or written.

The scheduled deletion will actively eliminate some of the expired data within a certain period of time. The default time is to expire every 100ms. Because the regular deletion strategy will only eliminate some expired keys each time, instead of all expired keys. If there is a lot of data in redis, if you delete all the data at once, the pressure on the server will be greater. Only one batch will be selected for deletion each time, so it is very difficult. Some expired keys may not be cleaned up in time, resulting in the memory not being released immediately.

Redis suddenly slows down, what are the reasons?

  1. Bigkey exists . If the bigkey is stored in the Redis instance, it will take a long time to eliminate and delete the bigkey to release the memory. You should avoid storing bigkeys to reduce the time-consuming to release memory.

  2. If the Redis instance sets the memory upper limit maxmemory , it may cause Redis to slow down. When the Redis memory reaches maxmemory, before writing new data each time, Redis must first kick out some data from the instance, so that the memory of the entire instance remains below maxmemory, and then new data can be written in.

  3. Huge pages are enabled . When Redis is executing background RDB and AOF rewrite, it uses fork sub-process to handle it. However, after the main process forks the child process, the main process can still receive write requests at this time, and the incoming write requests will use the Copy On Write (copy-on-write) method to operate memory data.

    What is copy-on-write?

    The advantage of this is that any write operation by the parent process will not affect the data persistence of the child process.

    However, when the main process copies memory data, it will involve the application of new memory. If the operating system opens the memory large page at this time, then even if the client only modifies 10B of data during this period, Redis will also apply for memory. Applying to the operating system in units of 2MB will take longer to apply for memory, which will increase the delay of each write request and affect Redis performance.

    The solution is to turn off the memory huge page mechanism.

  4. Swap is used . In order to alleviate the impact of insufficient memory on applications, the operating system allows some of the data in the memory to be swapped to the disk to achieve buffering of the memory used by the application. These memory data are swapped to the area on the disk, which is Swap. When the data in the memory is swapped to the disk, when Redis accesses the data again, it needs to read from the disk. The speed of accessing the disk is hundreds of times slower than accessing the memory. Especially for a database like Redis, which has extremely high performance requirements and is extremely sensitive to performance, this operation delay is unacceptable. The solution is to increase the memory of the machine so that Redis has enough memory to use. Or organize the memory space to release enough memory for Redis to use

  5. The network bandwidth is overloaded . When the network bandwidth is overloaded, the server will experience packet transmission delay and packet loss at the TCP layer and network layer. In addition to operating memory, the high performance of Redis lies in network IO. If there is a bottleneck in network IO, it will also seriously affect the performance of Redis. Solution: 1. Confirm in time that the Redis instance occupies the full network bandwidth. If it belongs to normal business access, it is necessary to expand or migrate the instance in time to avoid affecting other instances of this machine due to excessive traffic of this instance. 2. At the operation and maintenance level, it is necessary to increase the monitoring of various indicators of the Redis machine, including network traffic. When the network traffic reaches a certain threshold, it will call the police in advance, confirm and expand in time.

  6. Frequent short connections . Frequent short connections will cause Redis to spend a lot of time on connection establishment and release, and TCP's three-way handshake and four-way handshake will also increase access delay. Applications should use long connections to operate Redis to avoid frequent short connections.

Why is the maximum slot number of Redis cluster 16384?

Redis Cluster adopts the data fragmentation mechanism and defines 16384 Slot slots. Each Redis instance in the cluster is responsible for maintaining a part of the slots and the key-value data mapped to the slots.

Each Redis node periodically sends ping/pong messages (heartbeat packets contain data from other nodes) to exchange data information.

The nodes of the Redis cluster will send ping messages according to the following rules:

  • (1) 5 nodes are randomly selected every second, and the node that has not communicated for the longest time is found to send a ping message
  • (2) The local node list will be scanned every 100 milliseconds, and if it is found that the last time the node received a pong message is greater than cluster-node-timeout/2, a ping message will be sent immediately

There is a char array of myslots in the message header of the heartbeat packet, which is a bitmap, and each bit represents a slot. If the bit is 1, it means that the slot belongs to this node.

Next, answer why the maximum number of slots in the Redis cluster is 16384 instead of 65536.

1. If 16384 slots are used, the header of the heartbeat packet occupies 2KB (16384/8); if 65536 slots are used, the header of the heartbeat packet occupies 8KB (65536/8). It can be seen that 65536 slots are used, and the header of sending heartbeat information reaches 8k, which is a waste of bandwidth .

2. Generally, a Redis cluster will not have more than 1000 master nodes , too many may cause network congestion.

3. The hash slot is saved in the form of a bitmap, and the bitmap is compressed during transmission. The lower the fill rate of the bitmap, the higher the compression rate . Where bitmap filling rate = slots / N (N represents the number of nodes). So, the lower the slot count, the lower the fill ratio and the higher the compression ratio.

Guess you like

Origin blog.csdn.net/Tyson0314/article/details/130334398