Summary of Java interview questions | Summary of Java interview questions 8- Redis module (continuously updated)

Redis

Article directory

redis thread model

Redis uses a file event processor internally file event handler. This file event processor is single-threaded , so Redis is called a single-threaded model. It uses an IO multiplexing mechanism to listen to multiple sockets at the same time, and pushes the sockets that generate events into the memory queue. The event dispatcher selects the corresponding event processor for processing based on the event type on the socket.

The structure of the file event handler contains 4 parts:

  • multiple sockets
  • IO multiplexer
  • File event dispatcher
  • Event handler (connection response handler, command request handler, command reply handler)

Multiple sockets may generate different operations concurrently, and each operation corresponds to a different file event. However, the IO multiplexer will listen to multiple sockets and put the sockets that generate events into the queue. The event dispatcher will queue each time Remove a socket from the queue and hand it to the corresponding event processor for processing according to the event type of the socket.

Let’s look at a communication process between the client and Redis:

Redis-single-thread-model

You must understand that communication is completed through sockets. Students who do not understand can first take a look at socket network programming.

First, when the Redis server process is initialized, AE_READABLEthe events of the server socket are associated with the connection response processor.

Client socket01 requests the server socket of the Redis process to establish a connection. At this time, the server socket will generate an AE_READABLEevent. After the IO multiplexer listens to the event generated by the server socket, it will push the socket into the queue. The file event dispatcher obtains the socket from the queue and hands it to the connection response processor . The connection response handler will create a socket01 that can communicate with the client, and associate the socket01 AE_READABLEevent with the command request handler.

Assume that the client sends a set key valuerequest at this time. At this time, socket01 in Redis will generate AE_READABLEan event. The IO multiplexer pushes socket01 into the queue. At this time, the event dispatcher obtains the AE_READABLEevent generated by socket01 from the queue. Due to the previous socket01 The AE_READABLEevent is already associated with the command request handler, so the event dispatcher hands the event to the command request handler for processing. The command requests the processor to read socket01 key valueand complete the settings in its own memory key value. After the operation is completed, it associates socket01's AE_WRITABLEevent with the command reply handler.

If the client is ready to receive the return result at this time, socket01 in Redis will generate an AE_WRITABLEevent, which is also pushed into the queue. The event dispatcher finds the associated command reply processor, and the command reply processor inputs this time to socket01. A result of the operation, for example ok, the event of socket01 is subsequently disassociated AE_WRITABLEfrom the command reply handler.

The difference between Redis and Mysql

Summary of the differences between Redis and MySQL

Type: Mysql is a relational database, redis is a non-relational database;

Data storage: Mysql is persisted to the hard disk, which is relatively slow. Redis saves the data in memory, and the reading speed is relatively fast.

Business scenario: Mysql and redis need to be selected according to the specific business. Generally speaking, redis and Mysql need to be used together; redis is suitable for storing some hot data because of its fast reading and writing speed, such as rankings, counters, and message queue push.

What is the difference between Redis and traditional relational databases?

Redis is a NoSQL database based on key-value pairs, and the value of the key-value pair is composed of a variety of data structures and algorithms . Redis data is stored in memory, so its speed is amazing, with read and write performance up to 100,000/second, far exceeding that of relational databases.

Relational database stores data based on two-dimensional data tables. Its data format is more rigorous and supports relational queries. Relational database data is stored on disk and can store massive amounts of data, but its performance is far inferior to Redis.

Common data structures in Redis

String: The bottom layer of the String structure is a simple dynamic string that supports expansion and storage of strings.

Hash: The hash type stores key-value pairs, and the underlying data structures are ziplist and hash.

list: list stores linearly ordered and repeatable elements. The underlying data structure is a doubly linked list/compressed list.

set: set stores non-repeatable elements, generally used to find intersections, differences, etc. The underlying data structure is hash and integer array

zset: zset stores ordered and non-repeatable elements. zset adds a score attribute to each element as a sorting basis. The underlying data structure is ziplist and jump list.

geospatioal (geographic location) Hyperloglog (radix) Bitmaps (bit storage)

zset data structure

There are two encoding schemes for ordered collection objects. When the following conditions are met at the same time, the collection object uses ziplist encoding, otherwise it uses skiplist encoding:

  • The number of elements stored in an ordered set does not exceed 128;
  • All elements stored in an ordered set have member lengths less than 64 bytes.

The underlying data structures of the zset object include: compressed list, dictionary, and jump table.

rehash process in Redis

The implementation of Redis dictionary mainly involves three structures: dictionary, hash table, and hash table node. Among them, each hash table node saves a key-value pair, each hash table is composed of multiple hash table nodes, and the dictionary is a further encapsulation of the hash table. The relationship between these three structures is shown in the figure below:

[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-25rzUfoJ-1682835430066) (D:/Study/JAVA/Interview/Interview Questions Organized Version.assets/redis- hash-3-20220225114140138.png)]

dict represents the dictionary, dicttht represents the hash table, and dictEntry represents the hash table node.

A dictionary contains two hash tables. In Redis, expanding and shrinking hash tables is implemented through REHASH. The general steps for executing REHASH are as follows:

  1. Allocate memory space for the ht[1] hash table of the dictionary

    If an expansion operation is performed, the size of ht[1] is the first 2n that is greater than or equal to ht[0].used*2. If a shrink operation is performed, the size of ht[1] is the first 2n that is greater than or equal to ht[0].used.

  2. Migrate data stored in ht[0] to ht[1]

    Recalculate the hash value and index value of the key, and then place the key-value pair at the specified position in the ht[1] hash table.

  3. Promote the dictionary's ht[1] hash table to the default hash table

    After the migration is completed, clear ht[0], and then exchange the values ​​​​of ht[0] and ht[1] to prepare for the next REHASH.

The program automatically starts expanding the hash table when any of the following conditions are met:

  1. The server is not currently executing the bgsave or bgrewriteof command, and the load factor of the hash table is greater than or equal to 1;
  2. The server is currently executing the bgsave or bgrewriteof command, and the load factor of the hash table is greater than or equal to 5.

In order to avoid the impact of REHASH on server performance, the REHASH operation is not completed once, but is completed multiple times and gradually. The detailed process of progressive REHASH is as follows:

  1. Allocate space for ht[1], so that the dictionary holds two hash tables ht[0] and ht[1] at the same time;
  2. The index counter rehahidx in the dictionary is set to 0, indicating that the REHASH operation officially begins;
  3. During REHASH, every time the dictionary is added, deleted, modified, or searched, in addition to performing the specified operations, the program will also migrate all key-value pairs located on rehashidx in ht[0] to ht[1] , then add 1 to the value of rehasidx;
  4. As the dictionary continues to be accessed, eventually at a certain moment, all key-value pairs on ht[0] are migrated to ht[1]. At this time, the program sets the rehashidx attribute value to -1, indicating that the REHASH operation is completed.

During REHSH, the dictionary holds two hash tables at the same time, and access at this time will be handled according to the following principles:

  1. Newly added key-value pairs will be saved in ht[1];
  2. Other operations such as deletion, modification, and search will be performed on two hash tables. That is, the program first tries to access the data to be operated on in ht[0]. If it does not exist, it will access it in ht[1], and then access it. The received data is processed accordingly.

Why doesn’t redis consider thread safety?

Redis is a single-threaded program, so it is thread-safe. Although Redis 6.0 has added a multi-threading model, the added multi-threads are only used to process network IO events. The execution process of instructions is still handled by the main thread, so there will not be multiple threads to notify the execution of operation instructions. Case.

Why is Redis single-threaded so fast?

  • Redis is based on memory, and the memory read and write speed is very fast;
  • Redis is single-threaded, avoiding unnecessary context switches and race conditions;
  • Redis uses multiplexing technology to handle concurrent connections. The internal implementation of non-blocking I/O uses epoll, using a simple event framework implemented by epoll+ itself. Read, write, close, and connect in epoll are all converted into events, and then use the multiplexing feature of epoll to never waste any time on io.

Why is Redis single-threaded?

1. Official answer

Because Redis is a memory-based operation, the CPU is not the bottleneck of Redis. The bottleneck of Redis is most likely the size of the machine memory or the network bandwidth. Since single-threading is easy to implement and the CPU will not become a bottleneck, it is logical to adopt a single-threaded solution.

2.Performance indicators

Regarding the performance of Redis, the official website also has it. An ordinary notebook can easily handle hundreds of thousands of requests per second.

3. Detailed reasons

1) No performance consumption of various locks is required

The data structures of Redis are not all simple Key-Value. There are also complex structures such as list and hash. These structures may perform very fine-grained operations, such as adding an element after a long list, adding an element to the hash. Or delete an object. These operations may require adding a lot of locks, resulting in a greatly increased synchronization overhead.

In short, in the case of a single thread, there is no need to consider various lock issues. There is no locking and releasing lock operations, and there is no performance consumption caused by possible deadlocks.

2) Single-threaded multi-process cluster solution

The power of single thread is actually very powerful, and the core efficiency is also very high. Multi-threading can naturally have a higher performance limit than single thread. However, in today's computing environment, even the upper limit of single-machine multi-threading often cannot meet the needs. Yes, what needs to be further explored is multi-server clustering solutions. Multi-threading technology is still not available in these solutions.

Therefore, a single-threaded, multi-process cluster is a fashionable solution.

3)CPU consumption

Using a single thread avoids unnecessary context switching and race conditions, and there is no switching caused by multiple processes or threads that consumes the CPU.

But what if the CPU becomes the bottleneck of Redis, or you don’t want other CPU cores of the server to be idle?

You can consider starting several more Redis processes. Redis is a key-value database, not a relational database, and there are no constraints between data. As long as the client distinguishes which keys are placed in which Redis process, it will be fine.

redis big key deletion

Big key (bigkey) means that the value of the key is huge. For example Hashes, Sorted Sets, Lists, Sets, over time, it will become very large, maybe tens or hundreds of MB, or even GB.

If you directly use delthe command to delete such large keys, it will cause long-term blocking or even crash.

Because delwhen the command deletes set type data, the time complexity is O(M), M is the number of elements in the set.

Redis is single-threaded. If a single command takes too long to execute, it will block other commands and easily cause an avalanche.

solution

  • Progressive deletion

    To delete in batches, use the scan command to traverse the large key , obtain a small number of elements each time, delete them, and then obtain and delete the next batch of elements.

    Hash key: Use the hscan command to obtain 500 fields each time, and then use the hdel command;

    set key: Use the sscan command to scan 500 elements in the set each time, and then use the srem command to delete one element at a time;

    list key: Delete large List keys without using the scan command; delete a small number of elements at a time through the ltrim command.

    sorted set key: Delete a large ordered set key, similar to List. Use the zremrangebyrank command that comes with sortedset to delete the top 100 elements each time.

  • UNLINK (version 4.0 and later).

    Redis 4.0 introduces an important command UNLINKto save delthe dilemma of deleting large keys.

    UNLINK working ideas:

    (1) Delete the key in all namespaces and return immediately without blocking.

    (2) The background thread performs the real operation of releasing space.

    UNLINKBasically it can be replaced del, but it is still needed in some scenarios del. For example, it is not suitable to be used when the space occupation is accumulated very quickly UNLINKbecause UNLINKthe space is not released immediately.

High availability and high concurrency of Redis

The realization of high concurrency in redis mainly relies on the master-slave architecture , one master and multiple slaves. Generally speaking, it is enough for many projects. A single master is used to write data, and a single machine has tens of thousands of QPS. Multiple slaves are used to query data, and multiple slave instances are used. Can provide 10w QPS per second.

If you want to achieve high concurrency while accommodating a large amount of data, you need a redis cluster. After using the redis cluster, you can provide hundreds of thousands of read and write concurrency per second.

Redis is highly available. If it is deployed in a master-slave architecture, then just add sentinels. This can be achieved. If any instance goes down, master-slave switching can be performed.

Redis master-slave replication

  • Redis copies data to the slave node asynchronously , but starting from Redis 2.8, the slave node will periodically confirm the amount of data it copies each time;
  • A master node can be configured with multiple slave nodes;
  • Slave nodes can also connect to other slave nodes;
  • When the slave node replicates, it will not block the normal work of the master node;
  • When the slave node is replicating, it will not block its own query operations. It will use the old data set to provide services; but when the replication is completed, it needs to delete the old data set and load the new data set. At this time, it will External services have been suspended;
  • The slave node is mainly used for horizontal expansion and separation of reading and writing. The expanded slave node can improve the read throughput .

Note that if a master-slave architecture is adopted, it is recommended that the master node's persistence must be turned on . It is not recommended to use the slave node as the data hot backup of the master node, because in that case, if you turn off the master's persistence, the master may be down. The data is empty when restarting, and then the data of the slave node may be lost as soon as it is copied.

In addition, various backup plans for the master also need to be implemented. In case all the local files are lost, select an RDB from the backup to restore the master. This will ensure that there is data at startup . Even if the high availability mechanism explained later is adopted , the slave node can automatically take over the master node. But it is also possible that the master node has automatically restarted before sentinel detects the master failure, or it may cause all the slave node data above to be cleared.

The core principle of Redis master-slave replication

When a slave node is started, it sends a PSYNCcommand to the master node.

If this is the first time that the slave node connects to the master node, a full resynchronizationfull copy will be triggered. At this time, the master will start a background thread and start generating a RDBsnapshot file. At the same time, it will also cache all write commands newly received from the client in the memory. RDBAfter the file is generated, the master will RDBsend this to the slave. The slave will first write it to the local disk, and then load it from the local disk into the memory . Then the master will send the write commands cached in the memory to the slave, and the slave will also synchronize these. data. If the slave node has a network failure with the master node and is disconnected, it will automatically reconnect. After the connection, the master node will only copy the missing data to the slave.

Redis-master-slave-replication

Master-slave replication breakpoint resume

Starting from Redis 2.8, breakpoint resumption of master-slave replication is supported. If the network connection is disconnected during the master-slave replication process, you can continue copying from the last copy instead of starting from the beginning. .

The master node will maintain a backlog in the memory. Both the master and the slave will save a replica offset and a master run id. The offset is saved in the backlog. If the network connection between the master and the slave is disconnected, the slave will ask the master to continue replicating from the last replica offset. If the corresponding offset is not found, it will be executed once resynchronization.

Locating the master node based on host+ip is unreliable. If the master node restarts or the data changes, the slave node should be distinguished based on different run ids.

Expired key processing

The slave will not expire the key, but will only wait for the master to expire the key. If the master expires a key or eliminates a key through LRU, a del command will be simulated and sent to the slave.

The complete process of copying

When the slave node starts, it will save the master node's information locally, including the master node's hostand ip, but the replication process has not started.

There is a scheduled task inside the slave node, which checks every second whether there is a new master node to connect and copy. If found, it establishes a socket network connection with the master node. Then the slave node sends pingthe command to the master node. If the master sets requirepass, the slave node must send the masterauth password for authentication. The master node performs full replication for the first time and sends all data to the slave node. In the follow-up, the master node continues to asynchronously copy write commands to the slave node.

Redis-master-slave-replication-detail

Full copy

  • The master executes bgsave to generate an RDB snapshot file locally.
  • The master node sends the rdb snapshot file to the slave node. If the rdb copy time exceeds 60 seconds (repl-timeout), then the slave node will think that the copy has failed, and you can increase this parameter appropriately (for machines with Gigabit network cards, generally every second Transferring 100MB, 6G files may take more than 60s)
  • When the master node generates the rdb, it will cache all new write commands in the memory. After the slave node saves the rdb, it will copy the new write commands to the slave node.
  • If during copying, the memory buffer continues to consume more than 64MB, or exceeds 256MB at one time, copying will stop and copying will fail.
client-output-buffer-limit slave 256MB 64MB 60Copy to clipboardErrorCopied
  • After the slave node receives the rdb, it clears its old data and then reloads the rdb into its own memory. Note that before clearing the old data, the slave node will still provide external services based on the old data version .
  • If the slave node has AOF enabled, BGREWRITEAOF will be executed immediately to rewrite the AOF.

incremental copy

  • If the master-slave network connection is disconnected during full replication, incremental replication will be triggered when the slave reconnects to the master.
  • The master directly obtains part of the lost data from its own backlog and sends it to the slave node. The default backlog is 1MB.
  • The master obtains data from the backlog based on the offset in the psync sent by the slave.

heartbeat

The master and slave nodes will send heartbeat information to each other.

The master sends a heartbeat every 10 seconds by default, and the slave node sends a heartbeat every 1 second.

Asynchronous replication

Each time the master receives a write command, it first writes the data internally and then sends it asynchronously to the slave node.

Things to note when using redis

  • Set the expiration time for the Key, and pay attention to the keys of different businesses, and try to spread the expiration time as much as possible

    • It is generally recommended to use expire to set the expiration time.
    • If a large number of keys expire at a certain point in time, Redis may be stuck or even have a cache avalanche at the time of expiration . Therefore, generally, the expiration times of keys of different businesses should be dispersed. Sometimes, if you are in the same business, you can also add a random value to the time to spread the expiration time.
  • It is recommended to use batch operations to improve efficiency

    • The execution of a command by the Redis client can be divided into 4 processes: 1. Send command -> 2. Command queuing -> 3. Command execution -> 4. Return the result. 1 and 4 are called RRT (command execution round trip time). Redis provides batch operation commands, such as mget, mset, etc., which can effectively save RRT. However, most commands do not support batch operations, such as hgetall, and mhgetall does not exist. Pipeline can solve this problem.
  • Pay attention to using the del command

    • If you delete a String type key, the time complexity is 0. O(1)You can delete it directly .
    • If you delete a List/Hash/Set/ZSet type, its complexity is O(n), n represents the number of elements.
    • If it is a List type, you can execute it lpop或者rpopuntil all elements are deleted.
    • If it is a Hash/Set/ZSet type, you can execute hscan/sscan/scanthe query first, and then hdel/srem/zremdelete each element in sequence.

    21 points you must know when using Redis - Zhihu (zhihu.com)

Redis cluster has several modes

  • master-slave mode

    • effect
      • Read and write separation: master writes, slave degree, improves the read and write load capacity of the server
      • Load balancing: Based on the master-slave structure, combined with read-write separation, the slave shares the master load, changes the slave data according to changes in demand, and shares the data reading load through multiple nodes, greatly improving the concurrency and data throughput of the Redis server. quantity
      • Fault recovery: When a problem occurs with the master, the slave provides services to achieve rapid fault recovery.
      • Data redundancy: Implementing hot data backup is a data redundancy method other than persistence.
      • The cornerstone of high availability: Based on master-slave replication, build sentinel mode and cluster to implement Redis high availability solution
  • Sentry mode

    • Redis Sentinel is a distributed architecture that contains several sentinel nodes and data nodes. Each sentinel node will monitor the data node and other sentinel nodes. When it is found that the node is unreachable, it will mark the node offline. If the identified master node is the master node, it will negotiate with other sentinel nodes. When most of the sentinel nodes believe that the master node is unreachable, they will elect a sentinel node to complete the automatic failover work and also Notify the application side of this change in real time. The entire process is automatic and does not require manual intervention, effectively solving the high availability problem of Redis!

      A group of sentinels can monitor one master node or multiple master nodes at the same time. The topology of the two cases is as follows:

      [The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-ZzU3aOMO-1682835430067) (D:/Study/JAVA/Interview/Interview Questions Organized Version.assets/redis- 7-16626934464042.png)]

      ​The role of sentry:

      • Sentinels will regularly monitor data nodes to see if other sentinel nodes are reachable.

      • When the sentry discovers that the master is down, it will automatically upgrade the salve to the master, and then use the publish-subscribe mode to let other slave servers modify the configuration and switch hosts. This can only be done when multiple nodes think it is unreachable.

        advantage:

        1. The master and slave can be switched, and it can be used even if one fails. The system has good availability.
        2. Composed of multiple sentinel nodes, the system is more robust

        shortcoming:

        • Expansion is troublesome, configuration is also troublesome
  • Cluster mode

    • When redis cluster was designed, decentralization and middleware were taken into consideration. That is to say, every node in the cluster has an equal relationship and is peer-to-peer. Each node saves its own data and the entire The status of the cluster. Each node is connected to all other nodes, and these connections remain active, which ensures that we only need to connect to any node in the cluster to obtain data from other nodes.

    • So how does redis allocate these nodes and data reasonably?

      Redis cluster does not use traditional consistent hashing to distribute data, but uses another method called hash slot to distribute data . Redis cluster is allocated 16384 slots by default. When we set a key, we will use the CRC16 algorithm to take the modulus to get the corresponding slot, and then divide the key into the nodes in the hash slot interval. The specific algorithm is: CRC16(key) % 16384.

      In the redis-cluster architecture, the redis-master node is generally used to receive reads and writes, while the redis-slave node is generally only used for backup. It has the same slot set as the corresponding master. If a redis-master fails unexpectedly , Then upgrade its corresponding slave to a temporary redis-master.

(6 messages) Three ways of redis cluster_Notail 0 's blog-CSDN blog_Three ways of redis cluster

Let’s talk about Redis’ persistence strategy

RDB AOF RDB-AOF

RDB: Use snapshots to persist data to the hard disk. Redis uses snapshots to save a copy of the data in memory at a certain time node. It is the default persistence mechanism of redis.

AOF: records every command to modify data in an independent log. In order to take into account data security and performance, you can choose to synchronize the AOF file every second.

Advantages of RDP: The generated .rdb file is small in size and fast in persistence; Disadvantages: Poor real-time performance and easy data loss

Advantages of AOF: real-time performance is better than RBD. It uses a synchronization mechanism once a second and only loses data within one second at most. Disadvantages: large file size and slow repair speed

RDB-AOF hybrid: Taking into account the advantages of RDB and AOF, the RDB file is written directly to the beginning of the AOF file during rewriting, and the redis command after rewriting is appended to the end of the AOF file.

Through RDB or AOF, the data in Redis memory can be persisted to the disk, and then the data can be backed up to other places, such as Alibaba Cloud and other cloud services.

If Redis hangs, the memory and disk data on the server will be lost. You can copy the previous data back from the cloud service, put it in the specified directory, and then restart Redis. Redis will automatically persist the data file based on data in the memory to restore the data in the memory and continue to provide services to the outside world.

If both RDB and AOF persistence mechanisms are used at the same time, AOF will be used to reconstruct the data when Redis restarts , because the data in AOF is more complete .

Advantages and Disadvantages of AOF and RDB

RDB pros and cons

  • RDB will generate multiple data files, each data file represents the Redis data at a certain moment. This method of multiple data files is very suitable for cold backup . This complete data file can be sent to some remote Secure storage, such as Amazon's S3 cloud service, or Alibaba Cloud's ODPS distributed storage in China, uses a predetermined backup strategy to regularly back up the data in Redis.
  • RDB has very little impact on the external read and write services provided by Redis, allowing Redis to maintain high performance because the Redis main process only needs to fork a child process and let the child process perform disk IO operations for RDB persistence.
  • Compared with the AOF persistence mechanism, it is faster to restart and restore the Redis process directly based on RDB data files.
  • If you want to lose as little data as possible when Redis fails, then RDB is not as good as AOF. Generally speaking, RDB data snapshot files are generated every 5 minutes or more. At this time, you have to accept that once the Redis process goes down, the data of the last 5 minutes (or even longer) will be lost.
  • Every time RDB forks a child process to generate an RDB snapshot data file, if the data file is particularly large, it may cause the service provided by the client to be suspended for several milliseconds, or even seconds.

AOF advantages and disadvantages

  • AOF can better protect data from loss. Generally, AOF will perform an fsyncoperation through a background thread every 1 second, and lose 1 second of data at most.
  • AOF log files are append-onlywritten in mode, so there is no disk addressing overhead, the writing performance is very high, and the file is not easily damaged. Even if the tail of the file is damaged, it is easy to repair.
  • Even if the AOF log file is too large, background rewrite operations will not affect the client's reading and writing. Because rewritewhen logging, the instructions will be compressed to create a minimum log that needs to restore the data. When creating a new log file, the old log file is still written as usual. When the new merged log file is ready, just exchange the old and new log files.
  • The commands of the AOF log file are recorded in a highly readable manner. This feature is very suitable for emergency recovery of catastrophic accidental deletion . For example, if someone accidentally flushallclears all the data using the command, as long as the background operation rewritehas not occurred at this time, the AOF file can be copied immediately, the last command flushallis deleted, and then the AOFfile is put back, and the recovery mechanism can be used. Automatically recover all data.
  • For the same piece of data, AOF log files are usually larger than RDB data snapshot files.
  • After AOF is turned on, the supported write QPS will be lower than the write QPS supported by RDB, because AOF is generally configured to fsynclog files once per second. Of course, once per second fsync, the performance is still very high. (If writing is done in real time, QPS will drop significantly and Redis performance will be greatly reduced)
  • A bug has occurred in AOF before. When data was restored through the logs recorded by AOF, the exact same data was not recovered. Therefore, a more complex mergemethod based on command log playback like AOF is more fragile and prone to bugs than the method based on RDB that persists a complete data snapshot file each time. However, AOF is to avoid bugs caused by the rewrite process. Therefore, each rewrite is not merged based on the old instruction log, but the instructions are reconstructed based on the data in the memory at that time , which will make the robustness much better.

How to choose between RDB and AOF

  • Don't just use RDB because that will cause you to lose a lot of data;
  • Don't just use AOF, because there are two problems: first, you use AOF for cold backup, and the recovery speed is faster without RDB for cold backup; second, RDB simply and roughly generates data snapshots each time, which is more robust. It can avoid the bugs of complex backup and recovery mechanisms like AOF;
  • Redis supports enabling two persistence methods at the same time. We can use the two persistence mechanisms of AOF and RDB. AOF can be used to ensure that data is not lost as the first choice for data recovery; RDB can be used for different degrees of cold backup. When the AOF files are lost or damaged and unavailable, RDB can also be used for fast data recovery.

Redis cluster has several modes

  • master-slave mode

    • effect
      • Read and write separation: master writes, slave degree, improves the read and write load capacity of the server
      • Load balancing: Based on the master-slave structure, combined with read-write separation, the slave shares the master load, changes the slave data according to changes in demand, and shares the data reading load through multiple nodes, greatly improving the concurrency and data throughput of the Redis server. quantity
      • Fault recovery: When a problem occurs with the master, the slave provides services to achieve rapid fault recovery.
      • Data redundancy: Implementing hot data backup is a data redundancy method other than persistence.
      • The cornerstone of high availability: Based on master-slave replication, build sentinel mode and cluster to implement Redis high availability solution
  • Sentry mode

    • Redis Sentinel is a distributed architecture that contains several sentinel nodes and data nodes. Each sentinel node will monitor the data node and other sentinel nodes. When it is found that the node is unreachable, it will mark the node offline. If the identified master node is the master node, it will negotiate with other sentinel nodes. When most of the sentinel nodes believe that the master node is unreachable, they will elect a sentinel node to complete the automatic failover work and also Notify the application side of this change in real time. The entire process is automatic and does not require manual intervention, effectively solving the high availability problem of Redis!

      A group of sentinels can monitor one master node or multiple master nodes at the same time. The topology of the two cases is as follows:

      [External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-ZdCAZ83v-1682835430067) (D:/Study/JAVA/Interview/Interview Questions Organized Version.assets/redis- 7.png)]

      ​The role of sentry:

      • Sentinels will regularly monitor data nodes to see if other sentinel nodes are reachable.

      • When the sentry discovers that the master is down, it will automatically upgrade the salve to the master, and then use the publish-subscribe mode to let other slave servers modify the configuration and switch hosts. This can only be done when multiple nodes think it is unreachable.

        advantage:

        1. The master and slave can be switched, and it can be used even if one fails. The system has good availability.
        2. Composed of multiple sentinel nodes, the system is more robust

        shortcoming:

        • Expansion is troublesome, configuration is also troublesome
  • Cluster mode

    • When redis cluster was designed, decentralization and middleware were taken into consideration. That is to say, every node in the cluster has an equal relationship and is peer-to-peer. Each node saves its own data and the entire The status of the cluster. Each node is connected to all other nodes, and these connections remain active, which ensures that we only need to connect to any node in the cluster to obtain data from other nodes.
      So how does redis allocate these nodes and data reasonably?
      Redis cluster does not use traditional consistent hashing to distribute data, but uses another method called hash slot to distribute data. Redis cluster is allocated 16384 slots by default. When we set a key, we will use the CRC16 algorithm to take the modulus to get the corresponding slot, and then divide the key into the nodes in the hash slot interval. The specific algorithm is: CRC16(key) % 16384.

      In the redis-cluster architecture, the redis-master node is generally used to receive reads and writes, while the redis-slave node is generally only used for backup. It has the same slot set as the corresponding master. If a redis-master fails unexpectedly, Then upgrade its corresponding slave to a temporary redis-master.

Redis Sentinel Cluster achieves high availability

Introduction to Sentinel

Sentinel, the Chinese name is Sentinel. Sentinel is a very important component in the Redis cluster architecture. It mainly has the following functions:

  • Cluster monitoring: Responsible for monitoring whether the Redis master and slave processes are working properly.
  • Message notification: If a Redis instance fails, Sentinel is responsible for sending messages as alarm notifications to the administrator.
  • Failover: If the master node hangs, it will automatically be transferred to the slave node.
  • Configuration center: If failover occurs, notify the client of the new master address.

Sentinel is used to achieve high availability of Redis cluster. It is also distributed. It runs as a sentinel cluster and works together with each other.

  • During failover, determining whether a master node is down requires the consent of most sentinels, which involves the issue of distributed election.
  • Even if some sentinel nodes fail, the sentinel cluster can still work normally, because if a failover system, which is an important part of the high availability mechanism, is a single point, it will be very confusing.

Sentinel master-slave switching causes data inconsistency

Two situations

The process of active/standby switchover may result in data loss:

  • Data loss caused by asynchronous replication

Because the master->slave replication is asynchronous, some data may not be copied to the slave before the master crashes, and this part of the data is lost.

async-replication-data-lose-case

  • Data loss due to split brain

Split-brain, that is to say, the machine where a certain master is located suddenly leaves the normal network and cannot connect to other slave machines, but in fact the master is still running. At this time, the sentry may think that the master is down, and then start the election and switch other slaves to the master. At this time, there will be two masters in the cluster, which is the so-called split brain .

Although a slave has been switched to the master at this time, the client may not have time to switch to the new master and continues to write data to the old master. Therefore, when the old master is restored again, it will be hung to the new master as a slave, its own data will be cleared, and the data will be copied from the new master again . The new master does not have the data written by the client later, so this part of the data is lost.

Redis-cluster-split-brain

solution

Configure as follows:

min-slaves-to-write 1
min-slaves-max-lag 10Copy to clipboardErrorCopied

Indicates that at least 1 slave is required, and the delay in data replication and synchronization cannot exceed 10 seconds.

If once all slaves, data replication and synchronization delays exceed 10 seconds, then at this time, the master will no longer receive any requests.

  • Reduce the loss of asynchronously replicated data

With min-slaves-max-lagthis configuration, it can be ensured that once the slave copies data and the ack delay is too long, it is considered that too much data may be lost after the master goes down, and then the write request is rejected. This way, the master goes down due to part of the The data loss caused by the data not being synchronized to the slave is reduced within the controllable range.

  • Reduce split-brain data loss

If a master has a brain split and loses connection with other slaves, then the above two configurations can ensure that if it cannot continue to send data to the specified number of slaves, and the slave does not send itself an ack message for more than 10 seconds, then it will be rejected directly. Client write request. Therefore, in a split-brain scenario, up to 10 seconds of data are lost.

What is the difference between cache penetration, cache breakdown, and cache avalanche, and how to solve them?

Reference answer

Cache penetration:

Problem Description:

A malicious request for data that does not exist will not be hit by the cache, and the subsequent query to the database will not be able to return the query results, so it will not be written to the cache. This will cause every request to fall on the database, causing cache corruption. through

solution:

  1. Cache empty objects: After the storage layer misses, the null value will still be stored in the cache layer. When the client accesses the data again, the cache layer will directly return the null value.
  2. Bloom filter: Store the data in the Bloom filter and intercept it with the filter before accessing the cache. If the requested data does not exist, a null value will be returned directly.

Cache breakdown:

Problem Description:

A piece of hot data with a very large number of visits. At the moment when its cache expires, a large number of requests go directly to the storage layer, causing the service to crash.

solution:

  1. Never expires: Hotspot data does not set an expiration time, so the above problems will not occur. This is a "physical" never-expiration. Or set a logical expiration time for each data, and use a separate thread to rebuild the cache when the data is found to be logically expired.
  2. Add a mutex lock: Add a mutex lock when accessing data. When one thread accesses the data, other threads can only wait. After this thread accesses, the data in the cache will be reconstructed, and then other threads can directly access the value from the cache.

Cache avalanche:

Problem Description:

The cache fails in a large area at the same time, and subsequent requests will fall on the database, causing the database to withstand a large number of requests in a short period of time and collapse. It may be that a large amount of data in the cache expires at the same time, or it may be that the Redis node fails, resulting in a large number of requests. cannot be processed.

solution:

  1. Avoid data expiration at the same time: When setting the expiration time, append a random number to avoid a large number of keys from expiring at the same time.
  2. Enable downgrade and circuit breaker measures: When an avalanche occurs, if the application accesses other than core data, predefined information/null values/error information will be returned directly. Or when an avalanche occurs, for a request to access the cache interface, the client will not send the request to Redis, but return it directly.
  3. Build a highly available Redis service: Use sentinel or cluster mode to deploy multiple Redis instances. Even if individual nodes go down, the overall availability of the service can still be maintained.

How to ensure double-write consistency between cache and database?

The difference between local cache and redis cache

  1. In terms of read and write speed, regardless of concurrency issues, the local cache is naturally the fastest. But if the local cache is not locked, what should we do if it is concurrent? Therefore, we compare again in locking mode.
  2. In this scenario, the same data is taken out from the database and put into redis only once, but put into the local cache, it takes n cluster times.
  3. Local cache cannot be used for repeated clicks. Repeated clicks will distribute requests to multiple servers. Local cache can only prevent repeated clicks on the local machine. Redis can prevent it, but the time interval also needs to be beyond the read-write difference of redis.
  4. Redis memory can be expanded many times, and local expansion of heap memory is very costly.
  5. The local cache needs to implement the expiration function by itself, and poor implementation may lead to extremely serious consequences. However, redis has been verified by a large amount of traffic, and many loopholes do not require examination, making it safe.
  6. **Local cache cannot provide rich data structures. Redis has many data structures for easy operation, such as **hash set list zset, etc.
  7. Redis can write to disk and can be saved persistently. Some caches can still be used after you want to restart the program .
  8. The level of development students varies greatly. Using local cache is very likely to cause serious thread safety issues, and concurrency considerations are serious.
  9. After adding local cache, the code complexity increases sharply, and it is difficult for subsequent developers to understand the original development ideas at once. Indirectly increase maintenance costs.
  10. In fact, the time saved in map and redis values ​​may be nothing in the messy code we write. So sometimes we really don’t need to compare those few milliseconds!

Please introduce the expiration strategy of Redis

Redis supports the following two expiration strategies:

Lazy deletion: When the client accesses a key, Redis will first check its expiration time and delete the key immediately if it is found to be expired.

Regular deletion: Redis will put the key with an expiration time set into a separate dictionary, and perform expiration scans on the dictionary 10 times per second.

**Expiration scanning does not traverse all keys in the dictionary, but uses a simple greedy strategy. **The deletion logic of this policy is as follows:

  1. Randomly select 20 keys from the expired dictionary;
  2. Delete the expired keys among these 20 keys;
  3. If the proportion of expired keys exceeds 25%, repeat step 1.

This configuration is equipped with a memory elimination strategy, and there are mainly six options:

  • volatile-lru : Select the least recently used data from the data set (server.db[i].expires) with an expiration time set for elimination.
  • volatile-ttl : Select the data that will expire from the data set (server.db[i].expires) that has set expiration time and eliminate it.
  • volatile-random : arbitrarily select data for elimination from the data set (server.db[i].expires) with an expiration time set
  • allkeys-lru : Select the least recently used data from the data set (server.db[i].dict) to eliminate
  • allkeys-random : arbitrarily select data for elimination from the data set (server.db[i].dict)
  • no-enviction (eviction): It is forbidden to evict data. New write operations will report an error. When the memory exceeds maxmemory, write requests will report an error, but deletion and read requests can continue.

redis watch command

Many times, it is necessary to ensure that the data in the transaction has not been modified by other clients before executing the transaction. Redis provides the watch command to solve this type of problem, which is an optimistic locking mechanism. The client requests the server to monitor one or more keys through the watch command. If these keys change before the client executes the transaction, the server will refuse to execute the transaction submitted by the client and return a null value to it.

Common operations of redis list

  • lpush/rpush: add data from the left/right side of the list;
  • lrange: Specify the index range and return data within this range;
  • lindex: Returns the data at the specified index;
  • lpop/rpop: pop up a data from the left/right side of the list;
    ile-random**: select any data to eliminate from the data set (server.db[i].expires) with expiration time set
  • allkeys-lru : Select the least recently used data from the data set (server.db[i].dict) to eliminate
  • allkeys-random : arbitrarily select data for elimination from the data set (server.db[i].dict)
  • no-enviction (eviction): It is forbidden to evict data. New write operations will report an error. When the memory exceeds maxmemory, write requests will report an error, but deletion and read requests can continue.

redis watch command

Many times, it is necessary to ensure that the data in the transaction has not been modified by other clients before executing the transaction. Redis provides the watch command to solve this type of problem, which is an optimistic locking mechanism. The client requests the server to monitor one or more keys through the watch command. If these keys change before the client executes the transaction, the server will refuse to execute the transaction submitted by the client and return a null value to it.

Common operations of redis list

  • lpush/rpush: add data from the left/right side of the list;
  • lrange: Specify the index range and return data within this range;
  • lindex: Returns the data at the specified index;
  • lpop/rpop: Pop a data from the left/right side of the list;
  • blpop/brpop: Pop a data from the left/right side of the list. If the list is empty, it will enter the blocking state.

Guess you like

Origin blog.csdn.net/qq_43167873/article/details/130448265