6. Redis

6.1 Introduction to Redis

Simply put, Redis is a database, but unlike traditional databases, Redis data is stored in memory, so the storage and writing speed is very fast, so Redis is widely used in the cache direction. In addition, Redis is often used for distributed locks. Redis provides a variety of data types to support different business scenarios. In addition, Redis supports transactions, persistence, LUA scripts, LRU-driven events, and multiple clustering solutions.


6.2 Why use Redis / Why use cache

This problem is mainly viewed from two points: "high performance" and "high concurrency".

1. High performance

Suppose the user accesses some data in the database for the first time. This process will be slower because it is read from the hard disk. Store the data accessed by the user in the cache, so that the next time you access the data, you can get it directly from the cache . Operating the cache is to directly manipulate the memory, so the speed is quite fast . If the corresponding data in the database is changed, the corresponding data in the cache can be changed synchronously!

Insert picture description here

2. High concurrency

Direct operation of the cache can withstand requests far greater than direct access to the database, so we can consider transferring part of the data in the database to the cache, so that part of the user's requests will go directly to the cache without going through the database.

Insert picture description here


6.3 Why use Redis instead of map/guava for caching?

According to whether the cache belongs to the same process as the application process, the memory can be divided into local cache and distributed cache . The local cache is to cache data in the memory space of the same process, and the data read and write are all completed in the same process; while the distributed cache is an independently deployed process and is generally deployed on a different machine from the application process . Therefore, it is necessary to complete the data transmission of the distributed cache data read and write operations through the network.

Disadvantages of local cache:

  • The access speed is fast, but the big data cannot be stored
    . The advantage of local caching over distributed caching is that because the data does not need to be transmitted across the network, the performance is better, but because it takes up the memory space of the application process, such as the JVM memory of the Java process Space, so large data volume data storage cannot be carried out.

  • Cluster data update problem
    At the same time, the local cache can only be accessed by the application process, and generally cannot be accessed by other application processes. Therefore, in the cluster deployment of the application process, if the corresponding database data has data updates, it needs to be updated synchronously. The locally cached data of different deployment nodes are packaged to ensure data consistency, which is relatively complex and error-prone. For example, the Redis-based publish and subscribe mechanism is used to synchronize the update of each deployment node.

  • Data is lost with the restart of the application process.
    Because the locally cached data is stored in the memory space of the application process, when the application process restarts, the locally cached data will be lost. Therefore, for the data that needs to be persisted, you need to save it in time, otherwise it may cause data loss.

scenes to be used:

  • Local caching is generally suitable for caching read-only data , such as statistical data. Or the independent data of each deployment node , such as the persistent connection service, because each deployment node maintains a different connection, the data of each connection is independent and is deleted when the connection is disconnected.

  • If data needs to be shared and kept consistent in different deployment nodes of the cluster, a distributed cache needs to be used for unified storage , so that all application processes of the application cluster can access data in the unified distributed cache.

Taking Java as an example, using the built-in map or guava to achieve local caching, the main feature is light weight and fast speed, the life cycle ends with the destruction of the JVM, and in the case of multiple instances, each instance is Need to keep a copy of the cache, the cache is not consistent.

Using Redis or memcached is called a distributed cache. In the case of multiple instances, each instance shares a copy of cached data, and the cache is consistent. The disadvantage is that it is necessary to maintain the high availability of redis or memcached services, and the overall program architecture is more complicated.


6.4 The difference between Redis and Memcached

  • Redis supports richer data types (supports more complex application scenarios) : Redis not only supports simple k/v type data, but also provides storage for list, set, zset, hash and other data structures; Memcache only supports simple Data type , String.

  • Redis supports the persistence of data , which can keep the data in the memory on the disk, and can be loaded again for use when restarting; while Memecached does not support the persistence of data and stores all the data in the memory.

  • Cluster mode : memcached does not have a native cluster mode and needs to rely on the client to write data to the cluster fragments; however, Redis currently supports the cluster mode natively.

  • Memcached is a multi-threaded, non-blocking IO multiplexing network model; Redis uses a single-threaded multiple IO multiplexing model.

Insert picture description here

Supplement: relational and non-relational databases

  • Relational database : refers to a database that uses a relational model to organize data. Simply put, the relational model is a two-dimensional table model. Main representatives: SQL Server, Oracle, Mysql, PostgreSQL.

    • Advantages : (1) Two-dimensional table, easy to understand; (2) General sql statement, easy to use; (3) The database has ACID attributes, which is easy to maintain;
    • Disadvantages : (1) Low efficiency at high concurrency; (2) The difficulty of horizontal expansion is relatively high;
  • Non-relational database : mainly refers to those non-relational, distributed, and generally does not guarantee ACID data storage system, mainly representing MongoDB, Redis, CouchDB.

    • Advantages :
      (1) Key-value database for high-performance concurrent reading and writing. The main feature is extremely high concurrent read and write performance, such as Redis, Tokyo Cabint, etc.
      (2) Document-oriented database for mass data access. The characteristic is that it can quickly query data in massive databases. For example, MongoDB and CouchDB;
      (3) For scalable distributed databases. The main problem to be solved is the defect in the scalability of traditional databases.
    • Disadvantages : Because Nosql has fewer constraints, it cannot provide where field attribute queries like SQL does. Therefore, it is suitable for storing relatively simple data. Some data cannot be persisted, so they need to be combined with relational databases.

6.5 Redis common data structure and usage scenario analysis

1. String

Common commands: set, get, decr, incr, mget, etc.

The String data structure is a simple key-value type. In fact, the value can be not only a String, but also a number. Conventional key-value caching application; conventional counting: the number of Weibo, the number of fans, etc.

2. Hash

Common commands: hget, hset, hgetall, etc.

Hash is a string -> (field + value) mapping table. Hash is especially suitable for storing objects. During subsequent operations, you can directly modify only the value of a field in this object. For example, we can store user information, product information, etc. in a Hash data structure. For example, below I use the hash type to store some of my own information:

Insert picture description here

3. List

Common commands: lpush, rpush, lpop, rpop, lrange, etc.

List is a linked list. Redis list has many application scenarios, and it is also one of the most important data structures of Redis. For example, Weibo watchlist, fan list, message list and other functions can all be implemented with Redis list structure.

The Redis list is implemented as a doubly linked list, which can support reverse lookup and traversal, which is more convenient to operate, but it brings some additional memory overhead.

In addition, you can use the lrange command, which is how many elements to read from a certain element, and you can implement paging query based on the list. This is a great function, based on redis to achieve simple high-performance paging, and you can do continuous pull-downs like Weibo. Pagination (going down one page at a time), high performance.

4. Set

Common commands: sadd, spop, smembers, sunion, etc.

The external function provided by set is similar to that of list. The special feature is that set can automatically remove duplicates.

When you need to store a list of data and do not want duplicate data, set is a good choice, and set provides an important interface for judging whether a member is in a set collection, which is also not provided by list. The operation of intersection, union, and difference can be easily realized based on set.

For example, in a Weibo application, all the followers of a user can be stored in a collection, and all its fans can be stored in a collection. Redis can easily implement functions such as common attention, common fans, and common preferences. This process is also the process of finding the intersection. The specific commands are:
sinterstore key1 key2 key3 将交集存在key1内

5. Sorted Set

Common commands: zadd, zrange, zrem, zcard, etc.

Compared with set, sorted set adds a weight parameter score, so that the elements in the set can be sorted according to score.

For example: In the live broadcast system, the real-time ranking information includes online user lists in the live broadcast room, various gift rankings, barrage messages (which can be understood as message rankings based on message dimensions) and other information, suitable for storage using the SortedSet structure in Redis .


6.6 Redis set expiration time

There is a function to set expiration time in Redis, that is, an expiration time can be set for the value stored in the Redis database. As a cache database, this is very practical. For example, the token or some login information in our general projects, especially the SMS verification code, are time-limited. According to the traditional database processing method, it is generally judged to expire by oneself, which will undoubtedly seriously affect the performance of the project.

When we set the key, we can give an expire time, which is the expiration time. Through the expiration time, we can specify the time that the key can survive.

If you assume that you set a batch of keys that can only survive for 1 hour, how does redis delete this batch of keys after the next 1 hour?

Periodic deletion + lazy deletion.

  • Periodic deletion : By default , Redis randomly selects some keys with an expiration time set every 100ms, checks whether they expire, and deletes them if they expire . Note that this is randomly selected. Why should it be random? If you think about it, if redis stores hundreds of thousands of keys and traverses all keys with expiration time every 100ms, it will put a lot of load on the CPU!

  • Lazy deletion : Periodic deletion may result in many expired keys that are not deleted when the time is up, so there is a lazy deletion. If your expired key is not deleted by periodic deletion, it still stays in memory. Unless your system checks the key, it will be deleted by Redis . This is the so-called lazy deletion, and it is lazy enough!

But just setting the expiration time is still problematic. Let's think about it: If you delete a lot of expired keys on a regular basis, and then you don't check it in time, you don't go through the lazy deletion. What will happen at this time? If a large number of expired keys accumulate in memory, the redis memory block will be exhausted. How to solve this problem? At this time, Redis memory elimination mechanism is needed .


6.7 Redis memory elimination mechanism (There are 2000w data in MySQL, and only 20w data in Redis. How to ensure that the data in Redis are all hot data?)

Redis provides 6 data elimination strategies:

  1. Volatile-lru : select the least recently used data from the data set (server.db[i].expires) with an expiration time set to be eliminated
  2. volatile-ttl : Select the data to be expired from the data set (server.db[i].expires) that has set expiration time
  3. volatile-random : arbitrarily select data to be eliminated from the data set (server.db[i].expires) for which the expiration time has been set
  4. allkeys-lru : When the memory is insufficient to accommodate the newly written data, in the key space, remove the least recently used key (this is the most commonly used).
  5. allkeys-random : arbitrarily select data from the data set (server.db[i].dict) to eliminate
  6. no-eviction : It is forbidden to evict data, that is, when the memory is not enough to hold the newly written data, the new write operation will report an error. No one should
    use this!

6.8 Redis persistence mechanism (how to ensure that the data can be restored after Redis is hung up and restarted)

1. Snapshot (snapshotting) persistence (RDB)

The RDB method is to persist Redis data at a certain moment to disk, which is a snapshot-type persistence method. In the process of data persistence, Redis will first write the data to a temporary file. After the persistence process is over, this temporary file will be used to replace the last persisted file . It is this feature that allows us to back up at any time, because the snapshot file is always available in its entirety.

For the RDB method, Redis will create (fork) a separate child process for persistence, and the main process will not perform any IO operations, which ensures the extremely high performance of Redis.

Snapshot persistence is the persistence method adopted by Redis by default , and it is configured by default in the redis.conf configuration file:
Insert picture description here

2. AOF (append-only file) persistence

AOF persistence to log (ie, appendonly.aofin the form of files) to record each write operation , the Redis executed all write commands recorded (read operations are not recorded), we can only append files but can not overwrite the file, will start at the beginning of Redis Read the file to rebuild the data. In other words, if Redis restarts, it will execute the write command from front to back according to the content of the log file to complete the data recovery work .

The default AOF persistence strategy is fsync once per second (fsync refers to recording the write instructions in the cache to the disk), because in this case, Redis can still maintain good processing performance, even if Redis fails, Only the last 1 second of data will be lost.

Because the append method is adopted, if no processing is done, the AOF file will become larger and larger. For this reason, Redis provides an AOF file rewrite (rewrite) mechanism , that is, when the size of the AOF file exceeds the set threshold At this time, Redis will start the content compression of the AOF file, and only retain the minimum instruction set that can restore the data. For example, it may be more vivid. If we call the INCR instruction 100 times, 100 instructions will be stored in the AOF file, but this is obviously very inefficient. It is possible to combine these 100 instructions into one SET instruction. It is the principle of the rewrite mechanism .

3. The principle of AOF rewrite mechanism

  1. When the rewrite is about to begin, Redis will create (fork) a "rewrite sub-process", this sub-process will first read the existing AOF file, and the instructions contained in it will be analyzed and compressed and written to a temporary File.

  2. At the same time, the main work process will accumulate the newly received write commands into the memory buffer while continuing to write them into the original AOF file. This is to ensure the availability of the original AOF file and avoid repetition. An accident occurred during writing.

  3. When the "rewrite child process" completes the rewriting work, it will send a signal to the parent process, and the parent process will append the write instructions cached in the memory to the new AOF file after receiving the signal.

  4. When the append is over, redis will replace the old AOF file with the new AOF file, and then any new write commands will be appended to the new AOF file.

4. Summary of RDB and AOF

RDB:

  • Advantages : If large-scale data recovery is required, and the integrity of data recovery is not very sensitive, the RDB method is more efficient.
  • Disadvantages : RDB needs to be persisted regularly. The risk is that the data between two persistences may be lost, and the amount may be large (such as losing data within 5 minutes).

AOF:

  • Advantages : AOF has better real-time performance. If there is a problem, only 1s of data will be lost;
  • Disadvantages : In the case of the same data size, the AOF file is larger than the RDB file. Moreover, the recovery speed of the AOF method is also slower than that of the RDB method.

5. Hybrid persistence of RDB and AOF

Redis 4.0 began to support hybrid persistence of RDB and AOF.

If hybrid persistence is turned on, when AOF is rewritten , the first half of the new AOF file is the full data in RDB format, and the second half is incremental data in AOF format. The advantage of this is that it can combine the advantages of RDB and AOF to quickly load while avoiding excessive data loss.
Insert picture description here

When data is restored , the AOF file is still loaded first when Redis is started. There may be two situations for loading the AOF file as follows:

  • The beginning of the aof file is the RDB format, first load the RDB content and then load the remaining aof.

  • The beginning of the aof file is not in the RDB format, and the entire file is directly loaded in the aof format.


6.9 Redis transaction

Redis implements transaction functions through commands such as MULTI, EXEC, and WATCH. Transaction provides a mechanism to package multiple command requests, and then execute multiple commands one-time and sequentially. During the execution of the transaction, the server will not interrupt the transaction and execute the command requests of other clients instead. All the commands in the transaction are executed, and then the command requests from other clients are processed.

In traditional relational databases, ACID properties are often used to test the reliability and security of transaction functions. In Redis, transactions always have Atomicity, Consistency, and Isolation, and when Redis runs in a specific persistence mode, transactions also have Durability.


6.10 Cache avalanche and cache penetration problem solutions

1. Cache avalanche

The cache has a large-scale key failure at a certain moment (the expiration time is reached), then a large number of requests will be sent to the database, causing the database to withstand a large number of requests in a short time and crash.

Solution:

  • 1) In advance

    • Uniform expiration : Set different expiration times to make the cache invalidation time as even as possible to avoid cache avalanches caused by the same expiration time, resulting in a large number of database accesses.
    • Hierarchical cache : On the basis of the failure of the first level cache, access to the second level cache, the invalidation time of each level of cache is different.
    • The hot data cache never expires.
    • Ensure the high availability of Redis cache and prevent Redis downtime from causing cache avalanches. You can use master-slave + sentinel, Redis cluster to avoid the situation of Redis crash.
  • 2) In the event

    • Mutual exclusion lock : After the cache expires, the number of threads that read data and write the cache is controlled through a mutex or queue. For example, a key allows only one thread to query data and write to the cache, and other threads wait. This method will block other threads, and the throughput of the system will decrease at this time.
    • Use the circuit breaker mechanism to reduce the current limit : When the traffic reaches a certain threshold, it will directly return a prompt such as "system congestion" to prevent excessive requests from hitting the database and destroy the database. At least some users can be used normally. Other users can also get results after refreshing several times.
  • 3) Afterwards

    • Turn on the Redis persistence mechanism to restore the cached data as soon as possible. Once restarted, the data in the memory can be automatically loaded from the disk to restore the data.

Insert picture description here

2. Cache penetration

Cache penetration means that the data requested by the user does not exist in the cache, that is, there is no hit, and it does not exist in the database at the same time, causing the user to query the data in the database every time the user requests the data. If a malicious attacker keeps requesting data that does not exist in the system, it will cause a large number of requests to fall on the database in a short time, causing excessive pressure on the database, and even causing the database to become unbearable and crash.

Insert picture description here

Solution:

  • 1) Store the invalid key in Redis :
    When Redis cannot find the data, and the database cannot find the data, we will save the key in Redis, set value="null", and set its expiration The time is extremely short . When a request to query this key appears later, null is directly returned, and there is no need to query the database. But this way of processing is problematic. If the non-existent key value passed in is random every time, it does not make sense to store it in Redis .

  • 2) Use bloom filter :
    If the bloom filter determines that a key does not exist in the bloom filter, then it must not exist, if it determines that a key exists, it is likely to exist (there is a certain misjudgment rate). So we can add a bloom filter before caching, store all the keys in the database in the bloom filter, go to the bloom filter to query whether the key exists before querying Redis, and return directly if it does not exist , It is not allowed to access the database, thus avoiding the query pressure on the underlying storage system .


6.11 How to Solve the Key Problem of Concurrent Competition in Redis

The so-called Redis concurrent competition key problem is that multiple systems operate on a key at the same time, but the final execution order is different from the order we expect, which leads to different results !

Recommend a solution: distributed locks (both zookeeper and redis can implement distributed locks). (If there is no Redis concurrency competition Key problem, do not use distributed locks, which will affect performance)

Distributed locks that can be implemented based on zookeeper temporary ordered nodes. The general idea is: when each client locks a method, a unique instantaneous ordered node is generated in the directory of the specified node corresponding to the method on zookeeper. The way to determine whether to acquire the lock is very simple, only the one with the smallest sequence number among the ordered nodes is judged. When the lock is released, only the instantaneous node needs to be deleted. At the same time, it can avoid deadlock problems caused by locks that cannot be released due to service downtime. After completing the business process, delete the corresponding child node to release the lock.

In practice, of course, it is based on reliability. So Zookeeper is the first choice.


6.12 How to ensure data consistency between the cache and the database when it is double-written?

Nuggets: https://juejin.cn/post/6850418121754050567
Knowing: https://zhuanlan.zhihu.com/p/59167071

Guess you like

Origin blog.csdn.net/cys975900334/article/details/115277402