[Microservices] Microservices Common Interview Questions

Microservices Common Interview Questions

1. Microservices

1.1. What are the common components of Spring Cloud?

Description of the problem : This topic mainly examines the basic understanding of SpringCloud components

Reference phrase :

Spring Cloud contains many components, and many functions are repeated. The most commonly used components include:

• Registry components: Eureka, Nacos, etc.

• Load balancing component: Ribbon

•Remote call component: OpenFeign

• Gateway components: Zuul, Gateway

• Service protection components: Hystrix, Sentinel

• Service configuration management components: SpringCloudConfig, Nacos

• Distributed transactions: Seata

1.2. What is the service registry structure of Nacos?

Description of the problem : Investigate the understanding of the hierarchical structure of Nacos data and the mastery of the source code of Nacos

Reference phrase :

Nacos adopts a hierarchical data storage model, and the outermost layer is Namespace, which is used to isolate the environment. Then there is Group, which is used to group services. Next is the service (Service), a service contains multiple instances, but may be in different computer rooms, so there are multiple clusters (Cluster) under the Service, and different instances (Instance) under the Cluster.

Corresponding to the Java code, Nacos uses a multi-layer Map to represent. The structure is Map<String, Map<String, Service>>, where the key of the outermost Map is namespaceId and the value is a Map. The key of the inner Map is the concatenated serviceName of the group, and the value is the Service object. Inside the Service object is a Map, the key is the cluster name, and the value is the Cluster object. The Cluster object internally maintains a collection of Instances.

As shown in the picture:

insert image description here

1.3. How does Nacos support the pressure of hundreds of thousands of service registrations within Ali?

Description of the problem : Investigate the mastery of the Nacos source code

Reference phrase :

When Nacos receives a registration request internally, it does not write data immediately, but puts the service registration task into a blocking queue and immediately responds to the client. Then use the thread pool to read the tasks in the blocking queue and complete the instance update asynchronously, thereby improving the concurrent writing capability.

1.4. How does Nacos avoid concurrent read and write conflicts?

Description of the problem : Investigate the mastery of the Nacos source code

Reference phrase :

When Nacos updates the instance list, it will use the CopyOnWrite technology. First, copy the old instance list, then update the copied instance list, and then overwrite the old instance list with the updated instance list.

In this way, during the update process, the request to read the instance list will not be affected, and the dirty read problem will not occur.

1.5. What are the differences between Nacos and Eureka?

Description of the problem : Investigate the mastery of the underlying implementation of Nacos and Eureka

Reference phrase :

Nacos has similarities and differences with Eureka, which can be described from the following points:

  • Interface method : Both Nacos and Eureka expose the Rest-style API interface to the outside world, which is used to realize functions such as service registration and discovery
  • Instance type : Nacos instances are divided into permanent and temporary instances; Eureka only supports temporary instances
  • Health detection : Nacos uses heartbeat mode detection for temporary instances, and active request for permanent instances; Eureka only supports heartbeat mode
  • Service discovery : Nacos supports two modes of timing pull and subscription push; Eureka only supports timing pull mode

1.6. What is the difference between Sentinel's current limiting and Gateway's current limiting?

Description of the problem : Investigate the mastery of the current limiting algorithm

Reference phrase :

There are three common implementations of current limiting algorithms: sliding time window, token bucket algorithm, and leaky bucket algorithm. Gateway uses the token bucket algorithm based on Redis.

However, Sentinel is more complicated inside:

  • The default current limiting mode is based on the sliding time window algorithm
  • The current limiting mode of queuing is based on the leaky bucket algorithm
  • The current limit of hotspot parameters is based on the token bucket algorithm

1.7. What is the difference between Sentinel's thread isolation and Hystix's thread isolation?

Description of the problem : Investigate the mastery of the thread isolation scheme

Reference phrase :

By default, Hystix implements thread isolation based on the thread pool. Each isolated business must create an independent thread pool. Too many threads will bring additional CPU overhead. The performance is average, but the isolation is stronger.

Sentinel is a thread isolation based on a semaphore (counter). It does not need to create a thread pool. It has better performance, but the isolation is average.

2. MQ articles

2.1. Why did you choose RabbitMQ instead of other MQs?

As shown in the picture:

insert image description here

Words:

Kafka is famous for its high throughput, but its data stability is average, and the order of messages cannot be guaranteed. Our log collection is also used, and RabbitMQ is used in the business module.

Alibaba's RocketMQ is based on the principle of Kafka, which makes up for the shortcomings of Kafka and inherits its advantages of high throughput. Currently, its clients are mainly Java. But we are worried about the stability of Alibaba's open source products, so we don't use them.

RabbitMQ is developed based on the concurrency-oriented language Erlang. The throughput is not as good as Kafka, but it is enough for us. Moreover, the message reliability is good, and the message delay is extremely low, and the cluster construction is more convenient. It supports multiple protocols and has clients in various languages, which is more flexible. Spring's support for RabbitMQ is also relatively good, and it is more convenient to use and more in line with our needs.

Considering the requirements of concurrency and stability, RabbitMQ was chosen.

2.2. How does RabbitMQ ensure that messages are not lost?

Words:

RabbitMQ provides targeted solutions for various places where problems may occur during message delivery:

  • When the producer sends a message, the message may not reach the exchange due to network problems:
    • RabbitMQ provides a publisher confirm mechanism
      • After the producer sends the message, you can write the ConfirmCallback function
      • After the message reaches the switch successfully, RabbitMQ will call ConfirmCallback to notify the sender of the message and return ACK
      • If the message does not reach the switch, RabbitMQ will also call ConfirmCallback to notify the sender of the message and return NACK
      • An exception will also be thrown if the message is not sent successfully after timeout
  • After the message reaches the exchange, if it fails to reach the queue, the message will also be lost:
    • RabbitMQ provides a publisher return mechanism
      • Producers can define the ReturnCallback function
      • When the message arrives at the switch but not in the queue, RabbitMQ will call ReturnCallback to notify the sender of the failure reason
  • After the message arrives in the queue, MQ downtime may also cause loss of messages:
    • RabbitMQ provides persistence function, cluster master-slave backup function
      • Message persistence, RabbitMQ will persist switches, queues, and messages to disk, and restart after downtime can restore messages
      • Both mirror clusters and arbitration queues can provide master-slave backup functions. When the master node goes down, the slave node will automatically switch to master, and the data is still in the
  • After the message is delivered to the consumer, if the consumer handles it improperly, the message may also be lost
    • SpringAMQP provides consumer confirmation mechanism, consumer retry mechanism and consumer failure processing strategy based on RabbitMQ:
      • Confirmation mechanism for consumers:
        • The consumer processes the message successfully, and when no exception occurs, Spring returns ACK to RabbitMQ, and the message is removed
        • The consumer fails to process the message, throws an exception, crashes, Spring returns NACK or does not return the result, and the message is not abnormal
      • Consumer retry mechanism:
        • By default, when a consumer fails to process, the message will return to the MQ queue again and then delivered to other consumers. The consumer retry mechanism provided by Spring does not return NACK after the processing fails, but directly retries locally on the consumer. After multiple retries fail, the message is processed according to the consumer failure handling strategy. It avoids the extra pressure caused by frequent messages entering the queue.
      • Consumer failure strategy:
        • When a consumer fails local retries multiple times, the message is discarded by default.
        • Spring provides the Republish strategy. After multiple retries fail and the number of retries is exhausted, the message is redelivered to the specified exception switch, and the exception stack information will be carried to help locate the problem.

2.3. How does RabbitMQ avoid message accumulation?

Words:

The reason for the problem of message accumulation is often because the speed of message sending exceeds the speed of consumer message processing. So the solution is nothing more than the following three points:

  • Improve consumer processing speed
  • add more consumers
  • Increase the upper limit of queue message storage

1) Improve consumer processing speed

The processing speed of consumers is determined by the business code, so what we can do includes:

  • Optimize business code as much as possible to improve business performance
  • After receiving the message, open the thread pool and process multiple messages concurrently

Advantages: low cost, just change the code

Disadvantages: Enabling the thread pool will bring additional performance overhead, which is not suitable for high-frequency, low-latency tasks. It is recommended for services with a long task execution period.

2) Add more consumers

A queue binds multiple consumers to compete for tasks together, which can naturally increase the speed of message processing.

Advantages: Problems that can be solved with money are not problems. Realize simple and rude

Cons: The problem is that there is no money. the cost is too high

3) Increase the upper limit of queue message storage

After version 1.8 of RabbitMQ, a new queue mode was added: Lazy Queue

This kind of queue does not save messages in memory, but directly writes them to disk after receiving messages, theoretically there is no storage limit. It can solve the problem of message accumulation.

Advantages: more secure disk storage; unlimited storage; avoid Page Out problems caused by memory storage, and more stable performance;

Disadvantages: Disk storage is limited by IO performance, and message timeliness is not as good as memory mode, but the impact is not significant.

2.4. How does RabbitMQ guarantee the order of messages?

Words:

In fact, RabbitMQ is a queue storage, which naturally has the characteristics of first-in first-out. As long as the sending of messages is orderly, theoretically the reception is also orderly. However, when multiple consumers are bound to a queue, messages may be polled and delivered to consumers, and the processing order of consumers cannot be guaranteed.

Therefore, to ensure the order of messages, the following points need to be done:

  • Guarantee the order of message sending
  • Ensure that a set of ordered messages are sent to the same queue
  • Ensure that a queue contains only one consumer

2.5. How to prevent repeated consumption of MQ messages?

Words:

The reasons for the repeated consumption of messages are various and unavoidable. Therefore, we can only start from the consumer side. As long as the idempotence of message processing can be guaranteed, the message will not be repeatedly consumed.

There are many solutions to guarantee idempotence:

  • Add a unique id to each message, record the message table and message status locally, and make judgments based on the unique id of the database table when processing messages
  • The same is to record the message table, and use the message status field to realize the judgment based on optimistic lock to ensure idempotence
  • Based on the idempotence of the business itself. For example, according to the deletion of id, the query business is inherently idempotent; the business of adding and modifying can be considered based on the uniqueness of the database id, or the optimistic locking mechanism to ensure idempotence. The essence is similar to the message table scheme.

2.6. How to ensure the high availability of RabbitMQ?

Words:

To achieve high availability of RabbitMQ is nothing more than the following two points:

  • Do a good job in the persistence of switches, queues, and messages
  • Build a mirrored cluster of RabbitMQ, and do a good job of master-slave backup. Of course, you can also use a quorum queue instead of a mirrored cluster.

2.7. What problems can be solved by using MQ?

Words:

RabbitMQ can solve many problems, such as:

  • Decoupling: Modifying several business-related microservice calls to MQ-based asynchronous notifications can decouple the business coupling between microservices. It also improves business performance.
  • Traffic peak clipping: Put sudden business requests into MQ as a buffer. The back-end business obtains messages from MQ according to its own processing capability, and processes tasks one by one. The flow curve becomes much smoother
  • Delay queue: Based on RabbitMQ's dead letter queue or DelayExchange plug-in, it can achieve the effect of delaying the reception of messages after they are sent.

3. Redis articles

3.1. What is the difference between Redis and Memcache?

  • redis支持更丰富的数据类型(Supports more complex application scenarios): Redis not only supports simple k/v type data, but also provides storage of data structures such as list, set, zset, and hash. memcache supports a simple data type, String.
  • Redis支持数据的持久化, the data in the memory can be kept in the disk, and can be loaded and used again when restarting, while Memecache stores all the data in the memory.
  • 集群模式: memcached does not have a native cluster mode, and needs to rely on the client to write data to the cluster; but redis currently supports the cluster mode natively.
  • Redis使用单线程: Memcached is a multi-threaded, non-blocking IO multiplexing network model; Redis uses a single-threaded multiplexing IO multiplexing model.

insert image description here

3.2. Redis single thread problem

Question : Redis uses single thread, how to ensure high concurrency?

Words :

The main reasons why Redis is fast are:

  1. completely memory based
  2. The data structure is simple, and the data operation is also simple
  3. Use multiple I/O multiplexing model to make full use of CPU resources

Question : What are the benefits of doing this?

Words :

The advantages of single threading are as follows:

  • The code is clearer and the processing logic is simpler
  • There is no need to consider various lock issues, there is no lock release operation, and there is no performance consumption caused by locks
  • There is no CPU switching caused by multi-process or multi-thread, making full use of CPU resources

3.2. What are the persistence schemes of Redis?

Relevant information:

1) RDB persistence

RDB persistence can use save or bgsave. In order not to block the main process business, bgsave is generally used. The process:

  • The Redis process will fork a child process (consistent with the memory data of the parent process).
  • The parent process continues to process client request commands
  • Write all data in memory to a temporary RDB file by the child process.
  • After the write operation is complete, the old RDB file will be replaced by the new RDB file.

The following are some configurations related to RDB persistence:

  • save 60 10000: If 10,000 keys change within 60 seconds, perform RDB persistence.
  • stop-writes-on-bgsave-error yes: If Redis fails to perform RDB persistence (commonly due to insufficient memory in the operating system), Redis will no longer accept requests from clients to write data.
  • rdbcompression yes: When generating RDB files, compress them at the same time.
  • dbfilename dump.rdb: Name the RDB file dump.rdb.
  • dir /var/lib/redis: Save the RDB file in /var/lib/redisthe directory.

Of course, in practice, we usually stop-writes-on-bgsave-errorset the setting to false, and at the same time let the monitoring system send an alarm when Redis fails to perform RDB persistence, so that manual intervention can be solved instead of rudely rejecting the client's write request.

Advantages of RDB persistence:

  • RDB persistent files are small, and Redis data recovery is fast
  • The child process does not affect the parent process, and the parent process can continue to process client commands
  • The copy-on-write method is adopted when the child process is forked. In most cases, there is not much memory consumption and the efficiency is relatively good.

Disadvantages of RDB persistence:

  • The copy-on-write method is adopted when the child process is forked. If Redis writes more at this time, it may cause additional memory usage, or even memory overflow.
  • RDB file compression will reduce the file size, but it will consume additional CPU when passing
  • If the business scenario values ​​data durability (durability), then RDB persistence should not be used. For example, if Redis executes RDB persistence every 5 minutes, if Redis crashes unexpectedly, it will lose up to 5 minutes of data.

2) AOF persistence

You can use appendonly yesconfiguration items to enable AOF persistence. When Redis performs AOF persistence, it will append the received write command to the end of the AOF file, so Redis can restore the database to its original state as long as it plays back the commands in the AOF file.
  Compared with RDB persistence, an obvious advantage of AOF persistence is that it can improve data durability. Because in AOF mode, every time Redis receives a write command from the client, it will write the command write()to the end of the AOF file.
  However, in Linux, write()after data is transferred to a file, the data will not be flushed to the disk immediately, but will be temporarily stored in the file system buffer of the OS. At the right time, the OS will flush the data in the buffer to the disk (if you need to flush the file content to the disk, you can call fsync()or fdatasync()).
  Through appendfsyncconfiguration items, you can control how often Redis synchronizes commands to disk:

  • always: Every time Redis writes the command write()to the AOF file, it will be called fsync()to flush the command to disk. This guarantees the best data durability, but can impose a significant overhead on the system.
  • no: Redis only sends commands write()to AOF files. This lets the OS decide when to flush commands to disk.
  • everysec: In addition to writing the command write()to the AOF file, Redis will execute it every second fsync(). In practice, it is recommended to use this setting, which can guarantee data persistence to a certain extent without significantly reducing Redis performance.

However, AOF persistence is not without disadvantages: Redis will continue to append the received write commands to the AOF file, causing the AOF file to become larger and larger. Large AOF files consume disk space and cause Redis to restart more slowly. In order to solve this problem, under appropriate circumstances, Redis will rewrite the AOF file to remove redundant commands in the file to reduce the size of the AOF file. During the rewriting of the AOF file, Redis will start a sub-process, and the sub-process is responsible for rewriting the AOF file.
  You can control the frequency of Redis rewriting AOF files through the following two configuration items:

  • auto-aof-rewrite-min-size 64mb
  • auto-aof-rewrite-percentage 100

The effect of the above two configurations: When the size of the AOF file is greater than 64MB, and the size of the AOF file is at least twice the size after the last rewrite, then Redis will perform AOF rewrite.

advantage:

  • High persistence frequency and high data reliability
  • No additional memory or CPU consumption

shortcoming:

  • large file size
  • Large files lead to low efficiency in service data recovery

Words:

Redis provides two data persistence methods, one is RDB and the other is AOF. By default, Redis uses RDB persistence.

RDB persistent files are small in size, but the frequency of saving data is generally low, the reliability is poor, and data is easy to lose. In addition, RDB will use the Fork function to copy the main process when writing data, which may have additional memory consumption, and file compression will also have additional CPU consumption.

ROF persistence can be persisted once per second, with high reliability. However, the persistent file is large in size, resulting in a long time to read the file during data recovery, and the efficiency is slightly low

3.3. What are the clustering methods of Redis?

Words:

Redis clusters can be divided into master-slave clusters and fragmented clusters .

Master-slave clusters generally have one master and multiple slaves. The master library is used to write data, and the slave library is used to read data. Combined with Sentry, the master can be re-elected when the main database is down, the purpose is to ensure the high availability of Redis .

Sharded clusters are data shards. We will let multiple Redis nodes form a cluster and allocate 16383 slots to different nodes. When storing data, use the hash operation on the key to get the slot value and store it in the corresponding node. Because the storage data is oriented to the slot rather than the node itself, the cluster can be dynamically scaled. The purpose is to allow Redis to store more data.

1) Master-slave cluster

The master-slave cluster is also a read-write separation cluster. It is generally one master and many slaves.

Redis's replication (replication) function allows users to create any number of replicas of the server based on a Redis server, where the replicated server is the master server (master), and the server replica created by replication is the slave server ( slave).

As long as the network connection between the master and slave servers is normal, the master and slave servers will have the same data, and the master server will always synchronize the data updates that happen to itself to the slave server, thus ensuring that the data of the master and slave servers are the same.

  • Writing data can only be done through the master node
  • Reading data can be done from any node
  • If configured 哨兵节点, when the master goes down, the sentinel will elect a new master from the slave node.

There are two types of master-slave clusters:

insert image description here

Cluster with sentinels:

insert image description here

2) Fragmentation cluster

In the master-slave cluster, each node must save all information, which is easy to form a barrel effect. And when the amount of data is large, a single machine cannot meet the demand. At this point we are going to use a sharded cluster.

insert image description here

Cluster characteristics:

  • Each node holds different data

  • All redis nodes are interconnected with each other (PING-PONG mechanism), internally using binary protocol to optimize transmission speed and bandwidth.

  • The fail of a node takes effect only when more than half of the nodes in the cluster detect the failure.

  • The client is directly connected to the redis node, and no intermediate proxy layer is required to connect to any available node in the cluster to access the data

  • redis-cluster maps all physical nodes to [0-16383] slots (slots) to achieve dynamic scaling

In order to ensure the high availability of each node in Redis, we can also create a replication (slave node) for each node, as shown in the figure:

insert image description here

When a failure occurs, the master and slave can switch in time:

insert image description here

3.4. What are the common data types of Redis?

Support multiple types of data structures, the main difference is the data format of value storage is different:

  • string: The most basic data type, a binary safe string, up to 512M.

  • list: A list of strings that maintain order in the order they were added.

  • set: An unordered collection of strings with no duplicate elements.

  • sorted set: A sorted collection of strings.

  • hash: key-value pair format

3.5. Talk about the Redis transaction mechanism

Relevant information:

Reference: http://redisdoc.com/topic/transaction.html

The Redis transaction function is realized through the four primitives of MULTI, EXEC, DISCARD and WATCH. Redis serializes all commands in a transaction and executes them sequentially. However, Redis transactions do not support rollback operations. After a command runs incorrectly, the correct command will continue to execute.

  • MULTI: Used to start a transaction, it always returns OK. After MULTI is executed, the client can continue to send any number of commands to the server. These commands will not be executed immediately, but will be placed in a command queue to be executed.
  • EXEC: Execute all commands in the command queue sequentially. Return the return value of all commands. During transaction execution, Redis will not execute commands of other transactions.
  • DISCARD: clear the command queue, and give up the execution of the transaction, and the client will exit from the transaction state
  • WATCH: Redis's optimistic locking mechanism, using the compare-and-set (CAS) principle, can monitor one or more keys. Once one of the keys is modified, subsequent transactions will not be executed

When using transactions, you may encounter the following two types of errors:

  • Enqueued commands may be corrupted before EXEC is executed. For example, a command may produce a syntax error (wrong number of arguments, wrong argument name, etc.), or other more serious errors such as insufficient memory (if the server uses a maximum memory limit maxmemoryset
    • Starting from Redis 2.6.5, the server will record the failure to enqueue the command, and when the client calls the EXEC command, it will refuse to execute and automatically give up the transaction.
  • Command may fail after EXEC call. For example, a command in a transaction might handle the wrong type of key, such as using a list command on a string key, and so on.
    • Even if some/some commands in the transaction generate an error during execution, other commands in the transaction will still continue to execute and will not be rolled back.

Why does Redis not support rollback (roll back)?

Here are the advantages of this approach:

  • Redis commands can only fail because of incorrect syntax (and these problems cannot be detected when enqueuing), or because the command is used on the wrong type of key: that is, from a practical point of view, the command that fails It is caused by programming errors , and these errors should be found in the development process, and should not appear in the production environment.
  • Since there is no need to support rollbacks, the internals of Redis can be kept simple and fast.

Since there is no mechanism to avoid errors caused by programmers themselves, and such errors usually do not appear in a production environment, Redis chooses a simpler and faster way to handle transactions without rollback.

Words:

Redis transactions actually put a series of Redis commands into the queue, and then execute them in batches without interruption by other transactions during execution. However, unlike relational database transactions, Redis transactions do not support rollback operations. If a command fails to execute in a transaction, other commands will still be executed.

In order to make up for the problem of not being able to roll back, Redis will check the command when the transaction is enqueued, and if the command is abnormal, the entire transaction will be abandoned.

Therefore, as long as the programmer's programming is correct, in theory Redis will execute all transactions correctly without rolling back.

Interviewer: What if Redis crashes halfway through the execution of the transaction?

Redis has a persistence mechanism. Because of reliability issues, we generally use AOF persistence. All commands of the transaction will also be written to the AOF file, but if Redis is down before the EXEC command is executed, the transaction in the AOF file will be incomplete. Use redis-check-aofthe program to remove incomplete transaction information in the AOF file to ensure that the server can start smoothly.

3.6. Redis key expiration strategy

References:

Why do you need memory reclamation?

  • 1. In Redis, the set command can specify the expiration time of the key. When the expiration time is reached, the key will become invalid;
  • 2. Redis is based on memory operations. All data is stored in memory. The memory of a machine is limited and very precious.

Based on the above two points, in order to ensure that Redis can continue to provide reliable services, Redis needs a mechanism to clean up infrequently used, invalid, and redundant data. The invalid data needs to be cleaned up in time, which requires memory recycling.

Redis memory recovery is mainly divided into two parts: expired deletion strategy and memory elimination strategy.

expired deletion policy

Delete keys that have reached their expiration time.

  • 1) Timed deletion

A timer will be created for each key with an expiration time set, and it will be deleted immediately once the expiration time is reached. This strategy can immediately clear expired data, which is more memory-friendly, but the disadvantage is that it takes up a lot of CPU resources to process expired data, which will affect the throughput and response time of Redis.

  • 2) Lazy deletion

When a key is accessed, it is judged whether the key expires, and it is deleted when it expires. This strategy can save CPU resources to the greatest extent, but it is very unfriendly to memory. In an extreme case, a large number of expired keys may not be accessed again, so they will not be cleared, resulting in a large amount of memory.

In computer science, lazy deletion (English: lazy deletion) refers to a method of deleting elements from a hash table (also known as a hash table). In this method, deletion just refers to marking an element to be deleted, rather than clearing it entirely. Deleted positions are treated as empty elements when inserted and occupied when searched.

  • 3) Periodically delete

Every once in a while, scan the expired key dictionary in Redis, and clear some expired keys. This strategy is a compromise between the former two. It can also achieve the optimal balance of CPU and memory resources under different circumstances by adjusting the time interval of scheduled scans and the limited time consumption of each scan.

In Redis, 同时使用了定期删除和惰性删除. However, Redis regularly deletes some keys by random selection, so it cannot guarantee 100% deletion of expired keys.

Redis combines regular deletion and lazy deletion, and can basically handle the cleaning of expired data very well, but in fact there are still some problems. If there are many expired keys, some of them are missed by regular deletion, and they are not checked in time, that is If there is no lazy deletion, a large number of expired keys will accumulate in the memory, resulting in the exhaustion of redis memory. When the memory is exhausted, what will happen when new keys arrive? Is it a direct abandonment or other measures? Is there any way to accept more keys?

Memory Retirement Policy

Redis's memory elimination strategy means that when the memory reaches the maxmemory limit, an algorithm is used to determine which data to clean up to ensure the storage of new data.

Redis's memory elimination mechanism includes:

  • noeviction: When the memory is not enough to accommodate the newly written data, the new write operation will report an error.
  • allkeys-lru: When the memory is not enough to accommodate the newly written data, in the key space ( server.db[i].dict), remove the least recently used key (this is the most commonly used).
  • allkeys-random: When the memory is not enough to accommodate the newly written data, in the key space ( server.db[i].dict), randomly remove a key.
  • volatile-lru: When the memory is not enough to accommodate the newly written data, in the key space ( server.db[i].expires) with the expiration time set, remove the least recently used key.
  • server.db[i].expiresvolatile-random: When the memory is not enough to accommodate the newly written data, a key is randomly removed from the key space ( ) with the expiration time set .
  • volatile-ttl: When the memory is not enough to accommodate the newly written data, in the key space ( ) with the expiration time set server.db[i].expires, the key with the earlier expiration time is removed first.

In the configuration file, which elimination mechanism to use can be configured through maxmemory-policy.

When will the elimination take place?

Redis will judge whether the current redis has reached the maximum limit of memory every time it processes a command (the processCommand function calls freeMemoryIfNeeded). If it reaches the limit, it will use the corresponding algorithm to process the key that needs to be deleted.

When eliminating keys, Redis defaults to the most commonly used LRU algorithm (Latest Recently Used). Redis saves the latest access time of the key by saving the lru attribute in each redisObject, and directly reads the lru attribute of the key when implementing the LRU algorithm.

In the specific implementation, Redis traverses each db, randomly selects a batch of sample keys from each db, the default is 3 keys, and then deletes the least recently used key from these 3 keys.

Words:

The Redis expiration strategy includes two parts: regular deletion and lazy deletion. Periodic deletion is a scheduled task inside Redis, which will periodically delete some expired keys. Lazy deletion means that when a user queries a key, it will check whether the key has expired, return it to the user if it has not expired, and delete it if it expires.

However, neither of these two strategies can guarantee that the expired key will be deleted, and more and more fish will slip through the net, which may also lead to memory overflow. When an out-of-memory problem occurs, Redis will also do memory recovery. Memory reclamation adopts the LRU strategy, which is the least recently used. The principle is to record the latest usage time of each key, and when the memory is recovered, randomly select some keys, compare their usage time, and delete the oldest ones.

The logic of Redis is: what has been used recently is likely to be used again

3.7. Where is Redis useful in the project?

(1) shared session

In a distributed system, services will be deployed in different tomcats, so multiple tomcat sessions cannot be shared, and data stored in sessions cannot be shared before. Redis can be used instead of sessions to solve the problem of data sharing between distributed systems.

(2) Data cache

Redis uses memory storage, which has high read and write efficiency. We can store hot data with high access frequency of the database in redis, so that users can read from redis first when requesting, reducing database pressure and improving concurrency.

(3) Asynchronous queue

A major advantage of Reids in the field of memory storage engines is that it provides list and set operations, which makes Redis a good message queuing platform to use. Moreover, there is also a dedicated structure such as pub/sub in Redis, which is used for 1-to-N message communication mode.

(4) Distributed lock

The optimistic locking mechanism in Redis can help us achieve the effect of distributed locks and solve multi-threaded security issues in distributed systems

3.8. Redis cache breakdown, cache avalanche, cache penetration

1) Cache penetration

References:

  • What is cache penetration

    • Under normal circumstances, when we query data, it exists. Then request to query a piece of data that does not exist in the database at all, that is, neither the cache nor the database can query this data, but the request will hit the database every time. This phenomenon of querying data that does not exist is called cache penetration .
  • problems with penetration

    • Just imagine, if a hacker attacks your system and uses a non-existent id to query data, a large number of requests will be generated to the database for query. It may cause your database to crash due to excessive pressure.
  • Solution

    • Cache empty values: The reason why penetration occurs is because there is no key for storing these empty data in the cache. As a result, every query goes to the database. Then we can set the values ​​corresponding to these keys to null and throw them into the cache. When there is a request to query this key later, null will be returned directly. In this way, you don't have to go around the database, but don't forget to set the expiration time.
    • BloomFilter (Bloom filter): Hash all possible data into a large enough bitmap, and a data that must not exist will be intercepted by this bitmap, thereby avoiding the query pressure on the underlying storage system. Add a layer of BloomFilter before the cache. When querying, first go to the BloomFilter to check whether the key exists. If it does not exist, return it directly. If it exists, check the cache -> check the DB.

Words:

There are two solutions to cache penetration: one is to set a null value to the cache for a key that does not exist. The second is to use the Bloom filter. Before querying the cache, the Bloom filter is used to determine whether the key exists, and then query the cache.

Setting a null value may be maliciously targeted, and the attacker uses a large number of non-existing unique keys, then Solution 1 will cache a large amount of non-existing key data. At this time, we can also specify a format template for the Key, and then perform regular specification matching for the non-existing key . If it does not match at all, it does not need to store the null value in redis, but directly returns an error.

2) Cache breakdown

Related information :

  • What is cache breakdown?

The key may be accessed at a certain point of time with high concurrency, which is a very "hot" data. At this time, a problem needs to be considered: the problem of cache being "broken down".

When the key expires at the moment, the redis query fails, and the continuous large concurrency will break through the cache and directly request the database, just like digging a hole in a barrier.

  • solution:
    • Use mutex (mutex key): mutex is mutual exclusion. To put it simply, when the cache fails (judging that the value taken out is empty), instead of loading db immediately, first use Redis's SETNX to set a mutex key, and then load when the operation returns successfully. db operation and reset the cache; otherwise, retry the entire get cache method. SETNX is the abbreviation of "SET if Not eXists", that is, it is only set when it does not exist, and it can be used to achieve mutual exclusion.
    • Soft expiration: that is, logical expiration, instead of using the expiration time provided by redis, the business layer stores expiration time information in the data. When querying, the business program judges whether it is expired. If the data is about to expire, the time limit of the cache will be extended. The program can send a thread to the database to obtain the latest data. Other threads will continue to use it when they see the extended expiration time. For old data, wait for the dispatched thread to obtain the latest data before updating the cache.

It is recommended to use a mutex, because soft expiration will have business logic intrusion and additional judgment.

Words :

The main concern of cache breakdown is that a key expires, which causes sudden high concurrent access to the database when the cache is updated. Therefore, we can use mutex control when updating the cache, allowing only one thread to update the cache, while other threads wait and re-read the cache. For example, the setnx command of Redis can achieve mutual exclusion.

3) Cache Avalanche

Related information :

Cache avalanche means that within a certain period of time, the cache set expires and becomes invalid. The access query for this batch of data falls on the database, and for the database, there will be periodic pressure peaks.

solution:

  • Data classification and batch processing: take different classification data and cache different cycles
  • Data of the same category: set the cache with a fixed duration and random number
  • The cache time of hot data is longer, and the cache time of unpopular data is shorter
  • Avoid avalanches caused by redis node downtime, build master-slave clusters, and ensure high availability

Words:

The key to solving the cache avalanche problem is to disperse the expiration time of cache keys. Therefore, we can classify data according to business, and then set different expiration times. For keys of the same business type, set a fixed duration plus a random number. Try to ensure that the expiration time of each key is different.

In addition, Redis downtime may also cause a cache avalanche, so we need to build a Redis master-slave cluster and sentinel monitoring to ensure the high availability of Redis.

3.9. Cache hot and cold data separation

Background information :

Redis uses memory storage, and when massive data storage is required, the cost is very high.

After research, it is found that the unit cost price gap between mainstream DDR3 memory and mainstream SATA SSD is about 20 times. In order to optimize the overall cost of redis machines, we consider implementing hierarchical data storage based on heat statistics and data dynamics between RAM / FLASH exchange, thereby greatly reducing costs and achieving a high balance between performance and cost.

Basic idea: Hot data is identified based on the heat statistics algorithm of key access times (LFU), and the hot data is kept in redis, and the data with no access/less access times is transferred to the SSD. If the key on the SSD is redistributed becomes hot, it is reloaded into redis memory.

Currently popular high-performance disk storage solutions that follow the Redis protocol include:

  • SSDB:http://ssdb.io/zh_cn/
  • RocksDB:https://rocksdb.org.cn/

Therefore, we need to introduce a proxy between the application and the cache service to switch between Redis and SSD, as shown in the figure:

insert image description here

Such a proxy solution is provided by Alibaba Cloud. Of course, there are also some open source solutions, such as: https://github.com/JingchengLi/swapdb

3.10. Redis implements distributed locks

Conditions to be met by distributed locks:

  • Multi-process mutual exclusion: at the same time, only one process can acquire the lock
  • Guarantee that the lock can be released: when the task ends or an exception occurs, the lock must be released to avoid deadlock
  • Blocking lock (optional): whether to retry when acquiring a lock fails
  • Reentrant lock (optional): When the code to acquire the lock is called recursively, the lock can still be acquired

1) The most basic distributed lock:

Using the setnx command of Redis, if this command is executed multiple times, only the first execution will succeed, and the 互斥effect can be achieved. However, in order to ensure that the lock can be released when the service is down, you need to use the expire command to set a validity period for the lock

setnx lock thread-01 # 尝试获取锁
expire lock 10 # 设置有效期

Question 1 : What if the service is down before expire?

To ensure the atomicity of the setnx and expire commands. The set command of redis can satisfy:

set key value [NX] [EX time] 

Need to add options for nx and ex:

  • NX: Consistent with setnx, successful execution for the first time
  • EX: Set expiration time

Question 2 : When releasing the lock, if your own lock has expired, there will be a security hole at this time, how to solve it?

Store the current process and thread ID in the lock, judge the lock ID when releasing the lock, delete it if it is your own, and give up the operation if it is not.

However, to ensure the atomicity of these two steps, it needs to be implemented through Lua scripts.

if redis.call("get",KEYS[1]) == ARGV[1] then
    redis.call("del",KEYS[1])
end

2) Reentrant distributed lock

If there is a need for reentry, in addition to recording the process ID in the lock, the number of retries must also be recorded. The process is as follows:

insert image description here

Below we assume that the key of the lock is " lock", the hashKey is the id of the current thread: " threadId", and the automatic release time of the lock is assumed to be 20

Steps to acquire a lock:

  • 1. Determine whether the lock existsEXISTS lock
    • It exists, indicating that someone has acquired the lock. Next, judge whether it is your own lock
      • Determine whether the current thread id exists as a hashKey:HEXISTS lock threadId
        • If it does not exist, it means that the lock already exists and is not obtained by itself. The lock acquisition fails, end
        • It exists, indicating that it is a lock acquired by itself, and the number of reentries +1: HINCRBY lock threadId 1, go to step 3
    • 2. Does not exist, indicating that the lock can be acquired.HSET key threadId 1
    • 3. Set the lock automatic release time,EXPIRE lock 20

Steps to release the lock:

  • 1. Determine whether the current thread id exists as a hashKey:HEXISTS lock threadId
    • If it does not exist, it means that the lock has expired, so don’t worry about it
    • It exists, indicating that the lock is still there, and the number of reentries is reduced by 1: HINCRBY lock threadId -1, to obtain a new number of reentries
  • 2. Determine whether the number of reentries is 0:
    • If it is 0, it means that all locks are released and the key is deleted:DEL lock
    • If it is greater than 0, it means that the lock is still in use, and the valid time is reset:EXPIRE lock 20

The corresponding Lua script is as follows:

The first is to acquire the lock:

local key = KEYS[1]; -- 锁的key
local threadId = ARGV[1]; -- 线程唯一标识
local releaseTime = ARGV[2]; -- 锁的自动释放时间

if(redis.call('exists', key) == 0) then -- 判断是否存在
	redis.call('hset', key, threadId, '1'); -- 不存在, 获取锁
	redis.call('expire', key, releaseTime); -- 设置有效期
	return 1; -- 返回结果
end;

if(redis.call('hexists', key, threadId) == 1) then -- 锁已经存在,判断threadId是否是自己	
	redis.call('hincrby', key, threadId, '1'); -- 不存在, 获取锁,重入次数+1
	redis.call('expire', key, releaseTime); -- 设置有效期
	return 1; -- 返回结果
end;
return 0; -- 代码走到这里,说明获取锁的不是自己,获取锁失败

Then release the lock:

local key = KEYS[1]; -- 锁的key
local threadId = ARGV[1]; -- 线程唯一标识
local releaseTime = ARGV[2]; -- 锁的自动释放时间

if (redis.call('HEXISTS', key, threadId) == 0) then -- 判断当前锁是否还是被自己持有
    return nil; -- 如果已经不是自己,则直接返回
end;
local count = redis.call('HINCRBY', key, threadId, -1); -- 是自己的锁,则重入次数-1

if (count > 0) then -- 判断是否重入次数是否已经为0
    redis.call('EXPIRE', key, releaseTime); -- 大于0说明不能释放锁,重置有效期然后返回
    return nil;
else
    redis.call('DEL', key); -- 等于0说明可以释放锁,直接删除
    return nil;
end;

3) Highly available locks

问题: Redis distributed lock depends on redis, if redis crashes, the lock will fail. How to solve?

At this time, most students will answer: build a master-slave cluster and do data backup.

This is how you get into the trap, because the next question from the interviewer comes:

问题: If when building a master-slave cluster for data backup, process A acquires the lock, but the master has not backed up the data to the slave, the master is down, and the slave is upgraded to the master. At this time, the original lock becomes invalid, and other processes can also acquire the lock, resulting in a security problem . How to solve?

Regarding this problem, the Redis official website gives a solution, which can be solved by using the RedLock idea:

In the distributed environment of Redis, we assume that there are N Redis masters. These nodes are completely independent of each other, and there is no master-slave replication or other cluster coordination mechanisms. We have previously described how to safely acquire and release locks under a single instance of Redis. We ensure that locks will be acquired and released using this method on every (N) instances. In this example, we assume that there are 5 Redis master nodes, which is a reasonable setting, so we need to run these instances on 5 machines or 5 virtual machines, so as to ensure that they will not all go down at the same time .

In order to acquire the lock, the client should do the following:

  1. Get the current Unix time in milliseconds.
  2. Tries to acquire locks from N instances in sequence, using the same key and random value. In step 2, when setting the lock to Redis, the client should set a network connection and response timeout, which should be less than the lock expiration time. For example, if your lock automatically expires in 10 seconds, the timeout should be between 5-50 milliseconds. This can avoid the situation that the server-side Redis has hung up, and the client is still waiting for the response result. If the server does not respond within the specified time, the client should try another Redis instance as soon as possible.
  3. The client uses the current time to subtract the start time of acquiring the lock (the time recorded in step 1) to obtain the time used to acquire the lock. If and only if the lock is obtained from most (here 3 nodes) Redis nodes, and the time used is less than the lock expiration time, the lock is considered successful.
  4. If the lock is obtained, the real effective time of the key is equal to the effective time minus the time used to acquire the lock (the result calculated in step 3).
  5. If for some reason, the lock acquisition fails (the lock is not acquired in at least N/2+1 Redis instances or the lock acquisition time has exceeded the effective time), the client should unlock all Redis instances (even if some The Redis instance is not locked successfully at all).

3.11. How to make the database consistent with the cached data?

Words:

The implementation options are as follows:

  • Local cache synchronization: The database data of the current microservice is synchronized with the cache data, and the modification logic for Redis can be added directly when the database is modified to ensure consistency.
  • Cross-service cache synchronization: Service A calls service B and caches the query results. Service B database modification, service A can be notified through MQ, and service A modifies the Redis cache data
  • General solution: use the Canal framework, pretend to be MySQL's salve node, monitor MySQL's binLog changes, and then modify Redis cache data

. We have previously described how to safely acquire and release locks under a single instance of Redis. We ensure that locks will be acquired and released using this method on every (N) instances. In this example, we assume that there are 5 Redis master nodes, which is a reasonable setting, so we need to run these instances on 5 machines or 5 virtual machines, so as to ensure that they will not all go down at the same time .

In order to acquire the lock, the client should do the following:

  1. Get the current Unix time in milliseconds.
  2. Tries to acquire locks from N instances in sequence, using the same key and random value. In step 2, when setting the lock to Redis, the client should set a network connection and response timeout, which should be less than the lock expiration time. For example, if your lock automatically expires in 10 seconds, the timeout should be between 5-50 milliseconds. This can avoid the situation that the server-side Redis has hung up, and the client is still waiting for the response result. If the server does not respond within the specified time, the client should try another Redis instance as soon as possible.
  3. The client uses the current time to subtract the start time of acquiring the lock (the time recorded in step 1) to obtain the time used to acquire the lock. If and only if the lock is obtained from most (here 3 nodes) Redis nodes, and the time used is less than the lock expiration time, the lock is considered successful.
  4. If the lock is obtained, the real effective time of the key is equal to the effective time minus the time used to acquire the lock (the result calculated in step 3).
  5. If for some reason, the lock acquisition fails (the lock is not acquired in at least N/2+1 Redis instances or the lock acquisition time has exceeded the effective time), the client should unlock all Redis instances (even if some The Redis instance is not locked successfully at all).

If there are any deficiencies, please give more advice,
to be continued, continue to update!
Let's make progress together!

Guess you like

Origin blog.csdn.net/qq_40440961/article/details/128894251