SpringCloud microservice technology stack. Dark horse follow-up (interview)

SpringCloud microservice technology stack. Dark horse follow-up interview

today's goal

insert image description here

1. Microservices

1.1. What are the common components of Spring Cloud?

Description of the problem : This topic mainly examines the basic understanding of SpringCloud components

Difficulty : easy

Reference phrase :

Spring Cloud contains many components, and many functions are repeated. The most commonly used components include:

• Registry components: Eureka, Nacos, etc.

• Load balancing component: Ribbon

•Remote call component: OpenFeign

• Gateway components: Zuul, Gateway

• Service protection components: Hystrix, Sentinel

• Service configuration management components: SpringCloudConfig, Nacos

1.2. What is the service registry structure of Nacos?

Description of the problem : Investigate the understanding of the hierarchical structure of Nacos data and the mastery of the source code of Nacos

Difficulty : Normal

Reference phrase :

Nacos adopts a hierarchical data storage model, and the outermost layer is Namespace, which is used to isolate the environment. Then there is Group, which is used to group services. Next is the service (Service), a service contains multiple instances, but may be in different computer rooms, so there are multiple clusters (Cluster) under the Service, and different instances (Instance) under the Cluster.

Corresponding to the Java code, Nacos uses a multi-layer Map to represent. The structure is Map<String, Map<String, Service>>, where the key of the outermost Map is namespaceId and the value is a Map. The key of the inner Map is the concatenated serviceName of the group, and the value is the Service object. Inside the Service object is a Map, the key is the cluster name, and the value is the Cluster object. The Cluster object internally maintains a collection of Instances.

As shown in the figure:
insert image description here
simple implementation code

package com.nacos;
import org.junit.Test;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;

public class NacosStructure {
    
    
    @Test
    public void testNacosStructure() {
    
    
        // 实例
        Instance personInfo = new Instance("personInfo");
        Instance finance = new Instance("finance");

        // 集群(一个地区的机房)
        Cluster SZ = new Cluster("SZ");
        SZ.getInstance(personInfo);
        SZ.getInstance(finance);

        // 其中服务组是环境隔离的
        NameSpace dev01 = new NameSpace("dev01");
        // 集群是部署在服务组中
        dev01.putService("01-personInfo", new Service("personInfo"));

        dev01.putNameSpace("dev01", dev01.getService("01-personInfo"));
        System.out.println(dev01);
    }

}

class NameSpace {
    
    
    private String nameSpaceId;
    private Map<String, Map<String, Service>> nameSpaceMap = new HashMap();
    private Map<String, Service> groupMap = new HashMap<>();

    public void putNameSpace(String nameSpaceId, Map<String, Service> serviceMap) {
    
    
        nameSpaceMap.put(nameSpaceId, serviceMap);
    }

    public void putService(String groupId, Service service) {
    
    
        this.groupMap.put(groupId, service);
    }

    public Map<String, Service> getService(String groupId) {
    
    
        return this.groupMap;
    }

    public NameSpace(String nameSpaceId) {
    
    
        this.nameSpaceId = nameSpaceId;
    }

    @Override
    public String toString() {
    
    
        return "NameSpace{" +
                "nameSpaceId='" + nameSpaceId + '\'' +
                ", nameSpaceMap=" + nameSpaceMap +
                ", groupMap=" + groupMap +
                '}';
    }
}

class Service {
    
    
    private String name;
    Map<String, Cluster> service = new HashMap();

    public Service(String name) {
    
    
        this.name = name;
    }

    /**
     * 往服务中添加集群
     *
     * @param c
     */
    public void putCluster(Cluster c) {
    
    
        this.service.put(c.getName(), c);
    }

    /**
     * 往服务中删除集群
     *
     * @param c
     */
    public void deleteCluster(Cluster c) {
    
    
        this.service.remove(c.getName());
    }

    @Override
    public String toString() {
    
    
        return "Service{" +
                "name='" + name + '\'' +
                ", service=" + service +
                '}';
    }
}

class Cluster {
    
    
    private String name;
    private Set<Instance> instance = new HashSet<>();

    public Cluster(String name) {
    
    
        this.name = name;
    }

    public String getName() {
    
    
        return name;
    }

    public void setName(String name) {
    
    
        this.name = name;
    }

    /**
     * 往集群中添加实例
     *
     * @param in
     */
    public void getInstance(Instance in) {
    
    
        this.instance.add(in);
    }

    /**
     * 集群中删除实例
     *
     * @param in
     */
    public void removeInstance(Instance in) {
    
    
        this.instance.remove(in);
    }

    @Override
    public String toString() {
    
    
        return "Cluster{" +
                "name='" + name + '\'' +
                ", instance=" + instance +
                '}';
    }
}

class Instance {
    
    
    private String name;

    public Instance(String name) {
    
    
        this.name = name;
    }

    @Override
    public String toString() {
    
    
        return "Instance{" +
                "name='" + name + '\'' +
                '}';
    }
}

1. Download the Nacos source code and run it

To study the Nacos source code, of course, you cannot use the packaged Nacos server jar package to run. You need to download the source code and compile it yourself.

1.1. Download Nacos source code

Nacos GitHub address: https://github.com/alibaba/nacos

The downloaded Nacos source code of version 1.4.2 has been provided in the pre-class materials:
insert image description here

If you need to study other versions of students, you can also download it yourself:

Everyone find its release page: https://github.com/alibaba/nacos/tags, find the 1.4.2. version:
insert image description here
click to enter, download Source code (zip):
insert image description here

1.2. Import Demo project

Our pre-class materials provide a microservice demo, including services such as service registration and discovery.
insert image description here
After importing the project, view its project structure:
insert image description here
structure description:

  • cloud-source-demo: project parent directory
    • cloud-demo: the parent project of microservices, managing microservice dependencies
      • order-service: Order microservice, which needs to access user-service in business, is a service consumer
      • user-service: user microservice, which exposes the interface of querying users based on id, and is a service provider

1.3. Import Nacos source code

Unzip the previously downloaded Nacos source code into the cloud-source-demo project directory:
insert image description here

Then, use IDEA to import it as a module:

1) Select the project structure option:
insert image description here
Then click Import module:
insert image description here
In the pop-up window, select the nacos source code directory:
insert image description here
Then select the maven module, finish:
insert image description here

Finally, click OK:
insert image description here
the imported project structure:
insert image description here

1.4.proto compilation

The underlying data communication of Nacos will serialize and deserialize the data based on protobuf. And define the corresponding proto file in the consistency submodule:
insert image description here

We need to compile the proto file into the corresponding Java code first.

1.4.1. What is protobuf

The full name of protobuf is Protocol Buffer, which is a data serialization protocol provided by Google. This is Google's official definition:

Protocol Buffers is a lightweight and efficient structured data storage format that can be used for serialization of structured data and is very suitable for data storage or RPC data exchange format. It can be used in language-independent, platform-independent, and extensible serialized structured data formats in communication protocols, data storage, and other fields.

It can be simply understood as a cross-language, cross-platform data transmission format. The function is similar to json, but both performance and data size are much better than json.

The reason why protobuf can be cross-language is because the format of the data definition is .protoformat, which needs to be compiled into the corresponding language based on protoc.

1.4.2. Install protoc

GitHub address of Protobuf: https://github.com/protocolbuffers/protobuf/releases

We can download the Windows version to use:
insert image description here
In addition, the pre-class materials also provide a downloaded installation package:
insert image description here
unzip it to any non-Chinese directory, and the protoc.exe in the bin directory can help us compile:
insert image description here
then put this bin directory Configure it in your environment variable path, you can refer to the JDK configuration method:
insert image description here

1.4.3. Compile proto

Enter the src/main directory under the consistency module of nacos-1.4.2:
insert image description here

Then open a cmd window and run the following two commands:

protoc --java_out=./java ./proto/consistency.proto
protoc --java_out=./java ./proto/Data.proto

As shown in the figure:
insert image description here
these java codes will be compiled in the consistency module of nacos:
insert image description here

1.5. Run

The entry of the nacos server is the Nacos class in the console module:
insert image description here

We need to make it stand-alone start:
insert image description here

Then create a new SpringBootApplication:
insert image description here

Then fill in the application information:

Main class:com.alibaba.nacos.Nacos
VM options: -Dnacos.standalone=true

insert image description here
Then run the main function of Nacos:
insert image description here

After starting the order-service and user-service services, you can view the nacos console:
insert image description here

2. Service registration

After the service is registered to Nacos, it will be saved in a local registry, and its structure is as follows:
insert image description here

First of all, the outermost layer is a Map, the structure is: Map<String, Map<String, Service>>:

  • key: It is namespace_id, which plays the role of environmental isolation. There can be multiple groups under the namespace
  • value: Another one Map<String, Service>, representing the group and the services within the group. There can be multiple services in a group
    • key: represents group grouping, but when used as a key, the format is group_name:service_name
    • value: A certain service under the group, such as userservice, user service. The type is Service, which also contains one internally Map<String,Cluster>, and there can be multiple clusters under one service
      • key: cluster name
      • value: Clustertype, containing the specific information of the cluster. A cluster may contain multiple instances, that is, specific node information, including one Set<Instance>, which is the collection of instances under the cluster
        • Instance: instance information, including instance IP, Port, health status, weight, etc.

When each service registers with Nacos, the information will be organized and stored in this Map.

2.1. Service registration interface

Nacos provides an API interface for service registration. The client only needs to send a request to this interface to realize service registration.

**Interface description:** Register an instance to Nacos service.

Request type :POST

Request path :/nacos/v1/ns/instance

Request parameters :

name type Is it required? describe
ip string yes Service instance IP
port int yes Service instance port
namespaceId string no Namespace ID
weight double no Weights
enabled boolean no Is it online
healthy boolean no whether healthy
metadata string no Extended Information
clusterName string no cluster name
serviceName string yes Service Name
groupName string no group name
ephemeral boolean no Is it a temporary instance

Error code :

error code describe Semantics
400 Bad Request Syntax error in client request
403 Forbidden permission denied
404 Not Found resource not found
500 Internal Server Error internal server error
200 OK normal

2.2. Client

First, we need to find the entry for service registration.
The path for Nacos to introduce an instance is: /nacos/v1/ns/instancethen we need to find the same path and find it in src/main/java/com/alibaba/nacos/naming/controllers/InstanceController.javathe directory:
insert image description here
and the request type is POST, then we need to find PostMapping, the method register is the entry of the registration center
insert image description here

1.3. How does Nacos support the pressure of hundreds of thousands of service registrations within Ali?

Description of the problem : Investigate the mastery of the Nacos source code

Difficulty : Difficult

Reference phrase :

When Nacos receives a registration request internally, it does not write data immediately, but puts the service registration task into a blocking queue and immediately responds to the client. Then use the thread pool to read the tasks in the blocking queue and complete the instance update asynchronously, thereby improving the concurrent writing capability.
Here is the temporary instance

1.4. How does Nacos avoid concurrent read and write conflicts?

Description of the problem : Investigate the mastery of the Nacos source code

Difficulty : Difficult

Reference phrase :

When Nacos updates the instance list, it will use the CopyOnWrite technology. First, copy the old instance list, then update the copied instance list, and then overwrite the old instance list with the updated instance list.

In this way, during the update process, the request to read the instance list will not be affected, and the dirty read problem will not occur.
It is equivalent to the concurrent process, what is modified is the new instance list (copy of the old list), and what is read is the old instance list. The two do not affect each other.

Use synchronized locks for multiple instances of the same service, serial execution, to ensure the security of writing. The
insert image description here
registration of instances uses a single thread, asynchronously calling
insert image description here
single-threaded, to ensure the security of writing
insert image description here

1.5. What are the differences between Nacos and Eureka?

Description of the problem : Investigate the mastery of the underlying implementation of Nacos and Eureka

Difficulty : Difficult

Reference phrase :

Nacos has similarities and differences with Eureka, which can be described from the following points:

  • Interface method : Both Nacos and Eureka expose the Rest-style API interface to the outside world, which is used to realize functions such as service registration and discovery
  • Instance type : Nacos instances are divided into permanent and temporary instances; Eureka only supports temporary instances
  • Health detection : Nacos uses heartbeat mode detection for temporary instances, and active request for permanent instances; Eureka only supports heartbeat mode
  • Service discovery : Nacos supports two modes of timing pull and subscription push; Eureka only supports timing pull mode

1.6. What is the difference between Sentinel's current limiting and Gateway's current limiting?

Description of the problem : Investigate the mastery of the current limiting algorithm

Difficulty : Difficult

Reference words :
current limiting : limit the requests of the application server to avoid overloading or even downtime of the server due to too many requests.

There are three common implementations of the current limiting algorithm:
1. Sliding time window
2. Token bucket algorithm
3. Leaky bucket algorithm
Gateway uses the token bucket algorithm based on Redis.

However, Sentinel is more complicated inside:

  • The default current limiting mode is based on the sliding time window algorithm
  • The current limiting mode of queuing is based on the leaky bucket algorithm
  • The current limit of hotspot parameters is based on the token bucket algorithm

Fixed window counter algorithm
The concept of fixed window counter algorithm is as follows: ● Divide time into multiple windows, the window
time span is called Interval, in this example it is 1000ms;
1. Current limiting is to set the counter threshold, which is 3 in this example.
● If the counter exceeds the current limiting threshold, all requests exceeding the threshold will be discarded.
insert image description here

Sliding window counter algorithm
The sliding window counter algorithm divides a window into n smaller intervals, for example
● The window time span Interval is 1 second; the number of intervals is n = 2, then the time span between each small area is 500ms
● Current limiting The threshold is still 3. When requests within the time window (1 second) exceed the threshold, the exceeded request flow limit
window will move according to the current time of the request (currentTime), and the window range is the first time zone after (currentTime-Interval) Start and end in the time zone where currentTime is located.

insert image description here
Token Bucket Algorithm Token
Bucket Algorithm Description:
● Tokens are generated at a fixed rate and stored in the token bucket. If the token bucket is full, excess tokens are discarded. ● After a
request comes in, it must first try to get it from the bucket Token, it can be processed only after the token is obtained
● If there is no token in the token bucket, the request waits or discards
insert image description here

Leaky Bucket Algorithm Leaky
Bucket Algorithm Explanation: .
● Treat each request as a "water drop" into the leaky bucket for storage;
● "Leaky bucket" "leaks" out requests at a fixed rate to execute, if the "leaky bucket" is empty If it is full, the "leakage" will stop;
● If the "leaky bucket" is full, the excess "water droplets" will be discarded directly.
It can be understood that the request is in the bucketqueuewait

insert image description here

When Sentinel implements the leaky bucket, it adopts the queue waiting mode:
let all requests enter a queue, and then execute them sequentially according to the time interval allowed by the threshold. Multiple concurrent requests must wait,

Expected waiting time = expected waiting time of the latest request + allowed interval.

If the expected wait time for the request exceeds the maximum time, it will be rejected.
For example: QPS=5, means that a request in the queue is processed every 200ms; timeout = 2000, means that requests that are expected to wait for more than 2000ms will be rejected and an exception will be thrown.
insert image description here
Comparison of Current Limiting Algorithms

insert image description here

1.7. What is the difference between Sentinel's thread isolation and Hystix's thread isolation?

Description of the problem : Investigate the mastery of the thread isolation scheme

Difficulty : Normal

Reference phrase :

By default, Hystix implements thread isolation based on the thread pool. Each isolated business must create an independent thread pool. Too many threads will bring additional CPU overhead. The performance is average, but the isolation is stronger.

Sentinel is a thread isolation based on a semaphore (counter). It does not need to create a thread pool. It has better performance, but the isolation is average.

2. MQ articles

2.1. Why did you choose RabbitMQ instead of other MQs?

As shown in the picture:
insert image description here

Words:

Kafka is famous for its high throughput, but its data stability is average, and the order of messages cannot be guaranteed. Our company's log collection is also used, and RabbitMQ is used in the business module.

Alibaba's RocketMQ is based on the principle of Kafka, which makes up for the shortcomings of Kafka and inherits its advantages of high throughput. Currently, its clients are mainly Java. But we are worried about the stability of Alibaba's open source products, so we don't use them.

RabbitMQ is developed based on the concurrency-oriented language Erlang. The throughput is not as good as Kafka, but it is enough for our company. Moreover, the message reliability is good, and the message delay is extremely low, and the cluster construction is more convenient. It supports multiple protocols and has clients in various languages, which is more flexible. Spring's support for RabbitMQ is also relatively good, and it is more convenient to use and more in line with our company's needs.

Considering our company's concurrency and stability requirements, we chose RabbitMQ.

2.2. How does RabbitMQ ensure that messages are not lost?

Words:

RabbitMQ provides targeted solutions for various places where problems may occur during message delivery:

  • When the producer sends a message, the message may not reach the exchange due to network problems:
    • RabbitMQ provides a publisher confirm mechanism
      • After the producer sends the message, you can write the ConfirmCallback function
      • After the message reaches the switch successfully, RabbitMQ will call ConfirmCallback to notify the sender of the message and return ACK
      • If the message does not reach the switch, RabbitMQ will also call ConfirmCallback to notify the sender of the message and return NACK
      • An exception will also be thrown if the message is not sent successfully after timeout
  • After the message reaches the exchange, if it fails to reach the queue, the message will also be lost:
    • RabbitMQ provides a publisher return mechanism
      • Producers can define the ReturnCallback function
      • When the message arrives at the switch but not in the queue, RabbitMQ will call ReturnCallback to notify the sender of the failure reason
  • After the message arrives in the queue, MQ downtime may also cause loss of messages:
    • RabbitMQ provides persistence function, cluster master-slave backup function
      • Message persistence, RabbitMQ will persist switches, queues, and messages to disk, and restart after downtime can restore messages
      • Both mirror clusters and arbitration queues can provide master-slave backup functions. When the master node goes down, the slave node will automatically switch to master, and the data is still in the
  • After the message is delivered to the consumer, if the consumer handles it improperly, the message may also be lost
    • SpringAMQP provides consumer confirmation mechanism, consumer retry mechanism and consumer failure processing strategy based on RabbitMQ:
      • Confirmation mechanism for consumers:
        • The consumer processes the message successfully, and when no exception occurs, Spring returns ACK to RabbitMQ, and the message is removed
        • The consumer fails to process the message, throws an exception, crashes, Spring returns NACK or does not return the result, and the message is not abnormal
      • Consumer retry mechanism:
        • By default, when a consumer fails to process, the message will return to the MQ queue again and then delivered to other consumers. The consumer retry mechanism provided by Spring does not return NACK after the processing fails, but directly retries locally on the consumer. After multiple retries fail, the message is processed according to the consumer failure handling strategy. It avoids the extra pressure caused by frequent messages entering the queue.
      • Consumer failure strategy:
        • When a consumer fails local retries multiple times, the message is discarded by default.
        • Spring provides the Republish strategy. After multiple retries fail and the number of retries is exhausted, the message is redelivered to the specified exception switch, and the exception stack information will be carried to help locate the problem.

2.3. How does RabbitMQ avoid message accumulation?

Words:

The reason for the problem of message accumulation is often because the speed of message sending exceeds the speed of consumer message processing. So the solution is nothing more than the following three points:

  • Improve consumer processing speed
  • add more consumers
  • Increase the upper limit of queue message storage

1) Improve consumer processing speed

The processing speed of consumers is determined by the business code, so what we can do includes:

  • Optimize business code as much as possible to improve business performance
  • After receiving the message, open the thread pool and process multiple messages concurrently

Advantages: low cost, just change the code

Disadvantages: Enabling the thread pool will bring additional performance overhead, which is not suitable for high-frequency, low-latency tasks. It is recommended for services with a long task execution period.

2) Add more consumers

A queue binds multiple consumers to compete for tasks together, which can naturally increase the speed of message processing.

Advantages: Problems that can be solved with money are not problems. Realize simple and rude

Cons: The problem is that there is no money. the cost is too high

3) Increase the upper limit of queue message storage

After version 1.8 of RabbitMQ, a new queue mode was added: Lazy Queue

This kind of queue does not save messages in memory, but directly writes them to disk after receiving messages, theoretically there is no storage limit. It can solve the problem of message accumulation.

Advantages: more secure disk storage; unlimited storage; avoid Page Out problems caused by memory storage, and more stable performance;

Disadvantages: Disk storage is limited by IO performance, and message timeliness is not as good as memory mode, but the impact is not significant.

2.4. How does RabbitMQ guarantee the order of messages?

Words:

In fact, RabbitMQ is a queue storage, which naturally has the characteristics of first-in first-out. As long as the sending of messages is orderly, theoretically the reception is also orderly. However, when multiple consumers are bound to a queue, messages may be polled and delivered to consumers, and the processing order of consumers cannot be guaranteed.

Therefore, to ensure the order of messages, the following points need to be done:

  • Guarantee the order of message sending
  • Ensure that a set of ordered messages are sent to the same queue
  • Ensure that a queue contains only one consumer

2.5. How to prevent repeated consumption of MQ messages?

Words:

The reasons for the repeated consumption of messages are various and unavoidable. Therefore, we can only start from the consumer side. As long as the idempotence of message processing can be guaranteed, the message will not be repeatedly consumed.

There are many solutions to guarantee idempotence:

  • Add a unique id to each message, record the message table and message status locally, and make judgments based on the unique id of the database table when processing messages
  • The same is to record the message table, and use the message status field to realize the judgment based on optimistic lock to ensure idempotence
  • Based on the idempotency of the business itself. For example, according to the deletion of id, the query business is inherently idempotent; the business of adding and modifying can be considered based on the uniqueness of the database id, or the optimistic locking mechanism to ensure idempotence. The essence is similar to the message table scheme.

2.6. How to ensure the high availability of RabbitMQ?

Words:

To achieve high availability of RabbitMQ is nothing more than the following two points:

  • Do a good job in the persistence of switches, queues, and messages
  • Build a mirrored cluster of RabbitMQ, and do a good job of master-slave backup. Of course, you can also use a quorum queue instead of a mirrored cluster.

2.7. What problems can be solved by using MQ?

Words:

RabbitMQ can solve many problems, such as:

  • Decoupling: Modifying several business-related microservice calls to MQ-based asynchronous notifications can decouple the business coupling between microservices. It also improves business performance.
  • Traffic peak clipping: Put sudden business requests into MQ as a buffer. The back-end business obtains messages from MQ according to its own processing capability, and processes tasks one by one. The flow curve becomes much smoother
  • Delay queue: Based on RabbitMQ's dead letter queue or DelayExchange plug-in, it can achieve the effect of delaying the reception of messages after they are sent.

3. Redis articles

3.1. What is the difference between Redis and Memcache?

  • redis支持更丰富的数据类型(Supports more complex application scenarios): Redis not only supports simple k/v type data, but also provides storage of data structures such as list, set, zset, and hash. memcache supports a simple data type, String.
  • Redis支持数据的持久化, the data in the memory can be kept in the disk, and can be loaded and used again when restarting, while Memecache stores all the data in the memory.
  • 集群模式: memcached does not have a native cluster mode, and needs to rely on the client to write data to the cluster; but redis currently supports the cluster mode natively.
  • Redis使用单线程: Memcached is a multi-threaded, non-blocking IO multiplexing network model; Redis uses a single-threaded multiplexing IO multiplexing model.
    insert image description here

3.2. Redis single thread problem

Interviewer : Redis uses single thread, how to ensure high concurrency?

Interview speech :

The main reasons why Redis is fast are:

  1. completely memory based
  2. The data structure is simple, and the data operation is also simple
  3. Use multiple I/O multiplexing model to make full use of CPU resources

Interviewer : What are the benefits of doing this?

Interview speech :

The advantages of single threading are as follows:

  • The code is clearer and the processing logic is simpler
  • There is no need to consider various lock issues, there is no lock release operation, and there is no performance consumption caused by locks
  • There is no CPU switching caused by multi-process or multi-thread, making full use of CPU resources

3.2. What are the persistence schemes of Redis?

Relevant information:

1) RDB persistence

RDB persistence can use save or bgsave. In order not to block the main process business, bgsave is generally used. The process:

  • The Redis process will fork a child process (consistent with the memory data of the parent process).
  • The parent process continues to process client request commands
  • Write all data in memory to a temporary RDB file by the child process.
  • After the write operation is complete, the old RDB file will be replaced by the new RDB file.

The following are some configurations related to RDB persistence:

  • save 60 10000: If 10,000 keys change within 60 seconds, perform RDB persistence.
  • stop-writes-on-bgsave-error yes: If Redis fails to perform RDB persistence (commonly due to insufficient memory in the operating system), Redis will no longer accept requests from clients to write data.
  • rdbcompression yes: When generating RDB files, compress them at the same time.
  • dbfilename dump.rdb: Name the RDB file dump.rdb.
  • dir /var/lib/redis: Save the RDB file in /var/lib/redisthe directory.

Of course, in practice, we usually stop-writes-on-bgsave-errorset the setting to false, and at the same time let the monitoring system send an alarm when Redis fails to perform RDB persistence, so that manual intervention can be solved instead of rudely rejecting the client's write request.

Advantages of RDB persistence:

  • RDB persistent files are small, and Redis data recovery is fast
  • The child process does not affect the parent process, and the parent process can continue to process client commands
  • The copy-on-write method is adopted when the child process is forked. In most cases, there is not much memory consumption and the efficiency is relatively good.

Disadvantages of RDB persistence:

  • The copy-on-write method is adopted when the child process is forked. If Redis writes more at this time, it may cause additional memory usage, or even memory overflow.
  • RDB file compression will reduce the file size, but it will consume additional CPU when passing
  • If the business scenario values ​​data durability (durability), then RDB persistence should not be used. For example, if Redis executes RDB persistence every 5 minutes, if Redis crashes unexpectedly, it will lose up to 5 minutes of data.

2) AOF persistence

You can use appendonly yesconfiguration items to enable AOF persistence. When Redis performs AOF persistence, it will append the received write command to the end of the AOF file, so Redis can restore the database to its original state as long as it plays back the commands in the AOF file.
  Compared with RDB persistence, an obvious advantage of AOF persistence is that it can improve data durability. Because in AOF mode, every time Redis receives a write command from the client, it will write the command write()to the end of the AOF file.
  However, in Linux, write()after data is transferred to a file, the data will not be flushed to the disk immediately, but will be temporarily stored in the file system buffer of the OS. At the right time, the OS will flush the data in the buffer to the disk (if you need to flush the file content to the disk, you can call fsync()or fdatasync()).
  Through appendfsyncconfiguration items, you can control how often Redis synchronizes commands to disk:

  • always: Every time Redis writes the command write()to the AOF file, it will be called fsync()to flush the command to disk. This guarantees the best data durability, but can impose a significant overhead on the system.
  • no: Redis only sends commands write()to AOF files. This lets the OS decide when to flush commands to disk.
  • everysec: In addition to writing the command write()to the AOF file, Redis will execute it every second fsync(). In practice, it is recommended to use this setting, which can guarantee data persistence to a certain extent without significantly reducing Redis performance.

However, AOF persistence is not without disadvantages: Redis will continue to append the received write commands to the AOF file, causing the AOF file to become larger and larger. Large AOF files consume disk space and cause Redis to restart more slowly. In order to solve this problem, under appropriate circumstances, Redis will rewrite the AOF file to remove redundant commands in the file to reduce the size of the AOF file. During the rewriting of the AOF file, Redis will start a sub-process, and the sub-process is responsible for rewriting the AOF file.
  You can control the frequency of Redis rewriting AOF files through the following two configuration items:

  • auto-aof-rewrite-min-size 64mb
  • auto-aof-rewrite-percentage 100

The effect of the above two configurations: When the size of the AOF file is greater than 64MB, and the size of the AOF file is at least twice the size after the last rewrite, then Redis will perform AOF rewrite.

advantage:

  • High persistence frequency and high data reliability
  • No additional memory or CPU consumption

shortcoming:

  • large file size
  • Large files lead to low efficiency in service data recovery

Interview speech:

Redis provides two data persistence methods, one is RDB and the other is AOF. By default, Redis uses RDB persistence.

RDB persistent files are small in size, but the frequency of saving data is generally low, the reliability is poor, and data is easy to lose. In addition, RDB will use the Fork function to copy the main process when writing data, which may have additional memory consumption, and file compression will also have additional CPU consumption.

ROF persistence can be persisted once per second, with high reliability. However, the persistent file is large in size, resulting in a long time to read the file during data recovery, and the efficiency is slightly low

3.3. What are the clustering methods of Redis?

Interview speech:

Redis clusters can be divided into master-slave clusters and fragmented clusters .

Master-slave clusters generally have one master and multiple slaves. The master library is used to write data, and the slave library is used to read data. Combined with Sentry, the master can be re-elected when the main database is down, the purpose is to ensure the high availability of Redis .

Sharded clusters are data shards. We will let multiple Redis nodes form a cluster and allocate 16383 slots to different nodes. When storing data, use the hash operation on the key to get the slot value and store it in the corresponding node. Because the storage data is oriented to the slot rather than the node itself, the cluster can be dynamically scaled. The purpose is to allow Redis to store more data.

1) Master-slave cluster

The master-slave cluster is also a read-write separation cluster. It is generally one master and many slaves.

Redis's replication (replication) function allows users to create any number of replicas of the server based on a Redis server, where the replicated server is the master server (master), and the server replica created by replication is the slave server ( slave).

As long as the network connection between the master and slave servers is normal, the master and slave servers will have the same data, and the master server will always synchronize the data updates that happen to itself to the slave server, thus ensuring that the data of the master and slave servers are the same.

  • Writing data can only be done through the master node
  • Reading data can be done from any node
  • If configured 哨兵节点, when the master goes down, the sentinel will elect a new master from the slave node.

There are two types of master-slave clusters:
insert image description here

Cluster with sentinels:
insert image description here

2) Fragmentation cluster

In the master-slave cluster, each node must save all information, which is easy to form a barrel effect. And when the amount of data is large, a single machine cannot meet the demand. At this point we are going to use a sharded cluster.
insert image description here

Cluster characteristics:

  • Each node holds different data

  • All redis nodes are interconnected with each other (PING-PONG mechanism), internally using binary protocol to optimize transmission speed and bandwidth.

  • The fail of a node takes effect only when more than half of the nodes in the cluster detect the failure.

  • The client is directly connected to the redis node, and no intermediate proxy layer is required to connect to any available node in the cluster to access the data

  • redis-cluster maps all physical nodes to [0-16383] slots (slots) to achieve dynamic scaling

In order to ensure the high availability of each node in Redis, we can also create a replication (slave node) for each node, as shown in the figure:
insert image description here

When a failure occurs, the master and slave can switch in time:
insert image description here

3.4. What are the common data types of Redis?

Support multiple types of data structures, the main difference is the data format of value storage is different:

  • string: The most basic data type, a binary safe string, up to 512M.

  • list: A list of strings that maintain order in the order they were added.

  • set: An unordered collection of strings with no duplicate elements.

  • sorted set: A sorted collection of strings.

  • hash: key-value pair format

3.5. Talk about the Redis transaction mechanism

Relevant information:

Reference: http://redisdoc.com/topic/transaction.html

The Redis transaction function is realized through the four primitives of MULTI, EXEC, DISCARD and WATCH. Redis serializes all commands in a transaction and executes them sequentially. However, Redis transactions do not support rollback operations. After a command runs incorrectly, the correct command will continue to execute.

  • MULTI: Used to start a transaction, it always returns OK. After MULTI is executed, the client can continue to send any number of commands to the server. These commands will not be executed immediately, but will be placed in a command queue to be executed.
  • EXEC: Execute all commands in the command queue sequentially. Return the return value of all commands. During transaction execution, Redis will not execute commands of other transactions.
  • DISCARD: clear the command queue, and give up the execution of the transaction, and the client will exit from the transaction state
  • WATCH: Redis's optimistic locking mechanism, using the compare-and-set (CAS) principle, can monitor one or more keys. Once one of the keys is modified, subsequent transactions will not be executed

When using transactions, you may encounter the following two types of errors:

  • Enqueued commands may be corrupted before EXEC is executed. For example, a command may produce a syntax error (wrong number of arguments, wrong argument name, etc.), or other more serious errors such as insufficient memory (if the server uses a maximum memory limit maxmemoryset
    • Starting from Redis 2.6.5, the server will record the failure to enqueue the command, and when the client calls the EXEC command, it will refuse to execute and automatically give up the transaction.
  • Command may fail after EXEC call. For example, a command in a transaction might handle the wrong type of key, such as using a list command on a string key, and so on.
    • Even if some/some commands in the transaction generate an error during execution, other commands in the transaction will still continue to execute and will not be rolled back.

Why does Redis not support rollback (roll back)?

Here are the advantages of this approach:

  • Redis commands can only fail because of incorrect syntax (and these problems cannot be detected when enqueuing), or because the command is used on the wrong type of key: that is, from a practical point of view, the command that fails It is caused by programming errors , and these errors should be found in the development process, and should not appear in the production environment.
  • Since there is no need to support rollbacks, the internals of Redis can be kept simple and fast.

Since there is no mechanism to avoid errors caused by programmers themselves, and such errors usually do not appear in a production environment, Redis chooses a simpler and faster way to handle transactions without rollback.

Interview speech:

Redis transactions actually put a series of Redis commands into the queue, and then execute them in batches without interruption by other transactions during execution. However, unlike relational database transactions, Redis transactions do not support rollback operations. If a command fails to execute in a transaction, other commands will still be executed.

In order to make up for the problem of not being able to roll back, Redis will check the command when the transaction is enqueued, and if the command is abnormal, the entire transaction will be abandoned.

Therefore, as long as the programmer's programming is correct, in theory Redis will execute all transactions correctly without rolling back.

Interviewer: What if Redis crashes halfway through the execution of the transaction?

Redis has a persistence mechanism. Because of reliability issues, we generally use AOF persistence. All commands of the transaction will also be written to the AOF file, but if Redis is down before the EXEC command is executed, the transaction in the AOF file will be incomplete. Use redis-check-aofthe program to remove incomplete transaction information in the AOF file to ensure that the server can start smoothly.

Guess you like

Origin blog.csdn.net/sinat_38316216/article/details/129883049