Distributed cache Redis experience

1. What is the cache used for in the system



1. A small amount of data storage, high-speed read and write access. High-speed access is ensured by all data in-momery, and the function of data landing is provided at the same time. In fact, this is the main application scenario of Redis.



2. Mass data storage, distributed system support, data consistency guarantee, convenient cluster node addition/deletion. After Redis 3.0, clustering has been supported, and semi-automatic data sharding has been realized, but it needs the support of smart-client.



2. Introduce the redis network model in detail from different perspectives



: Redis uses a single-threaded IO multiplexing model and encapsulates a simple AeEvent event processing framework, which mainly implements epoll, kqueue and select. For pure IO operations , a single thread can maximize the speed advantage, but Redis also provides some simple computing functions, such as sorting, aggregation, etc. For these operations, the single-threaded model will actually seriously affect the overall throughput. During the CPU calculation process, the entire IO Scheduling is blocked.



Memory management: Redis uses the method of on-site memory application to store data, and rarely uses free-list and other methods to optimize memory allocation. There will be memory fragmentation to a certain extent. Redis will store data with expiration time according to the data storage command parameters. The data is stored separately and called temporary data. Non-temporary data will never be eliminated. Even if the physical memory is not enough, swap will not eliminate any non-temporary data (but will try to eliminate some temporary data), At this point Redis is more suitable as a storage rather than a cache.



Data consistency problem: In terms of consistency, I personally feel that redis is not as good as memcached. Memcached provides the cas command, which can ensure the consistency of multiple concurrent access operations on the same data. Redis does not provide the cas command and cannot guarantee this. However, Redis provides the function of transaction, which can ensure the atomicity of a series of commands, and will not be interrupted by any operation in the middle.



Supported KEY types: In addition to key/value, Redis also supports many data structures such as list, set, sorted set, hash, etc. It provides KEYS for enumeration operations, but cannot be used online. If you need to enumerate online data, Redis provides tools to directly scan its dump files and enumerate all data. Redis also provides functions such as persistence and replication.



Client support: Redis officially provides rich client support, including clients in most programming languages. For example, I chose the official recommended Java client Jedis for this test. It provides rich interfaces and methods to make Developers do not need data sharding within the relationship, routing for reading data, etc., just a simple call, which is very convenient.

Data replication: Starting from 2.8, Slave will periodically (once per second) initiate an Ack to confirm the processing progress of the replication stream. The detailed process of how Redis replication works is as follows:

1. If a Slave is set, whether it is the first Whether the secondary connection or reconnection to the Master, it will issue a SYNC command;

2. When the Master receives the SYNC command, it will do two things:

a) The Master executes BGSAVE: background writes data to the disk (rdb snapshot);

b) The Master At the same time, the newly received commands to write and modify the data set are stored in the buffer (non-query class);



3. When the Master saves the data to the snapshot file in the background, the Master will transfer the snapshot file to the Slave, and After the Slave clears the memory, it loads the file into the memory;



4. The Master will also forward the commands previously collected in the buffer to the Slave through the Reids command protocol, and the Slave executes these commands to achieve synchronization with the Master;



5. The Master/Slave will continue to synchronize commands asynchronously to achieve the synchronization of the final data;



6. It should be noted that once a reconnection occurs between the Master and the Slave, a full synchronization operation will be triggered. But after 2.8, it may also be a partially synchronous operation.



Starting from 2.8, when the connection between Master and Slave is disconnected, they can use continuous replication processing instead of full synchronization.

The Master maintains an in-memory backlog for the replication stream to record the recently sent replication stream commands; at the same time, a replication offset and the current Master server ID (Masterrun) are maintained between the Master and the Slave. id).



When the network is disconnected and the Slave tries to reconnect:

a. If the MasterID is the same (that is, it is still the Master server before the disconnection), and the historical commands from the time of disconnection to the current moment still exist in the Master's memory buffer, then The Master will send all the commands for the missing period to the Slave for execution, and then the replication work can continue;

b. Otherwise, the full replication operation is still required.

Read-write separation: redis supports read-write separation, and it is easy to use. You only need to configure the redis read server and write server in the configuration file. Multiple servers are separated by commas as follows:

Horizontal dynamic expansion: It took three years and finally waited. Here comes the Redis 3.0 that has been expected. The new version mainly implements the function of Cluster, and data migration will be performed automatically after adding or deleting cluster nodes. Central to enabling online reconfiguration of Redis Cluster is the ability to move slots from one node to another. Because a hash slot is actually a collection of keys, all Redis Cluster really does when rehashing is moving some keys from one node to another.



Data elimination strategy: When the size of the redis memory data set rises to a certain size, the data elimination strategy will be implemented. Redis provides 6 data elimination strategies:



volatile-lru: Pick the least recently used data from the data set with expiration time set (server.db[i].expires) Eliminate



volatile-ttl: From the data set with set expiration time (server.db[i]. expires) to select the data to be expired



volatile-random: arbitrarily select data from the data set (server.db[i].expires) that has set the expiration time to be eliminated



allkeys-lru: from the data set (server.db[i] .dict) to select the least recently used data Eliminate



allkeys-random: arbitrarily select data from the data set (server.db[i].dict) Eliminate



no-enviction (eviction): prohibit the eviction of data



3. Cluster (ie distributed )



Let’s introduce the cluster function of redis in detail. The cluster function has been supported since version 3.0, that is, the real distribution is realized.



Redis cluster is a distributed, fault-tolerant Redis implementation. The functions that can be used by the cluster are a subset of the functions that ordinary stand-alone Redis can use.



There are no central nodes or proxy nodes in Redis Cluster, and one of the main design goals of the cluster is to achieve linear scalability.



Redis Cluster sacrifices some fault tolerance for consistency: the system keeps data as much as possible while maintaining limited resistance to netsplit and node failure consistency.



Cluster Features:



(1) All redis nodes are interconnected with each other (PING-PONG mechanism), and the internal binary protocol is used to optimize the transmission speed and bandwidth.



(2) The fail of a node takes effect only when more than half of the nodes in the cluster detect the failure.



(3) The client is directly connected to the redis node, and no intermediate proxy layer is required. The client does not need to connect to all nodes in the cluster, just connect to any available node in the cluster.



(4) redis-cluster maps all physical nodes to [0-16383] slot, and cluster is responsible for maintaining



the subset of functions implemented by node<->slot<->value Redis cluster:



Redis cluster implements a single-machine Redis, all Commands to handle a single database key. Complex computing operations for multiple database keys, such as set union operations and set operations, are not implemented, and commands that theoretically require the use of multiple database keys of multiple nodes have not been implemented. In the future, users may be able to perform read-only operations on multiple database keys in the cluster's computing nodes through the MIGRATE COPY command, but the cluster itself will not implement those operations that require multiple database keys to be stored on multiple nodes. Complex multi-key commands to move around.



Redis cluster does not support multi-database function like single-machine Redis, the cluster only uses the default database No. 0, and cannot use the SELECT command.

Clients and Servers in Redis Cluster Protocol:

Nodes in Redis Cluster have the following responsibilities:

1. Hold key-value pair data.

2. Record the state of the cluster, including mapping keys to right nodes.

3. Automatically discover other nodes, identify nodes that are not working properly, and elect a new master node from the slave nodes when necessary.



To perform the tasks listed above, each node in the cluster establishes a "cluster bus" with other nodes, which is a TCP connection that communicates using a binary protocol.

The Gossip protocol is used between nodes to do the following:

1. Propagating information about the cluster to discover new nodes.

2. Send PING packets to other nodes to check if the target node is functioning properly.

3. Send cluster information when certain events occur.

4. In addition to this, cluster connections are also used to publish or subscribe to information in the cluster.

Because cluster nodes cannot proxy command requests, clients should forward command requests to other nodes by themselves when a node returns -MOVED or -ASK redirection errors. Because the client is free to send command requests to any node in the cluster, and can forward the command to the correct node when necessary, based on the information provided by the steering error, so in theory, the client is not required to Stores cluster state information. However, if the client can save the mapping information between keys and nodes, the number of possible turns can be effectively reduced, thereby improving the efficiency of command execution.
Key distribution model

The key space of Redis cluster is divided into 16384 slots, and the maximum number of nodes in the cluster is also 16384.

The recommended maximum number of nodes is around 1000. Each master node is responsible for processing a portion of the 16384 hash slots.

When we say that a cluster is in a "stable" state, we mean that the cluster is not performing reconfiguration operations and each hash slot is processed by only one node. Reconfiguration refers to moving some/some slots from one node to another. A master node can have any number of slave nodes, and these slave nodes are used to replace the master node when the master node is disconnected from the network or the node fails.

Cluster node properties:



Each node has a unique ID in the cluster, which is a 160-bit random number in hexadecimal, generated by /dev/urandom when the node is first started.

The node will save its ID to the configuration file, and the node will continue to use this ID as long as the configuration file is not deleted. The node ID is used to identify each node in the cluster. A node can change its IP and port number without changing the node ID. The cluster can automatically identify the change of IP/port number and broadcast this information to other nodes through the Gossip protocol.

The following is the associated information that each node has and that nodes send to other nodes:

1. The IP address and TCP port number used by the node.

2. The flags of the node.

3. The hash slot that the node is responsible for processing.

4. The last time the node sent a PING packet using the cluster connection.

5. The last time the node received a PONG packet in the reply.

6. The time when the cluster marked the node as offline.

7. The number of slave nodes for this node.

8. If the node is a slave node, then it will record the node ID of the master node. If this is a primary node, then the primary node ID column has a value of 0000000.



Some of the above information can be obtained by sending the CLUSTER NODES command to any node in the cluster (either master or slave).



Node handshake:



The node always answers (accept) the connection request from the cluster connection port, and replies to the received PING packet, even if the PING packet is from an untrusted node. However, with the exception of PING, the node will reject all other packets that are not from cluster nodes. There are only two ways for a node to recognize that another node belongs to the same cluster:



1. A node can send a MEET message to another node to force the node receiving the message to recognize that the node sending the message is a member of the cluster. elements. A node will only send MEET information to another node if the administrator explicitly sends it the CLUSTER MEET ipport command.



2. If a trusted node propagates the information of a third-party node to another node, the node that receives the information will also recognize the third-party node as a member of the cluster. That is, if A knows B, B knows C, and B spreads information about C to A, then A also recognizes C as part of the cluster and tries to connect to C.



This means that if we add one/some new nodes to a cluster, this/these new nodes will eventually be connected to all other nodes already in the cluster.

This means that as long as the administrator explicitly specifies a trust relationship using the CLUSTER MEET command, the cluster can automatically discover other nodes. This node identification mechanism makes clusters more robust by preventing unexpected mixes of different Redis clusters due to IP address changes or other network events. When a node's network connection goes down, it actively connects to other known nodes.
MOVED steering:

A Redis client can send command requests to any node in the cluster (including slave nodes). The node will analyze the command request, and if the command is a command that the cluster can execute, then the node will look for the slot where the key to be processed by the command is located. If the hash slot to be searched happens to be processed by the node that received the command, then the node executes the command directly. On the other hand, if the searched slot is not handled by the node, the node will look at its own internally stored hash slot to node ID mapping record and return a MOVED error to the client.

Even if the client waits so long before resending the GET command that the cluster changes the configuration again so that node 127.0.0.1:6381 no longer handles slot 3999, then when the client sends node 127.0.0.1: 6381 When sending a GET command, the node will again return a MOVED error to the client, indicating that the node is now responsible for processing slot 3999.

Although we use IDs to identify nodes in the cluster, in order to make the redirection operation of the client as simple as possible, the node directly returns the IP and port number of the target node instead of the ID of the target node in the MOVED error. But a client should memorize the information that "slot 3999 is handled by node 127.0.0.1:6381", so that when another command needs to be executed on slot 3999, the client can speed up the search for the correct node.

Note that when the cluster is in a stable state, all clients will eventually keep a map of hash slots to nodes, making the cluster very efficient: clients can send command requests directly to the correct nodes , without redirects, proxies, or any other entiy that might have a single point of failure.

In addition to MOVED steering errors, a client should also be able to handle ASK steering errors described later.

Cluster online reconfiguration:

Redis Cluster supports adding or removing nodes while the cluster is running. In fact, node addition and node deletion can be abstracted into the same operation, that is, moving a hash slot from one node to another: adding a new node to the cluster is equivalent to removing the slots of other existing nodes Move to a blank new node. Removing a node from the cluster is equivalent to moving all the slots of the removed node to other nodes in the cluster.

Therefore, at the heart of enabling online reconfiguration of Redis Cluster is the ability to move slots from one node to another. Because a hash slot is actually a collection of keys, what Redis Cluster really does when rehash is to move some keys from one node to another.

To understand how Redis Cluster moves slots from one node to another, we need to introduce the various subcommands of the CLUSTER command, which are responsible for managing the slots translation table of cluster nodes.

The following are the subcommands available for the CLUSTER command:

The first two commands ADDSLOTS and DELSLOTS are used to assign or remove a node, respectively. When a slot is assigned or removed, the node will pass this information through the Gossip protocol. spread to the entire cluster. The ADDSLOTS command is typically used when a new cluster is created as a means of quickly assigning individual slots to individual nodes.

The CLUSTERSETSLOT slot NODE node subcommand assigns the specified slot to the node node.

As for the CLUSTER SETSLOT slot MIGRATING node command and the CLUSTER SETSLOTslot IMPORTING node command, the former is used to migrate the slot in the given node node out of the node, while the latter is used to import the given slot slot into the node node:

When a slot is set to the MIGRATING state, the node that originally held the slot will still continue to accept command requests for this slot, but only when the key processed by the command still exists in the node, the node will process the command request.

If the key used by the command does not exist for the node, the node will return an -ASK redirection error to the client, telling the client to send the command request to the slot's migration target node.

When a slot is set to the IMPORTING state, the node will only accept command requests for this slot after receiving the ASKING command.

If the client does not send an ASKING command to the node, the node will use the -MOVED redirect error to redirect the command request to the node that is actually responsible for handling the slot.

The above description of MIGRATING and IMPORTING is a bit confusing, let's illustrate it with a practical example.

Suppose now, we have two nodes A and B, and we want to move slot 8 from node A to node B, so we:

send the command CLUSTER SETSLOT 8 IMPORTING

to node B A sends the command CLUSTER SETSLOT 8 MIGRATING B to node A

every When the client sends a command request about hash slot 8 to other nodes, these nodes will return the redirection information to node A to the client:

if the key to be processed by the command already exists in slot 8, then the command will be processed by node A. deal with.

If the key to be processed by the command does not exist in slot 8 (eg, a new key is to be added to the slot), then the command is processed by node B.
This mechanism will keep node A from creating any new keys for slot 8.

At the same time, a special client redis-trib and the Redis cluster configuration utility will move the keys in slot 8 in node A to node B.

The key move operation is performed by the following two commands:

CLUSTERGETKEYSINSLOT slot count

The above command will cause the node to return the keys in count slots. For each key returned by the command, redis-trib will send a MIGRATE command to node A, This command will atomically move the specified key from node A to node B (both nodes are blocked while the key is being moved to avoid race conditions).

The following is the working principle of the MIGRATE command:

MIGRATEtarget_host target_port key target_database id timeout

The node that executes the MIGRATE command will connect to the target node and send the serialized key data to the target. Once the target returns OK, the node will send its own key from the database deleted in.

From an external client's perspective, at some point in time, the key key exists either on node A or on node B, but not on both node A and node B.

Because Redis Cluster only uses database number 0, when the MIGRATE command is used to perform cluster operations, the value of target_database is always 0.
The target_database parameter exists to make the MIGRATE command a generic command that can act on other functions outside the cluster.

We have optimized the MIGRATE command so that it remains efficient even when transferring complex data such as list keys with multiple elements.

However, although MIGRATE is very efficient, for a cluster with a large number of keys and a large amount of data in the key, the cluster reconfiguration will still take a lot of time, which may cause the cluster to fail to adapt to those who have strict requirements on response time. application.

ASK Steering:

When we introduced MOVED Steering before, we said that there is another ASK Steering besides MOVED Steering. When a node needs to let a client permanently send a command request for a slot to another node, the node returns a MOVED turn to the client. On the other hand, when the node needs to make the client redirect to another node only in the next command request, the node returns an ASK redirect to the client.

For example, in the slot 8 example we listed in the previous section, since the keys contained in slot 8 are scattered across node A and node B, when the client does not find a key in node A, it should Turn to node B to look for it, but this turn should affect only one command query, instead of letting the client go directly to node B every time: the keys held by node A belonging to slot 8 are not all migrated to Before node B, the client should visit node A first, and then visit node B. Because this steering is only for one of the 16384 slots, the performance penalty for the cluster is acceptable.

For the above reasons, if we want to continue looking for node B after looking for node A, the client should send an ASKING command before sending a command request to node B, otherwise this command request for a slot with IMPORTING state will Refused to execute by Node B. A node that receives an ASKING command from a client will set a one-time flag for the client, allowing the client to execute a command request for a slot in the IMPORTING state. From the client's point of view, the complete semantics of an ASK turn are as follows:

1. If the client receives an ASK turn, adjust the sending object of the command request to the node specified by the turn.

2. Send an ASKING command before sending the actual command request.

3. It is not necessary to update the mapping of slot 8 to node recorded by the client: slot 8 should still be mapped to node A, not node B.

Once the migration of node A for slot 8 is completed, when node A receives the command request for slot 8 again, it will return MOVED to the client, and redirect the command request for slot 8 to node B for a long time.

Note that even if the client has a bug and prematurely maps slot 8 to node B, as long as the client does not send the ASKING command, the client will encounter a MOVED error when sending a command request and turn it back to the node A.

Fault tolerance:

node failure detection, the following is the implementation method of node failure check:

1. When a node sends a PING command to another node, but the target node fails to return a reply to the PING command within the given time limit, the node sending the command will mark the target node as PFAIL (possible failure, may have failed). ). The time period to wait for the reply of the PING command is called "node timeout" and is a node-wise setting.

2. Every time a node sends a PING command to other nodes, it will randomly broadcast the information of three nodes it knows, one of which is to indicate whether the node has been marked as PFAIL or FAIL.

When a node receives messages from other nodes, it records those nodes that were marked as failed by other nodes. This is called a failure report.

3. If a node has marked a node as PFAIL, and based on the failure report received by the node, most other master nodes in the cluster also believe that the node has entered the failure state, then the node will send the failure report of the failed node. The status is marked FAIL.

4. Once a node is marked as FAIL, the information about the failure of this node will be broadcast to the entire cluster, and all nodes that receive this information will mark the failed node as FAIL.

Simply put, for a node to mark another node as invalid, it must first ask other nodes for their opinions and get the consent of most master nodes. Because expired failure reports are removed, the master node must use the most recently received failure report as the basis for marking a node as FAIL.

Slave node election: Once a master node enters the FAIL state, if the master node has one or more slave nodes, one of the slave nodes will be upgraded to the new master node, and the other slave nodes will start to respond to the new master node. primary node for replication.

The new master node is elected by all the slave nodes under the offline master node. The following are the election conditions:

1. This node is the slave node of the offline master node.

2. The number of slots processed by the offline master node is not empty.

3. The data of the slave node is considered to be reliable, that is, the disconnection duration of the replication link between the master and slave nodes cannot exceed the product of the nodetimeout multiplied by the REDIS_CLUSTER_SLAVE_VALIDITY_MULT constant.

If a slave node satisfies all the above conditions, then the slave node will send an authorization request to other master nodes in the cluster, asking them whether to allow itself (slave node) to be promoted to the new master node.

If the slave node sending the authorization request satisfies the following attributes, then the master node will return the FAILOVER_AUTH_GRANTED authorization to the slave node, agreeing to the upgrade requirements of the slave node:

1. It is a slave node that sends the authorization request, and the master node to which it belongs is in the FAIL state.

2. Among all the slave nodes of the offline master node, the node ID of this slave node is the smallest in the sorting.

3. The slave node is in normal operating state: it is not marked as FAIL state, nor is it marked as PFAIL state.

Once a slave node is authorized by the majority of the master nodes within the given time limit, it will start to perform the following failover operations:

1. Inform the other nodes through a PONG packet that this node is now the master node.

2. Inform other nodes through PONG packets that this node is a promoted slave.

3. Claiming all hash slots handled by the offline master node.

4. Explicitly broadcast a PONG packet to all nodes to speed up the progress of other nodes in identifying this node, instead of waiting for timed PING/PONG packets.

All other nodes will update the configuration accordingly according to the new master:

all slots taken over by the new master will be updated.

All slave nodes that have gone offline from the master node will perceive the PROMOTED flag and start replicating with the new master node.

If the offline master node comes back online, it will perceive the PROMOTED flag and adjust itself as a slave node of the current master node.

During the life of the cluster, if a master node with the PROMOTED flag becomes a slave node for some reason, the node will lose the PROMOTED flag it carried.

Source source: minglisoft.cn/technology QQ:1225363639 QQ:3192364813

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326504015&siteId=291194637