[Technology selection] Several clustering schemes of Redis, and comparison of advantages and disadvantages

background

In service development, a single machine will have a single point of failure problem, and the service is deployed on a server. Once the server is down, the service will not be available. Therefore, in order to make the service highly available, distributed services appear, and the same service is deployed. On multiple machines, even if several servers are down, as long as one server is available, the service will be available.

The same is true for redis. In order to solve the stand-alone failure, the master-slave mode is introduced, but there is a problem in the master-slave mode: after the master node fails, the service needs to be manually switched from the slave node to the master node before the service is restored. To solve this problem, redis introduces the sentinel mode. The sentinel mode can automatically promote the slave node to the master node after the master node fails, and the service can be restored without manual intervention. However, neither the master-slave mode nor the sentinel mode has achieved real data sharding storage. Each redis instance stores the full amount of data, so redis cluster was born, realizing real data sharding storage. However, since redis cluster was released relatively late (the official version was released only in 2015), major manufacturers couldn't wait, and successively developed their own redis data fragmentation cluster models, such as: Twemproxy, Codis, etc.

1. Master-slave mode

Although a redis single node can persist data to the hard disk through the RDB and AOF persistence mechanisms, the data is stored on a server. If the server has a problem such as a hard disk failure, the data will be unavailable and cannot be read or written. Separation, reading and writing are all on the same server, and I/O bottlenecks will occur when the request volume is large.

In order to avoid single point of failure and non-separation of reading and writing, Redis provides the replication function to realize that after the data in the master database is updated, the updated data will be automatically synchronized to other slave databases.

insert image description here

The characteristics of the master-slave structure of redis above: a master can have multiple salve nodes; a salve node can have slave nodes, and the slave nodes are in a cascade structure.

  • Advantages and disadvantages of master-slave mode

    1. Advantages: The master-slave structure has the advantages of read-write separation, improved efficiency, data backup, and multiple copies.
    2. Disadvantages: The biggest disadvantage is that the master-slave mode does not have automatic fault tolerance and recovery functions. If the master node fails, the cluster cannot work, and the availability is relatively low. Manual intervention is required to upgrade the slave node to the master node.

#Ordinary master-slave mode, when the master database crashes, you need to manually switch the slave database to become the master database:
1. Use the SLAVE NO ONE command in the slave database to promote the slave database to the master data to continue the service.
2. Start the master database that crashed before, and then use the SLAVEOF command to set it as the slave database of the new master database to synchronize data.

More references to what I wrote before: Redis master-slave replication

2. Sentinel mode

The first type of master-slave synchronization/replication mode, when the master server is down, you need to manually switch a slave server to the master server, which requires manual intervention, which is laborious and laborious, and will cause the service to be unavailable for a period of time. That's where Sentry Mode comes in. The sentinel mode was provided from Redis version 2.6, but the mode of this version was unstable at that time, and the sentinel mode was not stabilized until Redis version 2.8.

The core of the sentinel mode is still master-slave replication, but compared with the master-slave mode, when the master node is down and cannot be written, there is an additional election mechanism: a new master node is elected from all slave nodes. The implementation of the election mechanism depends on starting a sentinel process in the system.

insert image description here

As shown in the figure above, Sentry itself has a single point of failure problem, so in a Redis system with one master and multiple slaves, multiple Sentinels can be used for monitoring. Sentinels will not only monitor the master database and slave databases, but also monitor each other. Each sentinel is an independent process, and as a process, it will run independently.

insert image description here

  1. The role of sentinel mode:

    • Monitor whether all servers are running normally: Send commands to return the running status of the monitoring server, process and monitor the master server, slave server, and sentinels also monitor each other.

    • Failover: When the Sentinel detects that the master is down, it will automatically switch the slave to the master, and then notify other slave servers through the publish-subscribe mode, modify the configuration file, and let them switch to the master. At the same time, the problematic old master will also become a slave of the new master, that is to say, even if the old master is restored, it will not restore its original master status, but will become a slave of the new master.

  2. Sentinel implementation principle
    Sentinel will read the content of the configuration file when starting the process, and find out the main database that needs to be monitored through the following configuration:

    sentinel monitor master-name ip port quorum
    #master-name是主数据库的名字
    #ip和port 是当前主数据库地址和端口号
    #quorum表示在执行故障切换操作前,需要多少哨兵节点同意。
    

    The reason why only the master node needs to be connected here is because the info command of the master node is used to obtain the slave node information, thereby establishing a connection with the slave node, and at the same time knowing the information of the newly added slave node through the info information of the master node.

    A sentinel node can monitor multiple master nodes, but this is not recommended because when the sentinel node crashes, multiple cluster switchovers will fail at the same time. After Sentinel starts, two connections are made to the main database.

    1.订阅主数据库_sentinel_:hello频道以获取同样监控该数据库的哨兵节点信息
    2.定期向主数据库发送info命令,获取主数据库本身的信息。
    

    After establishing a connection with the main database, the following three operations will be performed regularly:

    • Send info command to master and slave every 10s. The function is to obtain the current database information. For example, when a new slave node is found, a connection will be established and added to the monitoring list. When the role of the master-slave database changes, the information will be updated.

    • Send your own information to the _sentinel_:hello channel of the master data and the slave database every 2s. The role is to share your own monitoring data with Sentry. Each sentinel will subscribe to the _sentinel:hello channel of the database. When other sentinels receive the message, they will judge whether the sentinel is a new sentinel. If so, add it to the sentinel list and establish a connection.

    • Send ping commands to all master-slave nodes and all sentinel nodes every 1s to monitor whether the nodes are alive.

  3. Subjective offline and objective offline
    When the sentinel node sends a ping command, if the node does not reply after a certain period of time (down-after-millisecond), the sentinel considers that it is subjectively offline. Subjective offline means that the current Sentinel believes that the node is already down. If the node is the master database, Sentinel will further judge whether it needs to fail over it. At this time, it will send the command (SENTINEL is-master-down-by-addr) Ask other sentinel nodes whether they think the master node is offline subjectively. When the specified number (quorum) is reached, the sentinel will consider it objectively offline.

    When the master node objectively goes offline, a master-slave switchover is required. The steps of the master-slave switchover are:

    (1) Select the lead sentinel.
    (2) All the slaves of the leading sentinel select the slave database with the highest priority. The priority can be set via the slave-priority option.
    (3) If the priority is the same, the greater the offset from the copied command (that is, the more data is copied and synchronized, the newer the data), the higher the priority.
    (4) If the above conditions are the same, select the slave database with the smaller run ID.
    After a slave database is selected, the sentinel sends the slave no one command to upgrade to the master database, and sends the slaveof command to set the master databases of other slave nodes as the new master database.

  4. Advantages and disadvantages of sentinel mode

#1.优点
    哨兵模式是基于主从模式的,解决可主从模式中master故障不可以自动切换故障的问题。
#2.不足-问题
(1)是一种中心化的集群实现方案:始终只有一个Redis主机来接收和处理写请求,写操作受单机瓶颈影响。
(2)集群里所有节点保存的都是全量数据,浪费内存空间,没有真正实现分布式存储。数据量过大时,主从同步严重影响master的性能。
(3Redis主机宕机后,哨兵模式正在投票选举的情况之外,因为投票选举结束之前,谁也不知道主机和从机是谁,此时Redis也会开启保护机制,禁止写操作,直到选举出了新的Redis主机。

The data stored by each node in the master-slave mode or the sentinel mode is a full amount of data. When the amount of data is too large, the stored data needs to be segmented and stored in multiple redis instances. At this time, Redis Sharding technology will be used.

More can refer to the article written before: Redis Sentinel (Sentinel) mode

3. Redis Cluster

Note: Cluster pronunciation /ˈklʌstə(r)/

Although the sentinel mode of Redis can achieve high availability and read-write separation, there are several deficiencies:

  • In sentinel mode, each Redis server stores the same data, which is a waste of memory space; the amount of data is too large, and the master-slave synchronization seriously affects the performance of the master.

  • Sentinel mode is a centralized cluster implementation scheme, each slave machine is highly coupled to the host machine, and the service is unavailable during the period from the master downtime to the recovery of the slave election master.

  • Sentinel mode always has only one Redis host to receive and process write requests. Write operations are still affected by the bottleneck of a single machine, and a true distributed architecture has not been implemented.

Redis added the Cluster cluster mode on 3.0 to realize the distributed storage of Redis, that is to say, different data is stored on each Redis node. In order to solve the problem of limited Redis capacity of a single machine, the cluster mode distributes data to multiple machines according to certain rules. The memory/QPS is not limited to a single machine, and can benefit from the high scalability of distributed clusters. Redis Cluster is a server sharding technology (sharding and routing are implemented on the server side), using multiple masters and multiple slaves. Each partition is composed of a Redis host and multiple slaves. In parallel. The Redis Cluster cluster adopts the P2P model and is completely decentralized.

insert image description here

As shown in the figure above, it is officially recommended that at least 3 master nodes are required for cluster deployment, and it is best to use a mode of 3 masters and 3 slaves with six nodes. The Redis Cluster cluster has the following characteristics:

  • The cluster is completely decentralized and adopts multiple masters and multiple slaves; all redis nodes are interconnected with each other (PING-PONG mechanism), and the binary protocol is used internally to optimize transmission speed and bandwidth.

  • The client is directly connected to the Redis node without an intermediate proxy layer. The client does not need to connect to all the nodes in the cluster, but can connect to any available node in the cluster.

  • Each partition is composed of a Redis host and multiple slaves, and the shards and shards are parallel to each other.

  • Each master node is responsible for maintaining a part of the slots and the key-value data mapped to the slots; each node in the cluster has a full amount of slot information, and through the slots, each node knows which node the specific data is stored on.

Redis cluster is mainly for massive data + high concurrency + high availability scenarios, massive data, if you have a large amount of data, then it is recommended to use redis cluster, when the amount of data is not large, using sentinel is enough. The performance and high availability of redis cluster are better than sentinel mode.

Redis Cluster adopts virtual hash slot partitioning instead of consistent hash algorithm, and pre-allocates some card slots. All keys are mapped to these slots according to the hash function. The master node in each partition is responsible for maintaining a part of the slots and the slots mapped. key-value data. For the detailed implementation principle of Redis Cluster, please refer to: Redis Cluster data fragmentation implementation principle.

More can refer to the article written before: Talking about Redis-Cluster (Cluster)

4. Redis cluster solutions of major factories

Redis only supports single-instance mode before version 3.0. Although Redis developer Antirez proposed to add the cluster function in Redis version 3.0 as early as on his blog, version 3.0 was not released until 2015. Major enterprises can't wait any longer. Before the release of version 3.0, in order to solve the storage bottleneck of Redis, they launched their own Redis cluster solutions one after another. The core idea of ​​these solutions is to store data sharding (sharding) in multiple Redis instances, and each shard is a Redis instance.

client sharding

Client-side sharding is implemented by putting the logic of sharding on the Redis client (for example: jedis already supports the Redis Sharding function, that is, ShardedJedis), through the pre-defined routing rules of the Redis client (using consistent hashing), the The access to Key is forwarded to different Redis instances, and the returned results are collected when querying data. The schema for this scenario is shown in the figure.

insert image description here

Pros and cons of client-side sharding:

  • Advantages: The advantage of client-side sharding technology using the hash consensus algorithm for sharding is that all logic is controllable and does not depend on third-party distributed middleware. The Redis instances on the server side are independent of each other and not related to each other. Each Redis instance runs like a single server. It is very easy to expand linearly and the system is very flexible. Developers know how to implement the rules of sharding and routing, so they don't have to worry about stepping on the pit.

    1. Consistent Hash Algorithm:
      It is an algorithm commonly used in distributed systems. For example, a distributed storage system needs to store data on a specific node. If a common hash method is used to map the data to a specific node, such as mod(key,d), key is the key
      of the data, and d It is the number of machine nodes. If a machine joins or exits the cluster, all data mappings will be invalid.
      The consistent hash algorithm solves the problem of poor scalability of the ordinary remainder Hash algorithm, and can ensure that as many requests as possible hit the original routed server when the server is online or offline.

    2. Implementation method: Consistent hash algorithm, such as MURMUR_HASH hash algorithm, ketamahash algorithm
      such as Redis Sharding implementation of Jedis, using consistent hashing algorithm (consistent hashing), hashing the key and node name at the same time, and then performing mapping matching, the adopted algorithm is MURMUR_HASH.
      The main reason for using consistent hashing instead of simple hash-like modulo mapping is that when nodes are added or removed, there will be no rehashing due to re-matching. Consistent hashing only affects the key distribution of adjacent nodes, and the impact is small.

  • insufficient:

    • This is a static sharding scheme, which needs to increase or decrease the number of Redis instances, and manually adjust the sharding program.

    • The operation and maintenance costs are relatively high. Any problem with the cluster data requires the cooperation of operation and maintenance personnel and developers, which slows down the speed of problem solving and increases the cost of cross-departmental communication.

    • In different client programs, the cost of maintaining the same route sharding logic is huge. For example, the java project and the PHP project share a set of Redis clusters, and the routing fragmentation logic needs to write two sets of the same logic, and there will be two sets of maintenance in the future.

One of the biggest problems with client sharding is that when the topological structure of the server-side Redis instance group changes, each client needs to be updated and adjusted. If the client sharding module can be taken out separately to form a separate module (middleware), this problem can be solved as a bridge between the client and the server, and proxy sharding will appear at this time.

proxy sharding

Twemproxy is the most widely used redis agent fragmentation, which is an open-source Redis agent by Twitter. Its basic principle is: through the form of middleware, the Redis client sends the request to Twemproxy, and Twemproxy sends it to the correct Redis instance according to the routing rules. Finally, Twemproxy aggregates the results back to the client.

Twemproxy introduces a proxy layer to manage multiple Redis instances in a unified manner, so that the Redis client only needs to operate on Twemproxy, and does not need to care about how many Redis instances there are behind, thus realizing the Redis cluster.

insert image description here

  • Advantages of Twemproxy:

    • The client connects to Twemproxy like a Redis instance, without changing any code logic.

    • Automatic deletion of invalid Redis instances is supported.

    • Twemproxy maintains a connection with the Redis instance, reducing the number of connections between the client and the Redis instance.

  • Disadvantages of Twemproxy:

    • Since each request from the Redis client goes through the Twemproxy proxy to reach the Redis server, there will be a performance loss in this process.

    • There is no friendly monitoring and management background interface, which is not conducive to operation and maintenance monitoring.

    • The biggest pain point of Twemproxy is that it cannot expand/shrink smoothly. For operation and maintenance personnel, the workload is very heavy when adding Redis instances due to business needs.

Twemproxy, as the most widely used, proven and stable Redis proxy, is widely used in the industry.

codes

The problem that Twemproxy cannot smoothly increase Redis instances has brought great inconvenience, so Peapod independently developed Codis, a Redis proxy software that supports smooth increase of Redis instances, which was developed based on Go and C languages, and was launched in November 2014. Open source codis open source address on GitHub .

insert image description here

In the architecture diagram of Codis, Codis introduces Redis Server Group, which realizes the high availability of Redis cluster by specifying a master CodisRedis and one or more slave CodisRedis. When a primary CodisRedis hangs up, Codis will not automatically promote a secondary CodisRedis to the primary CodisRedis, which involves data consistency issues (Redis's own data synchronization uses master-slave asynchronous replication, when data is successfully written to the primary CodisRedis , there is no guarantee whether the slave CodisRedis has read this data), the administrator needs to manually promote the slave CodisRedis to the master CodisRedis on the management interface.

If manual processing is troublesome, pea pods also provide a tool Codis-ha, which will take it offline and promote a slave CodisRedis to master CodisRedis when it detects that the master CodisRedis is down.

Codis adopts the form of pre-sharding. When it is started, 1024 slots are created. One slot is equivalent to one box. Each box has a fixed number and the range is 1 1024 . The slot box is used to store the Key. As for which box the Key is stored in, a number can be obtained through the algorithm "crc32(key)%1024". The range of this number must be between 1 and 1024, and the Key is placed in the slot corresponding to this number . . For example, if the number obtained by a Key through the algorithm "crc32(key)%1024" is 5, put it into the slot (box) coded 5. Only one Redis Server Group can be placed in one slot, and one slot cannot be placed in multiple Redis Server Groups. A Redis Server Group can store at least 1 slot and a maximum of 1024 slots. Therefore, a maximum of 1024 Redis Server Groups can be specified in Codis.

The biggest advantage of Codis is that it supports smooth increase (decrease) of Redis Server Group (Redis instance), and can migrate data safely and transparently. This is also where Codis differs from static distributed Redis solutions such as Twemproxy. After Codis added the Redis Server Group, it involved the migration of slots. For example, the system has two Redis Server Groups, and the correspondence between Redis Server Groups and slots is as follows.

Redis Server Group slot
1 1~500
2 501~1024

When a Redis Server Group is added, the slot will be reassigned. There are two ways for Codis to allocate slots:

  • The first method: manually redistribute through the Codis management tool Codisconfig, and specify the range of slots corresponding to each Redis Server Group. For example, you can specify the new correspondence between Redis Server Group and slots as follows.
Redis Server Group slot
1 1~500
2 501~700
3 701~1024
  • The second method: through the rebalance function of the Codis management tool Codisconfig, the slots will be automatically migrated according to the memory of each Redis Server Group to achieve data balance.

Five, Redis cluster solution

In order to solve some problems in the Redis cluster solution, we can take the following measures:

Deployment and management automation: You can use automated tools to deploy and manage Redis cluster solutions, such as Redis-trib officially provided by Redis or third-party tools such as Redis Sentinel Manager.
Data backup and recovery: Data backup and recovery measures can be taken to ensure data reliability, such as using the RDB and AOF backup mechanisms that come with Redis, or using third-party backup tools such as Redis Backup and Redis Cloud Backup.
Improve performance: Some technical means can be used to improve the performance of the Redis cluster solution, such as using client fragmentation, increasing the number of nodes, optimizing hash algorithms, and using persistent storage.
Use other solutions: If the Redis cluster solution cannot meet business needs, you can consider using other solutions, such as Redis Cluster Proxy, Twemproxy, Codis, etc.

6. The principle of Redis cluster scheme

The Redis cluster solution uses sharding technology to disperse and store data on multiple nodes. Redis cluster supports multiple nodes to form a cluster, each node can save a part of data, and clients can access the entire cluster through any node. The Redis cluster solution uses a hash algorithm to distribute data to different nodes. Specifically, it uses the hash value of the key as the basis for sharding, and calculates which node the data should be stored on by taking the modulus of the number of nodes. Redis cluster uses the concept of virtual slots to implement sharding. A node can have multiple virtual slots, and a slot corresponds to a hash value range of a key. When a node is responsible for one or more virtual slots, it is responsible for storing these keys.
In order to ensure high availability, the Redis cluster adopts a master-slave replication method. Each node can have one or more slave nodes. When the master node goes down, the slave nodes will automatically upgrade to master nodes and continue to provide services. Redis cluster also uses the sentinel mechanism to achieve automatic failover by monitoring the status of nodes. When a node goes down, Sentinel will automatically remove it from the cluster and select a new node as a slave node, so that the entire cluster remains in normal operation.

Summarize

The Redis cluster solution is a highly available and scalable solution that implements distributed storage and failover of data through fragmentation and master-slave replication mechanisms. However, the Redis cluster solution also has some shortcomings, which need to be optimized and improved in terms of deployment and management, data reliability, performance and transaction support.

Guess you like

Origin blog.csdn.net/u011397981/article/details/131389861