A, Redis official recommended cluster solution: Redis Cluster
Suitable for redis3.0 later,
Cluster data pieces Redis
· Redis Cluster consistent hashing is not used, but the use of different forms of slices, wherein each key is called the hash slot . .
16384 Redis cluster hash slots, in order to calculate a hash key slot given, we only need to use the key modulus CRC16 16384.
Redis each node in the cluster is responsible for a subset of hash slots, for example, a cluster has three nodes that contains, in which:
- A node comprising from 0 to 5500 in the hash slot.
- Node-B contains the hash of the slot from 5,501 to 11,000.
- Node C contains the hash of the groove from 11,001 to 16,383.
This allows for easy adding and removing nodes in the cluster. For example, if I want to add a new node D, I need some hash trough moves from node A, B, C to D. Similarly, if I want to remove a node from the cluster A, I simply move A service of hash groove. B and C. When the node A is empty, I can completely remove it from the cluster.
Because the hash slot to move from one node to another node does not need to stop operation, add and delete nodes, or the percentage change some hash slots held by node, it does not require any downtime.
All the key as far as a single command (or the entire transaction or Lua script execution) all belong to the same hash tank, Redis Cluster to support multiple key operation. Called user by using the hash marks become part of the same hash slot concept forced plurality of keys .
Hash Redis label recording cluster in the specification, but the point is, if there is substring} {keys between brackets, only the content of the interior of the hashed string, for example, this{foo}key
and another{foo}key
to ensure at the same hash slot, and may be used in a command having a plurality of keys as a parameter.
Redis Cluster master-slave model
In order to set the master node fails or can not maintain the sub-node communication with the majority of the available, Redis Cluster using a master-slave model, where each hash from a slot (host itself) to N copies (N) -1 additional from node). When creating a cluster, adding a master node from each node, so that the final cluster of a master node A, B, C, and as the slave node A1, B1, C1 composition. If the node B fails, the system can continue to run. Node B1 Copy B, B fails, the cluster nodes B1 upgraded to the new primary node, and will continue to operate normally.
Note that if the node B and B1 fail simultaneously, Redis Cluster will not continue to run.
Redis cluster to ensure consistency
Redis Cluster can not guarantee strong consistency . Under certain conditions, Redis Cluster may lose its system written confirmation to the client.
Redis Cluster may be a reason for missing the first written is that it uses asynchronous replication. This means that the following happens during writing:
- Client write master B.
- master B is determined to respond to the client.
- master B writes it to propagate from the device B1, B2 and B3.
(1) B did not wait for an acknowledgment from B1, B2, B3 of the replies to the client before, because this is a Redis excessive delay, so if the client to write something, B will confirm written, but written before the crash can be transmitted to its slave, wherein a Slave (not received write) promoted master, write lost forever.
This database is configured with the most per flush data to disk what is happening is very similar . Similarly, you can refresh the data on the disk by forcing the client database before you return to improve consistency, but can lead to low performance. In Redis Cluster, the equivalent synchronous replication.
Solution that is a trade-off between performance and consistency.
Redis Cluster support when absolutely necessary synchronous write through WAIT achieve command, which makes the possibility of losing write greatly reduced, but even using synchronous replication, Redis Cluster will not achieve strong consistency: total in more complex cases failure scenarios can be realized, can not be written to slave was chosen master.
(2) where there is another Notably, the Redis write the cluster will be lost, this situation occurs in the network partition, and a few examples where the client (including at least a primary server) isolated. Such as,
In Example 6 clusters of nodes, including A, B, C, A1, B1, C1,3 master and three slaves. There is also a customer, we call Z1.
After partitioning occurs, there may be A, C, A1, B1, C1 at a side of the partition, there is on the other side B and Z1.
Z1 can still write B, it will accept written. If the partition is restored within a very short time, the cluster will continue to operate normally. However, if the partition for a time sufficient that B1 promoted in most primary side of the partition, Z1 is sent to the write B will be lost.
Note that, Zl can be sent to the write amount of B present maximum window : If the most aspects partition has enough time to selected slave master, the master node to each end of a few will stop accepting write.
This time is a very important configuration instructions Redis Cluster, known as the node timeout .
After the node timeout, master is considered a failure, can be replaced by a copy of them. Similarly, after the node timeout is over and the master node can not perceive most of the other main node, it enters an error state and stops accepting written.
2): if an instance (instance) the last valid reply time exceeds the PING command from the down- after-milliseconds value specified by the option, then the instance is marked subjective Sentinel offline.
3): If a Master is marked as subjective offline, all this is being monitored to be sure the Master Sentinel Master indeed entered a subjective offline state once per second.
4): Sentinel when a sufficient number of (not less than the value specified profiles) did enter the confirmation Master subjective offline state within a specified time, the Master is marked as offline objective
5): In general each Sentinel will, in all Master Slave transmits a frequency of once every 10 seconds to other known command INFO
6): when the Master Sentinel objective is marked offline, all Sentinel Slave sends an INFO commands to the Master offline the frequency will be from 10 seconds to once per second,
7): If there is not enough number of Master Sentinel consent has been off the assembly line, the objective offline Master status will be removed.
If the Master valid responses to the Sentinel return to the PING command, Master subjective offline status will be removed.
Master-slave replication: The master node is responsible for writing data from the node is responsible for reading data, the master node periodically synchronize the data from node to ensure data consistency
Note: The master-slave replication and Sentinel mechanisms need to be manually configured.
Three, Redis as a cache application problems and solutions:
1) Cache penetration
-
All parameters may be stored in a hash query form, the control layer to be verified, do not meet discarded. And most common is the use of Bloom filter, all possible data hash to a sufficiently large bitmap in a certain absence of data will be blocked out of the bitmap, so as to avoid the underlying storage system queries pressure.
-
May also adopt a more simple and crude way, if a query returned an empty data (whether data does not exist, or system failure), we still see the empty cache results, but its expiration time will be very short, most no longer than five minutes.
2) cache avalanche
-
After a cache miss, by locking to control the number of threads or queue database read write cache. For example, a key for allowing only one thread to query the data and write cache, other threads wait.
-
By caching mechanism can reload advance to update the cache, and then before the big concurrent access manually trigger an impending load the cache
-
Different key, set different expiration time, so that the cache miss time point as uniform as possible. For example, we can add a random value based on the original expiration time, such as 1-5 minutes random, so the expiration time for each repetition cached rate will be reduced, it is difficult to initiate collective failure event
-
Do secondary cache, or double caching policy. A1 is the original cache, A2 copy is cached, when A1 fails, access to A2, A1 cache expiration time is set short-term, A2 to long-term.
3) buffers breakdown
4) cache warming
-
Direct write cache to refresh the page, next time on-line manual;
-
The amount of data can be loaded automatically when the project started;
-
Timed refresh the cache;
5) cache update
-
Timed to clean up expired cache;
-
When a user requests over, and determines that the request is being cached has expired, the underlying system expired then go get the new data and update the cache.
6) cache downgrade
-
General: For example, some services occasionally because the network jitter, or the service is on the line and a timeout, it can automatically downgrade;
-
Warning: Some services in the success rate fluctuates over time (e.g., between 95 to 100%) can be automatically or manually downgrade degraded, and transmitting alarm;
-
Error: Available for example less than 90%, or a database connection pool off the hook, or visits to a sudden surge in the system can withstand the maximum threshold, this time may be automatically or manually downgrade downgrade some cases;
-
Fatal error: special reasons such as data errors, this time in need of emergency manual downgrade.
Four, redis as a distributed lock program (best performance)
Distributed Lock is a way to synchronize access to shared resources between the distributed control system.
Realization of ideas:
- Use
SETNX
the command to acquire the lock, if there is Values, said the success achieved lock success; - Set expire, after a timeout to ensure that automatically releases the lock (using lua script setnx expire and becomes an atomic operation);
- Release the lock, use the
DEL
command to lock data deleted.