Theory and practical application Redis project --- (b) --- Redis cluster principle

A, Redis official recommended cluster solution: Redis Cluster

      Suitable for redis3.0 later,

       redis cluster is distributed solution redis official, and after the launch of version 3.0, an effective solution to the needs of distributed redis, hung up when a node redis can quickly switch to another node.
  Architectural details:
  (1) All of the nodes are interconnected and redis (PING-PONG mechanism), the internal transmission speed using the binary protocol and bandwidth optimization.
  fail (2) is to take effect when the node cluster by more than half of the nodes detect the failure.
  (3) redis client node is connected, without an intermediate layer proxy. The client need not be connected to all cluster nodes in the cluster connected to an available node
  (4) redis-cluster mapping all of the physical node to [0-16383] the slot, cluster maintains node <-> slot <-> value
 
      redis-cluster election: Fault Tolerance
  (1) led the electoral process is the cluster master all involved, if more than half of the master node to communicate with the master over (cluster-node-timeout), that the current master node hang.
  (2): When the entire cluster is unavailable (cluster_state: fail), when the cluster is unavailable, all operations on the cluster to make unavailable, received ((error) CLUSTERDOWN The cluster is down) error
      a: If the cluster master hang any, and the current master slave does not enter the cluster fail state, as may be appreciated that slot mapped into the group [0-16383] enters fail state when not completed.
      b: if more than half of the group into the master hang up, regardless of whether slave cluster fail to enter the state.
 
 
      Related concepts:

  Cluster data pieces Redis

· Redis Cluster consistent hashing is not used, but the use of different forms of slices, wherein each key is called  the hash slot . .

   16384 Redis cluster hash slots, in order to calculate a hash key slot given, we only need to use the key modulus CRC16 16384.

   Redis each node in the cluster is responsible for a subset of hash slots, for example, a cluster has three nodes that contains, in which:

  •  A node comprising from 0 to 5500 in the hash slot.
  •  Node-B contains the hash of the slot from 5,501 to 11,000.
  •  Node C contains the hash of the groove from 11,001 to 16,383.

  This allows for easy adding and removing nodes in the cluster. For example, if I want to add a new node D, I need some hash trough moves from node A, B, C to D. Similarly, if I want to remove a node from the cluster A, I simply move A service of hash groove. B and C. When the node A is empty, I can completely remove it from the cluster.

Because the hash slot to move from one node to another node does not need to stop operation, add and delete nodes, or the percentage change some hash slots held by node, it does not require any downtime.

All the key as far as a single command (or the entire transaction or Lua script execution) all belong to the same hash tank, Redis Cluster to support multiple key operation. Called user by using the hash marks become part of the same hash slot concept forced plurality of keys .

Hash Redis label recording cluster in the specification, but the point is, if there is substring} {keys between brackets, only the content of the interior of the hashed string, for example, this{foo}keyand another{foo}key to ensure at the same hash slot, and may be used in a command having a plurality of keys as a parameter.

  Redis Cluster master-slave model

   In order to set the master node fails or can not maintain the sub-node communication with the majority of the available, Redis Cluster using a master-slave model, where each hash from a slot (host itself) to N copies (N) -1 additional from node). When creating a cluster, adding a master node from each node, so that the final cluster of a master node A, B, C, and as the slave node A1, B1, C1 composition. If the node B fails, the system can continue to run. Node B1 Copy B, B fails, the cluster nodes B1 upgraded to the new primary node, and will continue to operate normally.

         Note that if the node B and B1 fail simultaneously, Redis Cluster will not continue to run.

  Redis cluster to ensure consistency

  Redis Cluster can not guarantee strong consistency . Under certain conditions, Redis Cluster may lose its system written confirmation to the client.

  Redis Cluster may be a reason for missing the first written is that it uses asynchronous replication. This means that the following happens during writing:

  • Client write master B.
  • master B is determined to respond to the client.
  • master B writes it to propagate from the device B1, B2 and B3.

  (1) B did not wait for an acknowledgment from B1, B2, B3 of the replies to the client before, because this is a Redis excessive delay, so if the client to write something, B will confirm written, but written before the crash can be transmitted to its slave, wherein a Slave (not received write) promoted master, write lost forever.

  This database is configured with the most per flush data to disk what is happening is very similar . Similarly, you can refresh the data on the disk by forcing the client database before you return to improve consistency, but can lead to low performance. In Redis Cluster, the equivalent synchronous replication.

       Solution that is a trade-off between performance and consistency.

      Redis Cluster support when absolutely necessary synchronous write through WAIT achieve command, which makes the possibility of losing write greatly reduced, but even using synchronous replication, Redis Cluster will not achieve strong consistency: total in more complex cases failure scenarios can be realized, can not be written to slave was chosen master.

      (2) where there is another Notably, the Redis write the cluster will be lost, this situation occurs in the network partition, and a few examples where the client (including at least a primary server) isolated. Such as,

In Example 6 clusters of nodes, including A, B, C, A1, B1, C1,3 master and three slaves. There is also a customer, we call Z1.

After partitioning occurs, there may be A, C, A1, B1, C1 at a side of the partition, there is on the other side B and Z1.

Z1 can still write B, it will accept written. If the partition is restored within a very short time, the cluster will continue to operate normally. However, if the partition for a time sufficient that B1 promoted in most primary side of the partition, Z1 is sent to the write B will be lost.

Note that, Zl can be sent to the write amount of B present maximum window : If the most aspects partition has enough time to selected slave master, the master node to each end of a few will stop accepting write.

This time is a very important configuration instructions Redis Cluster, known as the node timeout .

After the node timeout, master is considered a failure, can be replaced by a copy of them. Similarly, after the node timeout is over and the master node can not perceive most of the other main node, it enters an error state and stops accepting written.

 
 
 
Second, the expansion:
    Before Redis3.0 cluster concepts:
   (1) Sentinel (Sentinel) mechanism
    Sentinel (Sentinel) is a high availability solutions the Redis: Sentinel Sentinel system consists of one or more instances of the master can monitor any number of servers, and all those under the master server from the server, and enters the primary server being monitored off-line mode and automatically off the assembly line under the main server upgrade from a server to a new primary server.
    Redis Sentinel mechanism running processes:
   1): Sentinel per second per Master, Slave, and other examples a Sentinel frequency to transmit it knows a PING command 
   2): if an instance (instance) the last valid reply time exceeds the PING command from the down- after-milliseconds value specified by the option, then the instance is marked subjective Sentinel offline. 
   3): If a Master is marked as subjective offline, all this is being monitored to be sure the Master Sentinel Master indeed entered a subjective offline state once per second. 
   4): Sentinel when a sufficient number of (not less than the value specified profiles) did enter the confirmation Master subjective offline state within a specified time, the Master is marked as offline objective 
   5): In general each Sentinel will, in all Master Slave transmits a frequency of once every 10 seconds to other known command INFO 
   6): when the Master Sentinel objective is marked offline, all Sentinel Slave sends an INFO commands to the Master offline the frequency will be from 10 seconds to once per second, 
   7): If there is not enough number of Master Sentinel consent has been off the assembly line, the objective offline Master status will be removed. 
   If the Master valid responses to the Sentinel return to the PING command, Master subjective offline status will be removed.
 
     (2) Redis master copy from

  Master-slave replication: The master node is responsible for writing data from the node is responsible for reading data, the master node periodically synchronize the data from node to ensure data consistency

  Note: The master-slave replication and Sentinel mechanisms need to be manually configured.

 

Three, Redis as a cache application problems and solutions:

     1) Cache penetration

      Cache penetration refers to a certain query data does not exist, because the cache is needed from the database query is not hit, can not find the data cache is not written, it will lead to a time that does not exist in the data request should go to the database query, resulting in cache penetration.
         Solution:
  1. All parameters may be stored in a hash query form, the control layer to be verified, do not meet discarded. And most common is the use of Bloom filter, all possible data hash to a sufficiently large bitmap in a certain absence of data will be blocked out of the bitmap, so as to avoid the underlying storage system queries pressure.
  2. May also adopt a more simple and crude way, if a query returned an empty data (whether data does not exist, or system failure), we still see the empty cache results, but its expiration time will be very short, most no longer than five minutes.

     2) cache avalanche

    If the cache is concentrated over a period of time has expired, a lot of cache penetration occurs, all queries fall on the database, resulting in a cache avalanche.
         Solution:
  1. After a cache miss, by locking to control the number of threads or queue database read write cache. For example, a key for allowing only one thread to query the data and write cache, other threads wait.
  2. By caching mechanism can reload advance to update the cache, and then before the big concurrent access manually trigger an impending load the cache
  3. Different key, set different expiration time, so that the cache miss time point as uniform as possible. For example, we can add a random value based on the original expiration time, such as 1-5 minutes random, so the expiration time for each repetition cached rate will be reduced, it is difficult to initiate collective failure event
  4. Do secondary cache, or double caching policy. A1 is the original cache, A2 copy is cached, when A1 fails, access to A2, A1 cache expiration time is set short-term, A2 to long-term.

     3) buffers breakdown

   Cache is "breakdown" of the problem, and this difference is that here avalanche cache for a key cache, the former is a lot of key.

     4) cache warming

  After the warm-up is on the system cache line, ahead of the relevant data directly loaded into the cache buffer system. Avoid when requested by the user, query the database first, and then the data cache problem! Users to directly query cache data previously been preheated!
        Cache warming solutions:
  1. Direct write cache to refresh the page, next time on-line manual;
  2. The amount of data can be loaded automatically when the project started;
  3. Timed refresh the cache;

     5) cache update

  We know that the key to set the expiration time via expire, then how to deal with outdated data it? In addition to the built-in cache server cache invalidation strategy (Redis default there are 6 strategies to choose from), we can also according to the specific customize business needs out of the cache, there are two common strategies:
  1. Timed to clean up expired cache;
  2. When a user requests over, and determines that the request is being cached has expired, the underlying system expired then go get the new data and update the cache.
  Both have advantages and disadvantages, the first disadvantage is to maintain a large number of key buffer is too much trouble, the second drawback is that each time a user request to be judged over a cache miss, the logic is relatively complicated! In what specific programs, it can be weighed according to their own scenarios.

     6) cache downgrade

  When the traffic surge, the service problems (such as slow response times or no response) or non-core services affects the performance of core processes still need to ensure that the service is still available, even detrimental to the service. The system may automatically downgrade critical data, the switch may be arranged to achieve artificial degraded. The ultimate goal is to ensure that the core downgrade service is available, even for lossy. And some services are not degraded (such as the shopping cart settlement). To downgrade the system prior to carding see if the system is not to lose a pawn to save handsome; in order to tease out what must fight to the death to protect, which can downgrade; for example, can refer to the log level plan:
  1. General: For example, some services occasionally because the network jitter, or the service is on the line and a timeout, it can automatically downgrade;
  2. Warning: Some services in the success rate fluctuates over time (e.g., between 95 to 100%) can be automatically or manually downgrade degraded, and transmitting alarm;
  3. Error: Available for example less than 90%, or a database connection pool off the hook, or visits to a sudden surge in the system can withstand the maximum threshold, this time may be automatically or manually downgrade downgrade some cases;
  4. Fatal error: special reasons such as data errors, this time in need of emergency manual downgrade.

Four, redis as a distributed lock program (best performance)

        Distributed Lock is a way to synchronize access to shared resources between the distributed control system.

         Realization of ideas:

  1. Use SETNXthe command to acquire the lock, if there is Values, said the success achieved lock success;
  2. Set expire, after a timeout to ensure that automatically releases the lock (using lua script setnx expire and becomes an atomic operation);
  3. Release the lock, use the DELcommand to lock data deleted.
        Or use Redis official recommended redission

 

Guess you like

Origin www.cnblogs.com/huyangshu-fs/p/11256007.html