An internal communication mechanism between the nodes
1, basic communication theory
Room (1) redis cluster nodes take gossip protocol to communicate
With centralized different than the cluster metadata (node information, fault, etc.) stored centrally on a node, but continue to communicate with each other, to keep the whole cluster data from all nodes is complete
Maintain the cluster metadata and must be centralized, called gossip
Centralized: the benefits of that update, and read metadata, timeliness is very good, there was once metadata changes immediately update to the centralized storage, when the other nodes can be read immediately perceived; bad that with the new pressure all metadata all in one place, it may lead to metadata stored pressure
gossip: the benefits that update metadata scattered, not concentrated in one place, update requests after another, hit update all nodes up, a certain delay, reduce the pressure; shortcomings, there is a delay metadata updates Some operations will likely lead to a cluster of some hysteresis
We just do reshard, to do another operation, you will find say, configuration error, agreed
(2) port 10000
Each node has a dedicated port for communication between nodes, is the port number +10000 provide services themselves, such as 7001, then for inter-node communication port is 17001
After sending the ping pong return message to every node several other nodes will intervals, while other points received ping
Information (3) exchange
Add and remove fault information, node, hash slot information, etc.
2, gossip protocol
gossip protocol comprising a plurality of messages, including ping, pong, meet, fail, etc.
meet: a sending node to meet the new node is added, so that the new node joins the cluster, the new node will then begin to communicate with other nodes
redis-trib.rb add-node
In fact, the interior is to send a gossip meet message to the new node, notice that node to join the cluster
ping: each node to other nodes frequently send ping, which contains its own state as well as cluster metadata own maintenance, mutual exchange of metadata by ping
Each node will send a ping to other frequently per cluster, ping, the frequent exchange of data between each other, each metadata update
pong: ping return and meet, it contains its own status and other information can also be used to broadcast information and updates
fail: a node to another node fail after the determination, it sends the other nodes fail to notify the other nodes of the specified node down
3, ping-depth news
ping very frequent, but also to carry some of the metadata, it may increase the network burden
Each node performs 10 times per second ping, each will choose five other nodes in the longest no communication
Of course, if we find a node communication delays reached cluster_node_timeout / 2, then immediately send ping, avoid data exchange delay too long, far too long behind
For example, between two nodes 10 minutes there is no exchange of data, then the entire cluster in an inconsistent metadata serious situation, there will be problems
So cluster_node_timeout can be adjusted if the adjustment is relatively large, it will reduce the frequency of the transmitted
Every ping, is to bring a message of self-node, there is information 1/10 to bring other nodes send out data exchange
Information comprising at least three other nodes, information comprising up to a total -2 node other nodes
-------------------------------------------------------------------------------------------------------
Second, cluster-oriented realization of the principle of internal jedis
java client client development, jedis, redis's, redis cluster, jedis cluster api
Some basic principles jedis cluster api clusters interact with the redis cluster
1, based on client redirection
redis-cli -c, automatic redirection
(1) Request redirection
The client may select any command to send a redis example, each instance of a command is received redis calculated every corresponding key hash slot
If it is locally processed locally, otherwise moved to the client, so the client redirection
cluster keyslot mykey, you can view a key corresponding hash slot What is
With redis-cli when -c parameters can be added, it supports automatic redirection request, after receiving redis-cli Moved, will be automatically redirected to the corresponding node Run
(2) calculated hash slot
Calculated hash slot algorithm CRC16 is calculated according to the key value, then modulo 16384, to get the corresponding hash slot
Hash tag can be specified manually key corresponding slot, under the same key a hash tag, will be in a hash slot, such as set mykey1: {100} and set mykey2: {100}
(3) hash slot Find
Exchange data between nodes through gossip protocol, you know on which node in each hash slot
2、smart jedis
(1) What is smart jedis
Client-based redirection, it consumes network IO, because in most cases, are likely to occur once the request is redirected to find the correct node
So most of the client, such as java redis client is jedis, all the smart
Local maintain a hashslot -> node mapping table, cache, in most cases, go directly to the local cache can be found hashslot -> node, the node does not need to be moved by redirected
(2) JedisCluster works
When JedisCluster initialization, will randomly select a Node, initialization hashslot -> node mapping table, creating a connection pool for each node JedisPool
JedisCluster perform operations based on each time, are calculated JedisCluster first key in the local hashslot, and then find the corresponding node in the local mapping table
If that node or just hold the hashslot, then ok; if carried out such an operation reshard, may hashslot no longer on that node, it will return moved
If JedisCluter API corresponding node returns Moved found, then the node using the metadata, updates the local hashslot -> node mapping table cache
Repeat the above steps until you find the corresponding node, if the retry more than 5 times, then being given, JedisClusterMaxRedirectionException
jedis older versions may occur when a cluster node failure is not yet complete automatic switching recovery, frequent updates hash slot, frequently check the active node ping, resulting in a large number of network IO overhead
jedis latest version, for these excessive hash slot update and ping, are optimized to avoid similar problems
(3) hashslot migration and ask redirection
If the hash slot are migrating, then will return to ask redirected to jedis
After receiving the jedis ask redirection, will be relocated to the target node to perform, but because the hash slot ask occur during migration, so JedisCluster API is not updated hashslot receive ask local cache
Can already determine said, hashslot have migrated over, moved will update the local hashslot-> node mapping table cache
-------------------------------------------------------------------------------------------------------
Third, high availability and standby switching principle
The principle redis cluster of high availability, with almost similar Sentinel
1, the determination node downtime
If a node considers another node goes down, then that is pfail, subjective downtime
If multiple nodes are considered another node goes down, then that is fail, objective downtime, with almost the same principle Sentinel, sdown, odown
In the cluster-node-timeout, a node has not returned pong, then it is considered pfail
If a node considers a node pfail, then the message will gossip ping, ping to other nodes, if more than half of the nodes are considered pfail, then it will become fail
2, was filtered from the node
Downtime of the master node, from all the slave node, the master node is switched to select a
Check each time master and slave node Node disconnected, if more than cluster-node-timeout * cluster-slave-validity-factor, it becomes the master will not be eligible
This also is the same with the Sentinel, from node timeout filtration steps
3, from node election
Sentinel: from all nodes sorting, slave priority, offset, run id
Each slave node, all according to offset their own copy of the master data to set an election time, offset the greater (more copying data) from the node, the more time the electoral front, the election priority
All master node start slave poll, to slave to conduct elections to vote, if most of the master node (N / 2 + 1) have voted for one of the nodes, then through election, that can be switched from the node to the master
Switching from the execution standby master node, the master node from the switching node
4, compared with the Sentinel
The entire process compared with the Sentinel, is very similar, so that, redis cluster of powerful, integrated directly into the replication function and sentinal
No way to explain to you in-depth details of the design redis underlying core principles and design details, except that opening a separate course, the underlying principle redis depth analysis, redis source
For our lesson this framework, the main concern is the architecture, not the level of detail for the architecture, the basic idea of the core principles is to sort out clear