Interview Series 18 rediscluster principle

An internal communication mechanism between the nodes

1, basic communication theory

Room (1) redis cluster nodes take gossip protocol to communicate

With centralized different than the cluster metadata (node ​​information, fault, etc.) stored centrally on a node, but continue to communicate with each other, to keep the whole cluster data from all nodes is complete

Maintain the cluster metadata and must be centralized, called gossip

Centralized: the benefits of that update, and read metadata, timeliness is very good, there was once metadata changes immediately update to the centralized storage, when the other nodes can be read immediately perceived; bad that with the new pressure all metadata all in one place, it may lead to metadata stored pressure

gossip: the benefits that update metadata scattered, not concentrated in one place, update requests after another, hit update all nodes up, a certain delay, reduce the pressure; shortcomings, there is a delay metadata updates Some operations will likely lead to a cluster of some hysteresis

We just do reshard, to do another operation, you will find say, configuration error, agreed

(2) port 10000

Each node has a dedicated port for communication between nodes, is the port number +10000 provide services themselves, such as 7001, then for inter-node communication port is 17001

After sending the ping pong return message to every node several other nodes will intervals, while other points received ping

Information (3) exchange

Add and remove fault information, node, hash slot information, etc.

2, gossip protocol

gossip protocol comprising a plurality of messages, including ping, pong, meet, fail, etc.

meet: a sending node to meet the new node is added, so that the new node joins the cluster, the new node will then begin to communicate with other nodes

redis-trib.rb add-node

In fact, the interior is to send a gossip meet message to the new node, notice that node to join the cluster

ping: each node to other nodes frequently send ping, which contains its own state as well as cluster metadata own maintenance, mutual exchange of metadata by ping

Each node will send a ping to other frequently per cluster, ping, the frequent exchange of data between each other, each metadata update

pong: ping return and meet, it contains its own status and other information can also be used to broadcast information and updates

fail: a node to another node fail after the determination, it sends the other nodes fail to notify the other nodes of the specified node down

3, ping-depth news

ping very frequent, but also to carry some of the metadata, it may increase the network burden

Each node performs 10 times per second ping, each will choose five other nodes in the longest no communication

Of course, if we find a node communication delays reached cluster_node_timeout / 2, then immediately send ping, avoid data exchange delay too long, far too long behind

For example, between two nodes 10 minutes there is no exchange of data, then the entire cluster in an inconsistent metadata serious situation, there will be problems

So cluster_node_timeout can be adjusted if the adjustment is relatively large, it will reduce the frequency of the transmitted

Every ping, is to bring a message of self-node, there is information 1/10 to bring other nodes send out data exchange

Information comprising at least three other nodes, information comprising up to a total -2 node other nodes

-------------------------------------------------------------------------------------------------------

Second, cluster-oriented realization of the principle of internal jedis

java client client development, jedis, redis's, redis cluster, jedis cluster api

Some basic principles jedis cluster api clusters interact with the redis cluster

1, based on client redirection

redis-cli -c, automatic redirection

(1) Request redirection

The client may select any command to send a redis example, each instance of a command is received redis calculated every corresponding key hash slot

If it is locally processed locally, otherwise moved to the client, so the client redirection

cluster keyslot mykey, you can view a key corresponding hash slot What is

With redis-cli when -c parameters can be added, it supports automatic redirection request, after receiving redis-cli Moved, will be automatically redirected to the corresponding node Run

(2) calculated hash slot

Calculated hash slot algorithm CRC16 is calculated according to the key value, then modulo 16384, to get the corresponding hash slot

Hash tag can be specified manually key corresponding slot, under the same key a hash tag, will be in a hash slot, such as set mykey1: {100} and set mykey2: {100}

(3) hash slot Find

Exchange data between nodes through gossip protocol, you know on which node in each hash slot

2、smart jedis

(1) What is smart jedis

Client-based redirection, it consumes network IO, because in most cases, are likely to occur once the request is redirected to find the correct node

So most of the client, such as java redis client is jedis, all the smart

Local maintain a hashslot -> node mapping table, cache, in most cases, go directly to the local cache can be found hashslot -> node, the node does not need to be moved by redirected

(2) JedisCluster works

When JedisCluster initialization, will randomly select a Node, initialization hashslot -> node mapping table, creating a connection pool for each node JedisPool

JedisCluster perform operations based on each time, are calculated JedisCluster first key in the local hashslot, and then find the corresponding node in the local mapping table

If that node or just hold the hashslot, then ok; if carried out such an operation reshard, may hashslot no longer on that node, it will return moved

If JedisCluter API corresponding node returns Moved found, then the node using the metadata, updates the local hashslot -> node mapping table cache

Repeat the above steps until you find the corresponding node, if the retry more than 5 times, then being given, JedisClusterMaxRedirectionException

jedis older versions may occur when a cluster node failure is not yet complete automatic switching recovery, frequent updates hash slot, frequently check the active node ping, resulting in a large number of network IO overhead

jedis latest version, for these excessive hash slot update and ping, are optimized to avoid similar problems

(3) hashslot migration and ask redirection

If the hash slot are migrating, then will return to ask redirected to jedis

After receiving the jedis ask redirection, will be relocated to the target node to perform, but because the hash slot ask occur during migration, so JedisCluster API is not updated hashslot receive ask local cache

Can already determine said, hashslot have migrated over, moved will update the local hashslot-> node mapping table cache

-------------------------------------------------------------------------------------------------------

Third, high availability and standby switching principle

The principle redis cluster of high availability, with almost similar Sentinel

1, the determination node downtime

If a node considers another node goes down, then that is pfail, subjective downtime

If multiple nodes are considered another node goes down, then that is fail, objective downtime, with almost the same principle Sentinel, sdown, odown

In the cluster-node-timeout, a node has not returned pong, then it is considered pfail

If a node considers a node pfail, then the message will gossip ping, ping to other nodes, if more than half of the nodes are considered pfail, then it will become fail

2, was filtered from the node

Downtime of the master node, from all the slave node, the master node is switched to select a

Check each time master and slave node Node disconnected, if more than cluster-node-timeout * cluster-slave-validity-factor, it becomes the master will not be eligible

This also is the same with the Sentinel, from node timeout filtration steps

3, from node election

Sentinel: from all nodes sorting, slave priority, offset, run id

Each slave node, all according to offset their own copy of the master data to set an election time, offset the greater (more copying data) from the node, the more time the electoral front, the election priority

All master node start slave poll, to slave to conduct elections to vote, if most of the master node (N / 2 + 1) have voted for one of the nodes, then through election, that can be switched from the node to the master

Switching from the execution standby master node, the master node from the switching node

4, compared with the Sentinel

The entire process compared with the Sentinel, is very similar, so that, redis cluster of powerful, integrated directly into the replication function and sentinal


No way to explain to you in-depth details of the design redis underlying core principles and design details, except that opening a separate course, the underlying principle redis depth analysis, redis source

For our lesson this framework, the main concern is the architecture, not the level of detail for the architecture, the basic idea of ​​the core principles is to sort out clear

Guess you like

Origin www.cnblogs.com/xiufengchen/p/11259127.html