Addressing Distributed Algorithms

Redis topics Address: https://www.cnblogs.com/hello-shf/category/1615909.html

SpringBoot read Source Series: https://www.cnblogs.com/hello-shf/category/1456313.html

Elasticsearch Series: https://www.cnblogs.com/hello-shf/category/1550315.html

Data structures Series: https://www.cnblogs.com/hello-shf/category/1519192.html

A distributed algorithm Addressing

Distributed algorithm is addressing a very important element, these algorithms do not understand, it is not a thorough understanding of the principles of various distributed middleware. Briefly about the tall addressing these in the end is Gesha mean, for example, in the elasticsearch, using Multiply, stored on each slice is not the same data, and a set of relationships. For example, we went to search a data _id, elasticsearch know how this data is present on _id which fragmentation? Another example redis cluster in a data query go through the key, redis know how this key cluster on which node? So this is the problem to be solved by addressing algorithm.
Addressing describes three simple distributed algorithm

1  hash algorithm
 2  consistency hash algorithm
 3 hash slot

hash algorithm more suitable for fixed partition or cluster architecture of distributed nodes, such as the primary shard elasticsearch is fixed and can not be changed. Therefore, the use of hash algorithm is a good choice, of course, ES indeed do so. Interested can look at my other blog post on the ES. https://www.cnblogs.com/hello-shf/p/11543480.html
Shard = the hash (routing) number_of_primary_shards% (default routing the _id)

 

 

Consistency hash algorithm is more suitable for distributed architectures that require dynamic expansion as well as some dynamic load balancing, distributed RPC middleware and middleware.
redis cluster application is consistent hash addressing hash slot to achieve.

Two, hash algorithm

For example, in the elasticsearch, if there are three primary shard.

shard = hash(_id) % 3;

An insert data, by the above equation we can easily confirm the presence of the data on which the fragment. According to _id query it is the same above formula is easy to find the data resides on which slices.
Above algorithm seems everything is so beautiful, natural goose. . .
If the primary shard also means capacity is needed is the need to add a primary shard how to do? (Only if, elasticsearch primary shard is immutable) the hash equation becomes as follows

shard = hash(_id) % 4;

It is not addressing errors occur?
This means that when necessary to increase the partition of the original data on each partition according shard = hash (_id) hash% 4 modulo result corresponds to the data mover partition up. How provided that when there are huge amounts of data to do? To be honest it is difficult to do. When they find a shard downtime, disaster recovery requires fast processing time, is the same problem.

Third, consistency hash

It can be said consistency hash is to solve the above problem of dynamic expansion and volume reduction born. In a distributed architecture that do not support dynamic capacity expansion and disaster recovery, distributed = tasteless, nothing wrong with it.
In fact, it sounds consistent hash cattle X, in fact nothing advanced, just a more advanced hash modulo operation only.

 

 

As indicated above, the ring is a general hash hash of the modulo operation node = hash (key)% n; n takes 2 ^ 32, i.e. a hash ring formed from 0 to 32. Addressing clockwise to find the nearest node.

node = hash(key) % n

 

 

 

There are four nodes in IP i.e. modulo node = hash (IP)% n fell position as shown above, when a request, in accordance with node = hash (key)% n determined that the request falls below the position shown, clockwise lookup, to find the requesting node 2 hit. This is such a simple process of addressing.

Expansion:
On the basis of the original four nodes, a node 5 increases, still hash (IP)% n determines the location of the node in accordance with the IP hash ring according modulo i.e. node =. As shown below.

 

 

Seen the original request to the node 5 hits, so we still need to migrate data, but only partially, you only need to migrate data between nodes 1-2. Relative hash modulo, consistency hash algorithm reduces the migration of data expansion caused much of a problem. Similarly disaster recovery.

But consistency hash algorithm problems are evident, because the node is difficult to uniformly fall on the hash ring. However, effectively reducing the problems associated with dynamic data migration node additions and deletions.

四、hash slot

That hash hash slot groove. Such grooves formal hash algorithm employed redis cluster addressing. To redis cluster, for example.
There is a fixed redis cluster in the 16384 th hash slot.

hash slot = CRC16(key)%16384;

 

 #CRC16算法可以简单的理解为一种hash算法。详见度娘。
这样我们就能找到key对应的hash slot。其实按照我的理解,hash slot就是在寻址和节点间加了一层映射关系。当节点动态变化,只需要改变hash slot ==> 节点的映射,然后只需要迁移指定slot到新添加的节点即可。既减少了hash寻址带来的数据全量迁移问题,相对一致性hash也使得负载均衡效果更加明显。

 

 


如上图,如果我们有三个节点。redis cluster初始化时会自动均分给每个节点16384个slot。
当增加一个节点4,只需要将原来node1~node3节点部分slot上的数据迁移到节点4即可。在redis cluster中数据迁移并不会阻塞主进程。对性能影响是十分有限的。

 

  如有错误的地方还请留言指正。
  原创不易,转载请注明原文地址:https://www.cnblogs.com/hello-shf/p/12079986.html

 

  推荐几篇博文供大家参考:
  https://www.cnblogs.com/myseries/p/10959050.html
  https://www.cnblogs.com/abc-begin/p/8203613.html

Guess you like

Origin www.cnblogs.com/hello-shf/p/12079986.html