Sub-library sub-table consistency and load balancing hash algorithm

First, look at what is consistent hashing, recommended here a blog:
http://blog.csdn.net/cywosp/article/details/23397179/

In a distributed application, the expansion and contraction is a semi-automated process, during which the application is basically available, so unexpected massive upheaval can not take place, in order to minimize the potential impact, this time need to use consistency hash algorithm for load balancing and sub-library sub-table, hash routing algorithm is extremely important in a distributed scenario roles.

Consistent hashing algorithm was proposed as early as 1997 in the paper Consistent hashing and random trees, the current in the cache system more widely;

1 basic scenario

For example, you have N cache server (hereinafter referred to as cache), then how a target object mapped onto N cache it, you will probably calculated hash value object using a general method similar to the following, and then uniformly mapped to the N a cache;

hash(object)%N

Everything works fine, then consider the following two cases;

  • 1 m down a cache server off (in practice must be considered in this case), so that all mapped to the cache m object will fail, how do need to cache m removed from the cache, the cache this time is N -1 station, a mapping formula becomes hash (object)% (N-1);

  • 2 increase due to access, cache needs to be added, this time is N + 1 cache table, a mapping formula becomes hash (object)% (N + 1);

What 1 and 2 mean? This means that almost all of the sudden the cache are ineffective. For servers, this is a disaster, the flood of access will directly toward the back-end server;

Again consider the third question, due to hardware capacity is growing, you may want to add the node back to do more live, apparently the above hash algorithm can not do.

Is there any way to change this situation do, which is consistent hashing ...

2 hash algorithm and monotonicity

   A measure of the monotonicity Hash Algorithm (Monotonicity), defined as follows:

  Monotonicity means that if something has been assigned to the appropriate buffer, there are buffered by the new hash is added to the system. Hash result should be able to ensure that the original content can be assigned to the new buffer mapped to, and will not be mapped to other buffer old buffer set.

Readily seen, the above simple algorithm hash hash (object)% N monotonicity requirement is difficult to meet.

Principle 3 consistent hashing algorithm

consistent hashing algorithm is a hash, simply put, when you remove / add a cache, it can be as small as possible has been key to change the mapping relationships exist, as far as possible to meet the monotonicity requirements.

Here's to follow five simple steps talk about the basic principles of consistent hashing algorithms.

3.1 hash annular space

Is generally considered to hash algorithm key value mapped to a value of 32, that is, the value from 0 to 2 ^ 32-1 spatial power; we can imagine that a first space (0) tail (2 ^ 32-1) in contact with the ring, as shown in Figure 1 below.

Consistent hashing algorithm was proposed as early as 1997 in the paper Consistent hashing and random trees, the current in the cache system more widely;

1 basic scenario

For example, you have N cache server (hereinafter referred to as cache), then how a target object mapped onto N cache it, you will probably calculated hash value object using a general method similar to the following, and then uniformly mapped to the N a cache;

hash(object)%N

Everything works fine, then consider the following two cases;

  • 1 m down a cache server off (in practice must be considered in this case), so that all mapped to the cache m object will fail, how do need to cache m removed from the cache, the cache this time is N -1 station, a mapping formula becomes hash (object)% (N-1);

  • 2 increase due to access, cache needs to be added, this time is N + 1 cache table, a mapping formula becomes hash (object)% (N + 1);

What 1 and 2 mean? This means that almost all of the sudden the cache are ineffective. For servers, this is a disaster, the flood of access will directly toward the back-end server;

Again consider the third question, due to hardware capacity is growing, you may want to add the node back to do more live, apparently the above hash algorithm can not do.

Is there any way to change this situation do, which is consistent hashing ...

2 hash algorithm and monotonicity

   A measure of the monotonicity Hash Algorithm (Monotonicity), defined as follows:

  Monotonicity means that if something has been assigned to the appropriate buffer, there are buffered by the new hash is added to the system. Hash result should be able to ensure that the original content can be assigned to the new buffer mapped to, and will not be mapped to other buffer old buffer set.

Readily seen, the above simple algorithm hash hash (object)% N monotonicity requirement is difficult to meet.

Principle 3 consistent hashing algorithm

consistent hashing algorithm is a hash, simply put, when you remove / add a cache, it can be as small as possible has been key to change the mapping relationships exist, as far as possible to meet the monotonicity requirements.

Here's to follow five simple steps talk about the basic principles of consistent hashing algorithms.

3.1 hash annular space

Is generally considered to hash algorithm key value mapped to a value of 32, that is, the value from 0 to 2 ^ 32-1 spatial power; we can imagine that a first space (0) tail (2 ^ 32-1) in contact with the ring, as shown in Figure 1 below.


An annular space hash FIG.

3.2 object maps to hash space

Next, consider the four objects object1 ~ object4, calculated by the hash function hash key value distribution on the ring as shown in FIG.

hash(object1) = key1;

… …

hash(object4) = key4;

FIG key value distribution of objects 24

3.3 cache maps to hash space

Consistent hashing The basic idea is to objects and cache are mapped to the same hash value space, and using the same hash algorithm.

It assumed that the current has A, B, and C 3 sets Cache, then the mapping results shown in Figure 3, they are in the hash space, arranged in a corresponding hash value.

hash(cache A) = key A;

… …

hash(cache C) = key C;

3 and FIG cache key value distribution of the object

Here, by the way hash calculation of the cache, the general method can be used cache machine's IP address or machine name as the hash input.

3.4 object maps to cache

Now cache and objects have been mapped through the same hash algorithm to hash value space, the next thing to consider is how to map objects to cache the above.

In this annular space, if the clockwise direction starting from the key value of the object, a cache until met, then the object is stored in the cache, because cache object and the hash value is fixed, so the cache bound It is unique and determined. This is not to find a method of mapping objects and cache it? !

The above example is still continued (see FIG. 3), then according to the above method, the object object1 is stored on cache A; object2 and object3 to the corresponding cache C; object4 corresponds to the cache B;

3.5 examine changes in the cache

Recall, then brought by the hash method remainder of the biggest problem is not monotone, subject to change when the cache, cache will fail, and cause a huge impact on the back-end server, now have to analyze the analysis consistent hashing algorithm.

3.5.1 remove cache

Consider assumed cache B hang, according to mapping method mentioned above, this time will be affected only those cache B counterclockwise direction until the object is traversing between (cache C) Cache next, that is, it has been mapped to cache those objects on the B.

Thus there need only change the object object4, re-mapped to the cache C to; see Fig.

FIG 4 Cache B after being mapped cache removed

3.5.2 Adding cache

Consider adding a new cache D case, assuming that hash annular space, it is mapped cache D between the object and object2 object3. At this time will be affected only those D cache in a cache until the next counter-clockwise traversal of the object between (cache B) (which are also mapped to a portion of the original object cache C), these objects will be remapped to the D cache It can be.

Thus there need only change the object object2, re-mapped to the cache D; see Fig.

FIG 5 is added after the mapping relationship cache D

4 virtual node

Another indicator consideration Hash algorithm is balance (Balance), defined as follows:

The balance

  Balance refers to the hash result can be distributed to all of the possible buffer, so that all of the buffer space may have been utilized.

hash algorithm is not guarantee balance, if fewer cache if the object is not to be uniformly mapped onto cache, such as in the example above, only a case where the deployment of cache and cache C of A, four objects, cache a stores only object1, and cache C is stored in object2, object3 and object4; uneven distribution of.

为了解决这种情况, consistent hashing 引入了“虚拟节点”的概念,它可以如下定义:

“虚拟节点”( virtual node )是实际节点在 hash 空间的复制品( replica ),一实际个节点对应了若干个“虚拟节点”,这个对应个数也成为“复制个数”,“虚拟节点”在 hash 空间中以 hash 值排列。

仍以仅部署 cache A 和 cache C 的情况为例,在图 4 中我们已经看到, cache 分布并不均匀。现在我们引入虚拟节点,并设置“复制个数”为 2 ,这就意味着一共会存在 4 个“虚拟节点”, cache A1, cache A2 代表了cache A ; cache C1, cache C2 代表了 cache C ;假设一种比较理想的情况,参见图 6 。

图 6 引入“虚拟节点”后的映射关系

此时,对象到“虚拟节点”的映射关系为:

objec1->cache A2 ; objec2->cache A1 ; objec3->cache C1 ; objec4->cache C2 ;

因此对象 object1 和 object2 都被映射到了 cache A 上,而 object3 和 object4 映射到了 cache C 上;平衡性有了很大提高。

引入“虚拟节点”后,映射关系就从 { 对象 -> 节点 } 转换到了 { 对象 -> 虚拟节点 } 。查询物体所在 cache 时的映射关系如图 7 所示。

图 7 查询对象所在 cache

“虚拟节点”的 hash 计算可以采用对应节点的 IP 地址加数字后缀的方式。例如假设 cache A 的 IP 地址为202.168.14.241 。

引入“虚拟节点”前,计算 cache A 的 hash 值:

Hash(“202.168.14.241”);

引入“虚拟节点”后,计算“虚拟节”点 cache A1 和 cache A2 的 hash 值:

Hash(“202.168.14.241#1”); // cache A1

Hash(“202.168.14.241#2”); // cache A2

使用一致性hash的应用

Couchbase

Apache Cassandra

Akka

参考:

https://en.wikipedia.org/wiki/Consistent_hashing

https://community.oracle.com/blogs/tomwhite/2007/11/27/consistent-hashing

https://xing393939.github.io/tech/2016/02/26/distributed_solution.html

https://nullcc.github.io/2017/11/23/Web%E5%90%8E%E7%AB%AF%E7%B3%BB%E7%BB%9F%E6%9E%B6%E6%9E%84%E6%BC%AB%E8%B0%88(2)%E2%80%94%E2%80%94%E4%B8%80%E8%87%B4%E6%80%A7hash%E7%AE%97%E6%B3%95/

Guess you like

Origin www.cnblogs.com/hirampeng/p/11286294.html