[Vernacular resolve] in simple terms the principle of consistency Hash

[Vernacular resolve] in simple terms the principle of consistency Hash

0x00 Summary

Consistent hashing algorithm is a distributed system commonly used algorithms. But I believe many of my friends are knowing but not the why. This article will try to use and easy to understand way to introduce the principle of consistent hashing, and to help our in-depth concept through specific application scenarios.

0x01. The concept & principles

Hash, a hash generally do translation, hash, or hash transliteration, is the input of arbitrary length (also called pre-mapping pre-image) is converted into a fixed length output by the hash algorithm, the hash value is output.

Consistent hashing algorithm Karger in 1997 by the Massachusetts Institute of Technology and other people in solving the distributed Cache raised, mainly to solve the Internet's hot spot (Hot spot) problem. At present this idea has spread to other areas, and has been greatly developed in practice.

1. contrast to the classical method of hash

  • Classical hash method: always assume that memory number of positions are known and fixed in. Because the dependent node hash map / memory location, if the cluster needs to be changed, to recalculate the hash value of each key. (Number of servers) to change the size of the hash table is actually interferes with  all maps .

  • Consistent hashing: some virtual ring structure. No fixed number of positions, the ring has an infinite number of points , the server node can be placed at random positions on the ring. Hash table (the number of servers) can cause size changes only a  part of the request (relative to the ring assignment factor) affected by a particular change ring

2. The popular understanding of the key points consistent hashing:

From technical terms mouthful explained, consistent hashing key technical point: according to conventional hash algorithm corresponding to the hash key space having a power of 32 ^ 2 buckets, i.e. from 0 to (2 ^ 32) -1 digital space. We can be connected end to end these figures, imagine a closed ring.

With the colloquial be appreciated that the key point is that: when the deployment server, the server's sequence number space has been configured into a fixed very large numbers of 1 to 2 ^ 32. The server may assign 1 to 2 ^ 32 in any of a number. Such server clustering algorithm can fix most of the rules (because the number is an important parameter space algorithm), so that the face of expansion and other changes just part of the algorithm rules make adjustments . See the following specific examples will be described in detail.

3. How consistent hashing process the request

How do you decide which server node which requests will be handled?

In theory, each server node "owns" a range of hash ring, any requests to enter the interval by the same server node to handle.

We assume rings are ordered in increasing order so as to correspond with the location address clockwise traversal of the loop, then the request can be processed by each server node that first appeared in the clockwise traversal. In other words, the address is higher than the first server node is responsible for processing the request to address the request. If the request address is above the highest addressed node, it is processed by the server node to the smallest address, as for traversing the ring in a circular manner.

4. Exception Handling / Change Solutions

If a server node fails, the next server node interval broadens, any requests to enter the section will enter the new server node. How should this time to deal with these unusual requests?

Hash advantage of consistency here reflect: the need to re-allocate this is only an interval (corresponding to the server node failure), the rest of the ring and the request hash / node assignment remains unaffected.

0x02. Specific application scenario (an example illustrated by classic Water Margin)

As we all know, there are four Liangshanpo mountain hotel. Are: Dongshan Hotel / Western Hills Hotels / Nanshan Hotel / Hotel Kitayama

So these four hotel guests how to allocate it? Here you can use the Hash algorithm, we could see the benefits of consistent hashing.

1. classical algorithm:

Liangshan four hotels, in the order in which numbers are 1,2,3,4.

Hash functions: guest name Stroke / 4 to get a remainder , guests assigned to these four hotels in accordance with the remainder

If you reduce a hotel, a hash function becomes: guests by name strokes / 3 , and then assign guests to the hotel in accordance with the number of Xinyu. All guests have to reallocate hotel

If you add a hotel, a hash function becomes: guests by name strokes / 5 , and then assign guests to the hotel in accordance with the number of Xinyu. All guests have to reallocate hotel

Can see if there are changes in capacity, the hash function and allocation rules shall all be changed, thus causing damage to the overall system.

2. Consistency algorithm:

Previously put the number of server space (now - future) I thought well, as 100 barrels. Is visible in the coming years, the 100 is certainly adequate (l Liangshan no matter how expansion of production scale, even if enrollment of 10,000 chieftains, the mountain did not open 100 hotels might).

Hash function (this fixed):

Guests Name strokes / 100. This is fixed! Since the sequence number space 100 is fixed, so the hash function and the allocation rule are substantially fixed.

The hotel / guest allocation rules are as follows (this will make the appropriate changes to fine-tune the capacity):

  • 1 hotels in charge of hash (x) -> 1 ~ 20, namely the guest's name strokes / 100 located between 1 and 20.

  • 2 hotels in charge of hash (x) -> 21 ~ 40, namely the guest's name strokes / 100 is located between 21 and 40.

  • 3 hotel responsible hash (x) -> 41 ~ 60, i.e., guest name strokes / 100 located between 41 to 60.

  • 4 hotels in charge of hash (x) -> 61 ~ 100, namely the guest's name strokes / 100 is located between 61 and 100.

Guests stay rules are as follows (the fixed):

  • Guests, surname stroke / 100, to obtain the remainder. The remainder go to the corresponding hotel stay. For example, the remainder 3 live in the hotel 1, the remainder live in 22 hotels in 2 ......

  • If the wrong hotel closed, and went to live than the minimum that all "larger than the remainder of the hotel" in. And so on. For example, 1 hotel hung up, went to the hotel 2 Hotel 2 hung up to 3.

  • The hotel is also a problem if the maximum closed, circling back to the minimum hotel stay. That is, if hung up to the hotel Hotel 4 1.

Exception processing (expansion or down):

  • Reduce the hotel. If the hotel 3 hung up, then had to go to the hotel guests to the hotel 4 3, 4 of the original to the hotel guests or hotel 4. Hotel 4 so that only affected 1,2 hotel guests do not have to move.

  • Increase the hotel. 5. If the addition of a hotel is required for hotel / guest allocation rules do change. Let 4 hotels in charge of No. 61 to 80, 5 hotels in charge of 81 to 100. No part of this original 4:00 to migrate guests to No. 5.

key point:

Can be seen, the key is the serial number of the server space is already determined they will not modify a large number 100 . Of course, this is the Liangshan. 2 ^ 32 may be true for other cases. In this way hash function (algorithm because the number is an important parameter space) can remain the same, only the "assignment rule" appropriate fine-tuning needs to be done based on the actual system capacity. Thus less impact on the overall system.

Of course, the rules of the specific allocation algorithm hotel, can be integrated into the hash. That number is probably 21,41,61 hotel ....

0x03. Reference

https://blog.csdn.net/gerryke/article/details/53939212

https://blog.csdn.net/cb_lcl/article/details/81448570

https://www.iteblog.com/archives/2499.html

http://www.zsythink.net/archives/1182/

https://www.sohu.com/a/239283928_463994

Guess you like

Origin www.cnblogs.com/rossiXYZ/p/12147226.html