The basic principle of consistent hashing algorithm

Engineers often use server clusters to design and implement data caching, the following are common strategy:

1. Whether to add, or delete data query, data id are first converted into a hash value by a hash function, referred to as key.

2. If the current machine has N units, the value of key% N is calculated, this value is the number of the machine data belongs, whether it is to add, delete or query operations, only performed on the machine.

This cache strategy to analyze the problem may bring, and to propose an improved scheme.


Ordinary Hash Algorithm

 

 

 


Potential problems if the cache strategy is to add or delete machines (N change) the price will be high, once all the data has to be recalculated based on the hash value id and the hash value of the number of new machines modulo operation and large-scale data migration to address these issues, the introduction of consistent hashing algorithms. Id is assumed that the data is converted into a hash value by the hash function is the range of 2 ^ 32, i.e. (0 to 2 ^ 32-1) in the digital space.

These figures will be connected end to end, imagine a closed ring, then a numeric id After calculating the hash value that corresponds to a position on a ring Next, imagine three machines also in such a ring, which three machine position in the ring machine id determined according to the calculated hash value. So how to determine the ownership of a piece of data which machine it? First, the id of the data with the hash value calculated hash value, and mapped to the appropriate location in the ring, then clockwise from this position to find the nearest machine, the machine is the attribution data. For example, there is a FIG data m, the calculated hash value is mapped to the ring, so that his home machine 2

 

 

Local remainder ordinary hash algorithm is the most wrong after adding or deleting the machine will be adhered to have a large number of object storage location fails, thus greatly does not satisfy the monotonicity. How to analyze the consistency of the following hashing algorithm is processed

Consistent hashing algorithm
1. nodes (machines) Remove
distributed above example, if, Node2 (machine 2) fails to be removed, the method according to the clockwise migration, Hash value belongs The red all the data segments will be moved to Node3 (machine), so that only the red section mapping position changes, other objects without any changes. As shown below:

 

 


2. The node (machine) add
if you add a new node to the cluster Node4, KEY4 obtained by a corresponding hash algorithm, and mapped to the ring, as shown below:

 

 

Clockwise in accordance with the rules of migration, the data Hash value is the red data segment is moved to the Node4, other objects which also maintained the original storage location. Through the analysis of node additions and deletions, consistent hashing algorithm while maintaining the monotony of migration time data reaches a minimum, such a distributed clustering algorithm is very suitable to avoid large amounts of data migration , reducing the pressure on the server's

Consistent hashing algorithm optimization
actually above consistent hashing function there is a big problem, we say Hash function is a great time to sample inputs, its output on the output field is evenly distributed, but here If only three inputs, it is difficult to ensure a uniform distribution, it is possible to produce the profile shown in FIG, leads to a very uneven load

 

 

More consistent hashing algorithm to optimize the introduction of virtual nodes mechanisms that produce multiple nodes for each machine, called a virtual node. Specific practices can increase the number or port number after ip or host name of the machine to achieve. Suppose a machine with virtual nodes 1000, 3000 have three machines nodes, nodes 3000 are mapped to the domain hash of relatively uniform

 

Guess you like

Origin www.cnblogs.com/clarencezzh/p/11703747.html