Consistent HASH algorithm

Consistent HASH algorithm

In a distributed cache system, to evenly distribute data to different machines in the cache server cluster, it is necessary to use the key of the cached data to calculate the hash value, and then divide the hash value by the number of server nodes to calculate modulo The outgoing data needs to fall on that server node. This algorithm is very simple and can also achieve uniform distribution of data, but when adding or reducing data nodes, all cached data will be invalidated.

Consistent hash algorithm

traditional modulo

For example, 10 pieces of data, 3 nodes, if the modulo method is used, that is

  • node a: 0,3,6,9
  • node b: 1,4,7
  • node c: 2,5,8

When a node is added, the data distribution changes to

  • node a:0,4,8
  • node b: 1,5,9
  • node c: 2,6
  • node d: 3,7

Summary: Data 3, 4, 5, 6, 7, 8, and 9 all need to be relocated when adding nodes, and the cost is too high.

Consistent hashing

The most critical difference is that a hash operation is performed on both the node and the data, and then the hash values ​​of the node and the data are compared, and the node that is closest to the node for the data is used as the storage node. This ensures that when nodes increase or decrease, the least amount of data is affected. Or take the example just now, (using the ascii code of a simple string as the hash key):

Ten pieces of data, calculate their respective hash values

  • 0:192
  • 1:196
  • 2:200
  • 3:204
  • 4:208
  • 5:212
  • 6:216
  • 7:220
  • 8:224
  • 9:228

There are three nodes, and their respective hash values ​​are calculated

  • node a: 203
  • node g: 209
  • node z: 228

At this time, compare the hash values ​​of the two. If it is greater than 228, it will be assigned to the previous 203, which is equivalent to the entire hash value being a ring. The corresponding mapping result:

  • node a: 0,1,2
  • node g: 3,4
  • node z: 5,6,7,8,9

At this time, by adding node n, the hash value of node n can be calculated:

  • node n: 216

At this time, the corresponding data will be migrated:

  • node a: 0,1,2
  • node g: 3,4
  • node n: 5,6
  • node z: 7,8,9

At this time, only 5 and 6 need to be migrated

In addition, if only three hash values ​​are calculated at this time, it is easy to be unbalanced when compared with the hash value of the data. Therefore, the concept of virtual nodes is introduced. By adding the ID suffix to the three nodes In other ways, each node calculates n hash values ​​and places them evenly on the hash ring, so that the hash values ​​calculated from the data can be compared.

Using this algorithm for data distribution can greatly reduce the scale of data migration when adding or removing nodes.

virtual node

The server nodes are distributed according to the hash. Sometimes there will be unevenness, which will lead to uneven data distribution. By adding virtual nodes, the total number of server nodes will be greatly increased, so that they will be scattered on the hash ring more evenly.

Consistent hash algorithm

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325948609&siteId=291194637