10 minutes to understand the consistency of hash algorithm

Scenarios

When our data table more than 5 million or more, we will take into account the use of sub-library sub-table; when our system uses a cache server or can not meet, we will use multiple cache servers, that behind us how to access the database table or cache servers it, we certainly would not use loop or random, we will use the same hashing algorithm to target specific position at the time of access.

Simple hash algorithm

We can modulus according to a field (such as id), then the data into different databases or tables.

For example, pre-planning, a business data we will be able to meet the five libraries, according to the following figure id modulo

We hash modulo easily routed through to the corresponding library, but above a simple hash algorithm still has some drawbacks, if, five libraries can not be satisfied when our business, we need nine libraries, then the original take mold formula mod 5 to become a mod 9, and most of the data to be re-distributed, related to data transfer workload is huge. There is no once and for all, the answer is some consistency hash algorithm

Consistent hashing algorithm

Algorithm Overview

Consistent hashing algorithm (Consistent Hashing), is the karge MIT and their collaborators in 1997 published papers presented, the first in the paper " Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots ON at The World Wide Web presented "in. Briefly, consistent hashing the hash value of the entire space into a virtual ring, such as the hypothesis space H is a hash function 0--2 ^ 32-1 (i.e., a 32-bit hash value is no signed integer), the entire hash space ring as follows:

Server (ip or host name) itself hashed, confirming the position of each machine on the hash ring, e.g. ip: 192.168.4.101,192.168.4.102,192.168.4.103 corresponding node node1-101, node2-102, node3 Figure -103

 

数据key使用相同的函数计算出哈希值h,根据h确定此数据在环上的位置,从此位置沿环顺时针“行走”,最近的服务器就是其应该定位到的服务器。 例如 我们使用"10","11","12","13","14" 四个数据对象对应key10,key11,key12,key13,key14,经过哈希计算后,在环空间的位置如下:

 

根据一致性哈希算法,数据key10,key14会被定位到节点node3-103上,key12,key13被定位到节点node1-10上,而key11会被定位到节点node2-102上。

扩展性
节点添加

如果我们新增一个节点node4-104 对应的ip:192.168.4.104通过对应的哈希算法得到哈希值,并映射到环中,如下图

通过按顺时针迁移的规则,那么key10被迁移到了node4-104中,其它数据还保持这原有的存储位置

节点删除

如果删除一个节点node3-103,那么按照顺时针迁移的方法,key10,key14将会被迁移到node1-10上,其它的对象没有任何的改动。如下图:

如果服务节点太少的时候,会出现数据分配不均,比如极端情况下所有数据都落到node1-101节点上,如何解决数据倾斜问题,需要引入虚拟节点

虚拟节点

如果节点比较少的情况下,在0到2^32-1形成的环中,会出每个节点存放的数据不均匀;一致性哈希算法提出虚拟节点的解决方案。即虚拟节点时实际节点(物理机器)在hash环中的复制品,一个实际节点对应N多个虚拟节点,这个对应个数也成为了复制个数,虚拟节点在hash环中以hash值排列。

例如 我们以删除了一个点,只剩下 node1 和node2 两个节点的图;我们添加4个虚拟节点,两个节点 则对应8个节点,最后映射关系 如图

核心代码
 public class KetamaNodeLocator
    {
        private SortedList<long, string> ketamaNodes = new SortedList<long, string>(); private HashAlgorithm hashAlg; private int numReps = 160; public KetamaNodeLocator(List<string> nodes, int nodeCopies) { ketamaNodes = new SortedList<long, string>(); numReps = nodeCopies; //对所有节点,生成nCopies个虚拟结点 foreach (string node in nodes) { //每四个虚拟结点为一组 for (int i = 0; i < numReps / 4; i++) { //getKeyForNode方法为这组虚拟结点得到惟一名称 byte[] digest = HashAlgorithm.computeMd5(node + i); /** Md5是一个16字节长度的数组,将16字节的数组每四个字节一组,分别对应一个虚拟结点,这就是为什么上面把虚拟结点四个划分一组的原因*/ for (int h = 0; h < 4; h++) { long m = HashAlgorithm.hash(digest, h); ketamaNodes[m] = node; } } } } public string GetPrimary(string k) { byte[] digest = HashAlgorithm.computeMd5(k); string rv = GetNodeForKey(HashAlgorithm.hash(digest, 0)); return rv; } string GetNodeForKey(long hash) { string rv; long key = hash; //如果找到这个节点,直接取节点,返回 if (!ketamaNodes.ContainsKey(key)) { //得到大于当前key的那个子Map,然后从中取出第一个key,就是大于且离它最近的那个key 说明详见: http://www.javaeye.com/topic/684087 var tailMap = from coll in ketamaNodes where coll.Key > hash select new { coll.Key }; if (tailMap == null || tailMap.Count() == 0) key = ketamaNodes.FirstOrDefault().Key; else key = tailMap.FirstOrDefault().Key; } rv = ketamaNodes[key]; return rv; } } 
public class HashAlgorithm
    {
        public static long hash(byte[] digest, int nTime) { long rv = ((long)(digest[3 + nTime * 4] & 0xFF) << 24) | ((long)(digest[2 + nTime * 4] & 0xFF) << 16) | ((long)(digest[1 + nTime * 4] & 0xFF) << 8) | ((long)digest[0 + nTime * 4] & 0xFF); return rv & 0xffffffffL; /* Truncate to 32-bits */ } /** * Get the md5 of the given key. */ public static byte[] computeMd5(string k) { MD5 md5 = new MD5CryptoServiceProvider(); byte[] keyBytes = md5.ComputeHash(Encoding.UTF8.GetBytes(k)); md5.Clear(); //md5.update(keyBytes); //return md5.digest(); return keyBytes; } 

最后贴上了实现代码,可以运行跑跑,加深理解,希望对您有所帮助,码字不易请多多支持。

Guess you like

Origin www.cnblogs.com/chengtian/p/11304403.html
Recommended