A few days ago I was looking at the redis cluster solution, the server-side cluster solution supported after redis 3.0. However, there are also mature redis clusters on the client side. The implementation idea is to use a consistent hash algorithm to hash the redis nodes and hash the access keys, so as to find out which node to operate the data from. Let's first understand the consistent hash algorithm.

scenes to be used

Now we assume that there are 100 redis data servers, when a piece of data 101 comes in, calculate the stored server with the hash formula hash(i)&100, assuming hash(i) = i, then the data is hashed to the label of 1 server, and then a new server is added at this time, and then the hash formula is hash(i)%101. At this time, when requesting access to data 101, it is assigned to server 0, but in fact, the data is at 1 at this time. number server.

Therefore, a large amount of data is invalid at this time (can not be accessed).

So at this time, we assume that a new server is added. If it is persistent storage, we can let the server cluster re-hash the data, migrate the data, and then restore it, but this time means adding or reducing the server every time. When the cluster needs a lot of communication and data migration, the overhead is very large. If it is just a cache, then the cache is invalid. So what to do at this time?

We can see that the key problem is that when the number of servers changes, it is necessary to ensure that the old data can be calculated according to the old algorithm to the server where the data is located, and the new data can be calculated according to the new hash algorithm. server.

As shown in the figure above, we have four servers ABCD, these four servers are assigned to a ring of 0~232, for example, 0~230 is stored in A server, 230 +1~231 is stored in B server..... CDs are divided equally in this way. Divide our hash space as 0~232, and then take the modulo of 232 after the data comes in to get a value K1, we get the assigned server according to the position of K1 on the ring, as shown in the figure, K1 is Assigned to B server. At this time, we have a server B that fails.

We can see that if B fails, then if there is persistent storage, data recovery needs to be done, and the data of B can be migrated to C. For the data originally hashed in A and D, no changes are required. . Similarly, if we add a new server, we only need to migrate part of the data of one server to the newly added server.

Consistent hashing simply means that when removing/adding a cache, it can change the existing key mapping relationship as little as possible and meet the monotonicity requirements as much as possible.

-------------------------------------------------- ------------------------------- Code

Code implementation: Put the hash value of the node and the node information (such as the ip of the node) into an ordered set (select treeMap as an example), and find the first node larger than it according to the hash of the key to be operated, If not found, find the first node. (analog ring)

package com.cw.demo.yizhixing.hash;

import java.util.HashMap;

/**
 * server node
 * Created by chenwei01 on 2017/1/18.
 */ 
public  class ServerNode {
     // node hash value 
    private Integer index;
     // node address     
    private String serverAddr;
     // simulated data storage 
    private HashMap data= new HashMap();

    public ServerNode(String serverAddr) {
        this.serverAddr = serverAddr;
    }

    public ServerNode(int index, String serverAddr) {
        this.index = index;
        this.serverAddr = serverAddr;
    }

    public Integer getIndex() {
        return index;
    }

    public void setIndex(int index) {
        this.index = index;
    }

    public HashMap getData() {
        return data;
    }

    public void setData(HashMap data) {
        this.data = data;
    }

    public String getServerAddr() {
        return serverAddr;
    }

    public void setServerAddr(String serverAddr) {
        this.serverAddr = serverAddr;
    }

    @Override
    public String toString() {
        return "ServerNode{" +
                "index=" + index +
                ", serverAddr='" + serverAddr + '\'' +
                ", data=" + data +
                '}';
    }
}

package com.cw.demo.yizhixing.hash;

/**
 * hash 工具类
 * Created by chenwei01 on 2017/1/17.
 */
public class HashUtil {

    /**
     * 通过key计算下标
     * 采用FNV算法
     * @param key 节点值
     * @return
     */
    public  static  int hash(String key){
        final int p = 16777619;
        int hash = (int)2166136261L;
        for(int i=0;i<key.length();i++){
            hash = (hash ^ key.charAt(i)) * p;
        }
        hash += hash << 13;
        hash ^= hash >> 7;
        hash += hash << 3;
        hash ^= hash >> 17;
        hash += hash << 5;
        hash= hash<0?Math.abs(hash):hash; 
        //Take the remainder to the 32nd power of 2, in fact, there are some doubts here, why is this number, is it to increase the dispersion? 
        return (int)(hash % Math.pow(2,32)); 
    } 


}

package com.cw.demo.yizhixing.hash;

import org.apache.commons.lang.StringUtils;

import java.util. * ;

/**
 * Consistent hash algorithm demo
 * Commonly used strategies in distributed systems
 * Created by chenwei01 on 2017/1/18.
 */ 
public  class YiZhiXingDemo {

    public static void main(String[] args) {


        // Put the hash of each node into the collection and sort 
        ServerNode node1= new ServerNode("192.168.21.58" );
        ServerNode node2=new ServerNode("192.168.1.9");
        ServerNode node3=new ServerNode("192.168.82.220");
        ServerNode node4=new ServerNode("192.168.72.125");
        ServerNode node5=new ServerNode("192.168.12.112");
        ServerNode node6=new ServerNode("192.168.3.48");

        node1.setIndex(HashUtil.hash(node1.getServerAddr()));
        node2.setIndex (HashUtil.hash (node2.getServerAddr ()));
        node3.setIndex(HashUtil.hash(node3.getServerAddr()));
        node4.setIndex (HashUtil.hash (node4.getServerAddr ()));
        node5.setIndex (HashUtil.hash (node5.getServerAddr ()));
        node6.setIndex(HashUtil.hash(node6.getServerAddr()));

        TreeMap<Integer,ServerNode> map=new TreeMap<Integer,ServerNode>();
        map.put(node1.getIndex(), node1);
        map.put(node2.getIndex(), node2);
        map.put(node3.getIndex(), node3);
        map.put(node4.getIndex(), node4);
        map.put(node5.getIndex(), node5);
        map.put(node6.getIndex(), node6);

        Integer Serverkey=0;
        String keyStr="";
       for (int i=97;i<123;i++){
           keyStr= String.valueOf ((char)i);
           Integer index = HashUtil.hash(keyStr);
            // Get the first server whose hash value is greater than the key. If you can't get it, take the first server 
           SortedMap<Integer ,ServerNode> sortedMap = map.tailMap(index);
            if (sortedMap== null ||sortedMap.size()==0 ){
               Serverkey=map.firstKey();
           }else {
               Serverkey = sortedMap.firstKey();
           }
           ServerNode node =map.get(Serverkey);

           node.getData().put(keyStr, StringUtils.upperCase(keyStr));
           System.out.println( "key is " + keyStr + " , route to the node whose subscript is " +Serverkey+ " and the index is " + node.getIndex() + " to store" );
       }
        System.out.println(map);

        // ----------------------------------- Simulate downtime 
        map.remove(node1.getIndex() );
        System.out.println(map.size());
        for (int i=97;i<123;i++){
            keyStr= String.valueOf ((char)i);
            Integer index = HashUtil.hash(keyStr);
             // Get the first server whose hash value is greater than the key. If you can't get it, take the first server --start 
            SortedMap<Integer ,ServerNode> sortedMap = map.tailMap( index);
             if (sortedMap== null ||sortedMap.size()==0 ){
                Serverkey=map.firstKey();
            }else {
                Serverkey = sortedMap.firstKey();
            }
            ServerNode node =map.get(Serverkey);
            System.out.println( "The key is "+keyStr+" to index to the server with the subscript "+Serverkey+", and the result obtained is "+ node.getData().get(keyStr));
        }
    }
}

The result is shown in the figure

It can be seen that the advantages of the consistent hash algorithm: if the number of servers changes, not all caches will be invalidated, but only some of them will be invalidated, so that all the pressure will not be concentrated on the back-end database level.

There is also a virtual node solution, which is used to balance the imbalance of node routing. The main method is to establish a mapping between virtual nodes and real nodes. For example, the virtual node names are: xuni_0_192.168.0.1, xuni_2_192.168.0.1, xuni_3_192. After 168.0.1 is routed to this virtual node, the real node is 192.168.0.1.

Consistent hash algorithm and its use in ShardedJedis

scenes to be used

Therefore, a large amount of data is invalid at this time (can not be accessed).

We can see that the key problem is that when the number of servers changes, it is necessary to ensure that the old data can be calculated according to the old algorithm to the server where the data is located, and the new data can be calculated according to the new hash algorithm. server.

Guess you like