Distributed unique ID solution-snowflake algorithm

阅读大概需要3分钟

With source code

[tap]

cover picture

Preface

Gone are the days of monolithic services.

The complexity of current system business and data storage is increasing, and distributed systems are currently a very common solution.

Globally unique IDs are encountered in almost all design systems. Globally unique IDs play a vital role in storage and retrieval.

ID generator

In applications, a globally unique ID is often required as the primary key of the database. How to generate a globally unique ID?

First, you need to determine whether the globally unique ID is an integer or a string? If it is a string, then the existing UUID fully meets the demand and no additional work is required. The disadvantage is that the string takes up a lot of space as an ID, and the indexing efficiency is lower than that of an integer.

If an integer is used as the ID, then the 32-bit int type is excluded first, because the range is too small and the 64-bit long type must be used.

采用整型作为ID时，如何生成自增、全局唯一且不重复的ID？

Database self-increment

The database auto-increment ID is often used in systems with a small amount of data. Using the auto-increment ID of the database, starting from 1, it can basically achieve continuous increment. Oracle can be used SEQUENCE, MySQL can use the primary key AUTO_INCREMENT, although global uniqueness cannot be guaranteed, but each table is unique, which basically meets the needs.

The disadvantage of the database self-incrementing ID is that the ID cannot be obtained before the data is inserted. After the data is inserted, although the obtained ID is unique, it must wait until the transaction is submitted before the ID is valid. Some bidirectional data must be updated after insertion, which is troublesome.

During our development process, we encountered a scenario where the primary and primary database was synchronized (it can be simply understood as the same SQL is executed again in another database). If you use the database auto-increment ID, there will be inconsistent primary keys or primary keys Conflict issues.

Distributed ID generator

Option 1: UUID

Not recommended for distributed environments

uuid is the method we thought of first, and there is a corresponding method in the java.util; package. This is a uuid with rfc standard: https://www.ietf.org/rfc/rfc4122.txt

uuid has very good performance (local call), no network consumption.

However, uuid is not easy to store (a string is generated, the storage is too long, and many scenarios are not applicable); the information is insecure (based on MAC address generation, which may cause leakage. This vulnerability has been used to find the location of the producer of Melissa virus .
); can not guarantee increment (or trend increments); other bloggers feedback, the interception of the top 20 make unique ID, there will be duplication in large quantities (only about 220w) situation.

UUID.randomUUID().toString()

Solution two: snowflake (snowflake algorithm)

This is a distributed ID solution that is currently used more, and is recommended

Background Twitter Yunyun will not introduce it, it is the Twitter that blocked the account of understanding Wang some time ago.

Algorithm introduction

The result of the SnowFlake algorithm generating id is a 64bit integer, and its structure is as follows:

snowflake-64bit

1 bit, no use. In the binary system, the highest bit of 1 is a negative number, but the id we generate generally uses an integer, so the highest bit is fixed at 0
41 bits, used to record the time stamp (milliseconds).
- 41 bits can represent 2^{41}-1 numbers,
- If it is only used to represent a positive integer (a positive number in the computer contains 0), the range of values that can be represented is: 0 to 2^{41}-1, minus 1 because the range of values that can be represented starts from 0, not 1.
- In other words, 41 bits can represent a value of 2^{41}-1 milliseconds, and converted into a unit year is (2^{41}-1) / (1000 60 60 24 365) = 69 years
10 bits, used to record the working machine id.
- Can be deployed on 2^{10} = 1024 nodes, including 5 datacenterId and 5 workerId
- The largest positive integer that can be represented by 5 bits is 2^{5}-1 = 31, that is, the 32 numbers 0, 1, 2, 3, ... 31 can be used to represent different datecenterId or workerId
12 digits, serial number, used to record different IDs generated in the same millisecond.
- The largest positive integer that can be represented by 12 bits is 2^{12}-1 = 4095, that is, 4095 numbers such as 0, 1, 2, 3,...4094 can be used to represent the same machine at the same time. 4095 ID serial numbers generated within (milliseconds).

Since 64-bit integers in Java are of type long, the id generated by the SnowFlake algorithm in Java is stored in long.

SnowFlake can guarantee :

All ids generated by the same server increase in time trend
There will be no duplicate IDs in the entire distributed system (because there are datacenterId and workerId to distinguish)

Existing problems:

The configuration of the machine ID (5 digits) and the data center ID (5 digits) is not resolved. The same configuration will be used in distributed deployment, and there is still the risk of ID duplication.
Objects need to be instantiated when used, and there is no tool class out of the box.
Strong reliance on the machine clock, if the clock on the machine is dialed back, it will cause duplicate numbers or the service will be unavailable. (This would not happen under normal circumstances)

To solve the above problem, here is a solution. The workId is generated using the server hostName and the dataCenterId is generated using the IP. This can prevent 10-digit machine code duplication to the greatest extent, but since both IDs cannot exceed 32, only the remainder can be taken, which is unavoidable. Repeat, but in actual use, the configuration of hostName and IP are generally continuous or similar. As long as they are not exactly 32 bits apart, there will be no problem. Moreover, it is almost impossible for hostName and IP to be separated by 32 at the same time. For distributed deployment, generally no more than 10 containers.

The docker configuration used in production is generally compiled once, and then distributed to different containers, there will be no different configurations. In this case, there is uncertainty about the situation mentioned above, and there will be another reference article in the comments.

Source code

Java version snowflake ID generation algorithm

package com.my.blog.website.utils;

import org.apache.commons.lang3.RandomUtils;
import org.apache.commons.lang3.StringUtils;
import org.apache.commons.lang3.SystemUtils;

import java.net.Inet4Address;
import java.net.UnknownHostException;

/**
 * Twitter_Snowflake<br>
 * SnowFlake的结构如下(每部分用-分开):<br>
 * 0 - 0000000000 0000000000 0000000000 0000000000 0 - 00000 - 00000 - 000000000000 <br>
 * 1位标识，由于long基本类型在Java中是带符号的，最高位是符号位，正数是0，负数是1，所以id一般是正数，最高位是0<br>
 * 41位时间截(毫秒级)，注意，41位时间截不是存储当前时间的时间截，而是存储时间截的差值（当前时间截 - 开始时间截)
 * 得到的值），这里的的开始时间截，一般是我们的id生成器开始使用的时间，由我们程序来指定的（如下下面程序IdWorker类的startTime属性）。41位的时间截，可以使用69年，年T = (1L << 41) / (1000L * 60 * 60 * 24 * 365) = 69<br>
 * 10位的数据机器位，可以部署在1024个节点，包括5位datacenterId和5位workerId<br>
 * 12位序列，毫秒内的计数，12位的计数顺序号支持每个节点每毫秒(同一机器，同一时间截)产生4096个ID序号<br>
 * 加起来刚好64位，为一个Long型。<br>
 * SnowFlake的优点是，整体上按照时间自增排序，并且整个分布式系统内不会产生ID碰撞(由数据中心ID和机器ID作区分)，并且效率较高，经测试，SnowFlake每秒能够产生26万ID左右。
 */
public class SnowflakeIdWorker {

    // ==============================Fields===========================================
    /** 开始时间截 (2015-01-01) */
    private final long twepoch = 1489111610226L;

    /** 机器id所占的位数 */
    private final long workerIdBits = 5L;

    /** 数据标识id所占的位数 */
    private final long dataCenterIdBits = 5L;

    /** 支持的最大机器id，结果是31 (这个移位算法可以很快的计算出几位二进制数所能表示的最大十进制数) */
    private final long maxWorkerId = -1L ^ (-1L << workerIdBits);

    /** 支持的最大数据标识id，结果是31 */
    private final long maxDataCenterId = -1L ^ (-1L << dataCenterIdBits);

    /** 序列在id中占的位数 */
    private final long sequenceBits = 12L;

    /** 机器ID向左移12位 */
    private final long workerIdShift = sequenceBits;

    /** 数据标识id向左移17位(12+5) */
    private final long dataCenterIdShift = sequenceBits + workerIdBits;

    /** 时间截向左移22位(5+5+12) */
    private final long timestampLeftShift = sequenceBits + workerIdBits + dataCenterIdBits;

    /** 生成序列的掩码，这里为4095 (0b111111111111=0xfff=4095) */
    private final long sequenceMask = -1L ^ (-1L << sequenceBits);

    /** 工作机器ID(0~31) */
    private long workerId;

    /** 数据中心ID(0~31) */
    private long dataCenterId;

    /** 毫秒内序列(0~4095) */
    private long sequence = 0L;

    /** 上次生成ID的时间截 */
    private long lastTimestamp = -1L;

    private static SnowflakeIdWorker idWorker;

    static {
        idWorker = new SnowflakeIdWorker(getWorkId(),getDataCenterId());
    }

    //==============================Constructors=====================================
    /**
     * 构造函数
     * @param workerId 工作ID (0~31)
     * @param dataCenterId 数据中心ID (0~31)
     */
    public SnowflakeIdWorker(long workerId, long dataCenterId) {
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException(String.format("workerId can't be greater than %d or less than 0", maxWorkerId));
        }
        if (dataCenterId > maxDataCenterId || dataCenterId < 0) {
            throw new IllegalArgumentException(String.format("dataCenterId can't be greater than %d or less than 0", maxDataCenterId));
        }
        this.workerId = workerId;
        this.dataCenterId = dataCenterId;
    }

    // ==============================Methods==========================================
    /**
     * 获得下一个ID (该方法是线程安全的)
     * @return SnowflakeId
     */
    public synchronized long nextId() {
        long timestamp = timeGen();

        //如果当前时间小于上一次ID生成的时间戳，说明系统时钟回退过这个时候应当抛出异常
        if (timestamp < lastTimestamp) {
            throw new RuntimeException(
                    String.format("Clock moved backwards.  Refusing to generate id for %d milliseconds", lastTimestamp - timestamp));
        }

        //如果是同一时间生成的，则进行毫秒内序列
        if (lastTimestamp == timestamp) {
            sequence = (sequence + 1) & sequenceMask;
            //毫秒内序列溢出
            if (sequence == 0) {
                //阻塞到下一个毫秒,获得新的时间戳
                timestamp = tilNextMillis(lastTimestamp);
            }
        }
        //时间戳改变，毫秒内序列重置
        else {
            sequence = 0L;
        }

        //上次生成ID的时间截
        lastTimestamp = timestamp;

        //移位并通过或运算拼到一起组成64位的ID
        return ((timestamp - twepoch) << timestampLeftShift)
                | (dataCenterId << dataCenterIdShift)
                | (workerId << workerIdShift)
                | sequence;
    }

    /**
     * 阻塞到下一个毫秒，直到获得新的时间戳
     * @param lastTimestamp 上次生成ID的时间截
     * @return 当前时间戳
     */
    protected long tilNextMillis(long lastTimestamp) {
        long timestamp = timeGen();
        while (timestamp <= lastTimestamp) {
            timestamp = timeGen();
        }
        return timestamp;
    }

    /**
     * 返回以毫秒为单位的当前时间
     * @return 当前时间(毫秒)
     */
    protected long timeGen() {
        return System.currentTimeMillis();
    }

    private static Long getWorkId(){
        try {
            String hostAddress = Inet4Address.getLocalHost().getHostAddress();
            int[] ints = StringUtils.toCodePoints(hostAddress);
            int sums = 0;
            for(int b : ints){
                sums += b;
            }
            return (long)(sums % 32);
        } catch (UnknownHostException e) {
            // 如果获取失败，则使用随机数备用
            return RandomUtils.nextLong(0,31);
        }
    }

    private static Long getDataCenterId(){
        int[] ints = StringUtils.toCodePoints(SystemUtils.getHostName());
        int sums = 0;
        for (int i: ints) {
            sums += i;
        }
        return (long)(sums % 32);
    }

    /**
     * 静态工具类
     *
     * @return
     */
    public static synchronized Long generateId(){
        long id = idWorker.nextId();
        return id;
    }

    //==============================Test=============================================
    /** 测试 */
    public static void main(String[] args) {
        System.out.println(System.currentTimeMillis());
        long startTime = System.nanoTime();
        for (int i = 0; i < 50000; i++) {
            long id = SnowflakeIdWorker.generateId();
            System.out.println(id);
        }
        System.out.println((System.nanoTime()-startTime)/1000000+"ms");
    }
}

Reference original text: https://blog.csdn.net/xiaopeng9275/article/details/72123709

Sharing and watching are my greatest encouragement. I’m pub, see you next time!