Twitter's distributed self-incrementing ID algorithm snowflake (Java Edition)

Overview In
distributed systems, there are some scenarios in which globally unique IDs are required. In this case, 36-bit UUIDs can be used to prevent ID conflicts. However, UUIDs have some disadvantages. First, they are relatively long, and UUIDs are generally disordered.

Sometimes we want to use a simpler ID, and hope that the ID can be generated in time order.

Twitter's snowflake solves this need. Initially, Twitter migrated the storage system from MySQL to Cassandra. Because Cassandra does not have a sequential ID generation mechanism, it developed such a globally unique ID generation service.



Structure
The structure of snowflake is as follows (each part is separated by -):

0 - 0000000000 0000000000 0000000000 0000000000 0 - 00000 - 00000 - 000000000000

The first bit is unused, the next 41 bits are millisecond time (the length of 41 bits can use 69 Year), followed by 5-bit datacenterId and 5-bit workerId (10-bit length supports deployment of up to 1024 nodes), and the last 12-bit is the count in milliseconds (12-bit count sequence number supports each node to generate 4096 IDs per millisecond The serial number)

adds up to exactly 64 bits, which is a Long type. (The length is up to 19 after being converted into a string)

The IDs generated by snowflake are sorted by time as a whole, and there is no ID collision (distinguished by datacenter and workerId) in the entire distributed system, and the efficiency is high. After testing, snowflake can generate 260,000 IDs per second.


Source code
(the source code of the JAVA version)

/**
* Twitter_Snowflake<br>
* The structure of SnowFlake is as follows (each part is separated by -):<br>
* 0 - 0000000000 0000000000 0000000000 0000000000 0 - 00000 - 00000 - 000000000000 <br>
* 1 bit identification, since the long primitive type is signed in Java, The highest bit is the sign bit, positive numbers are 0, and negative numbers are 1, so the id is generally positive, and the highest bit is 0<br>
* 41-bit time interval (millisecond level), note that the 41-bit time interval is not the time to store the current time Cut, but store the difference of time cut (current time cut - start time cut)
* the value obtained), the start time cut here is generally the time when our id generator starts to use, which is specified by our program (The startTime property of the IdWorker class in the following program). 41-bit time-cut, can use 69 years, year T = (1L << 41) / (1000L * 60 * 60 * 24 * 365) = 69<br>
* 10 bits of data machine bits, can be deployed in 1024 Node, including 5-bit datacenterId and 5-bit workerId<br>
* 12-bit sequence, count in milliseconds, 12-bit count sequence number supports each node to generate 4096 ID sequence numbers per millisecond (same machine, same time cut)<br >
* adds up to exactly 64 bits, which is a Long type. <br>
* The advantage of SnowFlake is that it is sorted by time as a whole, and there is no ID collision in the entire distributed system (distinguished by the data center ID and machine ID), and the efficiency is high. After testing, SnowFlake every Seconds can generate about 260,000 IDs.
*/

public class SnowflakeIdWorker {


    // ==============================Fields===========================================
    /** 开始时间截 (2015-01-01) */
    private final long twepoch = 1420041600000L;

    /** 机器id所占的位数 */
    private final long workerIdBits = 5L;

    /** 数据标识id所占的位数 */
    private final long datacenterIdBits = 5L;

    /** 支持的最大机器id,结果是31 (这个移位算法可以很快的计算出几位二进制数所能表示的最大十进制数) */
    private final long maxWorkerId = -1L ^ (-1L << workerIdBits);

    /** 支持的最大数据标识id,结果是31 */
    private final long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);

    /** 序列在id中占的位数 */
    private final long sequenceBits = 12L;

    /** 机器ID向左移12位 */
    private final long workerIdShift = sequenceBits;

    /** 数据标识id向左移17位(12+5) */
    private final long datacenterIdShift = sequenceBits + workerIdBits;

    /** 时间截向左移22位(5+5+12) */
    private final long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;

    /** 生成序列的掩码,这里为4095 (0b111111111111=0xfff=4095) */
    private final long sequenceMask = -1L ^ (-1L << sequenceBits);

    /** 工作机器ID(0~31) */
    private long workerId;

    /** 数据中心ID(0~31) */
    private long datacenterId;

    /** 毫秒内序列(0~4095) */
    private long sequence = 0L;

    /** 上次生成ID的时间截 */
    private long lastTimestamp = -1L;

    //==============================Constructors=====================================
    /**
     * 构造函数
     * @param workerId 工作ID (0~31)
     * @param datacenterId 数据中心ID (0~31)
     */
    public SnowflakeIdWorker(long workerId, long datacenterId) {
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));
        }
        if (datacenterId > maxDatacenterId || datacenterId < 0) {
            throw new IllegalArgumentException(String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));
        }
        this.workerId = workerId;
        this.datacenterId = datacenterId;
    }

    // ==============================Methods==========================================
    /**
     * 获得下一个ID (该方法是线程安全的)
     * @return SnowflakeId
     */
    public synchronized long nextId() {
        long timestamp = timeGen();

        //如果当前时间小于上一次ID生成的时间戳,说明系统时钟回退过这个时候应当抛出异常
        if (timestamp < lastTimestamp) {
            throw new RuntimeException(
                    String.format("Clock moved backwards.  Refusing to generate id for %d milliseconds", lastTimestamp - timestamp));
        }

        //如果是同一时间生成的,则进行毫秒内序列
        if (lastTimestamp == timestamp) {
            sequence = (sequence + 1) & sequenceMask;
            //毫秒内序列溢出
            if (sequence == 0) {
                //阻塞到下一个毫秒,获得新的时间戳
                timestamp = tilNextMillis(lastTimestamp);
            }
        }
        //时间戳改变,毫秒内序列重置
        else {
            sequence = 0L;
        }

        //上次生成ID的时间截
        lastTimestamp = timestamp;

        //移位并通过或运算拼到一起组成64位的ID
        return ((timestamp - twepoch) << timestampLeftShift) //
                | (datacenterId << datacenterIdShift) //
                | (workerId << workerIdShift) //
                | sequence;
    }

    /**
     * 阻塞到下一个毫秒,直到获得新的时间戳
     * @param lastTimestamp 上次生成ID的时间截
     * @return 当前时间戳
     */
    protected long tilNextMillis(long lastTimestamp) {
        long timestamp = timeGen();
        while (timestamp <= lastTimestamp) {
            timestamp = timeGen();
        }
        return timestamp;
    }

    /**
     * 返回以毫秒为单位的当前时间
     * @return 当前时间(毫秒)
     */
    protected long timeGen() {
        return System.currentTimeMillis();
    }

    //==============================Test=============================================
    /** 测试 */
    public static void main(String[] args) {
        SnowflakeIdWorker idWorker = new SnowflakeIdWorker(0, 0);
        for (int i = 0; i < 1000; i++) {
            long id = idWorker.nextId();
            System.out.println(Long.toBinaryString(id));
            System.out.println(id);
        }
    }
}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326529603&siteId=291194637