Research the Snowflake Algorithm?

❝

In recent days, I have been troubled by the primary key ID generation, so I am looking for a suitable primary key generation strategy. Choosing a suitable primary key generation strategy can greatly reduce the maintenance cost of the primary key ID.

❞

Then, how are the commonly used primary key IDs generated? The following are the most commonly used methods of generating primary key IDs. Next, we will briefly introduce them.

UUID: global uniqueness, but the generated ID is unordered and the length is too long. It is not recommended to use it in the database simply because it is out of order, because the database will create a unique index for the primary key. If the primary key is out of order, the index maintenance cost too big.
Database auto-increment ID: The self-increment ID stand-alone environment is actually fine, but in a distributed environment, if the concurrency is too high, cluster deployment is required. At this time, in order to avoid primary key conflicts, it is necessary to set the step size to auto-increment ID. In this case The cost is also very high.
Redis's INCR: This method is mainly because redis is single-threaded, which is inherently atomic, and can use its own INCRto generate a unique ID. However, there are also problems in this way. When the amount of concurrency is too large, cluster deployment is required, which also requires Setting different step sizes, and its key has an expiration policy, the cost of maintaining such an ID generation policy is also very high.
Snowflake Algorithm: SnowFlakeAlgorithm is a solution for distributed ID launched by Twitter, which will be introduced in detail.

Snowflake

Snowflake algorithm structure: sign bit + timestamp + working process bit + serial number bit, a 64bit integer, 8 bytes, exactly a long type of data.

From left to right, the first bit is the sign bit, 0 means positive and 1 means negative.
Timestamp (milliseconds converted to years): 2^41/(365 * 24 * 60 * 60 * 1000) = 69.73 years. It shows that the range that the snowflake algorithm can represent is 69 years (starting from 1970), and it shows that the snowflake algorithm can be used until 2039.
The 10-bit work bit is a 5-digit data center ID and a 5-digit work ID. The range of the 5-digit data center ID and 5-digit work ID is: 0 to 2^5 -1 = 31. In a distributed environment, it is generally set by Different data center ID and job ID to ensure that the generated ID will not be duplicated.
12bit-serial number means that each machine in each computer room can generate 2^12-1 (4095) different ID serial numbers per millisecond.

Having said so much, how is the snowflake algorithm used and generated? Can't wait, let's start.

Snowflake

package cn.org.ppdxzz.tool;

import java.io.Serializable;
import java.time.LocalDateTime;
import java.time.ZoneOffset;

/**
 * @author: PeiChen
 * @version: 1.0
 */
public class Snowflake implements Serializable {

    private static final long serialVersionUID = 1L;

    /**
     * 起始时间戳
     */
    private final long START_TIMESTAMP;
    //数据标识占用位数
    private final long WORKERID_BIT = 5L;
    /**
     * 数据中心占用位数
     */
    private final long DATACENTER_BIT = 5L;
    /**
     * 序列号占用的位数
     */
    private final long SEQUENCE_BIT = 12L;
    /**
     * 最大支持机器节点数 0~31
     */
    private final long MAX_WORKERID = -1L ^ (-1L << WORKERID_BIT);
    /**
     * 最大支持数据中心节点数 0~31
     */
    private final long MAX_DATACENTER = -1L ^ (-1L << DATACENTER_BIT);
    /**
     * 最大支持序列号 12位 0~4095
     */
    private final long MAX_SEQUENCE = -1L ^ (-1L << SEQUENCE_BIT);

    private final long WORKERID_LEFT_SHIFT = SEQUENCE_BIT;
    private final long DATACENTER_LEFT_SHIFT = SEQUENCE_BIT + WORKERID_BIT;
    private final long TIMESTAMP_LEFT_SHIFT = SEQUENCE_BIT + WORKERID_BIT + DATACENTER_BIT;

    private final long workerId;
    private final long datacenterId;
    private long sequence = 0L;

    /**
     * 上一次的时间戳
     */
    private long lastTimestamp = -1L;

    public Snowflake(long workerId, long datacenterId) {
        this(null,workerId,datacenterId);
    }

    public Snowflake(LocalDateTime localDateTime, long workerId, long datacenterId) {
        if (localDateTime == null) {
            //2021-10-23 22:41:08 北京时间
            this.START_TIMESTAMP = 1635000080L;
        } else {
            this.START_TIMESTAMP = localDateTime.toEpochSecond(ZoneOffset.of("+8"));
        }
        if (workerId > MAX_WORKERID || workerId < 0) {
            throw new IllegalArgumentException("workerId can't be greater than MAX_WORKERID or less than 0");
        }
        if (datacenterId > MAX_DATACENTER || datacenterId < 0) {
            throw new IllegalArgumentException("datacenterId can't be greater than MAX_DATACENTER or less than 0");
        }

        this.workerId = workerId;
        this.datacenterId = datacenterId;
    }

    /**
     * 获取下一个ID
     * @return long:id
     */
    public synchronized long nextId() {
        long currentTimestamp = getCurrentTimestamp();
        if (currentTimestamp < lastTimestamp) {
            throw new IllegalArgumentException("Clock moved backwards. Refusing to generate id.");
        }
        if (currentTimestamp == lastTimestamp) {
            sequence = (sequence + 1) & MAX_SEQUENCE;
            //毫秒内序列溢出就取新的时间戳
            if (sequence == 0L) {
                currentTimestamp = getNextTimestamp(lastTimestamp);
            }
        } else {
            //不同毫秒内，序列号置为0L
            sequence = 0L;
        }

        //更新上次生成ID的时间截
        lastTimestamp = currentTimestamp;

        //移位并通过或运算拼到一起组成64位的ID
        return  //时间戳部分
                ((currentTimestamp - START_TIMESTAMP) << TIMESTAMP_LEFT_SHIFT)
                //数据中心部分
                | (datacenterId << DATACENTER_LEFT_SHIFT)
                //机器标识部分
                | (workerId << WORKERID_LEFT_SHIFT)
                //序列号部分
                | sequence;

    }

    /**
     * 获取字符串类型的下一个ID
     * @return String:id
     */
    public String nextStringId() {
        return Long.toString(nextId());
    }

    /**
     * 获取当前系统时间戳
     * @return 时间戳
     */
    private long getCurrentTimestamp() {
        return LocalDateTime.now().toEpochSecond(ZoneOffset.of("+8"));
    }

    /**
     * 阻塞到下一个毫秒，直到获得新的时间戳
     * @param lastTimestamp 上一次生成ID的时间戳
     * @return 下一个时间戳
     */
    private long getNextTimestamp(long lastTimestamp) {
        long timestamp = getCurrentTimestamp();
        while (timestamp <= lastTimestamp) {
            timestamp = getCurrentTimestamp();
        }
        return timestamp;
    }

    /**
     * 根据ID获取工作机器ID
     * @param id 生成的雪花ID
     * @return 工作机器标识
     */
    public long getWorkerId(long id) {
        return id >> WORKERID_LEFT_SHIFT & ~(-1L << WORKERID_BIT);
    }

    /**
     * 根据ID获取数据中心ID
     * @param id 生成的雪花ID
     * @return 数据中心ID
     */
    public long getDataCenterId(long id) {
        return id >> DATACENTER_LEFT_SHIFT & ~(-1L << DATACENTER_BIT);
    }

    public static void main(String[] args) {
        Snowflake snowflake = new Snowflake(0,0);
        for (int i = 0; i < 20; i++) {
            System.out.println(snowflake.nextId());
        }
    }
}

hutool

<dependency>
    <groupId>cn.hutool</groupId>
    <artifactId>hutool-core</artifactId>
    <version>5.1.2</version>
</dependency>

// 传入机器id和数据中心id，数据范围为：0~31
Snowflake snowflake = IdUtil.createSnowflake(1L, 1L);
System.out.println(snowflake.nextId());

Let’s briefly introduce these two. Snowflake Algorithm is a good solution for distributed unique ID at present, and it is also a solution that is widely used in the market. How to use it depends on your own needs. In short, The one that suits your own system is the best. If there is anything wrong, welcome to exchange and guide!