Art~Several unique ID generation schemes

Preface

ID has a very important role, just like our ID, it is a unique number, especially in the era of big data, there are thousands of data, if we still want to use ID to identify the identity, there will be certain difficulties. There is a risk of ID conflicts.

Especially in a complex distributed system business scenario, if the ID conflicts, it will face great business problems.

We often consider several features when designing ID generation schemes

Program characteristics

Uniqueness: Ensure that the generated ID is unique in the entire network.
Orderly increment: Ensure that the generated ID is sequentially incremented by a certain number for a certain user or business.
High availability: to ensure that the ID can be generated correctly at all times.
With time: The ID contains the time, and you can know the day of the transaction at a glance. That is, it has high readability.
Information security: ID may store some business information, which must be secured.

ID generation scheme

1. UUID

The core idea of ​​the algorithm is to combine the machine's network card, local time, and a random number to generate UUID.

Advantages: local generation, simple generation, good performance, no high availability risk.
Disadvantages: too long, redundant storage, disordered and unreadable, low query efficiency

2. Database auto-increment ID

Use the database id auto-increment strategy, such as MySQL's auto_increment. And you can use the two databases to set the non-synchronization length separately and generate the strategy of non-duplicate ID to achieve high availability.

Advantages: The IDs generated by the database are absolutely orderly, and the implementation of high availability is simple.
Disadvantages: The database instance needs to be deployed independently, which is costly and has performance bottlenecks.

3. Twitter's snowflake algorithm (snowflake algorithm)

As shown in the figure below, Twitter's Snowflake algorithm consists of the following parts:
Insert picture description here
1 sign bit:

Since the long type is signed in java, the highest bit is the sign bit, positive numbers are 0, negative numbers are 1, and IDs used in actual systems are generally positive numbers, so the highest bit is 0.

41-bit time stamp (millisecond level):

It should be noted that the 41-bit time stamp here is not the time stamp of the current time, but the difference of the time stamp (current time stamp-starting time stamp). The starting time stamp here is generally the beginning of the ID generator The time stamp used is specified by the program, so the 41-bit millisecond time stamp can be used at most (1 << 41) / (1000x60x60x24x365) = 69 years.

10-bit data machine bit:

Including 5 data identification bits and 5 machine identification bits, these 10 bits determine that a maximum of 1 <n <1024 nodes can be deployed in a distributed system. If this number is exceeded, the generated IDs may conflict.

Sequence within 12 milliseconds:

This 12-bit count allows each node to generate at most 1 << 12 = 4096 IDs per millisecond (the same machine, at the same time), which
adds up to exactly 64 bits, which is a Long type.

Advantages: high performance, low latency, ordered by time, generally does not cause ID collisions, does not rely on third-party databases, is deployed as a service, and has good stability.
Disadvantages: independent development and deployment are required, and it depends on the clock of the machine , If you encounter machine clock callback, there will be problems.

Simple implementation of java

public class SnowflakeIdWorker {
    
    
 // ==============================Fields==================
    /** 开始时间截 (2019-08-06) */
    private final long twepoch = 1565020800000L;
 
    /** 机器id所占的位数 */
    private final long workerIdBits = 5L;
 
    /** 数据标识id所占的位数 */
    private final long datacenterIdBits = 5L;
 
    /** 支持的最大机器id,结果是31 (这个移位算法可以很快的计算出几位二进制数所能表示的最大十进制数) */
    private final long maxWorkerId = -1L ^ (-1L << workerIdBits);
 
    /** 支持的最大数据标识id,结果是31 */
    private final long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);
 
    /** 序列在id中占的位数 */
    private final long sequenceBits = 12L;
 
    /** 机器ID向左移12位 */
    private final long workerIdShift = sequenceBits;
 
    /** 数据标识id向左移17位(12+5) */
    private final long datacenterIdShift = sequenceBits + workerIdBits;
 
    /** 时间截向左移22位(5+5+12) */
    private final long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;
 
    /** 生成序列的掩码,这里为4095 (0b111111111111=0xfff=4095) */
    private final long sequenceMask = -1L ^ (-1L << sequenceBits);
 
    /** 工作机器ID(0~31) */
    private long workerId;
 
    /** 数据中心ID(0~31) */
    private long datacenterId;
 
    /** 毫秒内序列(0~4095) */
    private long sequence = 0L;
 
    /** 上次生成ID的时间截 */
    private long lastTimestamp = -1L;
 
     //==============================Constructors====================
    /**
     * 构造函数
     * @param workerId 工作ID (0~31)
     * @param datacenterId 数据中心ID (0~31)
     */
    public SnowflakeIdWorker(long workerId, long datacenterId) {
    
    
        if (workerId > maxWorkerId || workerId < 0) {
    
    
            throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));
        }
        if (datacenterId > maxDatacenterId || datacenterId < 0) {
    
    
            throw new IllegalArgumentException(String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));
        }
        this.workerId = workerId;
        this.datacenterId = datacenterId;
    }
 
    // ==============================Methods=================================
    /**
     * 获得下一个ID (该方法是线程安全的)
     * @return SnowflakeId
     */
    public synchronized long nextId() {
    
    
        long timestamp = timeGen();
 
        //如果当前时间小于上一次ID生成的时间戳,说明系统时钟回退过这个时候应当抛出异常
        if (timestamp < lastTimestamp) {
    
    
            throw new RuntimeException(
                    String.format("Clock moved backwards.  Refusing to generate id for %d milliseconds", lastTimestamp - timestamp));
        }
 
        //如果是同一时间生成的,则进行毫秒内序列
        if (lastTimestamp == timestamp) {
    
    
            sequence = (sequence + 1) & sequenceMask;
            //毫秒内序列溢出
            if (sequence == 0) {
    
    
                //阻塞到下一个毫秒,获得新的时间戳
                timestamp = tilNextMillis(lastTimestamp);
            }
        }
        //时间戳改变,毫秒内序列重置
        else {
    
    
            sequence = 0L;
        }
 
        //上次生成ID的时间截
        lastTimestamp = timestamp;
 
        //移位并通过或运算拼到一起组成64位的ID
        return ((timestamp - twepoch) << timestampLeftShift) //
                | (datacenterId << datacenterIdShift) //
                | (workerId << workerIdShift) //
                | sequence;
    }
 
    /**
     * 阻塞到下一个毫秒,直到获得新的时间戳
     * @param lastTimestamp 上次生成ID的时间截
     * @return 当前时间戳
     */
    protected long tilNextMillis(long lastTimestamp) {
    
    
        long timestamp = timeGen();
        while (timestamp <= lastTimestamp) {
    
    
            timestamp = timeGen();
        }
        return timestamp;
    }
 
    /**
     * 返回以毫秒为单位的当前时间
     * @return 当前时间(毫秒)
     */
    protected long timeGen() {
    
    
        return System.currentTimeMillis();
    }
 
    //==============================Test=============================================
    /** 测试 */
    public static void main(String[] args) {
    
    
        SnowflakeIdWorker idWorker = new SnowflakeIdWorker(0, 0);
        for (int i = 0; i < 10; i++) {
    
    
            long id = idWorker.nextId();
            System.out.println(Long.toBinaryString(id));
            System.out.println(id);
        }
    }
}

4. Meituan Leaf

Leaf is Meituan's open source distributed ID generator, which can guarantee global uniqueness, increasing trend, monotonous increase, and information security. It also solves the defect that the ID may not be unique due to machine callback, but it also needs to rely on relational database and Zookeeper And other middleware.
Official website description: https://tech.meituan.com/2017/04/21/mt-leaf.html

The name Leaf is a sentence from the German philosopher and mathematician Leibniz: >There are no two identical leaves in the world> "There are no two identical leaves in the world"

Solve the clock problem

When the service starts, first check whether you have written the ZooKeeper leaf_forever node:

  • If it has been written, leaf_forever/${self}compare its own system time with the node record time. If it is less than the leaf_forever/${self}time, it is considered that the machine time has a large step back, and the service fails to start and an alarm is issued.
  • If it has not been written, it proves to be a new service node, directly create a persistent node leaf_forever/${self}and write its own system time, and then comprehensively compare the system time of the remaining Leaf nodes to judge whether its own system time is accurate. The specific method is to take all temporary nodes under leaf_temporary Service IP of (all running Leaf-snowflake nodes): Port, and then obtain the system time of all nodes through RPC request and calculate sum(time)/nodeSize. If abs(system time-sum(time)/nodeSize) <the threshold, the current system time is considered accurate, the service is started normally, and the temporary node is written to leaf_temporary/${self}maintain the lease. Otherwise, it is considered that the system time of the machine has a large step deviation, and the startup fails and an alarm is issued.
    Report its own system time to write at regular intervals (3s) leaf_forever/${self}.

And leaf caches a workerID file on the local file system. When ZooKeeper has a problem and the machine has a problem and needs to be restarted, the service can be guaranteed to start normally. Doing so results in a weak dependence on third-party components. Improved stability to a certain extent.

Guess you like

Origin blog.csdn.net/Shangxingya/article/details/114988314