Distributed ID Solution Comparison

In a complex distributed system, it is often necessary to uniquely identify a large amount of data. For example, when an order table is divided into databases and tables, the auto-increment ID of the database obviously cannot be used as the unique identification of an order. In addition, there are some requirements for distributed IDs in other distributed scenarios:

  • The trend is increasing:  Since most RDBMSs use the B-tree data structure to store index data, we should try to use ordered primary keys to ensure write performance in the selection of primary keys.

  • Monotonically increasing:  Ensure that the next ID must be greater than the previous ID, such as sorting requirements.

  • Information security:  If the ID is continuous, it is very easy for malicious users to steal; if it is an order number, it is even more dangerous, and we can directly know our order quantity. Therefore, in some application scenarios, IDs need to be irregular and irregular.

According to different scenarios and requirements, many distributed ID solutions have been born in the market. This article introduces multiple distributed ID solutions, including their advantages and disadvantages, usage scenarios, and code examples.

1. UUID 

UUID( Universally Unique Identifier) is calculated and generated based on data such as current time, counter (counter) and hardware identification (usually the MAC address of the wireless network card). Contains 32 hexadecimal numbers, divided into five segments by hyphens, 36 characters in the form of 8-4-4-4-12, which can generate a globally unique code with high performance.

JDK provides a UUID generation tool, the code is as follows:

import java.util.UUID;

public class Test {
    public static void main(String[] args) {
        System.out.println(UUID.randomUUID());
        // 输出:b0378f6a-eeb7-4779-bffe-2a9f3bc76380
    }
}

UUID can completely satisfy the distributed unique identification, but it is generally not used in the actual application process, for the following reasons:

  • High storage cost:  UUID is too long, 16 bytes and 128 bits, usually represented by a 36-length string, which is not applicable in many scenarios.

  • Information insecurity:  The UUID algorithm generated based on the MAC address will expose the MAC address. The creator of the Melissa virus was found based on the UUID.

  • Does not meet MySQL primary key requirements:  MySQL official has a clear suggestion that the primary key should be as short as possible, because too long is not good for MySQL index: if it is used as the database primary key, under the InnoDB engine, the disorder of UUID may cause frequent changes in data location , seriously affecting performance.

2. Database self-increment ID

Using Mysql's characteristic ID self-increment can achieve unique data identification, but after sub-database sub-table can only guarantee the uniqueness of the ID in a table, but not the uniqueness of the overall ID. In order to avoid this situation, we have the following two ways to solve this problem.

2.1 Primary key table

By creating the primary key table separately to maintain the unique identifier, as the output source of the ID, the uniqueness of the overall ID can be guaranteed. for example:

Create a primary key table

CREATE TABLE `unique_id`  (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `biz` char(1) NOT NULL,
  PRIMARY KEY (`id`),
 UNIQUE KEY `biz` (`biz`)
) ENGINE = InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET =utf8;

The business obtains the ID information through the update operation, and then adds it to a sub-table.

BEGIN;

REPLACE INTO unique_id (biz) values ('o') ;
SELECT LAST_INSERT_ID();

COMMIT;

 2.2 ID self-increment step setting

We can set the auto-increment step size of the Mysql primary key, so that the table data IDs distributed in different instances are not repeated, and the overall uniqueness is guaranteed.

As follows, you can set the step size of Mysql instance 1 to 1, and the step size of instance 1 to 2.

View the attributes of the primary key auto-increment

show variables like '%increment%'

 

 Obviously, how to ensure scalability in this way is actually a problem when the amount of concurrency is relatively high.

3. Number segment mode

The number segment mode is one of the mainstream implementation methods of the current distributed ID generator. The principle is as follows:

  • The number range mode fetches a number range from the database each time and loads it into the service memory. When the business is acquired, the ID can directly increase the value in this range.

  • When the IDs of this batch of number segments are used up, apply for a new number segment from the database again, and perform an update operation on the max_id field. The range of the new number segment is ( , max_id ] max_id +step.

  • Since multiple business terminals may operate at the same time, the optimistic locking method of version number version is used to update.

 For example, (1,1000] represents 1000 IDs, and specific business services will generate auto-increment IDs ranging from 1 to 1000. The table structure is as follows:

CREATE TABLE id_generator (
  id int(10) NOT NULL,
  max_id bigint(20) NOT NULL COMMENT '当前最大id',
  step int(20) NOT NULL COMMENT '号段的长度',
  biz_type    int(20) NOT NULL COMMENT '业务类型',
  version int(20) NOT NULL COMMENT '版本号,是一个乐观锁,每次都更新version,保证并发时数据的正确性',
  PRIMARY KEY (`id`)
) 

This distributed ID generation method is not strongly dependent on the database, does not access the database frequently, and puts much less pressure on the database. But there are also some disadvantages such as: server restart, single point of failure will cause ID discontinuity.

4、Return to INCR

Based on the characteristics of the globally unique ID, we can generate a globally unique ID through the INCR command of Redis.

 A simple case of Redis distributed ID:

/**
 *  Redis 分布式ID生成器
 */
@Component
public class RedisDistributedId {

    @Autowired
    private StringRedisTemplate redisTemplate;

    private static final long BEGIN_TIMESTAMP = 1659312000l;

    /**
     * 生成分布式ID
     * 符号位    时间戳[31位]  自增序号【32位】
     * @param item
     * @return
     */
    public long nextId(String item){
        // 1.生成时间戳
        LocalDateTime now = LocalDateTime.now();
        // 格林威治时间差
        long nowSecond = now.toEpochSecond(ZoneOffset.UTC);
        // 我们需要获取的 时间戳 信息
        long timestamp = nowSecond - BEGIN_TIMESTAMP;
        // 2.生成序号 --》 从Redis中获取
        // 当前当前的日期
        String date = now.format(DateTimeFormatter.ofPattern("yyyy:MM:dd"));
        // 获取对应的自增的序号
        Long increment = redisTemplate.opsForValue().increment("id:" + item + ":" + date);
        return timestamp << 32 | increment;
    }

}
Using Redis also has corresponding disadvantages: the persistence problem of ID generation, how to recover if Redis is down?

5. Snowflake Algorithm

Snowflake, the snowflake algorithm is a distributed ID generation algorithm open sourced by Twitter, which divides the 64-bit bit into multiple parts by dividing the namespace, and each part has specific different meanings. In Java, the 64-bit integer is Long Type, so the ID generated by the Snowflake algorithm in Java is stored in long. details as follows:

  • The first part:  occupies 1 bit, the first bit is the sign bit, not applicable

  • The second part:  41-bit timestamp, 41 bits can represent 241 numbers, each number represents milliseconds, then the time limit of the snowflake algorithm is (241)/(1000×60×60×24×365)=69years

  • The third part:  10bit indicates the number of machines, that is,  2^ 10 = 1024one machine, usually not so many machines are deployed

  • The fourth part:  12bit is an auto-increment sequence, which can represent 2^12=4096the number, and 4096 IDs can be generated within one second. In theory, the QPS of the snowflake solution is about409.6w/s

Snowflake algorithm case code:

public class SnowflakeIdWorker {

    // ==============================Fields===========================================
    /**
     * 开始时间截 (2020-11-03,一旦确定不可更改,否则时间被回调,或者改变,可能会造成id重复或冲突)
     */
    private final long twepoch = 1604374294980L;

    /**
     * 机器id所占的位数
     */
    private final long workerIdBits = 5L;

    /**
     * 数据标识id所占的位数
     */
    private final long datacenterIdBits = 5L;

    /**
     * 支持的最大机器id,结果是31 (这个移位算法可以很快的计算出几位二进制数所能表示的最大十进制数)
     */
    private final long maxWorkerId = -1L ^ (-1L << workerIdBits);

    /**
     * 支持的最大数据标识id,结果是31
     */
    private final long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);

    /**
     * 序列在id中占的位数
     */
    private final long sequenceBits = 12L;

    /**
     * 机器ID向左移12位
     */
    private final long workerIdShift = sequenceBits;

    /**
     * 数据标识id向左移17位(12+5)
     */
    private final long datacenterIdShift = sequenceBits + workerIdBits;

    /**
     * 时间截向左移22位(5+5+12)
     */
    private final long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;

    /**
     * 生成序列的掩码,这里为4095 (0b111111111111=0xfff=4095)
     */
    private final long sequenceMask = -1L ^ (-1L << sequenceBits);

    /**
     * 工作机器ID(0~31)
     */
    private long workerId;

    /**
     * 数据中心ID(0~31)
     */
    private long datacenterId;

    /**
     * 毫秒内序列(0~4095)
     */
    private long sequence = 0L;

    /**
     * 上次生成ID的时间截
     */
    private long lastTimestamp = -1L;

    //==============================Constructors=====================================

    /**
     * 构造函数
     *
     */
    public SnowflakeIdWorker() {
        this.workerId = 0L;
        this.datacenterId = 0L;
    }

    /**
     * 构造函数
     *
     * @param workerId     工作ID (0~31)
     * @param datacenterId 数据中心ID (0~31)
     */
    public SnowflakeIdWorker(long workerId, long datacenterId) {
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));
        }
        if (datacenterId > maxDatacenterId || datacenterId < 0) {
            throw new IllegalArgumentException(String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));
        }
        this.workerId = workerId;
        this.datacenterId = datacenterId;
    }

    // ==============================Methods==========================================

    /**
     * 获得下一个ID (该方法是线程安全的)
     *
     * @return SnowflakeId
     */
    public synchronized long nextId() {
        long timestamp = timeGen();

        //如果当前时间小于上一次ID生成的时间戳,说明系统时钟回退过这个时候应当抛出异常
        if (timestamp < lastTimestamp) {
            throw new RuntimeException(
                    String.format("Clock moved backwards.  Refusing to generate id for %d milliseconds", lastTimestamp - timestamp));
        }

        //如果是同一时间生成的,则进行毫秒内序列
        if (lastTimestamp == timestamp) {
            sequence = (sequence + 1) & sequenceMask;
            //毫秒内序列溢出
            if (sequence == 0) {
                //阻塞到下一个毫秒,获得新的时间戳
                timestamp = tilNextMillis(lastTimestamp);
            }
        }
        //时间戳改变,毫秒内序列重置
        else {
            sequence = 0L;
        }

        //上次生成ID的时间截
        lastTimestamp = timestamp;

        //移位并通过或运算拼到一起组成64位的ID
        return ((timestamp - twepoch) << timestampLeftShift) //
                | (datacenterId << datacenterIdShift) //
                | (workerId << workerIdShift) //
                | sequence;
    }

    /**
     * 阻塞到下一个毫秒,直到获得新的时间戳
     *
     * @param lastTimestamp 上次生成ID的时间截
     * @return 当前时间戳
     */
    protected long tilNextMillis(long lastTimestamp) {
        long timestamp = timeGen();
        while (timestamp <= lastTimestamp) {
            timestamp = timeGen();
        }
        return timestamp;
    }

    /**
     * 返回以毫秒为单位的当前时间
     *
     * @return 当前时间(毫秒)
     */
    protected long timeGen() {
        return System.currentTimeMillis();
    }

    /**
     * 随机id生成,使用雪花算法
     *
     * @return
     */
    public static String getSnowId() {
        SnowflakeIdWorker sf = new SnowflakeIdWorker();
        String id = String.valueOf(sf.nextId());
        return id;
    }

    //=========================================Test=========================================

    /**
     * 测试
     */
    public static void main(String[] args) {
        SnowflakeIdWorker idWorker = new SnowflakeIdWorker(0, 0);
        for (int i = 0; i < 1000; i++) {
            long id = idWorker.nextId();
            System.out.println(id);
        }
    }
}
The snowflake algorithm is strongly dependent on the machine clock. If the clock on the machine is dialed back, it will cause repeated number issuance. This is usually handled by recording the time of last use.

6. Meituan-Leaf

Open source project link: https://github.com/Meituan-Dianping/Leaf

Leaf supports both number segment mode and snowflake algorithm mode, which can be switched.

The snowflake mode relies on ZooKeeper. Unlike the original snowflake algorithm, it is mainly in the generation of workId. The workId in Leaf is generated based on the sequence Id of ZooKeeper. When each application uses Leaf-snowflake, it will be in Zookeeper when it starts Generate a sequence ID, which is equivalent to a machine corresponding to a sequence node, that is, a workId.

The number segment mode is an optimization for directly using the database auto-increment ID as a distributed ID, reducing the frequency of database operations. It is equivalent to obtaining self-incrementing IDs from the database in batches. Each time, a range of numbers is taken out from the database. For example, (1,1000] represents 1000 IDs. The business service will locally generate auto-incrementing IDs ranging from 1 to 1000 and load them into Memory.

7. Baidu-Uidgenerator

  • Open source project link: https://github.com/baidu/uid-generator
  • Chinese document address: https://github.com/baidu/uid-generator/blob/master/README.zh_cn.md

 UidGenerator is Baidu's open source Java language implementation, a unique ID generator based on the Snowflake algorithm. It is distributed and overcomes the concurrency limitations of the Snowflake algorithm. The QPS of a single instance can exceed 6,000,000. Required environment: JDK8+, MySQL (for assigning WorkerId).

Baidu's Uidgenerator has made some adjustments to the structure, as follows:

 The time part is only 28 bits, which means that UidGenerator can only withstand 8.5 years by default ( 2^28-1/86400/365), but UidGenerator can properly adjust the number of digits occupied by delta seconds, worker node id, and sequence.

8. Didi-TinyID

Open source project link: https://github.com/didi/tinyid

Tinyid is upgraded based on the algorithm of Meituan (Leaf) leaf-segment. It not only supports the multi-master node mode of the database, but also provides a tinyid-clientclient access method, which is more convenient to use.

But unlike Meituan (Leaf), Tinyid only supports one mode of number segment and does not support snowflake mode. Tinyid provides two calling methods, one based on Tinyid-serverthe provided http method, and the other Tinyid-clientclient-side method.

Nine, comparative summary

Guess you like

Origin blog.csdn.net/u011487470/article/details/130357598