Snowflake Algorithm - How to ensure distributed unique global ID generation in the case of high cluster concurrency?

snowflake algorithm

question

Why do you need a distributed globally unique ID and the business requirements of a distributed ID?

In complex distributed systems, it is often necessary to uniquely identify a large amount of data and messages:

  • For example, finance, payment, catering, and hotels in Meituan Dianping
  • The data in the system of Maoyan Movies and other products is gradually increasing, and after the database is divided into databases and tables, a unique ID is required to identify a piece of data or information;
  • Special orders, riders, and coupons all need unique IDs for identification

At this time, a system that can generate a globally unique ID is very necessary.

image-20230712162743217

Part of the mandatory requirements for ID generation rules

  1. globally unique: Duplicate ID numbers cannot appear. Since it is a unique identifier, this is the most basic requirement.
  2. increasing trend: Clustered indexes are used in MySQL's InnoDB engine. Since most RDBMSs use Btree data structures to store indexes, we should try to use ordered primary keys to ensure write performance in the selection of primary keys.
  3. monotonically increasing: Ensure that the next ID must be greater than the previous ID, such as transaction version number, IM incremental message, sorting and other special requirements.
  4. information security: If the ID is continuous, it is very easy for malicious users to crawl, just download the specified URL in order; if it is an order number, it is dangerous, and competitors can directly know our daily order quantity. Therefore, in some application scenarios, the ID needs to be irregular/irregular, making it difficult for competitors to guess.
  5. with timestamp: In this way, you can quickly understand the generation time of this distributed ID during development.

Usability Requirements for ID Number Generation Systems

  1. high availability: Issue a request to obtain a distributed ID, and the server must guarantee 99.999% of the cases to create a unique distributed ID for me.
  2. low latency: Send a request to obtain a distributed ID, and the server must be fast and extremely fast.
  3. High QPS: For example, if 100,000 distributed ID creation requests are sent concurrently, the server must be able to withstand and successfully create 100,000 distributed IDs at once.

General general plan

UUID

UUID.randomUUID(), the standard form of UUID (Universally Unique Identifier) ​​contains 32 hexadecimal numbers, divided into 5 segments by hyphens, and the form is 8-4-4-4-1236 characters, for example:550e8400-e29b-41d4-a716-446655440000

Very high performance: locally generated (under java.util package), no network consumption.

Problem : If only uniqueness is considered, no problem, but the performance of UUID into the database is very poor, because it is out of order .

  1. Unordered, it is impossible to predict the order of its generation, and it is impossible to generate incrementally ordered numbers

    First of all, the distributed id is generally used as the primary key, but according to the official recommendation of MySQL, the primary key should be as short as possible, and each UUID is very long, so it is not recommended.

  2. When the primary key and ID are used as the primary key, there will be some problems in specific environments

    For example, in the scenario of making the DB primary key, UUID is very inapplicable. MySQL officially recommends that the primary key be as short as possible, and a UUID with a length of 36 characters does not meet the requirements.
    image-20230712183255432

  3. Index, splitting of B+ tree index

    Since the distributed ID is the primary key, then the primary key contains the index, and the MySQL index is implemented through the B+ tree. Every time a new UUID data is inserted, in order to optimize the query, the B+ tree at the bottom of the index will be modified. Because the UUID data is out of order, each UUID data insertion will greatly modify the B+ tree of the primary key. Performance of database inserts.

UUID can only guarantee global uniqueness, and does not satisfy the subsequent trend of increasing and monotonically increasing, so it is not recommended.

Database auto-increment primary key

Stand-alone ✅

In distributed, the main principle of the database auto-increment ID mechanism is: the database auto-increment ID and the replace into of the MySQL database.

The function of replace into here is similar to that of insert, the difference is that replace into first tries to insert into the data list, and if it finds that this row of data already exists in the table (judged by the primary key or unique index), it will delete it first and then insert it, otherwise it will directly insert new data.

The meaning of REPLACE INTO is to insert a record, and replace the old data if the value of the unique index in the table encounters a conflict.

CREATE TABLE t_test(
	id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
	stub CHAR(1) NOT NULL DEFAULT '',
	UNIQUE KEY stub (stub)
)

SELECT * FROM t_test;

REPLACE INTO t_test (stub) VALUES('b');

Create a table, execute the first time REPLACE INTO, the number of affected rows is one row:

image-20230712190309092

Make a query:

image-20230712190131703

Execute it a second time REPLACE INTO, at this time the number of lines it affects is two lines:

image-20230712190219088

Make a query:

image-20230712190418175

The original data is deleted, and 2the data with the primary key is newly inserted.

At this point the program satisfies:uniquenessIncrementalmonotonicity

In the distributed case, and the amount of concurrency is not much, you can use this solution to obtain a globally unique ID.

Cluster distributed ❌

Is the database auto-increment ID mechanism suitable for distributed ID? The answer is not very suitable.

  • It is difficult to expand the system horizontally. For example, after defining the step size and the number of machines, what should I do if I want to add machines?

    Assuming that there is a machine that sends numbers: 1, 2, 3, 4, 5 (step size is 1), at this time, one machine needs to be expanded. You can do this: set the initial value of the second machine much higher than the first one, which seems to be fine, but if there are 100 machines online, how to expand the capacity at this time is a nightmare. Therefore, the horizontal expansion scheme of the system is complicated and difficult to realize.

  • The pressure on the database is still very high. Every time an ID is obtained, the database must be read and written once, which greatly affects performance and does not conform to the low latency and high QPS rules in distributed IDs (under high concurrency, if all IDs are obtained from the database, it will greatly affect performance).

  • Using the database auto-increment ID as the primary key in the scenario of sub-database and sub-table may encounter the following problems:

    1. Uniqueness conflict: In the environment of sharding databases and sharding tables, different database instances or tables may have their own auto-increment ID sequences, which may cause conflicts in the generated IDs. For example, if two database instances have an ID auto-increment column that increments from 1, the same ID may be generated, causing a uniqueness conflict.
    2. Data consistency is difficult to guarantee: When data is scattered in different databases and tables, queries and transaction operations need to be performed across multiple databases. This brings some challenges, such as maintaining data consistency and transaction isolation. The auto-increment ID is no longer globally unique, but unique within each shard, so special design and processing are required to ensure data consistency.
    3. Data Migration and Scaling Difficulty: When shards need to be added or data migrated, auto-increment IDs can become a limitation since they are used as identifiers for data. Reassigning ID ranges, resolving conflicts, and maintaining correct associations can become complex and time-consuming.

Generate a global ID policy based on Redis

Stand-alone ✅

Because Redis is single-threaded, atomicity is naturally guaranteed, and atomic operations INCRand can be used INCRBYto achieve.

SET count 10
INCR count    	// 返回 11
INCRBY count 5  // 返回 16

Cluster distributed ✅❌

Note: In the case of Redis cluster, it is also necessary to set different growth steps like MySQL, and at the same time, the key must be set with an expiration date.

Redis Cluster can be used for higher throughput.

Assuming that there are 5 Redis in a cluster, you can initialize the value of each Redis to 1, 2, 3, 4, 5 respectively, and then set the step size to 5.

The ID generated by each Redis is:

A:1 6 11 16 21
B:2 7 12 17 22
C:3 8 13 18 23
D:4 9 14 19 24
E:5 10 15 20 25

But the problem is that the maintenance and configuration of the Redis cluster is cumbersome, because it needs to be set to avoid single point of failure, sentinel mechanism, etc.

If we need to introduce the entire Redis cluster for an ID, it feels like killing a chicken with a sledgehammer, and it is not recommended.

Snowflake - Snowflake algorithm

The above three solutions: UUID, database self-incrementing primary key, and global ID generation strategy based on Redis can all be implemented, but each has its own advantages and disadvantages.

Next, let's take a look at the snowflake algorithm.

overview

Twitter's distributed self-incrementing ID algorithm - Snowflake.

Initially, Twitter migrated the storage system from MySQL to Cassandra (an open source distributed NoSQL database system developed by Facebook), because Cassandra does not have a sequential ID generation mechanism, so it developed such a globally unique ID generation service.

Twitter's distributed snowflake algorithm SnowFlake, after testing, SnowFlake can generate 260,000 self-incrementing and sortable IDs per second.

  1. Twitter's SnowFlake generation ID can be generated sequentially in time
  2. The result of the ID generated by the SnowFlake algorithm is a 64Bit integer, which is a Long type (the maximum length is 19 after being converted into a string)
  3. There will be no ID collision in the distributed system (distinguished by the datacenter data center and workerID machine code) and the efficiency is high

In a distributed system, there are some scenarios that require a globally unique ID. The basic requirements for generating an ID are:

  1. In a distributed environment, global uniqueness must be
  2. Generally, it needs to increase monotonically, because generally the unique ID will exist in the database, and the characteristic of InnoDB is to store the content in the leaf node on the primary key index, and it increases from left to right. Considering the performance of the database, it is generally best to generate an ID that is monotonically increasing. In order to prevent ID conflicts, 36-bit UUID can be used, but UUID has some disadvantages. First, it is relatively long. In addition, UUID is generally disordered.
  3. There may also be no rules, because if you use the unique ID as the order number, in order not to let others know how much the order is in a day, you need this rule

structure

Several core components of the snowflake algorithm:

image-20230712193219410

Number segment analysis:

  1. 1bit - sign bit

    No, because the highest bit in binary is the sign bit, 1 means a negative number and 0 means a positive number. The generated ID is generally a positive number, so the highest bit is fixed at 0.

  2. 41bit - Timestamp bits, used to record timestamps, in milliseconds:

    41 bits can represent 2^{41}-1numbers

    If it is only used to represent positive integers (positive numbers in computers contain 0), the range that can be represented is: 0to 2^{41}-1, minus 1 because the range of values ​​​​that can be represented is calculated from 0, not from 1.

    That is to say, 41 bits can represent 2^{41}-1the value of milliseconds, and the conversion into a unit year is (2^{41}-1)/(1000*60*60*24*365) = 69years

  3. 10Bit - Work machine ID

    Can be deployed on 2^{10} = 1024nodes, including 5 digits datacenterId(data center, computer room) and 5 digits workerID(machine code)

    The largest positive integer that can be represented by 5 digits is 2^{5}-1 = 31a number, that is, the 32 numbers [0, 1, 2, 3, ..., 31] can be used to represent different data centers and machine codes

  4. 12 bit - serial number bit

    The positive integers that can be used are 2^{12}-1 = 4095, that is, the 4095 numbers [0, 1, 2, ..., 4094] can be used to represent 4095 ID serial numbers generated by the same machine within the same time stamp (milliseconds).

SnowFlake guarantees:

  • All generated IDs are incremented by time trend
  • There will be no duplicate IDs in the entire distributed system, because there are datacenterId and workerId to distinguish

source code

However, Twitter's snowflake algorithm is written in Scala language, but some predecessors have implemented it in Java: GitHub address

package com.example.demo.util;

/**
 * @author: yuanyuan.jing
 * @date: 2018/11/30 11:09
 * @description: 雪花算法生成全局唯一id
 */
public class SnowflakeIdWorker {
    
    
    /**
     * Twitter_Snowflake<br>
     * SnowFlake的结构如下(每部分用-分开):<br>
     * 0 - 0000000000 0000000000 0000000000 0000000000 0 - 00000 - 00000 - 000000000000 <br>
     * 1位标识,由于long基本类型在Java中是带符号的,最高位是符号位,正数是0,负数是1,所以id一般是正数,最高位是0<br>
     * 41位时间截(毫秒级),注意,41位时间截不是存储当前时间的时间截,而是存储时间截的差值(当前时间截 - 开始时间截)
     * 得到的值),这里的的开始时间截,一般是我们的id生成器开始使用的时间,由我们程序来指定的(如下下面程序IdWorker类的startTime属性)。41位的时间截,可以使用69年,年T = (1L << 41) / (1000L * 60 * 60 * 24 * 365) = 69<br>
     * 10位的数据机器位,可以部署在1024个节点,包括5位datacenterId和5位workerId<br>
     * 12位序列,毫秒内的计数,12位的计数顺序号支持每个节点每毫秒(同一机器,同一时间截)产生4096个ID序号<br>
     * 加起来刚好64位,为一个Long型。<br>
     * SnowFlake的优点是,整体上按照时间自增排序,并且整个分布式系统内不会产生ID碰撞(由数据中心ID和机器ID作区分),并且效率较高,经测试,SnowFlake每秒能够产生26万个ID左右。
     */


        // ==============================Fields==============================
        /** 开始时间截 (2015-01-01) */
        private final long twepoch = 1420041600000L;

        /** 机器id所占的位数 */
        private final long workerIdBits = 5L;

        /** 数据标识id所占的位数 */
        private final long datacenterIdBits = 5L;

        /** 支持的最大机器id,结果是31 (这个移位算法可以很快的计算出几位二进制数所能表示的最大十进制数) */
        private final long maxWorkerId = -1L ^ (-1L << workerIdBits);

        /** 支持的最大数据标识id,结果是31 */
        private final long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);

        /** 序列在id中占的位数 */
        private final long sequenceBits = 12L;

        /** 机器ID向左移12位 */
        private final long workerIdShift = sequenceBits;

        /** 数据标识id向左移17位(12+5) */
        private final long datacenterIdShift = sequenceBits + workerIdBits;

        /** 时间截向左移22位(5+5+12) */
        private final long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;

        /** 生成序列的掩码,这里为4095 (0b111111111111=0xfff=4095) */
        private final long sequenceMask = -1L ^ (-1L << sequenceBits);

        /** 工作机器ID(0~31) */
        private long workerId;

        /** 数据中心ID(0~31) */
        private long datacenterId;

        /** 毫秒内序列(0~4095) */
        private long sequence = 0L;

        /** 上次生成ID的时间截 */
        private long lastTimestamp = -1L;

        // ==============================Constructors==============================
        /**
         * 构造函数
         * @param workerId 工作ID (0~31)
         * @param datacenterId 数据中心ID (0~31)
         */
        public SnowflakeIdWorker(long workerId, long datacenterId) {
    
    
            if (workerId > maxWorkerId || workerId < 0) {
    
    
                throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));
            }
            if (datacenterId > maxDatacenterId || datacenterId < 0) {
    
    
                throw new IllegalArgumentException(String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));
            }
            this.workerId = workerId;
            this.datacenterId = datacenterId;
        }

        // ==============================Methods==============================
        /**
         * 获得下一个ID (该方法是线程安全的)
         * @return SnowflakeId
         */
        public synchronized long nextId() {
    
    
            long timestamp = timeGen();

            // 如果当前时间小于上一次ID生成的时间戳,说明系统时钟回退过这个时候应当抛出异常
            if (timestamp < lastTimestamp) {
    
    
                throw new RuntimeException(
                        String.format("Clock moved backwards.  Refusing to generate id for %d milliseconds", lastTimestamp - timestamp));
            }

            // 如果是同一时间生成的,则进行毫秒内序列
            if (lastTimestamp == timestamp) {
    
    
                sequence = (sequence + 1) & sequenceMask;
                // 毫秒内序列溢出
                if (sequence == 0) {
    
    
                    // 阻塞到下一个毫秒,获得新的时间戳
                    timestamp = tilNextMillis(lastTimestamp);
                }
            }
            // 时间戳改变,毫秒内序列重置
            else {
    
    
                sequence = 0L;
            }

            // 上次生成ID的时间截
            lastTimestamp = timestamp;

            // 移位并通过或运算拼到一起组成64位的ID
            return ((timestamp - twepoch) << timestampLeftShift)
                    | (datacenterId << datacenterIdShift)
                    | (workerId << workerIdShift)
                    | sequence;
        }

        /**
         * 阻塞到下一个毫秒,直到获得新的时间戳
         * @param lastTimestamp 上次生成ID的时间截
         * @return 当前时间戳
         */
        protected long tilNextMillis(long lastTimestamp) {
    
    
            long timestamp = timeGen();
            while (timestamp <= lastTimestamp) {
    
    
                timestamp = timeGen();
            }
            return timestamp;
        }

        /**
         * 返回以毫秒为单位的当前时间
         * @return 当前时间(毫秒)
         */
        protected long timeGen() {
    
    
            return System.currentTimeMillis();
        }

    	// ==============================Test==============================
        public static void main(String[] args) {
    
    
            SnowflakeIdWorker idWorker = new SnowflakeIdWorker(0, 0);
            for (int i = 0; i < 100; i++) {
    
    
                long id = idWorker.nextId();
                //System.out.println(Long.toBinaryString(id));
                System.out.println(id);
            }
        }

}

But we will not use this code in actual work, just for learning, we will use Hutool toolkit in the project.

Project landing experience

Hutool toolkit: https://github.com/looly/hutool

SpringBoot integrates snowflake algorithm

  • pom.xml

    <dependency>
        <groupId>cn.hutool</groupId>
        <artifactId>hutool-captcha</artifactId>
        <version>4.6.8</version>
    </dependency>
    
  • New tool class

    package com.atguigu.boot.util;
    
    package cn.itedus.lottery.domain.support.ids;
    
    import cn.hutool.core.lang.Snowflake;
    import cn.hutool.core.net.NetUtil;
    import cn.hutool.core.util.IdUtil;
    import lombok.extern.slf4j.Slf4j;
    import org.springframework.stereotype.Component;
    import javax.annotation.PostConstruct;
    
    @Slf4j
    @Component
    public class IdGeneratorSnowflake {
          
          
    
        private long workerId = 0;
        private long datacenterId = 1;
        private Snowflake snowflake = IdUtil.createSnowflake(workerId, datacenterId);
    
        @PostConstruct
        public void init() {
          
          
            try {
          
          
                workerId = NetUtil.ipv4ToLong(NetUtil.getLocalhostStr());
                log.info("当前机器的workerId:{}", workerId);
            } catch(Exception e) {
          
          
                e.printStackTrace();
                log.warn("当前机器的workerId获取失败", e);
                workerId = NetUtil.getLocalhostStr().hashCode();
            }
        }
    
        public synchronized long snowflakeId() {
          
          
            return snowflake.nextId();
        }
    
        public synchronized long snowflakeId(long workerId, long datacenterId) {
          
          
            Snowflake snowflake = IdUtil.createSnowflake(workerId,datacenterId);
            return snowflake.nextId();
        }
    
        // 测试
        public static void main(String[] args) {
          
          
            System.out.println(new IdGeneratorSnowflake().snowflakeId());
        }
    }
    
    

    image-20230713111923617

  • used in business class

    @Service
    public class OrderService {
          
          
        
    	@Autowired
        private IdGeneratorSnowflake idGeneratorSnowflake;
    
        public String getIDBySnowFlake() {
          
          
            // 利用线程池 同时创建多个id
            ExecutorService threadPool = Executors.newFixedThreadPool(5);
            for (int i = 1; i <= 20; i++) {
          
          
                threadPool.submit(() -> {
          
          
                    log.info("{}", idGeneratorSnowflake.snowflakeId());
                });
            }
            threadPool.shutdown();
            return "hello snowflake";
        }
    }
    

    image-20230713124911348

    image-20230713124853468

    The generated IDs are all unique, but the order in which we print them is different in a multi-threaded scenario.

Advantages and disadvantages

advantage

  • The number of milliseconds is in the high position, the self-increment sequence is in the low position, and the entire ID is increasing in trend
  • It does not rely on third-party systems such as databases, and is deployed as a service, with higher stability and high ID generation performance.
  • Bits can be allocated according to their own business characteristics, very flexible

shortcoming

  • Rely on the machine clock, if the machine clock is dialed back, it will cause duplicate ID generation
  • It is incremental on a single machine, but due to the distributed environment, the clocks on each machine cannot be completely synchronized, and sometimes it is not a global increment.

other supplements

If you want to completely solve the clock problem, these two major manufacturers have open sourced mature solutions and optimized the snowflake algorithm:

Reference station b: Shang Silicon Valley

Guess you like

Origin blog.csdn.net/weixin_53407527/article/details/131700381