Summary of common distributed ID solutions: databases, algorithms, open source components

Distributed ID

Distributed ID (Distributed ID) refers to generating a globally unique identifier in a distributed system to identify different entities or data objects. In a distributed system, since data storage, computation, and processing are distributed across different nodes, a reliable way to track and identify these data objects is required.

Distributed ID minimum requirements:

全局唯一 :ID 的全局唯一性肯定是首先要满足的

高性能 : 分布式 ID 的生成速度要快,对本地资源消耗要小

高可用 :生成分布式 ID 的服务要保证可用性无限接近于 100%

方便易用 :拿来即用,使用方便,快速接入

Excellent Distributed ID

安全 :ID 中不包含敏感信息

有序递增 :如果ID存放在数据库,ID的有序性可以提升数据库写入速度。有利于ID来进行排序

有具体的业务含义 :生成的 ID 如果能有具体的业务含义,可以让定位问题以及开发更透明化(通过 ID 就能确定是哪个业务)

独立部署 :分布式系统单独有一个发号器服务,专门用来生成分布式 ID

Database of Distributed ID Scheme

Database primary key auto-increment

The database auto-increment ID is realized by setting an auto-increment ID field when creating a table in the database. Whenever a record is inserted, the database automatically generates a unique ID for that record.

The database auto-increment ID can well guarantee the uniqueness of the ID, but in a high-concurrency and large-scale distributed system, bottlenecks and performance problems are prone to occur. At the same time, since the database auto-increment ID can only guarantee the uniqueness in a single database, it is necessary to support the generation on multiple machines by sub-database and sub-table.

In short:

简单方便,有序递增,方便排序和分页

并发性能不高,受限于数据库性能

分库分表,需改造,较复杂

自增数据量泄露

Database number segment mode

In the mode of auto-incrementing the primary key of the database, the database must be accessed once every time the ID is obtained, and the pressure on the database is high. Therefore, it can be obtained in batches and then stored in the memory. When it is needed, it can be used directly from the memory

primary key auto increment

1,2,3......

Number segment mode: assign a number segment for each request

100,200,300

1...100,101...200,201...300

Compared with the auto-increment of the primary key, the number segment mode: the performance is improved and the auto-increment

Redis self-increment

Redis can realize distributed ID generation through auto-increment command. A common method is to use the Redis auto-increment command INCR to auto-increment a specific key and return it as an ID. This method is thread-safe and can be used in distributed systems

即使有AOF和RDB,但是依然会存在数据丢失的可能,有可能会造成ID重复

性能不错并且生成的 ID 是有序递增的,但是自增存在数据量泄露

MongoDB

MongoDB ObjectId is a built-in data type in the MongoDB database, which is used to uniquely identify MongoDB documents (Document).

It consists of 12 bytes, where the first 4 bytes represent a timestamp, the next 3 bytes represent a machine ID, then 2 bytes represent a process ID, and the last 3 bytes represent a random value.

Advantages and disadvantages:

生成的 ID 是有序递增的

当机器时间不对的情况下,可能导致会产生重复 ID

ID生成有规律性,存在安全性问题

Algorithm of Distributed ID Scheme

UUID

UUID is a universal unique identification code, which is composed of a set of algorithms and standards, which can guarantee uniqueness on a global scale. UUID does not depend on any central node, and can well guarantee the uniqueness of ID in a distributed system. The disadvantage is that the ID it generates is relatively long, which is not conducive to indexing and querying

The Open Software Foundation (OSF) specification defines elements including network card MAC address, time stamp, namespace (Namespace), random or pseudo-random number, and timing. Use these elements to generate a UUID.

Advantages and disadvantages:

通过本地生成,没有经过网络I/O,性能较快

无序,无法预测他的生成顺序

存储消耗空间大(32 个字符串,128 位)

不能生成递增有序的数字

当机器时间不对的情况下,可能导致会产生重复 ID

Snowflake (snowflake algorithm)

The Snowflake Algorithm is a distributed ID generation algorithm proposed by Twitter. The snowflake algorithm can generate unique IDs on multiple machines and supports high concurrency and large-scale distributed systems, but it needs to ensure the uniqueness of data center IDs and machine IDs.

Its principle is to divide a 64-bit long type ID into 4 parts: timestamp, data center ID, machine ID and serial number.

The timestamp occupies 42 bits and can be used for 69 years. The data center ID and machine ID occupy 5 bits respectively, which can support 32 data centers and 32 machines. The serial number occupies 12 bits, which can support each node to generate every millisecond 4096 IDs.

To be more specific: the generated 64-bit ID can be divided into 5 parts:

1位符号位标识 - 41位时间戳 - 5位数据中心标识 - 5位机器标识 - 12位序列号

insert image description here

time limit

2^41/(365*24*60*60*1000)=69年

number of worker processes

5+5 :区域+服务器标识

2^10=1024

number of serial numbers

2^12=4096
section effect illustrate
1bit Reserved The long basic type is signed in Java, the highest bit is the sign bit, the positive number is 0, and the negative number is 1
41bit Timestamp, accurate to milliseconds What is stored is the difference of the time cut (current time cut - start time cut), the result is approximately equal to 69.73 years
5bit data center Supports up to 2 to the 5th power (32) nodes
5bit machine id Supports up to 2 to the 5th power (32) nodes
12bit counter in milliseconds Each node generates up to 212 (4096) ids per millisecond

By default, the 41-bit timestamp can support the algorithm until 2082, the 10-bit working machine ID can support 1024 machines, and the serial number supports 1 millisecond to generate 4096 self-incrementing serial IDs. The advantage of SnowFlake is that it is generally sorted according to time increment, and ID collisions will not occur in the entire distributed system (distinguished by data center ID and machine ID), and the efficiency is high. After testing, SnowFlake can generate 26 per second. 10,000 ID or so

Advantages and disadvantages:

生成速度比较快、生成的 ID 有序递增、比较灵活

依赖时间,当机器时间不对的情况下,可能导致会产生重复 ID

Use of Snowflake Algorithm

IdWorker tool class

/**
 * Twitter的Snowflake JAVA实现方案
 * 分布式自增长ID
 */
public class IdWorker {
    
    
    // 时间起始标记点,作为基准,一般取系统的最近时间(一旦确定不能变动)
    private final static long twepoch = 1288834974657L;
    // 机器标识位数
    private final static long workerIdBits = 5L;
    // 数据中心标识位数
    private final static long datacenterIdBits = 5L;
    // 机器ID最大值
    private final static long maxWorkerId = -1L ^ (-1L << workerIdBits);
    // 数据中心ID最大值
    private final static long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);
    // 毫秒内自增位
    private final static long sequenceBits = 12L;
    // 机器ID偏左移12位
    private final static long workerIdShift = sequenceBits;
    // 数据中心ID左移17位
    private final static long datacenterIdShift = sequenceBits + workerIdBits;
    // 时间毫秒左移22位
    private final static long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;

    private final static long sequenceMask = -1L ^ (-1L << sequenceBits);
    /* 上次生产id时间戳 */
    private static long lastTimestamp = -1L;
    // 0,并发控制
    private long sequence = 0L;

    private final long workerId;
    // 数据标识id部分
    private final long datacenterId;

    public IdWorker() {
    
    
        this.datacenterId = getDatacenterId(maxDatacenterId);
        this.workerId = getMaxWorkerId(datacenterId, maxWorkerId);
    }

    /**
     * @param workerId     工作机器ID
     * @param datacenterId 序列号
     */
    public IdWorker(long workerId, long datacenterId) {
    
    
        if (workerId > maxWorkerId || workerId < 0) {
    
    
            throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));
        }
        if (datacenterId > maxDatacenterId || datacenterId < 0) {
    
    
            throw new IllegalArgumentException(String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));
        }
        this.workerId = workerId;
        this.datacenterId = datacenterId;
    }

    /**
     * 获取下一个ID
     *
     * @return
     */
    public synchronized long nextId() {
    
    
        long timestamp = timeGen();
        if (timestamp < lastTimestamp) {
    
    
            throw new RuntimeException(String.format("Clock moved backwards.  Refusing to generate id for %d milliseconds", lastTimestamp - timestamp));
        }

        if (lastTimestamp == timestamp) {
    
    
            // 当前毫秒内,则+1
            sequence = (sequence + 1) & sequenceMask;
            if (sequence == 0) {
    
    
                // 当前毫秒内计数满了,则等待下一秒
                timestamp = tilNextMillis(lastTimestamp);
            }
        } else {
    
    
            sequence = 0L;
        }
        lastTimestamp = timestamp;
        // ID偏移组合生成最终的ID,并返回ID
        long nextId = ((timestamp - twepoch) << timestampLeftShift)
                | (datacenterId << datacenterIdShift)
                | (workerId << workerIdShift) | sequence;

        return nextId;
    }

    private long tilNextMillis(final long lastTimestamp) {
    
    
        long timestamp = this.timeGen();
        while (timestamp <= lastTimestamp) {
    
    
            timestamp = this.timeGen();
        }
        return timestamp;
    }

    private long timeGen() {
    
    
        return System.currentTimeMillis();
    }

    /**
     * <p>
     * 获取 maxWorkerId
     * </p>
     */
    protected static long getMaxWorkerId(long datacenterId, long maxWorkerId) {
    
    
        StringBuffer mpid = new StringBuffer();
        mpid.append(datacenterId);
        String name = ManagementFactory.getRuntimeMXBean().getName();
        if (!name.isEmpty()) {
    
    
            /*
             * GET jvmPid
             */
            mpid.append(name.split("@")[0]);
        }
        /*
         * MAC + PID 的 hashcode 获取16个低位
         */
        return (mpid.toString().hashCode() & 0xffff) % (maxWorkerId + 1);
    }

    /**
     * <p>
     * 数据标识id部分
     * </p>
     */
    protected static long getDatacenterId(long maxDatacenterId) {
    
    
        long id = 0L;
        try {
    
    
            InetAddress ip = InetAddress.getLocalHost();
            NetworkInterface network = NetworkInterface.getByInetAddress(ip);
            if (network == null) {
    
    
                id = 1L;
            } else {
    
    
                byte[] mac = network.getHardwareAddress();
                id = ((0x000000FF & (long) mac[mac.length - 1])
                        | (0x0000FF00 & (((long) mac[mac.length - 2]) << 8))) >> 6;
                id = id % (maxDatacenterId + 1);
            }
        } catch (Exception e) {
    
    
            System.out.println(" getDatacenterId: " + e.getMessage());
        }
        return id;
    }


    public static void main(String[] args) {
    
    

        IdWorker idWorker = new IdWorker(0, 0);
        for (int i = 0; i < 10000; i++) {
    
    
            long nextId = idWorker.nextId();
            System.out.println(nextId);
        }
    }

}

Configure Distributed ID Generator

application.ym add configuration

workerId: 0
datacenterId: 0

IdWorker is added to the container

	@Value("${workerId}")
    private Integer workerId;@Value("${datacenterId}")
    private Integer datacenterId;@Bean
    public IdWorker idWorker(){
    
    
        return new IdWorker(workerId,datacenterId);
    }

Open Source Components of the Distributed ID Solution

uid- generator (Baidu)

UidGenerator is a unique ID generator based on Snowflake open sourced by Baidu, which is an improvement on Snowflake

GitHub:https://github.com/baidu/uid-generator
insert image description here

Tinyid

Tinyid is a unique ID generator based on the database number segment mode open sourced by Didi.

GitHub: https://github.com/didi/tinyid
insert image description here

Leaf (Meituan)

Leaf is a distributed ID solution open sourced by Meituan. Two modes, number segment mode and Snowflake, are provided to generate distributed IDs.

Leaf currently covers Meituan-Dianping’s internal finance, catering, food delivery, hotel tourism, Maoyan Movies and many other business lines. On the basis of 4C8G VM, through the company's RPC method, the QPS pressure test result is nearly 5w/s, and TP999 1ms.

Leaf design document:https://tech.meituan.com/2017/04/21/mt-leaf.html

GitHub:https://github.com/meituan-diaNPing/leaf

Comparison of the three

百度:只支持雪花算法

滴滴:只支持数据库号段,多DB,高可用,java- client,适合对id有高可用需求

美团:号段模式和 snowflake模,适合多种场景分布式id

Use of Leaf components

Source code packaging

git clone [email protected]:Meituan-Dianping/Leaf.git
cd Leaf
git checkout feature/spring-boot-starter
mvn clean install -Dmaven.test.skip=true 

Introduce dependencies

目前Leaf最新使用2.0.1.RELEASE的starter版本
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.0.1.RELEASE</version>
    </parent>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
		<!--引入源码编译打包安装到本地的Leaf-->
        <dependency>
            <artifactId>leaf-boot-starter</artifactId>
            <groupId>com.sankuai.inf.leaf</groupId>
            <version>1.0.1-RELEASE</version>
        </dependency>
        <!--zk-->
        <dependency>
            <groupId>org.apache.curator</groupId>
            <artifactId>curator-recipes</artifactId>
            <version>2.6.0</version>
            <exclusions>
                <exclusion>
                    <artifactId>log4j</artifactId>
                    <groupId>log4j</groupId>
                </exclusion>
            </exclusions>
        </dependency>
    </dependencies>

Leaf configuration parameters

Leaf provides two methods of generating IDs (number segment mode and snowflake mode). Both methods can be enabled at the same time, or a certain method can be specified to be enabled (the two methods are disabled by default).

configuration item meaning Defaults
leaf.name leaf Service Name
leaf.segment.enable Whether to enable number segment mode false
leaf.jdbc.url mysql library address
leaf.jdbc.username mysql username
leaf.jdbc.password mysql password
leaf.snowflake.enable Whether to enable snowflake mode false
leaf.snowflake.zk.address zk address in snowflake mode
leaf.snowflake.port Service registration port in snowflake mode

Number segment mode configuration

如果使用号段模式,需要建立DB表,并配置leaf.jdbc.url, leaf.jdbc.username, leaf.jdbc.password

如果不想使用该模式配置leaf.segment.enable=false即可。
CREATE DATABASE leaf

CREATE TABLE `leaf_alloc` (
  `biz_tag` varchar(128)  NOT NULL DEFAULT '',
  `max_id` bigint(20) NOT NULL DEFAULT '1',
  `step` int(11) NOT NULL,
  `description` varchar(256)  DEFAULT NULL,
  `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`biz_tag`)
) ENGINE=InnoDB;

insert into leaf_alloc(biz_tag, max_id, step, description) values('leaf-segment-test', 1, 2000, 'Test leaf Segment Mode Get Id')

Configure leaf.properties under classpath

leaf.name=com.sankuai.leaf.opensource.test
leaf.segment.enable=true
leaf.segment.url=jdbc:mysql://127.0.0.1:3306/leaf
leaf.segment.username=root
leaf.segment.password=123456

Snowflake mode configuration

The algorithm is taken from the open source snowflake algorithm of twitter. If you don't want to use this mode, configure leaf.snowflake.enable=false.

Configure leaf.properties under classpath

在leaf.properties中配置leaf.snowflake.zk.address,配置leaf 服务监听的端口leaf.snowflake.port。
leaf.snowflake.enable=true
leaf.snowflake.address=127.0.0.1
leaf.snowflake.port=2181

Annotate start leaf

使用@EnableLeafServer注解启动leaf
@SpringBootApplication
@EnableLeafServer
public class DistributedIdApplication {
    
    
    public static void main(String[] args) {
    
    
        SpringApplication.run(DistributedIdApplication.class, args);
    }
}

Use of APIs

@RestController
public class IdContoller {
    
    

    @Autowired
    private SegmentService segmentService;

    @Autowired
    private SnowflakeService snowflakeService;

    @GetMapping("/segment")
    public Result segment() {
    
    
//        segmentService.getId("leaf-segment-test").getId();
        return segmentService.getId("leaf-segment-test");
    }

    @GetMapping("/snowflake")
    public Result snowflake() {
    
    
    	// 参数key无实际意义,受迫于统一接口的实现
        return snowflakeService.getId("snowflake");
    }
}

The parameter key has no practical meaning and is forced by the realization of the unified interface

public interface IDGen {
    
    
    Result get(String var1);

    boolean init();
}

 public Result getId(String key) {
    
    
    return this.idGen.get(key);
 } 

In the number segment mode, the parameter key has an important meaning
insert image description here

Number segment mode test

When the database table is initialized

insert image description here
address:http://localhost:8080/segment

insert image description here
After requesting to obtain the id value, the number segment mode is loaded in advance

insert image description here
Visit again after restarting the service, using the new number segment

insert image description here
The number segment mode is loaded in advance again

insert image description here

Snowflake Algorithm Test

address:http://localhost:8080/snowflake
insert image description here

Guess you like

Origin blog.csdn.net/qq_38628046/article/details/119361123