Detailed explanation of nine distributed ID generation algorithms

1. Introduction to Distributed ID

1. What is a distributed ID?

When the amount of data in our business is not large, a single database and a single table can fully support the existing business, and even larger data can be dealt with with a MySQL master-slave synchronization read-write separation.

But as the data grows day by day, the master-slave synchronization can't be supported. You need to sub-database and sub-table, but after sub-database and sub-table, a unique ID is required to identify a piece of data. The self-incrementing ID of the database is obviously not enough. Demand; for example, our order requires a globally uniquely identified order number, which is the distributed ID

2. What conditions need to be met for distributed ID?

    1. Globally unique: It must be ensured that the ID is globally unique, basic requirements
    1. High availability:
    1. High performance: high availability and low latency, ID generation response requires a block, otherwise it will become a business bottleneck
    1. Simple and universal: the principle of using it right away

 

2. What are the ways to generate distributed IDs?

  1. UUID
  1. Database auto-increment ID
  1. Database multi-master mode
  1. Number mode
  1. Redis
  1. SnowFlake Algorithm (SnowFlake)
  1. Produced by Didi (TinyID)
  1. Baidu (Uidgenerator)
  1. Meituan (Leaf)
  1. MongoDB的ObjectID

 

1、UUID

   ​String uuid = UUID.randomUUID().toString().replaceAll("-","");​

 

advantage:

  • Simple enough, only the above line of code can generate a globally unique ID,

Disadvantages:

  • Because UUID is random and has no specific business meaning, if UUID is used as the order number, it is meaningless. From this UUID, there is no correlation with the order.
  • UUID is an unordered string and does not have the trend of self-increment
  • If the length is too long, storage and query will consume a lot of MySQL performance. If it is used as a primary key, the performance of the index will be very poor.

 

2. Database self-increasing ID

The auto_increment self-incrementing ID based on the database can also generate a distributed ID

 

advantage:

  • Simple implementation, monotonous and self-increasing ID, fast numerical type query speed

Disadvantages:

  • When the traffic surges, MySQL itself is the bottleneck of the system. It is relatively risky to use it to implement distributed services. It is not recommended!

 

3. Database cluster mode

In order to solve the risk of a single DB in the second database, a database cluster model can be used to solve this problem.

  • The master-slave mode can be used to solve the problem of single DB not satisfying high availability. The problem to be considered is: the synchronization between master and slave is delayed in existence. If the master DB is hung up, there may be duplicate ID generation ( what is the reason ? - caused by the delay of master-slave replication)
  • If you are worried that a master node will fail to use it, then do a dual-master mode cluster, that is, two Mysql instances can independently produce self-incrementing IDs. The questions that need to be considered are: 1: The auto-incrementing IDs of the two MySQL instances start from 1, and duplicate IDs will be generated. How to solve this ? 2: What about subsequent DB expansion ?

 

 

4. Number segment mode based on database

 

The number segment mode is one of the mainstream implementations of the current distributed ID generator. The number segment mode can be understood as obtaining self-incrementing IDs in batches from the database, and each time a number range is taken from the database, for example, (1,1000) represents 1000 The specific business service will generate an auto-increment ID of 1~1000 and load it into the memory for specific business services, typically TinyID. The table structure is as follows:

​CREATE TABLE id_generator (​

​  id int(10) NOT NULL,​

​  max_id bigint(20) NOT NULL COMMENT '当前最大id',​

​  step int(20) NOT NULL COMMENT '号段的布长',​

​  biz_type    int(20) NOT NULL COMMENT '业务类型',​

​  version int(20) NOT NULL COMMENT '版本号',​

​  PRIMARY KEY (`id`)​

​) ​

 

When the ID of this batch number segment is used up, apply for a new number segment again from the database, and do an update operation on the max_id​ field. Update max_id = max_id + step. If the update succeeds, it means that the new number segment is successfully obtained, and the new number is successfully obtained. The segment range is ​(max_id ,max_id +step]​.

SQL statement:

​update id_generator set max_id = #{max_id+step} where biz_type = XXX​

 

In order to solve the problem of concurrent updates, use optimistic lock control, optimized SQL:

​update id_generator set max_id = #{max_id+step}, version = version + 1 where version = # {version} and biz_type = XXX​

 

advantage:

  • This distributed ID generation method does not strongly rely on the database, does not frequently access the database, and the pressure on the database is much less

 

5. Based on Redis mode

Redis can also be realized, the principle is to use redis incr command to realize the atomic self-increment of ID.

Redis is an in-memory database, you need to consider the issue of persistence, redis has two persistence methods RDB and AOF

  • RDB: It will periodically take a snapshot for persistence. If redis is not persisted in time after continuous self-increment, Redis will hang, and ID duplication will occur after Redis is restarted.
  • AOF: Persistence of each write command, even if redis hangs, there will be no ID duplication, but due to the particularity of the incr command, it will take too long for Redis to restart and recover data.

 

6. Based on the Snowflake algorithm (Snowflake) mode

Snowflake algorithm (Snowflake) is the ID generation algorithm used by Twitter's internal distributed projects. After open source, it has been widely praised by domestic manufacturers. Under the influence of this algorithm, major companies have successively developed unique distributed generators.
 

Snowflake generates a Long type ID. A Long type occupies 8 bytes, and each byte occupies 8 bits, which means that a Long type occupies 64 bits.

Snowflake ID composition structure: ​positive digits​ (occupies 1 bit) + ​timestamp​ (occupies 41 bits) + ​machine ID​ (occupies 5 bits) + ​data center​ (occupies 5 bits) + ​self-value added​( (Occupying 12 bits), a Long type composed of a total of 64 bits.

  • The first bit (1bit): The highest bit of long in Java is the sign bit representing positive and negative, positive numbers are 0, and negative numbers are 1. Generally, the generated ID is positive, so the default is 0.
  • Timestamp part (41bit): time in milliseconds. It is not recommended to save the current timestamp. Instead, use the difference of (current timestamp-fixed start timestamp) to make the generated ID start from a smaller value; 41 bits The timestamp can use 69 years, (1L << 41) / (1000L * 60 * 60 * 24 * 365) = 69 years
  • Work machine id (10bit): also called ​workId​, this can be flexibly configured, and the machine room or machine number combination can be used.
  • Serial number part (12bit), self-increase support, the same node can generate 4096 IDs in the same millisecond

The core of the Snowflake algorithm is the allocation of the 10-digit work machine ID in the middle, so that the workID can be automatically generated to avoid the assignment of operation and maintenance personnel.

 

advantage:

  • The number of milliseconds is in the high position, the auto-increasing sequence is in the low position, and the entire ID is trending increasing.
  • It does not rely on third-party systems such as databases, and is deployed as a service, with higher stability and high ID generation performance.
  • Bits can be allocated according to their own business characteristics, which is very flexible.

Disadvantages:

  • Strong reliance on the machine clock. If the clock on the machine is dialed back, it will cause duplicate numbers or the service will be unavailable.

 

 

According to the idea of ​​the algorithm, a tool class can be directly generated using the java language, and a local call is used to generate a globally unique ID.

Reference github address: https://github.com/beyondfengyu/SnowFlake

 

7. Produced by Didi (TinyID)

 

​Tinyid​ is developed by Didi, Github address: https://github.com/didi/tinyid

​Tinyid​ is a number segment (1000,2000), (2000,3000], (3000,4000) for each service based on the number segment model

1: Specific realization principle:

  • Tinyid is implemented based on the database numbering algorithm. Simply put, the available id number segments are saved in the database. Tinyid will load the available number segments into the memory, and then generate the id directly in the memory.
  • The available number segment is loaded when the id is first obtained. If the current number segment usage reaches a certain amount, the next available number segment will be loaded asynchronously to ensure that there is always an available number segment in the memory. After the current number segment is used, the next number segment will be replaced with the current number segment. And so on.

2: Architecture diagram

Two access methods are provided:

  1. http access, ID is generated by accessing tinyid-server, where tinyid-server is recommended to be deployed to multiple machines in multiple computer rooms, this method needs to consider network latency
  1. Use tinyid-client to get the id,

advantage:

  • id is generated locally (call AtomicLong.addAndGet method), performance is greatly increased
  • The client's access to the server becomes low frequency, which reduces the pressure on the server, and there is no need to worry about network delays
  • Even if all servers are down, because the client has pre-loaded the number segment, it can still be used for a period of time 

Disadvantages:

  • If the client machine restarts more frequently, more IDs may be wasted ( what is the reason? )

 

8. Baidu (Uidgenerator)

UidGenerator is implemented in Java, a unique ID generator based on Snowflake algorithm. Github address: https://github.com/baidu/uid-generator

UidGenerator works in application projects in the form of components, supports custom workerId digits and initialization strategies, so that it is suitable for scenarios such as automatic restart and drift of instances in virtualized environments such as docker. UidGenerator uses future time to solve the inherent concurrency limitations of sequence ; Use RingBuffer to buffer the generated UID and parallelize the production and consumption of UID.

Specific algorithm implementation:

Snowflake algorithm:

Snowflake algorithm description: the specified machine & the same time & a certain concurrent sequence is unique. Based on this, a 64-bit unique ID (long) can be generated. The byte allocation method shown in the figure above is used by default:

  • sign(1bit)
  • Fixed 1bit symbol identification, that is, the generated UID is a positive number.
  • delta seconds (28 bits)
  • The current time, relative to the time base point "2016-05-20" incremental value, unit: second, can support up to about 8.7 years
  • worker id (22 bits)
  • The machine id can support up to about 420w machine starts. The built-in implementation is allocated by the database at startup, the default allocation strategy is disposable, and subsequent reuse strategies can be provided.
  • sequence (13 bits)
  • Concurrent sequence per second, 13 bits can support 8192 concurrency per second.

 

9. Meituan (Leaf)

Leaf is developed by Meituan, github address: https://github.com/Meituan-Dianping/Leaf

Support number segment mode and Snowflake algorithm mode at the same time, can switch to use.

1: Leaf-segment number segment mode:

The following changes have been made in the scheme of using the database: 

  1. The original scheme had to read and write the database every time the ID was obtained, which caused great pressure on the database. Instead, use the proxy server to obtain in batches, and obtain the value of a segment (step determines the size) each time. Go to the database to obtain a new number segment after use, which can greatly reduce the pressure on the database
  2. The different number issuing requirements of each business are distinguished by the biz_tag field, and the ID of each biz-tag is isolated from each other and does not affect each other

The specific realization of the number segment mode is the same as TinyID

 

2: Leaf-snowflake snowflake algorithm:

The bit design of the snowflake scheme is completely used, that is, the ID number is assembled in the way of "1+41+10+12". In order to solve the problem of workID under centralized authority, Meituan’s Leaf-snowflake is different from Baidu. Baidu generates workID through a database, while Meituan automatically configures wokerID for snowflake nodes through the feature of Zookeeper persistent sequential nodes, so Leaf relies on ZK. Services.

Features:

  1. Weak dependency on ZooKeeper: In addition to going to ZK to get data every time, a workerID file is also cached on the local file system. When ZooKeeper has a problem and the machine has a problem and needs to be restarted, the service can be guaranteed to start normally. Doing this to a weak dependence on ZK
  2. Solve the clock problem: Because this solution relies on time, if the machine's clock is dialed back, it may generate a duplicate ID number, and the clock rollback problem needs to be solved.

 

Guess you like

Origin blog.csdn.net/Crystalqy/article/details/108410616