[Algorithm] Snowflake Algorithm for Generating Distributed IDs

ID is the unique, invariable and non-repetitive identification of data . When querying data in the database, it must be queried through ID. It is an important issue to generate a globally unique ID in a distributed environment.

The snowflake algorithm (snowflake) is an algorithm that generates a globally unique ID in a distributed environment. This algorithm was invented by Twitter and is used to generate tweet IDs. Domestic Baidu's UidGenerator and Meituan's Leaf have optimized the snowflake algorithm, and they are also open source on GitHub.

1. Why do you need a distributed ID?

In the stand-alone scenario, our ID requirements can be met through MySQL's primary key auto-increment.
However, with the increase of system data volume and concurrency pressure, the original stand-alone environment cannot meet the requirements. MySQL needs to be divided into databases and tables, and distributed deployment of servers. At this point, there is a problem with only relying on MySQL's primary key auto-increment. Assuming that it is expanded to two database servers now, and the IDs on the table table1 of each server are incremented from 1, there is an ID conflict at this time. When querying the data with ID = 234 in table1, it is not possible to determine which server the ID is on.

insert image description here

In a distributed environment, data is spread across databases on different servers. How do we generate globally unique primary keys for different data?
The answer is: use distributed IDs !

2. Realization of Snowflake Algorithm

The distributed ID generated by the snowflake algorithm consists of four parts:

  1. The first bit is always 0.
  2. The 2nd to 42nd bits represent the time stamp in milliseconds .
  3. The 43rd to 52nd bits represent the machine ID, and there are up to 1024 machine nodes. This part can be modified according to different services.
  4. The 53rd to 64th bits represent the sequence number, that is, the sequence number of the ID generated by a certain machine within one millisecond . These 12 bits can be used to distinguish IDs generated within one millisecond, and up to 4096 different IDs can be distinguished.

Then within 1ms, up to 1024 x 4096 = 4194304 IDs can be generated.

Needless to say, the advantages of the snowflake algorithm are fast generation, flexible modification, orderly increment of generated ID, etc.

At the same time, its obvious disadvantage is that it needs to solve the problem of duplicate IDs , because it depends on time. When the machine time is not accurate, ID conflicts may occur.

Guess you like

Origin blog.csdn.net/weixin_45651194/article/details/129755661