Appreciation of Design Ideas-Distributed ID Generation Algorithm-Snowflake Algorithm


How to generate unique ID?

In the use of the database, according to the design criteria of the second normal form: each row in the database must be uniquely distinguishable, so we often need to generate a unique id. In the era of RDBMS (relational database management system), the database provides sequence generators, such as Oracle's sequence, mysql's increment field and so on. RDBMS is a centralized environment (single-machine environment). The only thing in the world is that the current machine only needs to say it. But in distributedIn the environment (decentralization), where multiple hosts coexist, how to let them automatically generate IDs that are not repeated globally?

image

The main solutions are in the following two categories

image

Method 1: still adopt a centralized approach

    A batch of sequences are pre-generated in the RDBMS, and each node in the distributed environment obtains a number segment from the RDBMS when it is started, and uses it separately. The segment model of Meituan leaf belongs to this type.

image

Method 2: Adopt decentralized thinking

    A rule is agreed that each node in a distributed environment generates a globally unique id by itself. UUID, GUID, and snowflake algorithms all fall into this category.




snowflake algorithm ❉

image

image

In fact, many innovative methods are very simple, and so is the snowflake algorithm. We need to learn its design ideas, and this method can be applied to IDs in a distributed environment.


The snowflake algorithm is open sourced by Twitter and set 64 bits [Thinking: Why is it 64 bits? ], consists of four parts: first, timestamp, machine id and auto-increment sequence.


  • The first bit, 1 bit, is fixed to 0; [Thinking: Why is the first bit 0?

  • Timestamp, 41 bits, the millisecond time difference between the current time and the specified date; [Thinking: Why is the time difference?

  • Cluster node id, 10 bits, up to 2^10, a total of 1024 machines;

  • Self-incrementing sequence, 12 bits, up to 2^12, a total of 4096 ids.



There are no two identical snowflakes in the world

     When each node generates an id, the generated id will be locally unique due to the difference between the timestamp and the auto-increment sequence; plus the cluster node id, it will naturally be globally unique. Therefore, the snowflake algorithm achieves that "there are no two pieces of the same in the world. "Snowflake" purpose.

image

     At the same time, the timestamp is measured in milliseconds, and each millisecond can support up to 4096 IDs. Therefore, each node can generate 4096000 IDs per second, and the generated IDs are (2^41-1)/86400/365/1000=69 It won’t exceed 41 places until a year later, and it’s enough to deal with any amount.

Design core

So the core of its design is:

1. The self-incrementing id used cyclically to ensure that it is locally unique within a certain time;

2. Millisecond timestamp, providing a large number of IDs generated in seconds to respond to high requests;

3. The cluster node id is guaranteed to be globally unique.


image


      If the design idea is understood, then corresponding improvements can be made. For example, Baidu has more than 1024 clusters, what should I do?


      Baidu adjusted the snowflake algorithm, and his uid is 1bit first + 28bit timestamp + 22bit machine id + 13bit serial number. So Baidu uid supports 2^22=4194304 nodes, and each node can generate 2^13=8192 IDs per second. But the timestamp has become shorter and can only be supported to the second level, so the id generated by this algorithm will exceed the length of 28bit after (2^28-1)/86400/365=8.5 years.


Guess you like

Origin blog.51cto.com/15127541/2665028