Distributed ID Generator - Snowflake Algorithm

Get into the habit of writing together! This is the 9th day of my participation in the "Nuggets Daily New Plan · April Update Challenge", click to view the event details .

image.png

1. Snowflake

Twitter's Snowflake algorithm is a self-incrementing ID algorithm in a distributed system. IDs can be generated in time order and can be globally unique. Twitter's requirements for the Snowflake algorithm:

performance

  • Each process has at least 10k ids per second
  • Response rate 2ms (including network delay)

coordination

For high availability within and across datacenters, the machines that generate the ids do not need cluster coordination. This means that there is no need to coordinate communication between each service.

Sort directly

Sort (timestamp) without loading the entire object ID

compact

The generated ID should be compact, in other words, the length of the ID should be moderate on the basis of completing the business needs.

High availability

ID generation services should be highly available, such as storage services

Tips: Description of Twitter's Snowflake Algorithm github.com/twitter-arc…

1.1 Data Structure of Snowflake Algorithm

The ID produced by the snowflake algorithm occupies 8 bytes and 64 bits, which is longthe .

image.png

  • The first bit represents the sign bit, and the generated IDs are all positive numbers, so the highest bit is 0

  • Timestamp (41bit), millisecond level timestamp. But the timestamp used in the actual development process uses the difference of the timestamp. 这个差值=当前时间戳 - 开发者设置的固定时间戳, then a 41-bit timestamp can be used for 69 years

    (1L << 41) / (1000L * 60 * 60 * 24 * 365) It is almost 69 years

  • Machine ID (10bit), a total of 1024 machines can be configured, if there are multiple machine rooms in 10bit, the machine room and the machine can be combined

  • Serial number (12bit), each machine can generate 4096 in 1ms (if a machine generates more than 4096 in one millisecond, it needs to be protected)

1.2 System Clock Dependency

NTP should be used to keep the system clock accurate. SnowflakeIt can prevent the influence of non-monotonic clocks, that is, the clocks run backwards. If your clock is running fast, and NTP tells it to repeat for a few milliseconds, it Snowflakewill refuse to generate an id until some time after the last time we generated an id. Run in a mode where ntp doesn't let the clock go backwards.

If the time is called back, the generated ID may be duplicated.

2. Snowflake algorithm Java implementation

/**
 * @author mxsm
 * @date 2022/4/9 21:17
 * @Since 1.0.0
 */
public class SnowflakeGenerator {

    private static final long FIXED_TIMESTAMP = 1649491204306L;

    private int machineId;

    private int sequenceNumber = 0;

    //最后一次生成ID时间
    private volatile long lastTimestamp = -1L;


    public SnowflakeGenerator(int machineId) {
        this.machineId = machineId;
    }

    public synchronized long nextId() {

        //获取当前时间
        long currentTimestamp = System.currentTimeMillis();

        //同一个毫秒内生成ID
        if(currentTimestamp == lastTimestamp){
            sequenceNumber += 1;
            //处理一秒超过4096个
            if(sequenceNumber > 4096){
                while (currentTimestamp <= lastTimestamp){
                    currentTimestamp = System.currentTimeMillis();
                }
                sequenceNumber = 0;
            }
        }else {
            //重置序列号
            sequenceNumber = 0;
        }
        lastTimestamp = currentTimestamp;

        return ( (currentTimestamp - FIXED_TIMESTAMP) << 22) | (machineId << 12) | sequenceNumber;
    }
}
复制代码

Tips: Code address github.com/mxsm/distri…

The above code is a simple implementation.

3. Advantages and disadvantages

advantage:

  • There is no coordination between the ID generation service and the service. Depends on working alone.
  • Local generation of ID generation service has no network consumption, high performance and high availability: it does not depend on the database during generation, and is completely generated in memory
  • High throughput: millions of auto-incrementing IDs can be generated per second

shortcoming:

  • Depends on the consistency with the system time. If the system time is recalled or changed, it may cause ID conflicts or duplicates.

Tips: In my company, there is no production server that deploys the Snowflake algorithm separately, but the generator is directly integrated into the code of each project. The machine ID is the value of the IP address modulo 32. Therefore, there may be a small probability of repetition under high concurrency. within the allowable range

4. Summary

The service cluster of Snowflake algorithm has no coordination and synchronization between services. It can be said that it is a highly available distributed ID availability cluster composed of a single machine. The snowflake algorithm relies on timestamps. If the timestamps are dialed back, there may be duplicate IDs. Generally speaking, the snowflake algorithm is better than Redis to implement UUID and MySQL.

I am an ant carrying an elephant. The article is helpful to you. Like and follow me. If the article is incorrect, please leave a comment~ Thank you

Guess you like

Origin juejin.im/post/7084624216074485774