JD.com: How to choose a distributed ID generation solution? So well written!

background

In distributed systems, it is often necessary to use a globally unique ID generator to identify the data that needs to be stored. What kind of ID generator do we need?

In addition to being the unique identifier of data, the ID generator generally needs to assume more responsibilities in the system. In summary, the following points are as follows:

Uniqueness: "Globally unique" vs "Business unique"?

If a distributed system uses a unique ID generator, there will be a very serious application mutual exclusion problem. Mutually exclusive locking means a decrease in cost and performance, making it difficult to implement a high-performance and highly reliable architecture. In business systems, globally unique IDs are often not needed.

For example, in a communication system, chat messages do not need to be globally unique. The ID that identifies a message sent by a user only needs to ensure user uniqueness. Because the message itself belongs to a certain user, the uniqueness of the user already implies the "global unique ID (= user ID + message ID)".

Time related: "seconds" vs "milliseconds"?

Time is naturally unique, so it is also the choice of many designs. But for an 8Byte ID, the time is not that much. If you are accurate to the second level, you will have to use 30 bits for thirty years. If you are accurate to the millisecond level, you will need to add another 10 bits. You will only have 20 bits left to do other things. One ID per second or one millisecond is obviously not enough. As mentioned just now, there are 20 bits that can be used for other things, including a SequenceID.

Is the time in seconds or milliseconds? In fact, when milliseconds are not used, the free 10 bits can be sent to Sequence, but the accuracy of the entire ID will be reduced. Peak speed is a more realistic consideration. The space of the Sequence determines the speed of the peak, and the peak means that it will not last too long. In this regard, 1 million per second is a smaller limit than 1000 per millisecond.

Order: "roughly ordered" vs "precisely ordered"?

First of all, if you want to achieve precise ordering, you need to control the sequence concurrency, and the performance will definitely be compromised. Secondly, only one ID can be generated at the same time, which means that only one ID generation service instance can provide services at the same time. Accurate and orderly operation will also face disaster recovery problems.

Another option is that the ordering is no longer guaranteed at the second level, and the entire ID is only guaranteed to be ordered in time. The ID in the next second is definitely larger than the ID in the previous second, but the ID taken later in the same second may be smaller than the previous one. Rough ordering is critical when using it, and only if it is business acceptable can it be a candidate.

plan the details

Take a look at how the industry designs ID generators

SnowFlake

41 bits are reserved for milliseconds, 10 bits are reserved for the machine (MachineID), and the remaining 12 bits are reserved for Sequence.

Weibo

Weibo has 30 bits of second-level time, 4 bits to distinguish IDC, 2 bits to distinguish services, and 15 bits to Sequence. The theoretical upper limit is 3.2w/s speed. Since the current numbering service is centrally located in the computer room, 1 bit is used to distinguish hot standby. In the end, 64bit was not fully used.

Flicker

Flicker uses MySQL's self-increasing ID mechanism (auto_increment + replace into + MyISAM) in its global ID generation solution. On our application side, we need to do the following two operations and submit them in a transaction session:

REPLACE INTO Tickets64 (stub) VALUES ('a');
SELECT LAST_INSERT_ID();

Flicker enables two database servers to generate IDs for disaster recovery, and generates odd and even IDs by distinguishing the starting value and step size of auto_increment.

TicketServer1:
auto-increment-increment = 2
auto-increment-offset = 1

TicketServer2:
auto-increment-increment = 2
auto-increment-offset = 2

WeChat

WeChat uses MySQL to persist the maximum unallocated ID, and each time a section is taken from the DB and placed in memory to be allocated to the caller. WeChat's ID generation is strictly incremental, which means that only one machine can provide services at the same time. Therefore, arbitration service + lease mechanism + routing table are used for disaster recovery.

How does Shopee Feeds generate ID?

Considering the characteristics of the Feeds business, precise ordering is not required, so we use the snowflake algorithm for ID generation. Use 39 (milliseconds) + 5 (machine) + 9 (seq) to ensure that the ID as the Redis score will not overflow.

The score of the Redis ordered set uses a double-precision 64-bit floating point number, expressed as an IEEE 754 floating point number. The integer range it can include is -(2^53) to +(2^53)

Such an ID generator can be used for about 17 years, which is enough for the life cycle of a product.

For problems caused by time dialback, because the frequency of occurrence is extremely small, it only needs to be judged simply. If currentMillis <= lastTime is not satisfied, an error will be returned.

Author: cyningsun
Source: www.cyningsun.com/12-26-2018/id-generator.html

Recommended recent popular articles:

1. Compilation of 1,000+ Java interview questions and answers (2022 latest version)

2. Explosive! Java coroutines are coming. . .

3. Spring Boot 2.x tutorial, so complete!

4. Stop filling the screen with explosive categories and try the decorator mode. This is the elegant way! !

5. "Java Development Manual (Songshan Edition)" is newly released, download it quickly!

If you think it’s good, don’t forget to like and retweet!

Guess you like

Origin blog.csdn.net/youanyyou/article/details/132894449