Distributed generation ID [turn]

Turn: http://blog.51cto.com/fulin0532/2094114

 

Encountered a snowflake algorithm when looking at the code, checked and found that Twitter is a distributed ID generation algorithm, can generate a globally unique ID in a distributed environment, then the Internet to find some of the industry's practices, currently seen Ctrip US program group, do some notes.

Background 1

In complex distributed systems, often requires a large amount of data and messages to be uniquely identified. As in the US group review of financial, payment, catering, hotels, cat's eye movies and other products, systems, data growing, the need for a unique ID to identify a piece of data or news data sub-library sub-table, increment ID database obviously can not meet the demand; in particular, that such an order, the rider, coupons are also required to do a unique ID logo. At this time, a globally unique ID can be generated in the system is necessary.
Summarized down, and that business systems What are the requirements for ID number it?

Requirement 1

1, globally unique: You can not duplicate ID numbers appear, since it is the unique identifier, which is a basic requirement.
2, the trend is incremented: MySQL InnoDB engine for use in a clustered index is, since most RDBMS data structures used to store the index B-tree data, the primary key selected above we should try to ensure that the primary key of an ordered write performance.
3, monotonically increasing: ensuring an ID greater than a certain ID, version number, for example, the special needs of transaction, increment the IM message sorting.
4, Information Security: If the ID is continuous, Pa malicious users to take the job very easy to do, directly in accordance with the specified URL to download the order; if the order number is even more dangerous, competing for a single day we can know of the amount. Therefore, in some scenarios, it may require ID no regular, irregular.
123 corresponding to the above-described three different scenes, 3 and 4 demand or mutually exclusive, use the same scheme can not satisfy.

At the same time in addition to the requirements of its own ID number, and availability requirements of the business also generated ID number system is extremely high, imagine if ID generation system failures, the entire US group pay reviews, coupons issued coupons, and other key riders to send a single action can not be execution, which will bring a disaster.

Whereby the system generates an ID summary should do the following:
a globally unique
support high concurrency
can reflect certain properties
highly reliable, fault tolerant single point of failure
High Performance

The average delay and delay TP999 be as low as possible;
the availability of five nines;
high QPS.

Industry Scenario 1

The UUID
the UUID (a Universally of Unique Identifier) of a standard type of 32 hexadecimal digits, hyphens divided into five sections, the form of a 8-4-4-4-12 36 characters example: 550e8400-e29b-41d4- a716-446655440000, so far the industry a total of five ways to generate UUID, as detailed specifications issued by IETF UUID a Universally Unique IDentifier (UUID) URN Namespace.

Advantages :

Performance is very high: generated locally, no network consumption.
Disadvantages:

not easy to store: the UUID is too long, 16 bytes 128, usually expressed as the length of the string 36, a lot of scenes is not applicable.
Unsafe information: MAC address generation algorithm UUID-based MAC address may result in leakage, this loophole has been used to find Melissa's disease ××× position.
ID there are some problems in a particular environment will be as a primary key, such as the following do DB primary key scene, UUID very NA:

① MySQL official with clear suggestions primary key as short as possible to try to [4], UUID 36 characters in length does not meet the requirements.

All indexes other than the clustered index are known as secondary indexes. In InnoDB, each record in a secondary index contains the primary key columns for the row, as well as the columns specified for the secondary index. InnoDB uses this primary key value to search for the row in the clustered index. If the primary key is long, the secondary indexes use more space, so it is advantageous to have a short primary key.

② index detrimental to MySQL: If as a database primary key in InnoDB engine, UUID disorder may cause frequent changes in location data seriously affect performance.

Class snowflake embodiment
of this kind of embodiment is generally divided namespace (UUID can be considered due to the more common, it is analyzed separately) to generate an algorithm ID, which the 64-bit programs are divided into a plurality of segments, separated to mark the machine, time and the like, such as in the 64-bit snowflake respectively represent the following figure (image from the network) as shown:

image

41-bit time may represent (1L << 41 is) / (1000L 3600 24 * 365) = 69 years, 10-bit machines may represent machine 1024 respectively. If we have a need for IDC division, also 10-bit can be divided 5-bit to IDC, points 5-bit machine to work. This can represent 32 IDC, the machine 32 can each IDC, can be defined according to their needs. 12 increment sequence number may represent 2 ^ 12 ID, QPS snowflake scheme theoretically about 409.6w / s, this assignment can ensure that any ID IDC any one of a machine-generated in an arbitrary millisecond It is different.

The advantages and disadvantages of this approach are:

Advantages:

number of milliseconds in the high, low self-energizing in sequence, the entire ID is incremented trend.
System does not rely on third-party databases, as a service deployment, greater stability, performance ID generated is very high.
Bit bit can be assigned according to their business characteristics, very flexible.
Disadvantages:

Strong dependence machine clock, the clock on the machine if the callback will result Fa is repeated or services are unavailable.
Application example the objectID Mongdb
MongoDB ObjectID official document can be counted and the like snowflake by "Time Encoding + + pid + inc" of 12 bytes, by way of 4 + 3 + 2 + 3 finally identified as a 24 length hexadecimal characters.

Database generation
to MySQL example, using a field set to ensure auto_increment_offset ID auto_increment_increment and increment, each time reading and writing operations using the following SQL MySQL obtained ID number.

begin;
REPLACE INTO Tickets64 (stub) VALUES ('a');
SELECT LAST_INSERT_ID();
commit;
image

The advantages and disadvantages of this approach are as follows:

Pros:

Very easy to work with existing database systems function realization, small cost, DBA professional maintenance.
ID number incremented monotonically, you can achieve some special requirements for service ID.
Disadvantages:

strong dependence DB, DB abnormality when the system is unavailable, it is a fatal problem. From the master copy can be configured to increase the availability of possible, but difficult to ensure data consistency in exceptional circumstances. When switching from the main inconsistency may result in duplicate Fa.
ID numbers issued in a single performance bottleneck MySQL read and write performance.
For MySQL performance problems, solutions are available as follows: In a distributed system we can deploy more than a few machines, each machine set a different initial value and step size and an equal number of machines. For example, there are two machines. An initial setting step is step 2, TicketServer1 is 1 (1,3,5,7,9,11 ...), initial TicketServer2 value of 2 (2,4,6,8,10 ...) . This is a primary key generation strategy Flickr team in 2010, the author describes (Ticket Servers: Distributed Unique Primary Keys on the Cheap). As shown below, in order to achieve the above-described embodiment are provided corresponding to the two machines parameter, is incremented from 1 2 begin after TicketServer1 number, TicketServer2 begin from No. 2, each time two machines Fa.

TicketServer1:
auto-increment-increment = 2
auto-increment-offset = 1

TicketServer2:
Auto-INCREMENT INCREMENT = 2-
Auto-offset-INCREMENT = 2
Suppose we want to deploy the N machines, the step should be set to N, the initial value of each were then 0,1,2 ... N-1 the whole structure becomes as shown below:

image

This architecture seemingly able to meet the needs of performance, but there are several drawbacks:

System level expansion more difficult, such as the definition of a good number of steps and the machine table after that if you want to add the machine how to do that? Suppose now that only one machine Fa is 1,2,3,4,5 (step 1), this time a machine capacity is needed. Can do this: the initial value is set smaller than the second machine over the first number, such as 14 (assuming the first expansion impossible to send time of 14), while setting step 2, then this numbers are issued by the machine 14 after an even number. Then remove the first, the reserved ID values is an odd number, such as 7, and the first modified step size is 2. Let it meet the standard number we define the segment, for this example it is to make the first after only produce odd. Expansion program looks complicated? Looks okay, now imagine if we have 100 machines online, this time to the expansion of the how to do? It is simply a nightmare. So the level of refinement of complex systems is difficult to achieve.
ID is not a monotonically increasing characteristics, the trend can only increase, this shortcoming is not very important for general business needs can tolerate.
The pressure is still very large database, each time to get the ID had to write a database, can only rely on the heap machines to improve performance.

4, Redis generation ID [Looks like we used this]

when performance is not enough to generate the required ID when using the database, we can try to use Redis to produce ID. Depending mainly on the Redis is single-threaded, so it can be used to generate a globally unique ID. INCR atomic operation can be realized and INCRBY Redis.

You can use Redis cluster to obtain higher throughput. If a cluster has five Redis. You can initialize each value Redis 1,2,3,4,5 respectively, and then the step is 5. Redis generated ID for the respective:

A:1,6,11,16,21

B:2,7,12,17,22

C:3,8,13,18,23

D:4,9,14,19,24

E:5,10,15,20,25

More suitable for daily use Redis to generate a serial number starting from 0. Such as order number = Date + day since growth numbers. Key may be generated each day in a Redis using INCR accumulated.

Advantages:

does not depend on a database, flexibility, and superior performance database.

Digital ID natural sorting, pagination or the results need to sort helpful.

Redis cluster using single point of failure can be prevented.

Disadvantage:

If the system does not have the Redis, need to introduce new components, increasing system complexity.

Require coding and configuration workload is relatively large, multi-environment operation and maintenance is very troublesome,

At the beginning of the program to which the load instance instance redis Once good, the future is difficult to make changes.

6. There are other programs, such as Jingdong Taobao electricity supplier order number generation. Because the order number and user id difference in service, order number as business wants to have more redundant information, such as:

And pieces: the start number + time + license plate number

Taobao Order: timestamp + user ID

Other electricity providers: + timestamp + single-channel user ID, order ID will add some first goods.

The user ID, the required meaning simple, registered channels can contain, as short as possible.

To sum up:
1, competing for a surprising amount about a day can be obtained by analyzing the order number. . .
2, issue snowflake is the clock rolled back problems. . .

source1-- US group
source2-- Ctrip

Guess you like

Origin blog.csdn.net/kingdelee/article/details/86760858