What tier companies distributed a unique ID generation scheme?

Here Insert Picture Description

I. Introduction

Distributed systems we have some data large volume of business spin-off, such as: user table, the Orders table. Because of the huge amount of data in a table can not undertake, it will be sub-library sub-table. Little friends can take a look "sub-library sub-table? How to do and never migrate data and avoid hot spots? "

But when it comes to sub-library sub-table, it will come out in distributed systems generate unique primary key in the ID, and never need to migrate data and characteristics unique ID to avoid hot spots article requirements:

• the whole system unique ID

• ID is a numeric type, and the trend is increasing

• ID brief, fast query efficiency

** What is incremental? ** If: the first 12 is generated ID, the ID of the next generation is 13, then the next generation of ID 14. This is to generate incremental ID.

** What is the trend increments? ** as: a period of time, it is generated ID increasing trend. Eg: re-generated time period between [0,1000] ID, generated over time between ID [1000, 2000]. But within the time interval [0-1000], it is possible to generate the first ID 12, second 10, third 14.

So what options? Look down!

Second, several of the ID generating programs distributed

2.1、UUID

This program is the first small partners can plan had taken into account

advantage:

• code is simple.

• The unit generates, there is no performance problems

• Because it is globally unique ID, so easy to migrate data

Disadvantages:

• Each generated ID is disordered, the trend can not be guaranteed increments

• UUID string storage, slow query efficiency

• storage space

ID no business meaning skill, unreadable

Scenario:

• similar scenario generation token token

• There are some requirements do not apply trend of increasing ID scenes

This program is not applicable UUID needs of older care of.

2.2, MySQL primary key increment

This program is the use of MySQL primary key increment auto_increment, default ID plus 1 each.

advantage:

• Digital, id increments

• high query efficiency

• have some business readable

Disadvantages:

• existence of a single point, if mysql hung up, would not be able to generate an iD

• database pressure, high concurrency can not withstand

2.3, MySQL multiple instances increment primary key

This program is to solve the problem of a single point of mysql, in the above basic auto_increment, setting step step
Here Insert Picture Description

Each of the initial values ​​are 1,2,3 ... N, in steps of N (in this case in steps of 4)

advantage:

• solve the problem of single point

Disadvantages :

• Once the steps will be good, you can not expansion; and the pressure of a single large database, the database itself can not meet the high concurrent performance

Scenario:

• The data does not need expansion scenarios

• This program does not meet the needs of older care, because the expansion is not convenient (remember this program, hehe)

2.4, snowflake snowflake algorithm

This algorithm introduces online a lot, the old Gu is not presented here in detail. Snow algorithm generates 64-bit binary integer, and then converted into a decimal number. 64-bit binary number consists of the following components:

• 1-bit identifier: always 0

• 41 timestamp: 41 time cut storage time is not the current cut-off time, but cut storage time difference (the current cut-off time - start time cut-off) values ​​obtained, the start time of the cut here, usually our time to start using the id generator, designated by our program

• 10-bit machine code: 1024 can be deployed in the node, if the sub-machine room (IDC) deployment, which may be made of 10 five machine room ID ID + 5 Composition

• 12-bit sequences: the count within milliseconds, the count of sequence number 12 of each support node (the same machine, the same cut-off time) to produce the ID number 4096 per millisecond

advantage:

• This program is capable of producing 4.096 million per ID, faster performance

• Time stamp at a high level, since the increase in the low sequence, the entire ID is a trend of increasing, according to an orderly increasing time

• high degree of flexibility, according to business demand, adjust the position of the divided bit, to meet the different needs

Disadvantages:

• machine-dependent clock, if the server clock call-back will result in duplicate ID generation

• In a distributed scenario, the server will often encounter the clock call-back, call-back between 10ms normally present; little friends say this is 10ms, can not consider it very short. However, this algorithm is built on the millisecond level generation scheme, once the call-back, it is very likely duplicate ID.

This program temporarily meet the needs of older care (hey, look at how this optimization program, little friends to remember)

2.5, Redis generation schemes

Redis using the increment incr atomicity operations, general algorithm of:

+ Day of the year from the first year the number of days the number of days + + + redis hour increment

advantage:

• incremental and orderly, readable

Disadvantages:

• occupied bandwidth, each time you want redis request

The overall performance of this test are as follows:

Requirements: while 100,000 Request ID

1, the concurrent completion time-consuming: about 9S
2, single-tasking average time: 74ms
3, single-threaded minimum time: less than 1ms
4, the largest single-threaded consuming: 4.1s

Performance can also, if the performance requirements are not too high, this program basically met the requirements of the old care.

But does not fully comply with business old Gu id hope increments starting from 1 trend. (Of course, the algorithm can be adjusted on a redis increment, do not need any year, how many days, etc.).

2.6 Summary

Above describes several common distributed ID generation scheme. Distributed ID program tier companies do not have this absolutely simple, they are very high concurrency, high availability requirements.

As Redis embodiment, each time to go Redis request, the network has requested time consuming, complicated by the strong dependence Redis. This design is risky, once Redis hung up, the whole system is not available.

And first-tier manufacturers will also consider the issue of security ID, such as: Redis scenario, the user can predict an ID number is how many, because the algorithm is increasing.

In this case the first day of competition 12:00 the next order, you can see the platform is the order ID number, and then the next day 12:00 the next single, and platforms to order ID number. So you can guess how many orders the platform one day be able to produce, and this is absolutely not allowed, the company top-secret ah.

Third, the first-tier manufacturers is how to design it?

In fact, first-tier manufacturers design ideas and small partners thinking about the same, but more like a 1 to 2 layers, more than one or two aspects of the design on it.

3.1, the transformation of the database primary key increment

We describe the features of the above-mentioned auto-increment primary keys using the database, you can achieve distributed ID; this ID is brief and clear, suitable for userId, never fits how to migrate data and avoid hot spots allocated amount of data (Secret papers) based on server metrics? ID requirements of article. But this program has serious problems:

• Once the steps laid down, is not easy expansion

• Database Alexander pressure

Small partners to look at how to optimize this program. Look at the database stress, why stress? Because every time we get the ID, the request should go to the database once. We can not do that every time to pick up?

Thinking we can get the ID database request time, it can be designed to obtain ID is an ID range segment.

Here Insert Picture Description

We fancy graph, Zhang ID rule table:

1, id represents a primary key, no business meaning.

2, biz_tag to represent business, because the entire system will need to generate a lot of business ID, which can share a table maintenance

3, max_id now represents the largest ID system as a whole has been allocated

4, desc description

5, update_time ID indicating each time taken

We look at the overall process:

1, [the user] when the registration service a user requires a user ID; [requests generated service ID (application independent) interface]

2, [service] will generate the ID to query the database to find user_tag the id, now max_id is 0, step = 1000

3, [] generated service ID and the step returns to the max_id user service []; and update the max_id max_id = max_id + step, i.e. updated 1000

4, [] is obtained user service max_id = 0, step = 1000;

5, the user can use the services [ID = max_id + 1, max_id + step] section ID, that is, [1, 1000]

6, [customer service] will save the jvm in this section

7, [users] need to use ID service when, in the interval [1,1000] in order to obtain id, the method can be used getAndIncrement AtomicLong in.

8, if the value interval runs out, again requesting service [Production Interface ID, acquired max_id 1000, i.e., can be used [max_id + 1, max_id + step] section ID, namely [1001,2000]

This program is very perfect solution to the problem of database from growing, and can define the starting point max_id, and step step, very easy expansion.

But also solve the problem of database pressure, because over a period of range, is to get in jvm memory, without the need for each request database. Even if the database is down, the system will not be affected, ID can remain for some time.

3.2 Competition

The above scenario, if multiple users at the same time obtain ID, at the same time to ask for ID [service], there will be complicated by problems at the time of acquisition max_id.

The service user A, to take max_id = 1000; user service B is also taken into max_id = 1000, then there is a problem, Id repeated. How to solve it?

In fact, many programs, plus distributed lock to ensure that only one user service get max_id. Of course, you can also use the database to solve their own locks.

Plus the use of a transaction row lock mode, the above statement is not completed before the execution, the second user is not allowed to service requests over, the second request can only be blocked.

3.3, the burst blocking issues

Here Insert Picture Description

Figure above, a plurality of user service ID obtained respective intervals, under high concurrency scenarios, with the ID quickly, if three service user at a time has run out, while removing the service request [ID]. Because the competition problems mentioned above, only one service user to operate the database, the other two will be blocked.

Junior partner will ask, there is a coincidence it? While the ID run. We are here to give users three services, the probability of not feeling; if it is 100 user services? The probability is not all of a sudden big.

Phenomenon is that while the system suddenly takes longer, while good, is this causes, how to solve?

3.4, double buffer scheme

In general the system design, the double buffer will often see, how to solve the above problem can also double buffer scheme.

Here Insert Picture Description

In the design, the double buffer scheme, the flow chart:

1, in buffer1 the currently acquired ID, acquires the ID of each acquired buffer1

2, when the Id buffer1 to 100 have been used, i.e. 10% range

3, reached 10%, there are first determined to obtain buffer2 before, if not immediately initiate a request to obtain the thread ID, this thread the acquired ID, the setting of buffer2.

4, if buffer1 run out, automatically switches to buffer2

5, buffer2 used in 10%, and will re-start thread gets set to buffer1 in

6, turn round

Double buffer solution, there is no small partners feel cool, so that to achieve business scenarios with ID, are in jvm memory, from the database does not need to get up. It allows the database downtime longer.

Because there will be a thread, you will observe when to automatically obtain. Switching between the two self-use buffer. To solve the problem of sudden blockage.

IV Summary

This program is distributed ID algorithm used regiment, little friends if you want a deeper understanding, you can go online to search under, there should be introduced relatively detailed.

Of course, this scenario US group also made some other optimization, monitoring frequency ID, automatically setting step step, so as to achieve economical use of ID.

This program is ideal for ID "sub-library sub-table? How to do and never migrate data and avoid hot spots? "The ID requirements.

But there are some problems this ID is too continuous, predictable competitors, not suitable order ID. We continue to introduce in the next article, so stay tuned!

Guess you like

Origin blog.csdn.net/weixin_44946117/article/details/91386091