Distributed ID program summary

Source: https://www.yuque.com/renyong-jmovm/kb/lgz2xv#5m84f
ID is a unique identifier of the data, the traditional approach is to use UUID and database auto-increment ID, Internet companies, most of companies are using Mysql, and because of the need to support the transaction, it is often used Innodb storage engine, UUID too long and disorderly, it is not suitable as a primary key in Innodb, increment ID is appropriate, but as the company business development, the amount of data increases, the need for data points table, while the sub-tables, each table data will be incremented at their own pace, is likely to arise ID conflicts. Then you need to be responsible for a separate mechanism to generate a unique ID, generated out of the ID can also be called a distributed ID, or a global ID. To analyze the following mechanisms generating respective ID of the distributed.
Here Insert Picture Description
This article does not particularly detailed analysis, mainly to do some summary, a later article detailed a number of programs.

1. Database increment ID

In still a first embodiment based on the self-energizing ID database, a database requires a separate instance, in this example a new separate table, table structure as follows:

CREATE DATABASE `SEQID`;

CREATE TABLE SEQID.SEQUENCE_ID (
	id bigint(20) unsigned NOT NULL auto_increment, 
	stub char(10) NOT NULL default '',
	PRIMARY KEY (id),
	UNIQUE KEY stub (stub)
) ENGINE=MyISAM;

The following statement can be used to generate and obtain a self-energizing ID

begin;
replace into SEQUENCE_ID (stub) VALUES ('anyword');
select last_insert_id();
commit;

stub field here and there is no special meaning, just to easy to insert data, only data can be inserted to produce increment id. For insertion we use replace, replace the same stub will look at whether there is a specified value of the data, if it exists first delete and then insert, if it does not exist directly insert.
This ID is distributed generation mechanism requires a separate Mysql example, while feasible, but if based on the performance and reliability enough to consider, when each service requires a system ID, the database needs to request acquisition, low performance, and If the database instance is down, it will affect all business systems.
To address database reliability problems, we can use the second generation of distributed ID program.

2. The multi-master database

If we have two database cluster composed of a master-slave mode, under normal circumstances, the database can solve the reliability problems, but if the main library hang up, the data is not synchronized in time from the library, this time there will be duplication of ID. We can use dual master mode cluster, that is, two instances Mysql can separate production increment ID, this can improve efficiency, but if you do not go through another transformation, then it Mysql two instances are likely to generate the same ID. Mysql requires a separate instance for each different configuration of the start value and increment step size.
The first configuration example Mysql:

set @@auto_increment_offset = 1;     -- 起始值
set @@auto_increment_increment = 2;  -- 步长

A second configuration example Mysql:

set @@auto_increment_offset = 2;     -- 起始值
set @@auto_increment_increment = 2;  -- 步长

After the above configuration, the two instances Mysql id generated sequence is as follows:

mysql1, starting value of 1, step 2, the resulting sequence ID: 1,3,5,7,9, ...
mysql2, a starting value of 2, step 2, the resulting sequence ID: 2 , 4,6,8,10, ...

For generation of such a distributed program ID, a need to add a separate ID generating distributed applications, such as DistributIdService, the application provides an interface for obtaining service application ID, a business application needs ID, request DistributIdService, DistributIdService by way rpc Mysql randomized to two examples above to obtain ID.
After the implementation of this embodiment, even if a station wherein Mysql instance down, it will not affect DistributIdService, DistributIdService still further be utilized to generate a Mysql ID.
But the extension of the scheme is not very good, Mysql instance if two is not enough, we need to add Mysql instance to improve the performance, then you will have trouble.
Now if you want to add an instance mysql3, to how does it work?
First, mysql1, mysql2 step size would certainly be revised to 3, and can only be artificial to modify, it takes time.
Second, because mysql1 and mysql2 are kept in auto-incremented for mysql3 starting value we may have to set a little too big, in order to give sufficient time to modify mysql1, mysql2 step size.
Third, it may appear in the modification step when repeated ID, to solve this problem, you may need to shut down the job.
In order to solve the above problems, and the ability to further improve the performance DistributIdService, distributed mechanism ID If the third generation.

3. Resolution segment mode

We can use the style section to get increment ID, number of segments can be understood as batch acquisition, such as DistributIdService from the database to get the ID, if they can get bulk multiple ID and cached locally, then that will greatly provide business application to obtain ID s efficiency.
For example DistributIdService each database acquired from the ID, to obtain a number of segments, such as (1, 1000], this range denotes the ID 1000, when requesting the service application provides DistributIdService ID, DistributIdService only incremented beginning from a local and to return without having to request each time the database until the local self-energizing to 1000, when the current segment number has been used up, before going to the database to reacquire the lower number one.
so, we need to database table changes as follows:
the CREATE tABLE id_generator (
ID int (10) the NOT NULL,
current_max_id BIGINT (20 is) the NOT NULL the COMMENT 'current maximum ID',
(10) 'length number segment' the NOT NULL the COMMENT increment_step int,
a PRIMARY KEY ( id)
) = ENGINE the InnoDB the DEFAULT the CHARSET = UTF8;
database table used to record the current length of the self-energizing step, and the maximum increment ID (i.e. the current has been applied to the last segment of a value of number), since the self-energizing logic is shifted DistributIdService to go, so that part of the logic of the database does not.
this solution will not strongly dependent on the database, even if the database is not available So DistributIdService also continue to support a period of time. But if DistributIdService restart, lost some ID, resulting in ID empty.
In order to improve the availability DistributIdService, you need to make a cluster, the cluster service upon request DistributIdService get ID, will randomly select a node to obtain a DistributIdService, each DistributIdService node, the database is the same database connection, you may generate a plurality of nodes simultaneously request DistributIdService database accession number section, then the time needed to control the use of positive locking, such as adding a version field in a database table using the following SQL in acquiring segment number:
Update id_generator current_max_id SET = {# newMaxId}, version = version + 1 where version = # {version}
because the oldMaxId + DistributIdService newMaxId is calculated step size, as long as the above to update the update was successful acquired according to paragraph number indicates success.
In order to provide high availability database layer, it is necessary to use multi-master database deployment for each database is to ensure that the number of segments generated are not repeated, which requires the use of the idea of the beginning, and then increase the database table just in start value and the step size, such as if we are two Mysql, then

The resulting mysql1 number section (1, 1001], the time increment sequence ... 1,3,4,5,7
mysql1 generated number section (2,1002], time increment sequence 2,4,6, 8,10 ...

A more detailed reference can pieces open source TinyId: https://github.com/didi/tinyid/wiki
in TinyId also adds a step to improve efficiency, the above implementations, the logical ID is in the increment of the DistributIdService achieve, and in fact can increment of logic into business applications locally, so for business applications only need to obtain paragraph, no longer need to request a call DistributIdService each time increment.

4. Snow algorithm

The above three methods in general is based on the idea of self-growth, and the next will introduce more famous snowflake algorithm -snowflake.
We can think of distributed ID from another angle, as long as make responsible for generating distributed generation ID of each machine is not the same in every millisecond ID on the line.
snowflake is distributed twitter ID open generation algorithm is an algorithm, and it generates the above three types of distributed mechanism ID is not the same, it does not rely on the database.
The core idea is: a long distributed-type ID is a fixed number, a type of 8 bytes long, that is 64 bit, the original snowflake algorithm for bit allocation as shown below:
Here Insert Picture Description
• identifying a first portion is one bit , because of the java highest bit is the sign bit long, a positive number is 0, a negative number is 1, generates ID for the generally positive, it is fixed to 0.
• timestamp part represents 41bit, this is at the millisecond level, the current timestamp are not stored on the general implementation, but the time stamp difference value (current time - starting time fixed), this could make the ID generated from more small value starts; 41 timestamps can be used 69 years, (1L << 41) / ( 1000L * 60 * 60 * 24 * 365) = 69 years
• Operating the machine id accounted for 10bit, more flexible here, for example, you can use 5 as before the identification data center room, the room 5 as a standalone machine identification, node 1024 may be deployed.
• 12bit part represents the serial number, the same support in the same millisecond node may generate ID 4096
in accordance with this logic algorithm, the algorithm only needs to come out in the Java language, a tool package method, then each of the service application may be used directly tools distributed method to obtain ID, just to ensure that each business has its own application id can work machines, without requiring a separate application to build a distributed obtain the ID.
snowflake algorithm is not difficult, provided the realization of java with a github:https://github.com/beyondfengyu/SnowFlake
large factory, in fact, did not directly use the snowflake, but has been transformed, because snowflake algorithm is the most difficult to practice working machine id, original snowflake algorithms need to manually go to each machine to specify a machine id, and configure somewhere so that snowflake obtain the machine id from here.
But in the big factory, the machine is a lot of labor costs too error-prone, so the manufacturers of snowflake has been transformed.

5. Baidu (uid-generator)

github Address: https://github.com/baidu/uid-generator
uid-Generator use is the snowflake, but the production machine id, time is also different is called workId.
When the application is started: uid-generator in workId is automatically generated by uid-generator, and taking into account the case of applications deployed on docker, users can themselves define workId generation strategy in uid-generator, the policy provided by default is assigned by the database. He said the simple point is: the data is returned after use when it starts to database tables (uid-generator WORKER_NODE need to add a table) to insert a data insert data corresponding increment of success is the unique id of the machine workId and the data from the host, port composition.
For uid-generator in WorkId, occupies bits 22 bit, 28 bit time occupied bits, a sequence of 13 bit occupies bit, to be noted that, snowflake and not the same as the original, the second unit of time rather than milliseconds, workId not the same, the same application restart each time a consumer will workId.
Specific Reference: https://github.com/baidu/uid-generator/blob/master/README.zh_cn.md

6. US Mission (Leaf)

github Address: https://github.com/Meituan-Dianping/Leaf
Leaf US group ID is also a distributed generation framework. It is very comprehensive, which is to support segment model number, also supports snowflake pattern. No Burst mode is not presented here, and analysis similar to the above.
Snowflake pattern is different from the original algorithm Leaf the snowflake, mainly in workId generating, based Leaf workId in order to generate the Id ZooKeeper, each application when using Leaf-snowflake, at startup will be in Zookeeper generating a sequence of Id, a machine equivalent to a sequence corresponding to the node, i.e. a workId.
Generally speaking, the above two are automatically generated workId, in order to make the system more stable and reduce labor success.

7.Redis

Here again introduce the additional use Redis to generate distributed ID, and in fact use Mysql similar increment ID, you can use Redis in incr command to achieve atomic increment and return, such as:

127.0.0.1:6379> set seq_id 1     // 初始化自增ID为1
OK
127.0.0.1:6379> incr seq_id      // 增加1,并返回
(integer) 2
127.0.0.1:6379> incr seq_id      // 增加1,并返回
(integer) 3

Redis use efficiency is very high, but to consider the persistence of the problem. Redis supports two kinds of AOF RDB and persistent manner.
RDB persistent equivalent of playing a timed snapshot persistent, if a snapshot is finished, continuous increment several times, a snapshot persistence have not had time to do at this time Redis hung up, there will be repeated after the restart Redis ID .
AOF persistent equivalent to each write command for persistence, if Redis hang, will not be duplication of ID, but due to incr command flies, leading to restart data recovery time is too long.

Published 118 original articles · won praise 7 · views 10000 +

Guess you like

Origin blog.csdn.net/qq_43792385/article/details/105017107