Distributed Systems unique ID

What is a distributed system unique ID

    In complex distributed systems, often requires a large amount of data and messages to be uniquely identified.

      Such as finance, electricity providers, payment, and other products of the system, data is growing, after the data sub-library sub-table must have a unique ID to identify a data or message, increment ID database obviously can not meet the demand, this time capable of generating a globally unique ID of the system is necessary.

Second, the unique ID of the distributed system features:

1, globally unique : You can not duplicate ID numbers appear, since it is the unique identifier, which is a basic requirement.

2, the trend is incremented : MySQL InnoDB engine for use in a clustered index is, since most RDBMS data structures used to store the index B-tree data, the primary key selected above we should try to ensure that the primary key of an ordered write performance

3, monotonically increasing : ensuring an ID greater than a certain ID, version number, for example, the special needs of transaction, increment the IM message sorting.

4, Information Security : If the ID is continuous, Pa malicious users to take the job very easy to do, directly in accordance with the specified URL to download the order; if the order number is even more dangerous, competing for a single day we can know of the amount. Therefore, in some scenarios, it may require ID no regular, irregular.

At the same time in addition to the requirements of its own ID number, and availability requirements of the business also generated ID number system is extremely high, imagine if paralyzed ID generation system, which will bring a disaster.

ID summarize thereby generating a system should do the following:

 

  1. The average delay and delay TP999 be as low as possible (TP90 is to meet the minimum time-consuming network requests ninety percent is needed to meet the minimum time-consuming network .TP99 request ninety-nine percent of the need. Similarly TP999 is to meet the minimum time-consuming network requests nine hundred ninety-nine thousandths of needed);
  2. Availability five nines (99.999%);
  3. High QPS

Supplementary: QPS and TPS

QPS : Queries of Per Second means "query per second rate", is a server capable of the appropriate number of queries per second, is a measure of a particular query server within the specified time how much of the processing flow.
TPS : is TransactionsPerSecond acronym, that is, the number of transactions / second. It is the unit of measurement software test results. A transaction is the process of a client sends a request to the server and then the server to respond. The number of transactions the client when sending your start time, server response is received after the end of counting, in order to calculate the time used and completed

Third, the implementation of a distributed system unique ID

 

1.UUID

UUID (Universally Unique Identifier) ​​of a standard type of 32 hexadecimal digits, hyphens divided into five sections, the form of a 8-4-4-4-12 36 characters example: 550e8400-e29b-41d4-a716 -446 655 440 000, by far the industry a total of five ways to generate UUID, as detailed specifications issued by IETF UUID a Universally Unique IDentifier (UUID) URN Namespace.

advantage:

  • Performance is very high: generated locally, no network consumption.

Disadvantages:

  • Easy to store: UUID long, 128-bit bytes 16, 36 generally indicates the length of the string, a lot of scenes is not applicable.
  • Unsafe information: MAC address generation algorithm UUID-based MAC address may result in leakage, this loophole has been used to find Melissa virus creator position.
  • ID has some problems as the primary key in a particular environment will be like to do under the DB primary key scene, UUID very NA

2. The database generation

In MySQL example, using a field set to ensure auto_increment_offset ID auto_increment_increment and increment, each time reading and writing operations using the following SQL MySQL obtained ID number.

 

The advantages and disadvantages of this approach are as follows:

advantage:

  • Very simple, using the existing database system function realization, small cost, DBA professional maintenance.
  • ID number incremented monotonically, you can achieve some special requirements for service ID.

Disadvantages:

  • 强依赖DB,当DB异常时整个系统不可用,属于致命问题。配置主从复制可以尽可能的增加可用性,但是数据一致性在特殊情况下难以保证。主从切换时的不一致可能会导致重复发号。
  • ID发号性能瓶颈限制在单台MySQL的读写性能。

3.Redis生成ID

当使用数据库来生成ID性能不够要求的时候,我们可以尝试使用Redis来生成ID。

这主要依赖于Redis是单线程的,所以也可以用生成全局唯一的ID。可以用Redis的原子操作 INCR和INCRBY来实现。

比较适合使用Redis来生成每天从0开始的流水号。比如订单号=日期+当日自增长号。可以每天在Redis中生成一个Key,使用INCR进行累加。

优点:

1)不依赖于数据库,灵活方便,且性能优于数据库。

2)数字ID天然排序,对分页或者需要排序的结果很有帮助。

缺点:

1)如果系统中没有Redis,还需要引入新的组件,增加系统复杂度。

2)需要编码和配置的工作量比较大。

4.利用zookeeper(分布式应用程序协调服务)生成唯一ID

zookeeper主要通过其znode数据版本来生成序列号,可以生成32位和64位的数据版本号,客户端可以使用这个版本号来作为唯一的序列号。

很少会使用zookeeper来生成唯一ID。主要是由于需要依赖zookeeper,并且是多步调用API,如果在竞争较大的情况下,需要考虑使用分布式锁。因此,性能在高并发的分布式环境下,也不甚理想。

5.snowflake(雪花算法)方案

这种方案大致来说是一种以划分命名空间(UUID也算,由于比较常见,所以单独分析)来生成ID的一种算法,这种方案把64-bit分别划分成多段,分开来标示机器、时间等,比如在snowflake中的64-bit分别表示如下图(图片来自网络)所示:

41-bit的时间可以表示(1L<<41)/(1000L*3600*24*365)=69年的时间,10-bit机器可以分别表示1024台机器。如果我们对IDC划分有需求,还可以将10-bit分5-bit给IDC,分5-bit给工作机器。这样就可以表示32个IDC,每个IDC下可以有32台机器,可以根据自身需求定义。12个自增序列号可以表示2^12个ID,理论上snowflake方案的QPS约为409.6w/s,这种分配方式可以保证在任何一个IDC的任何一台机器在任意毫秒内生成的ID都是不同的。

这种方式的优缺点是:

优点:

  • 毫秒数在高位,自增序列在低位,整个ID都是趋势递增的。
  • 不依赖数据库等第三方系统,以服务的方式部署,稳定性更高,生成ID的性能也是非常高的。
  • 可以根据自身业务特性分配bit位,非常灵活。

缺点:

  • 强依赖机器时钟,如果机器上时钟回拨,会导致发号重复或者服务会处于不可用状态。

 

Guess you like

Origin www.cnblogs.com/HUIWANG/p/11133638.html