Snow distributed algorithm [ID problems] [Xinyu]

Distributed ID

1 Select program

  • UUID

    Is the abbreviation UUID Universally Unique Identifier (Universally Unique Identifier), the Open Software Foundation (the OSF) specification defines a MAC address comprising, a time stamp, the name space (the Namespace), random or pseudo-random number sequence and other elements. With these elements generates a UUID.

    UUID is a 128-bit binary composition, generally converted to hexadecimal and expressed String.

    550e8400-e29b-41d4-a716-446655440000

    UUID advantages:

    • By locally generated, not via the network I / O, rapid performance
    • Disorderly, unpredictable order of his generation. (Of course, this is also one of his shortcomings)

    UUID disadvantages:

    • 128 Binary generally converted into 36 hex, too long can only be stored String, take up more space.
    • You can not generate incremental sequential numbers
  • Database primary key increment

    For everyone that uniquely identifies the most likely to think that the primary key increment, this is our most commonly used method. For example, we have a service order, then the order id as the primary key increment can be.

    • Separate database records primary key

    • The service database to set different increment step size and a fixed starting value, such as

      第一台 start 1  step 9  第二台 start 2  step 9  第三台 start 3  step 9 

    advantage:

    • Simple, incremental and orderly, easy sorting and paging

    Disadvantages:

    • Sub-library sub-table would cause some problems need to be transformed.
    • Concurrent performance is not high, limited by the performance of the database.
    • Simple incremental easily use other people to guess, like you have with a customer service increment, then other people can get you the day of service according to how many people are registered user ID registration analysis, so you can guess that the current service a rough situation.
    • Database downtime service is unavailable.
  • Redis

    Redis familiar with the students, should know that there are two command Incr, IncrBy in Redis because Redis is single-threaded so can guarantee atomicity.

    advantage:

    • Performance is better than a database, you can meet the orderly increments.

    Disadvantages:

    • Because the memory of KV redis database, even with AOF and RDB, but still there will be data loss, it may cause a duplicate ID.
    • It depends on redis, redis if the instability will affect the ID generation.
  • Snow algorithm -Snowflake

    Snowflake Twitter is an algorithm proposed, and its object is to generate a 64bit integer:

 

  • 1bit: General is the sign bit, not treated
  • 41bit:用来记录时间戳,这里可以记录69年,如果设置好起始时间比如今年是2018年,那么可以用到2089年,到时候怎么办?要是这个系统能用69年,我相信这个系统早都重构了好多次了。
  • 10bit:10bit用来记录机器ID,总共可以记录1024台机器,一般用前5位代表数据中心,后面5位是某个数据中心的机器ID
  • 12bit:循环位,用来对同一个毫秒之内产生不同的ID,12位可以最多记录4095个,也就是在同一个机器同一毫秒最多记录4095个,多余的需要进行等待下毫秒。

上面只是一个将64bit划分的标准,当然也不一定这么做,可以根据不同业务的具体场景来划分,比如下面给出一个业务场景:

  • 服务目前QPS10万,预计几年之内会发展到百万。
  • 当前机器三地部署,上海,北京,深圳都有。
  • 当前机器10台左右,预计未来会增加至百台。

这个时候我们根据上面的场景可以再次合理的划分62bit,QPS几年之内会发展到百万,那么每毫秒就是千级的请求,目前10台机器那么每台机器承担百级的请求,为了保证扩展,后面的循环位可以限制到1024,也就是2^10,那么循环位10位就足够了。

机器三地部署我们可以用3bit总共8来表示机房位置,当前的机器10台,为了保证扩展到百台那么可以用7bit 128来表示,时间位依然是41bit,那么还剩下64-10-3-7-41-1 = 2bit,还剩下2bit可以用来进行扩展。

 

 

时钟回拨

因为机器的原因会发生时间回拨,我们的雪花算法是强依赖我们的时间的,如果时间发生回拨,有可能会生成重复的ID,在我们上面的nextId中我们用当前时间和上一次的时间进行判断,如果当前时间小于上一次的时间那么肯定是发生了回拨,算法会直接抛出异常.

 

使用雪花算法 
# Twitter's Snowflake algorithm implementation which is used to generate distributed IDs.
# https://github.com/twitter-archive/snowflake/blob/snowflake-2010/src/main/scala/com/twitter/service/snowflake/IdWorker.scala

import time
import logging

class InvalidSystemClock(Exception):
    """
    时钟回拨异常
    """
    pass

# 64位ID的划分
WORKER_ID_BITS = 5
DATACENTER_ID_BITS = 5
SEQUENCE_BITS = 12

# 最大取值计算
MAX_WORKER_ID = -1 ^ (-1 << WORKER_ID_BITS)  # 2**5-1 0b11111
MAX_DATACENTER_ID = -1 ^ (-1 << DATACENTER_ID_BITS)

# 移位偏移计算
WOKER_ID_SHIFT = SEQUENCE_BITS
DATACENTER_ID_SHIFT = SEQUENCE_BITS + WORKER_ID_BITS
TIMESTAMP_LEFT_SHIFT = SEQUENCE_BITS + WORKER_ID_BITS + DATACENTER_ID_BITS

# 序号循环掩码
SEQUENCE_MASK = -1 ^ (-1 << SEQUENCE_BITS)

# Twitter元年时间戳
TWEPOCH = 1288834974657


logger = logging.getLogger('flask.app')


class IdWorker(object):
    """
    用于生成IDs
    """

    def __init__(self, datacenter_id, worker_id, sequence=0):
        """
        初始化
        :param datacenter_id: 数据中心(机器区域)ID
        :param worker_id: 机器ID
        :param sequence: 其实序号
        """
        # sanity check
        if worker_id > MAX_WORKER_ID or worker_id < 0:
            raise ValueError('worker_id值越界')

        if datacenter_id > MAX_DATACENTER_ID or datacenter_id < 0:
            raise ValueError('datacenter_id值越界')

        self.worker_id = worker_id
        self.datacenter_id = datacenter_id
        self.sequence = sequence

        self.last_timestamp = -1  # 上次计算的时间戳

    def _gen_timestamp(self):
        """
        生成整数时间戳
        :return:int timestamp
        """
        return int(time.time() * 1000)

    def get_id(self):
        """
        获取新ID
        :return:
        """
        timestamp = self._gen_timestamp()

        # 时钟回拨
        if timestamp < self.last_timestamp:
            logging.error('clock is moving backwards. Rejecting requests until {}'.format(self.last_timestamp))
            raise InvalidSystemClock

        if timestamp == self.last_timestamp:
            self.sequence = (self.sequence + 1) & SEQUENCE_MASK
            if self.sequence == 0:
                timestamp = self._til_next_millis(self.last_timestamp)
        else:
            self.sequence = 0

        self.last_timestamp = timestamp

        new_id = ((timestamp - TWEPOCH) << TIMESTAMP_LEFT_SHIFT) | (self.datacenter_id << DATACENTER_ID_SHIFT) | \
                 (self.worker_id << WOKER_ID_SHIFT) | self.sequence
        return new_id

    def _til_next_millis(self, last_timestamp):
        """
        等到下一毫秒
        """
        timestamp = self._gen_timestamp()
        while timestamp <= last_timestamp:
            timestamp = self._gen_timestamp()
        return timestamp


if __name__ == '__main__':
    worker = IdWorker(1, 2, 0)
    print(worker.get_id())

  

Guess you like

Origin www.cnblogs.com/LiuXinyu12378/p/11291075.html