C# realizes Snowflake (Snowflake algorithm), a distributed self-increasing ID algorithm

Requirements overview

In a distributed system, there are some scenarios that need to use a globally unique ID. In this case, in order to prevent ID conflicts, a 36-bit universal unique identifier/UUID (Universally Unique Identifier) can be used , but UUID has some disadvantages. First, it is relatively long. In addition, UUIDs are generally out of order. Sometimes we hope to use a simpler ID, and hope that the ID can be generated in an orderly manner.

 

Twitter-Snowflake generates background

Twitter used MySQL to store data in the early days. With the growth of users, a single MySQL instance could not withstand massive amounts of data. Later, the team studied how to generate a perfect self-incrementing ID to meet two basic requirements:

  • Hundreds of thousands of IDs can be generated per second to identify different records;
  • These IDs should be in a rough order, that is to say, the IDs of the two records with similar release time should also be similar, so that it is convenient for various clients to sort the records.

[Twitter-Snowflake] The algorithm was produced in this context.

 

Snowflake core structure

Twitter's solution to these two problems is very simple and efficient: each ID is a 64-bit number, composed of a timestamp, a working machine node, and a serial number, and the ID is generated by the current machine node. As shown in the figure:

Let's first explain the role of each interval.

  • Symbol bit : Used to distinguish positive and negative numbers. 1 is a negative number, and 0 is an integer. Generally, negative numbers are not required, so the value is fixed at 0.
  • Time stamp : A total of 41 bits are reserved to save millisecond time stamps. Because the length of the millisecond time stamp is 13 bits: 41-bit binary maximum (T) is: $2^{41}-1 = 2199023255551 $, which is exactly 13 bits. The year that can be represented = T / (3600 24  365 * 1000) = 69.7 years (the timestamp starts from 1970, 1, 1, 0, 0, 0). Converted to Unix time can be expressed to: 2039-09-07 23:47:35:

Everyone will think that this time is not enough, it doesn't matter, we will talk about how to optimize it later.

  • Work machine : 10bit is reserved to save the machine ID. A combination of 5-digit datacenterId and 5-digit workerId (10-digit length supports the deployment of up to 1024 nodes). As long as the machine ID is different, the ID generated every millisecond will be different. How many machines can generate ID at the same time? The answer is 1023 units ($2^{10}-1$).

    If there are fewer working machines, you can use a configuration file to set this ID, or use a random number. If there are too many machines, you have to implement a total of working machine ID allocators separately, such as using redis to auto-increment, or using Mysql auto_increment mechanism to achieve the effect.

  • Serial number (Serial number) : The serial number is 12 bits in total. In order to deal with the situation that multiple messages need to be assigned IDs within the same millisecond on the same machine, a total of 4095 serial numbers (0~4095, $2^{12}-1) can be generated $).

In summary: a total of 64=>(1+41+10+12) bits in total, which is a Long type (converted to a string length of 19). The same machine can generate 4095 IDs within 1 millisecond, and all machines have 1 4095 * 1023 IDs can be generated within milliseconds. The IDs generated by snowflake are sorted by self-increasing time as a whole, and there will be no ID collisions in the entire distributed system (differentiated by datacenter and workerId). Because they are all generated locally on each machine, the efficiency is very high.

 

optimization

1. Timestamp optimization

If the timestamp is the current millisecond-level timestamp, it can only indicate the year 2039, which is far from enough. We found that the interval from 1970 to the current time will never be used, so why not use the offset? That is, the timestamp part does not directly take the current millisecond timestamp, but subtracts a past time on this basis:

id = (1572057648000 - 1569859200000) << 22; 

Output:

id=9220959240192000

In the above code, the first timestamp is the current millisecond timestamp, and the second is a past timestamp (1569859200000 means 2019-10-01 00:00:00). In this way, the year we can represent is probably  当前年份(例如2019) + 69 = 2088, which is enough for a long time.

2. Serial number optimization

The serial number defaults to 0, if it has been used, it will increase automatically. If it increases to 4096, that is, the serial number in the same millisecond is used up, what should I do? Need to wait until the next millisecond. Some code examples:

//同一毫秒并发调用
if (ts == (iw.last_time_stamp)) {
    //序列号自增
    iw.sequence = (iw.sequence+1) & MASK_SEQUENCE;

    //序列号自增到最大值4096,4095 & 4096 = 0
    if (iw.sequence == 0) {
        //等待至下一毫秒
        ts = time_re_gen(ts);
    }
} else { //同一毫秒没有重复的
    iw.last_time_stamp = ts;
}

 

C# realizes the distributed self-increasing ID algorithm snowflake (snowflake algorithm)

  • General generic singleton (ReflectionSingleton) implementation, the following code:
using System;
using System.Reflection;

namespace NSMS.Helper
{
    /// <summary>
    /// 普通泛型单例模式
    /// 优点:简化单例模式构建,不需要每个单例类单独编写;
    /// 缺点:违背单例模式原则,构造函数无法设置成private,导致将T类的构造函数暴露;
    /// </summary>
    /// <typeparam name="T">class</typeparam>
    [Obsolete("Recommended use ReflectionSingleton")]
    public abstract class Singleton<T> where T : class, new()
    {
        protected static T _Instance = null;

        public static T Instance
        {
            get
            {
                if (_Instance == null)
                {
                    _Instance = new T();
                }
                return _Instance;
            }
        }

        protected Singleton()
        {
            Init();
        }

        public virtual void Init()
        {

        }
    }

    /// <summary>
    /// 反射实现泛型单例模式【推荐使用】
    /// 优点:1.简化单例模式构建,不需要每个单例类单独编写;2.遵循单例模式构建原则,通过反射去调用私有的构造函数,实现了构造函数不对外暴露;
    /// 缺点:反射方式有一定的性能损耗(可忽略不计);
    /// </summary>
    /// <typeparam name="T">class</typeparam>
    public abstract class ReflectionSingleton<T> where T : class
    {
        private static T _Intance;
        public static T Instance
        {
            get
            {
                if (null == _Intance)
                {
                    _Intance = null;
                    Type type = typeof(T); //1.类型强制转换

                    //2.获取到T的构造函数的类型和参数信息,监测构造函数是私有或者静态,并且构造函数无参,才会进行单例的实现
                    ConstructorInfo[] constructorInfoArray = type.GetConstructors(BindingFlags.Instance | BindingFlags.NonPublic); 
                    foreach (ConstructorInfo constructorInfo in constructorInfoArray)
                    {
                        ParameterInfo[] parameterInfoArray = constructorInfo.GetParameters();
                        if (0 == parameterInfoArray.Length)
                        {
                            //检查构造函数无参,构建单例
                            _Intance = (T)constructorInfo.Invoke(null);
                            break;
                        }
                    }

                    if (null == _Intance)
                    {
                        //提示不支持构造函数公有且有参的单例构建
                        throw new NotSupportedException("No NonPublic constructor without 0 parameter");
                    }
                }
                return _Intance;
            }
        }

        protected ReflectionSingleton() { }

        public static void Destroy()
        {
            _Intance = null;
        }
    }

}
  • Implementation of snowflake distributed id, the following code:
using System;
using System.Threading;

namespace NSMS.Helper
{
    /// <summary>
    /// 【C#实现Snowflake算法】
    /// 动态生产有规律的ID,Snowflake算法是Twitter的工程师为实现递增而不重复的ID需求实现的分布式算法可排序ID
    /// Twitter的分布式雪花算法 SnowFlake 每秒自增生成26个万个可排序的ID
    /// 1、twitter的SnowFlake生成ID能够按照时间有序生成
    /// 2、SnowFlake算法生成id的结果是一个64bit大小的整数
    /// 3、分布式系统内不会产生重复id(用有datacenterId和machineId来做区分)
    /// =>datacenterId(分布式)(服务ID 1,2,3.....) 每个服务中写死
    /// =>machineId(用于集群) 机器ID 读取机器的环境变量MACHINEID 部署时每台服务器ID不一样
    /// 参考:https://www.cnblogs.com/shiningrise/p/5727895.html
    /// </summary>
    public class Snowflake : ReflectionSingleton<Snowflake>
    {
        /// <summary>
        /// 构造函数私有化
        /// </summary>
        private Snowflake() { }

        #region 初始化字段
        private static long machineId;//机器ID
        private static long datacenterId = 0L;//数据ID
        private static long sequence = 0L;//序列号,计数从零开始

        private static readonly long twepoch = 687888001020L; //起始的时间戳,唯一时间变量,这是一个避免重复的随机量,自行设定不要大于当前时间戳

        private static readonly long machineIdBits = 5L; //机器码字节数
        private static readonly long datacenterIdBits = 5L; //数据字节数
        public static readonly long maxMachineId = -1L ^ -1L << (int)machineIdBits; //最大机器ID
        public static readonly long maxDatacenterId = -1L ^ (-1L << (int)datacenterIdBits);//最大数据ID

        private static readonly long sequenceBits = 12L; //计数器字节数,12个字节用来保存计数码        
        private static readonly long machineIdShift = sequenceBits; //机器码数据左移位数,就是后面计数器占用的位数
        private static readonly long datacenterIdShift = sequenceBits + machineIdBits; //数据中心码数据左移位数
        private static readonly long timestampLeftShift = sequenceBits + machineIdBits + datacenterIdBits; //时间戳左移动位数就是机器码+计数器总字节数+数据字节数
        public static readonly long sequenceMask = -1L ^ -1L << (int)sequenceBits; //一微秒内可以产生计数,如果达到该值则等到下一微妙在进行生成
        private static long lastTimestamp = -1L;//最后时间戳

        private static readonly object syncRoot = new object(); //加锁对象 
        #endregion

        #region Snowflake
        /// <summary>
        /// 数据初始化
        /// </summary>
        /// <param name="machineId">机器Id</param>
        /// <param name="datacenterId">数据中心Id</param>
        public void SnowflakesInit(short machineId, short datacenterId)
        {
            if (machineId < 0 || machineId > Snowflake.maxMachineId)
            {
                throw new ArgumentOutOfRangeException($"The machineId is illegal! => Range interval [0,{Snowflake.maxMachineId}]");
            }
            else
            {
                Snowflake.machineId = machineId;
            }

            if (datacenterId < 0 || datacenterId > Snowflake.maxDatacenterId)
            {
                throw new ArgumentOutOfRangeException($"The datacenterId is illegal! => Range interval [0,{Snowflake.maxDatacenterId}]");
            }
            else
            {
                Snowflake.datacenterId = datacenterId;
            }
        }

        /// <summary>
        /// 生成当前时间戳
        /// </summary>
        /// <returns>时间戳:毫秒</returns>
        private static long GetTimestamp()
        {
            return (long)(DateTime.UtcNow - new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc)).TotalMilliseconds;
        }

        /// <summary>
        /// 获取下一微秒时间戳
        /// </summary>
        /// <param name="lastTimestamp"></param>
        /// <returns>时间戳:毫秒</returns>
        private static long GetNextTimestamp(long lastTimestamp)
        {
            long timestamp = GetTimestamp();
            int count = 0;
            while (timestamp <= lastTimestamp)//这里获取新的时间,可能会有错,这算法与comb一样对机器时间的要求很严格
            {
                count++;
                if (count > 10) throw new Exception("The machine may not get the right time.");
                Thread.Sleep(1);
                timestamp = GetTimestamp();
            }
            return timestamp;
        }

        /// <summary>
        /// 获取长整形的ID
        /// </summary>
        /// <returns>分布式Id</returns>
        public long NextId()
        {
            lock (syncRoot)
            {
                long timestamp = GetTimestamp();
                if (Snowflake.lastTimestamp == timestamp)
                {
                    //同一微妙中生成ID
                    Snowflake.sequence = (Snowflake.sequence + 1) & Snowflake.sequenceMask; //用&运算计算该微秒内产生的计数是否已经到达上限
                    if (Snowflake.sequence == 0)
                    {
                        //一微妙内产生的ID计数已达上限,等待下一微妙
                        timestamp = GetNextTimestamp(Snowflake.lastTimestamp);
                    }
                }
                else
                {
                    //不同微秒生成ID
                    Snowflake.sequence = 0L; //计数清0
                }
                if (timestamp < Snowflake.lastTimestamp)
                {
                    //如果当前时间戳比上一次生成ID时时间戳还小,抛出异常,因为不能保证现在生成的ID之前没有生成过
                    throw new Exception($"Clock moved backwards.  Refusing to generate id for {Snowflake.lastTimestamp - timestamp} milliseconds!");
                }
                Snowflake.lastTimestamp = timestamp; //把当前时间戳保存为最后生成ID的时间戳
                long id = ((timestamp - Snowflake.twepoch) << (int)Snowflake.timestampLeftShift)
                    | (datacenterId << (int)Snowflake.datacenterIdShift)
                    | (machineId << (int)Snowflake.machineIdShift)
                    | Snowflake.sequence;
                return id;
            }
        } 
        #endregion
    }
}

The above method completes the C# implementation of the snowflake algorithm. It can also be combined with business expansion based on the algorithm. For example, the id produced has a certain business meaning. Here, a random string of 6 length is also extended, such as order number: order prefix Mark, amend as follows:

using System;
using System.Text;

namespace NSMS.Helper
{
    /// <summary>
    /// 集成ID生产规则
    /// </summary>
    public class IdWorker: ReflectionSingleton<IdWorker>
    {
        /// <summary>
        /// 构造函数私有化
        /// </summary>
        private IdWorker() { }

        #region 获取格式化GUID
        public enum GuidType { N, D, B, P, X, Default };
        public enum IsToUpperOrToLower { ToUpper, ToLower };

        public string GetFormatGuid(GuidType guidType = GuidType.N, IsToUpperOrToLower isToUpperOrToLower = IsToUpperOrToLower.ToLower)
        {
            string guid = guidType switch
            {
                GuidType.N => Guid.NewGuid().ToString("N"), // e0a953c3ee6040eaa9fae2b667060e09 
                GuidType.D => Guid.NewGuid().ToString("D"), // 9af7f46a-ea52-4aa3-b8c3-9fd484c2af12
                GuidType.B => Guid.NewGuid().ToString("B"), // {734fd453-a4f8-4c5d-9c98-3fe2d7079760}
                GuidType.P => Guid.NewGuid().ToString("P"), // (ade24d16-db0f-40af-8794-1e08e2040df3)
                GuidType.X => Guid.NewGuid().ToString("X"), // (ade24d16-db0f-40af-8794-1e08e2040df3)
                GuidType.Default => Guid.NewGuid().ToString(), // {0x3fa412e3,0x8356,0x428f,{0xaa,0x34,0xb7,0x40,0xda,0xaf,0x45,0x6f}}
                _ => throw new NotImplementedException(),
            };

            switch (isToUpperOrToLower)
            {
                case IsToUpperOrToLower.ToUpper:
                    guid = guid.ToUpper(); //返回大写GUID
                    break;
                case IsToUpperOrToLower.ToLower:
                    guid = guid.ToLower(); //返回小写GUID
                    break;
            }
            return guid;
        }
        #endregion

        /// <summary>
        /// 获取机器唯一编码
        /// </summary>
        /// <returns></returns>
        public string GetMachineCodeString() => MachineCode.GetMachineCodeString();

        /// <summary>
        /// 获取分布式Id(Snowflake)
        /// </summary>
        /// <param name="prefix">业务标识前缀</param>
        /// <param name="machineId">机器Id(集群环境的服务器Id)</param>
        /// <param name="datacenterId">分布式数据中心Id(服务Id)</param>
        /// <param name="hasRandom">是否开启随机变量</param>
        /// <returns></returns>
        public string GetSnowflakeId(string prefix, short machineId, short datacenterId, bool hasRandom = true) 
        {
            Snowflake.Instance.SnowflakesInit(machineId, datacenterId);
            string randomNo = GenerateRandomNumber(6);
            if (hasRandom)
            {
                if (string.IsNullOrWhiteSpace(prefix)) return $"{randomNo}.{Snowflake.Instance.NextId()}";
                else return $"{prefix}.{randomNo}.{Snowflake.Instance.NextId()}";
            }
            else
            {
                if (string.IsNullOrWhiteSpace(prefix)) return $"{Snowflake.Instance.NextId()}";
                else return $"{prefix}.{Snowflake.Instance.NextId()}";
            }
        }

        #region 获取随机数
        /// <summary>
        /// 随机数基础数据
        /// </summary>
        private readonly char[] _RandomBasicData =
        {
            '0','1','2','3','4','5','6','7','8','9',
            'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',
            'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'
        };

        /// <summary>
        /// 生产随机数
        /// </summary>
        /// <param name="length">随机数长度</param>
        /// <returns></returns>
        public string GenerateRandomNumber(int length)
        {
            int capacity = _RandomBasicData.Length;
            StringBuilder newRandom = new StringBuilder(capacity);
            Random rd = new Random();
            for (int i = 0; i < length; i++)
            {
                newRandom.Append(_RandomBasicData[rd.Next(capacity)]);
            }
            return newRandom.ToString();
        } 
        #endregion
    }
}

Next, we call the above method to produce test results, the calling code is as follows:

System.Console.WriteLine("【原生使用】Snowflake 生产分布式 id.");
Snowflake.Instance.SnowflakesInit(0, 0); //【Snowflake】初始化
for (int i = 0; i < 5; i++)
{
    long id = Snowflake.Instance.NextId(); //生产id
    System.Console.WriteLine($"=>序号:[{i + 1}],时间:[{DateTime.Now:yyyy-MM-ddTHH:mm:ss.ffff}],id=[{id}]");
}

System.Console.WriteLine($"\n【扩展使用】Snowflake 生产分布式 id.扩展业务前缀和随机串.");
for (int i = 0; i < 5; i++)
{
    string id = IdWorker.Instance.GetSnowflakeId("order", 1, 0); //生产id
    System.Console.WriteLine($"=>序号:[{i + 1}],时间:[{DateTime.Now:yyyy-MM-ddTHH:mm:ss.ffff}],id=[{id}]");
}

The above calling code is to demonstrate the production of 5 pieces of information in each of the [Native] and [Extended] methods (differentiated by time). The results are as follows:

 

reference:

 

Guess you like

Origin blog.csdn.net/ChaITSimpleLove/article/details/114065926