snowflake

这个算法是twitter开源的，作用是分布式下生成全局唯一的ID。

原理

Each time you generate an ID, it works, like this.

A timestamp with millisecond precision is stored using 41 bits of the ID.
Then the NodeID is added in subsequent bits.
Then the Sequence Number is added, starting at 0 and incrementing for each ID generated in the same millisecond. If you generate enough IDs in the same millisecond that the sequence would roll over or overfill then the generate function will pause until the next millisecond.

The default Twitter format shown below.

+--------------------------------------------------------------------------+
| 1 Bit Unused | 41 Bit Timestamp |  10 Bit NodeID  |   12 Bit Sequence ID |
+--------------------------------------------------------------------------+

通俗的讲就是1bit闲置，41bit用来存放时间戳偏移量，10bit保存节点ID，12bit存储的是某个节点的序列号，同一个节点在同一时间点(ms)多次请求，序列号部分自增。如果一个时间点的请求量超过12bit表示，那么就等待，直到下个时间点。

理论上1ms支持的ID数量为:

    2^{10+12} = 2^{22} == 4194304

1ms支持4000w的数量，单机1ms支持4096个ID，这个数量应该是足够支持大部分的开发了。

优点

128bit的一个UUID也可以保证全局唯一，而且大多数程序语言都提供了支持，为什么要用snowflake呢？
snowflake的不同之处在于自增性，一个节点在同一时间点(ms)和不同时间点获得的ID都是自增的。为什么自增的ID比较好呢？这关系到MYSQL内部的细节了。(对MYSQL知之甚少，感兴趣的自行搜索)

缺点

算法的缺点显而易见，极度依赖服务器系统的时间，如果系统时间回拨将可能产生重复。

源码解读

其实根据工作原理就知晓算法的实现了，但是为了学到一些东西，还是找一个源码看一下。

包变量解读

var (
	// Epoch is set to the twitter snowflake epoch of Nov 04 2010 01:42:54 UTC
	// You may customize this to set a different epoch for your application.
	Epoch int64 = 1288834974657

	// Number of bits to use for Node
	// Remember, you have a total 22 bits to share between Node/Step
	NodeBits uint8 = 10

	// Number of bits to use for Step
	// Remember, you have a total 22 bits to share between Node/Step
	StepBits uint8 = 12

	nodeMax   int64 = -1 ^ (-1 << NodeBits)
	nodeMask  int64 = nodeMax << StepBits
	stepMask  int64 = -1 ^ (-1 << StepBits)
	timeShift uint8 = NodeBits + StepBits
	nodeShift uint8 = StepBits
)

如果已经理解了snowflake的原理，结合源码给的注释变量的意思很容易懂，无非是某部分的表示位数，最值，掩码，移位的位数。
其中的Epoch这个时间戳的值为什么初始化为1288834974657，这个不需要纠结。

为什么这里要定义一系列变量呢，原理中不都阐述了哪一部分占用了那些位吗？原因如下，程序中用变量比常量更能表达含义，程序中应该避免出现魔数。其次这样更加灵活，比如可以根据实际改动某部分的占位值。最后这样更容易升级维护，若干若干年以后，如果出了更高位的cpu，这样只需要变动几个变量的值即可。

节点

type Node struct {
	mu   sync.Mutex
	time int64
	node int64
	step int64
}

生成一个节点

func NewNode(node int64) (*Node, error) {

	// re-calc in case custom NodeBits or StepBits were set
	nodeMax = -1 ^ (-1 << NodeBits)
	nodeMask = nodeMax << StepBits
	stepMask = -1 ^ (-1 << StepBits)
	timeShift = NodeBits + StepBits
	nodeShift = StepBits
	
	if node < 0 || node > nodeMax {
		return nil, errors.New("Node number must be between 0 and " + strconv.FormatInt(nodeMax, 10))
	}

	return &Node{
		time: 0,
		node: node,
		step: 0,
	}, nil
}

生成ID

func (n *Node) Generate() ID {

	n.mu.Lock()

    //当前时间，单位second
	now := time.Now().UnixNano() / 1000000

	if n.time == now {
	    // 当前时间已经生成过ID
	
	    // ID部分自增
		n.step = (n.step + 1) & stepMask

        // ID溢出，当前时间没有ID可分配，延迟到下一个可用时间点
		if n.step == 0 {
			for now <= n.time {
				now = time.Now().UnixNano() / 1000000
			}
		}
	} else {
	    // 当前时间已经生成过ID
	    
	    // 第一个ID为0
		n.step = 0
	}

    // 溢出时更新时间
	n.time = now

	r := ID((now-Epoch)<<timeShift |
		(n.node << nodeShift) |
		(n.step),
	)

	n.mu.Unlock()
	return r
}

关于作者

大四学生一枚，分享数据结构，面试题，golang，C语言等知识。QQ交流群：521625004。微信公众号：后台技术栈。

snowflake

文章目录

snowflake

原理

优点

缺点

源码解读

包变量解读

节点

生成一个节点

生成ID

关于作者

猜你喜欢