一、背景和优势说明
目前主要的两个ID生成算法为:UUID和雪花算法
UUID:实际上就是基于哈希算法(MD5,SHA等)将相关标识信息编码为一个散列值,其主要问题体现在效率上,另外哈希算法必然存在碰撞的可能,那么就没办法确保ID在时间上的互斥性,比如新生成的ID与已删除的ID发生碰撞。优势是使得ID具有随机性,无需依赖同步或中心发号器。
雪花算法:是基于定长编码的思想给出64比特的ID生成算法,但是依赖于中心发号器,类似改进算法则依赖于中心数据库,并且需时钟同步,否则必然产生相同ID。
森林算法(Forest Algorithm):是结合了UUID和雪花算法的优势,可以产生64(含)比特以上的ID生成算法,ID的前部分比特(如24比特,32比特)为哈希值,然后串联了不少于30比特的时间戳和不少于10比特的序号。所以具备如下特征:
- 1、采用MAC地址,数据库名,数据表名和时间戳前34比特串联成二进制序列计算出24位散列值,可适当调增散列值的长度
- 2、使用时间戳后30比特,所以散列值仅需12.4天计算一次。(长度可调整)
- 3、使用10位序号,支持同一时间戳下1024并发。(长度可调整)
ForestAlgorithm有如下特征:
- 1、去中心化,无需中心发号器或中心数据库
- 2、新生成的ID必然不同于已存在的ID
- 3、效率高,时间戳后30比特换算成年则可支持12.4天,于是12.4天散列值仅需计算一次,期间ID的生成效率优于雪花算法
- 4、无需考虑时间戳同步问题
- 5、生成的ID可保存在64比特的long型数据中,也可提升ID长度从而增加散列值、时间戳和序号适应范围;ID和散列值可分开应用或存储
- 6、因为增加了MAC地址,数据库名,数据表名,所以可以在服务器之间ID区别,库间ID区别,以及库内表间ID区别等。所以称为“森林算法”。
本文基于JAVA给出森林算法的DEMO以及实验效果,因为包括了WJLHA3,这里把WJLCoder.java和WJLHA3.java两个源代码贴出来,也可以访问连接:
唯一性数学证明请参看:
二、为什么不采用MD5、SHA或SM3?
首先这类哈希算法产生的哈希值不低于128位,那么就只能从中间选取几个字节,于是如何选择字节使得ID碰撞的概率最低是主要问题!无法确保选择的字节一定是碰撞低概率。
其次WJLHA算法是目前完全国产基础理论给出的哈希算法,主要特征是可以生成自定义长度的哈希算法,而且可以被理论证明将产生碰撞的可能性是极低的(请参见后续的博客)。因为WJLHA的哈希值长度可定义使得ID的长度可定义。
三、Forest Algorithm源码
package ForestAlgorithm;
// ForestAlgorithm.java
/****
* ForestAlgorithm(森林算法)结合UUID和雪花算法优势,兼顾运算效率的去中心化唯一ID生成算法
* 1、采用MAC地址,数据库名,数据表名和时间戳前24比特串联成二进制序列计算出24比特散列值,可适当调增散列值的长度
* 2、使用时间戳后30比特,所以散列值仅需12.4天计算一次,可调整
* 3、使用10位序号,支持同一时间戳下1024并发,可调整
* ForestAlgorithm支持如下特征:
* 1、去中心化,无需中心发号器或中心数据库
* 2、新生成的ID必然不同于已存在的ID
* 3、效率高,时间戳后30比特换算成年则可支持12.4天,于是12.4天内散列值仅需计算一次,期间ID的生成效率优于雪花算法
* 4、无需考虑时间戳同步问题
* 5、生成的ID可保存在64比特的long型数据中,也可提升ID长度从而增加散列值、时间戳和序号适应范围;或ID和散列值分开存储于数据库中
* 6、ID具有一定的随机性
* @author 王杰林
* @time 20220112
*/
import java.net.InetAddress;
import java.net.NetworkInterface;
import java.net.SocketException;
import java.net.UnknownHostException;
import java.util.Arrays;
public class ForestAlgorithm {
// 散列值长度,单位字节
private static final int HashValueBytes = 3;
// 散列值缓存
public static byte[] HashValue = null;
// MAC地址
public static byte[] MAC = null;
// 数据库名
public static String DatabaseName = null;
// 数据表名
public static String DataTableName = null;
// 时间戳前34比特
public static long HeadTime = 0;
// 保存ID的前24位
public static long ID24 = 0;
// 获取MAC地址,返回MAC地址对应的字节数组
public static byte[] getLocalMac() {
// 获取网卡,获取地址
InetAddress ia = null;
byte[] mac = null;
try {
ia = InetAddress.getLocalHost();
mac = NetworkInterface.getByInetAddress(ia).getHardwareAddress();
} catch (UnknownHostException e) {
// 添加异常日志
} catch (SocketException e) {
// 添加异常日志
}
return mac;
}
// 获取当前系统的时间戳,64位二进制
public static long getSystemTimes() {
return System.currentTimeMillis();
}
// 计算散列值
public static byte[] getHashValue(byte[] mac, String databasename, String datatablename, int headtime) {
int i = 0, j = 0;
WJLHA3 wjlha = new WJLHA3();
// 26比特需要4个字节
byte headtime_arr[] = new byte[4];
byte[] databasename_arr = databasename.getBytes(); // 如果包含中文需要用databasename.getBytes("GBK")等
byte[] datatablename_arr = datatablename.getBytes();
// 获取headtime的后4个字节
headtime_arr[0] = (byte) ((headtime >> 24) & 0xFF);
headtime_arr[1] = (byte) ((headtime >> 16) & 0xFF);
headtime_arr[2] = (byte) ((headtime >> 8) & 0xFF);
headtime_arr[3] = (byte) (headtime & 0xFF);
// 将数组全部序列化成一个数组
byte arr[] = new byte[mac.length + databasename_arr.length + datatablename_arr.length + headtime_arr.length];
// 组合
for (i = 0; i < mac.length; ++i) {
arr[i] = mac[i];
}
for (j = 0; i < mac.length + databasename_arr.length; ++i) {
arr[i] = databasename_arr[j];
j++;
}
for (j = 0; i < mac.length + databasename_arr.length + datatablename_arr.length; ++i) {
arr[i] = datatablename_arr[j];
j++;
}
for (j = 0; i < mac.length + databasename_arr.length + datatablename_arr.length + headtime_arr.length; ++i) {
arr[i] = headtime_arr[j];
j++;
}
// 返回散列值
return wjlha.getWJLHA(arr, HashValueBytes);
}
// 算法核心逻辑
public static long getFAID(String databasename, String datatablename, int sequencenumber) {
long ID = 0x0L;
// 获取时间戳
long time = getSystemTimes();
// 保存时间戳的前34位
long headtime = time >> 30;
// 获取当前的MAC地址
byte[] mac = getLocalMac();
// 规整化sequencenumber
if (sequencenumber > 1024) {
// 序号过大,输入无效
return 0;
}
// 仅保留序号的12位
sequencenumber = sequencenumber & 0x0FFF;
if (HashValue != null && Arrays.equals(mac, MAC) && databasename.equals(DatabaseName)
&& datatablename.equals(DataTableName) && headtime == HeadTime) {
// 封装30比特的时间戳和10比特的序号
ID = ID24 | ((time & 0x3FFFFFFFL) << 10);
ID |= (sequencenumber & 0X3FF);
} else {
// 缓存,方便进行比较
MAC = mac;
DatabaseName = databasename;
DataTableName = datatablename;
HeadTime = headtime;
// 需要计算散列值
HashValue = getHashValue(mac, databasename, datatablename, (int) headtime);
// 直接封装
ID = ((long)(HashValue[0] & 0xFF) << 56) | ((long)(HashValue[1] & 0xFF) << 48) | ((long)(HashValue[2] & 0xFF) << 40);
// 缓存,方便加速运算
ID24 = ID;
// 封装30比特的时间戳和10比特的序号
ID |= ((time & 0x3FFFFFFFL) << 10);
ID |= (sequencenumber & 0X3FF);
}
return ID;
}
// test
public static void main(String[] args) {
// TODO Auto-generated method stub
for(int i = 1; i < 20; ++i) {
System.out.println(getFAID("Pay","Order",i)); // Order, Notifciation
}
}
}
四、WJLHA3源代码
// WJLCoder.java
package ForestAlgorithm;
public class WJLCoder {
// Use 32bit Variable and not use the signed, so 31bit of int
private int rangeCodeBitsLenght;
// 31 - 8 = 23, Subtract 8 because need to leave one byte space.
private int rangeCodeShiftBitsLenght;
// Maximum value of interval.
private long rangeCodeMaxIterval;
// Minimum value of interval, Maximum value is one byte larger than the minimum.
private long rangeCodeMinIterval;
// JieLin Code Coefficient
private double coefficient;
// interval subscript EFLow
private long encodeFlow = rangeCodeMinIterval;
// interval length EFRange
private long encodeFlowRange = rangeCodeMinIterval;
// Delayed value output EFDigits
private int encodeDelay;
// the Delayed value count EFFollow
private int encodeDelayCount;
// Array subscript pointer by EOut_buff EOut_buff_loop
private int encodeOutBufferLoop;
// Hash Value cache array EOut_buff
private byte[] encodeOutBufferArray;
// Hash Value Byte Length
private int hashValueOutByteLength;
// Input Bytes BuFF Byte Length
private int inBytesBuFFLength;
public WJLCoder() {
this.rangeCodeBitsLenght = 31;
this.rangeCodeShiftBitsLenght = rangeCodeBitsLenght - 8;
this.rangeCodeMaxIterval = toUnsignedInteger(1L << this.rangeCodeBitsLenght);
this.rangeCodeMinIterval = toUnsignedInteger(1L << this.rangeCodeShiftBitsLenght);
this.coefficient = 0.0;
this.encodeFlow = this.rangeCodeMaxIterval;
this.encodeFlowRange = this.rangeCodeMaxIterval;
this.encodeDelay = 0;
this.encodeDelayCount = 0;
this.encodeOutBufferLoop = 0;
this.hashValueOutByteLength = 0;
this.inBytesBuFFLength = 0;
}
public int getRangeCodeBitsLenght() {
return rangeCodeBitsLenght;
}
public void setRangeCodeBitsLenght(int rangeCodeBitsLenght) {
this.rangeCodeBitsLenght = rangeCodeBitsLenght;
}
public int getRangeCodeShiftBitsLenght() {
return rangeCodeShiftBitsLenght;
}
public void setRangeCodeShiftBitsLenght(int rangeCodeShiftBitsLenght) {
this.rangeCodeShiftBitsLenght = rangeCodeShiftBitsLenght;
}
public long getRangeCodeMaxIterval() {
return rangeCodeMaxIterval;
}
public void setRangeCodeMaxIterval(long rangeCodeMaxIterval) {
this.rangeCodeMaxIterval = rangeCodeMaxIterval;
}
public long getRangeCodeMinIterval() {
return rangeCodeMinIterval;
}
public void setRangeCodeMinIterval(long rangeCodeMinIterval) {
this.rangeCodeMinIterval = rangeCodeMinIterval;
}
public double getCoefficient() {
return coefficient;
}
public void setCoefficient(double coefficient) {
this.coefficient = coefficient;
}
public long getEncodeFlow() {
return encodeFlow;
}
public void setEncodeFlow(long encodeFlow) {
this.encodeFlow = encodeFlow;
}
public long getEncodeFlowRange() {
return encodeFlowRange;
}
public void setEncodeFlowRange(long encodeFlowRange) {
this.encodeFlowRange = encodeFlowRange;
}
public int getEncodeDelay() {
return encodeDelay;
}
public void setEncodeDelay(int encodeDelay) {
this.encodeDelay = encodeDelay;
}
public int getEncodeDelayCount() {
return encodeDelayCount;
}
public void setEncodeDelayCount(int encodeDelayCount) {
this.encodeDelayCount = encodeDelayCount;
}
public int getEncodeOutBufferLoop() {
return encodeOutBufferLoop;
}
public void setEncodeOutBufferLoop(int encodeOutBufferLoop) {
this.encodeOutBufferLoop = encodeOutBufferLoop;
}
public byte[] getEncodeOutBufferArray() {
return encodeOutBufferArray;
}
public void setEncodeOutBufferArray(int encodeOutBufferArrayLength) {
this.encodeOutBufferArray = new byte[encodeOutBufferArrayLength];
}
public int getHashValueOutByteLength() {
return hashValueOutByteLength;
}
public void setHashValueOutByteLength(int hashValueOutByteLength) {
this.hashValueOutByteLength = hashValueOutByteLength;
}
public int getInBytesBuFFLength() {
return inBytesBuFFLength;
}
public void setInBytesBuFFLength(int inBytesBuFFLength) {
this.inBytesBuFFLength = inBytesBuFFLength;
}
// To convert signed int to unsigned int
public long toUnsignedInteger(long value) {
long result = value;
if (value > 0xffffffffL) {
result = (value % 0xffffffffL) - 1;
} else if (value < 0) {
result = value + 0xffffffffL + 1;
}
return result;
}
}
// WJLHA3.java
package ForestAlgorithm;
public class WJLHA3 {
public WJLCoder getWJLCoder() {
return new WJLCoder();
}
// To convert signed bytes to unsigned bytes, is Right
public int unsignedByte(byte value) {
int result = value;
if (value > 0xff) {
result = (value % 0xff) - 1;
} else if (value < 0) {
result = value + 0xff + 1;
}
return result;
}
public byte IntToByte(int value) {
return (byte) value;
}
// get the Jielin code coefficient
private void GetJieLinCoe(WJLCoder wjlha) {
wjlha.setCoefficient(Math.pow(2.0,
1.0 - (((double) wjlha.getHashValueOutByteLength()) / (double) wjlha.getInBytesBuFFLength())));
}
// Output bytes to cache and weighted encoding
private void OutPutByte(WJLCoder coder, int ucByte) {
ucByte &= 0x00FF;
coder.getEncodeOutBufferArray()[coder.getEncodeOutBufferLoop()
% coder.getHashValueOutByteLength()] = (byte) (coder
.getEncodeOutBufferArray()[(coder.getEncodeOutBufferLoop() + ucByte)
% coder.getHashValueOutByteLength()]
^ ucByte & 0x00FF);
coder.setEncodeOutBufferLoop(coder.getEncodeOutBufferLoop() + 1);
}
// Encode by JielinCeo
private void Encode(WJLCoder coder, byte symbol) {
long High = 0, i = 0;
if (1 == symbol) {
// the Symbol 1
coder.setEncodeFlow(coder.toUnsignedInteger(coder.getEncodeFlow()
+ (long) ((double) coder.getEncodeFlowRange() * 0.5 * coder.getCoefficient())));
}
coder.setEncodeFlowRange(
coder.toUnsignedInteger((long) ((double) coder.getEncodeFlowRange() * 0.5 * coder.getCoefficient())));
while (coder.getEncodeFlowRange() <= coder.getRangeCodeMinIterval()) {
High = coder.toUnsignedInteger(coder.getEncodeFlow() + coder.getEncodeFlowRange() - 1);
if (coder.getEncodeDelayCount() != 0) {
if (High <= coder.getRangeCodeMaxIterval()) {
OutPutByte(coder, coder.getEncodeDelay());
for (i = 1; i <= (coder.getEncodeDelayCount() - 1); ++i) {
OutPutByte(coder, 0xFF);
}
coder.setEncodeDelayCount(0);
coder.setEncodeFlow(
coder.toUnsignedInteger(coder.getEncodeFlow() + coder.getRangeCodeMaxIterval()));
} else if (coder.getEncodeFlow() >= coder.getRangeCodeMaxIterval()) {
OutPutByte(coder, coder.getEncodeDelay() + 1);
for (i = 1; i <= coder.getEncodeDelayCount() - 1; ++i) {
OutPutByte(coder, 0x00);
}
coder.setEncodeDelayCount(0);
} else {
coder.setEncodeDelayCount(coder.getEncodeDelayCount() + 1);
coder.setEncodeFlow(coder
.toUnsignedInteger((coder.getEncodeFlow() << 8) & (coder.getRangeCodeMaxIterval() - 1)));
coder.setEncodeFlowRange(coder.toUnsignedInteger(coder.getEncodeFlowRange() << 8));
continue;
}
}
if (((coder.getEncodeFlow() ^ High) & (0x00FFL << coder.getRangeCodeShiftBitsLenght())) == 0) {
OutPutByte(coder, (int) ((coder.getEncodeFlow() & 0x7FFFFFFFL) >> coder.getRangeCodeShiftBitsLenght()));
} else {
coder.setEncodeFlow(coder.toUnsignedInteger(coder.getEncodeFlow() - coder.getRangeCodeMaxIterval()));
coder.setEncodeDelay(
(int) coder.toUnsignedInteger(coder.getEncodeFlow() >> coder.getRangeCodeShiftBitsLenght()));
coder.setEncodeDelayCount(1);
}
coder.setEncodeFlow(
coder.toUnsignedInteger((coder.getEncodeFlow() << 8) & (coder.getRangeCodeMaxIterval() - 1))
| coder.toUnsignedInteger(coder.getEncodeFlow() & coder.getRangeCodeMaxIterval()));
coder.setEncodeFlowRange(coder.toUnsignedInteger(coder.getEncodeFlowRange() << 8));
}
}
// Finish Encode by JielinCeo
private void FinishEncode(WJLCoder coder) {
int n = 0;
if (coder.getEncodeDelayCount() != 0) {
if (coder.getEncodeFlow() < coder.getRangeCodeMaxIterval()) {
OutPutByte(coder, coder.getEncodeDelay());
for (n = 1; n <= coder.getEncodeDelayCount() - 1; n++) {
OutPutByte(coder, 0xFF);
}
} else {
OutPutByte(coder, coder.getEncodeDelay() + 1);
for (n = 1; n <= coder.getEncodeDelayCount() - 1; n++) {
OutPutByte(coder, 0x00);
}
}
}
coder.setEncodeFlow(coder.toUnsignedInteger(coder.getEncodeFlow() << 1));
n = coder.getRangeCodeBitsLenght() + 1;
do {
n -= 8;
OutPutByte(coder, (int) coder.toUnsignedInteger(coder.getEncodeFlow() >> n));
} while (!(n <= 0));
}
// the main function
public byte[] getWJLHA(byte[] InBytesBuFF, int HashValueBytesArray_Length) {
int i = 0, j = 0;
WJLCoder wjlha = getWJLCoder();
wjlha.setEncodeOutBufferArray(HashValueBytesArray_Length);
wjlha.setHashValueOutByteLength(HashValueBytesArray_Length);
wjlha.setInBytesBuFFLength(InBytesBuFF.length);
// Hash Algorithm for Weighted Probability Model
GetJieLinCoe(wjlha);
// Encode each bits
for (i = 0; i < wjlha.getInBytesBuFFLength(); ++i) {
for (j = 7; j >= 0; --j) {
Encode(wjlha, (byte)((InBytesBuFF[i] >> j) & 0x01));
}
}
if(wjlha.getInBytesBuFFLength() <= HashValueBytesArray_Length) {
for (i = 0; i < wjlha.getInBytesBuFFLength() + 1; ++i) {
for (j = 0; j < 8; ++j) {
Encode(wjlha, (byte)(0x01));
}
}
}
FinishEncode(wjlha);
return wjlha.getEncodeOutBufferArray();
}
}
## 五、Forest Algorithm实验效果
通过main可以测试结果(64比特ID)如下:
-6721494604282226687
-6721494604281655294
-6721494604281117693
-6721494604280580092
-6721494604280043515
-6721494604279504890
-6721494604278971385
-6721494604278437880
-6721494604277902327
-6721494604277362678
-6721494604276825077
-6721494604276278260
-6721494604275738611
-6721494604275204082
-6721494604274668529
-6721494604274134000
-6721494604273597423
-6721494604273036270
-6721494604272504813
然后将Order修改为Notifciation时可得ID结果如下:
-5353051228372509695
-5353051228371942398
-5353051228371407869
-5353051228370879484
-5353051228370349051
-5353051228369820666
-5353051228369291257
-5353051228368763896
-5353051228368232439
-5353051228367704054
-5353051228367178741
-5353051228366650356
-5353051228366121971
-5353051228365592562
-5353051228365063153
-5353051228364535792
-5353051228364007407
-5353051228363474926
-5353051228362941421