基于WJLHA3的分布式唯一ID生成算法——森林算法(JAVA)

一、背景和优势说明

目前主要的两个ID生成算法为:UUID和雪花算法
UUID:实际上就是基于哈希算法(MD5,SHA等)将相关标识信息编码为一个散列值,其主要问题体现在效率上,另外哈希算法必然存在碰撞的可能,那么就没办法确保ID在时间上的互斥性,比如新生成的ID与已删除的ID发生碰撞。优势是使得ID具有随机性,无需依赖同步或中心发号器。
雪花算法:是基于定长编码的思想给出64比特的ID生成算法,但是依赖于中心发号器,类似改进算法则依赖于中心数据库,并且需时钟同步,否则必然产生相同ID。
森林算法(Forest Algorithm):是结合了UUID和雪花算法的优势,可以产生64(含)比特以上的ID生成算法,ID的前部分比特(如24比特,32比特)为哈希值,然后串联了不少于30比特的时间戳和不少于10比特的序号。所以具备如下特征:

  • 1、采用MAC地址,数据库名,数据表名和时间戳前34比特串联成二进制序列计算出24位散列值,可适当调增散列值的长度
  • 2、使用时间戳后30比特,所以散列值仅需12.4天计算一次。(长度可调整)
  • 3、使用10位序号,支持同一时间戳下1024并发。(长度可调整)

ForestAlgorithm有如下特征:

  • 1、去中心化,无需中心发号器或中心数据库
  • 2、新生成的ID必然不同于已存在的ID
  • 3、效率高,时间戳后30比特换算成年则可支持12.4天,于是12.4天散列值仅需计算一次,期间ID的生成效率优于雪花算法
  • 4、无需考虑时间戳同步问题
  • 5、生成的ID可保存在64比特的long型数据中,也可提升ID长度从而增加散列值、时间戳和序号适应范围;ID和散列值可分开应用或存储
  • 6、因为增加了MAC地址,数据库名,数据表名,所以可以在服务器之间ID区别,库间ID区别,以及库内表间ID区别等。所以称为“森林算法”。
    本文基于JAVA给出森林算法的DEMO以及实验效果,因为包括了WJLHA3,这里把WJLCoder.java和WJLHA3.java两个源代码贴出来,也可以访问连接:

国产哈希算法WJLHA(五):自定义哈希长度的WJLHA3开源(JAVA)

唯一性数学证明请参看:

WJLHA3编码原理和唯一性证明

二、为什么不采用MD5、SHA或SM3?

首先这类哈希算法产生的哈希值不低于128位,那么就只能从中间选取几个字节,于是如何选择字节使得ID碰撞的概率最低是主要问题!无法确保选择的字节一定是碰撞低概率。
其次WJLHA算法是目前完全国产基础理论给出的哈希算法,主要特征是可以生成自定义长度的哈希算法,而且可以被理论证明将产生碰撞的可能性是极低的(请参见后续的博客)。因为WJLHA的哈希值长度可定义使得ID的长度可定义。

三、Forest Algorithm源码

package ForestAlgorithm;
// ForestAlgorithm.java
/****
 * ForestAlgorithm(森林算法)结合UUID和雪花算法优势,兼顾运算效率的去中心化唯一ID生成算法
 * 1、采用MAC地址,数据库名,数据表名和时间戳前24比特串联成二进制序列计算出24比特散列值,可适当调增散列值的长度
 * 2、使用时间戳后30比特,所以散列值仅需12.4天计算一次,可调整
 * 3、使用10位序号,支持同一时间戳下1024并发,可调整
 * ForestAlgorithm支持如下特征:
 * 1、去中心化,无需中心发号器或中心数据库
 * 2、新生成的ID必然不同于已存在的ID
 * 3、效率高,时间戳后30比特换算成年则可支持12.4天,于是12.4天内散列值仅需计算一次,期间ID的生成效率优于雪花算法
 * 4、无需考虑时间戳同步问题
 * 5、生成的ID可保存在64比特的long型数据中,也可提升ID长度从而增加散列值、时间戳和序号适应范围;或ID和散列值分开存储于数据库中
 * 6、ID具有一定的随机性
 * @author 王杰林
 * @time 20220112
 */
import java.net.InetAddress;
import java.net.NetworkInterface;
import java.net.SocketException;
import java.net.UnknownHostException;
import java.util.Arrays;

public class ForestAlgorithm {
    
    
	// 散列值长度,单位字节
	private static final int HashValueBytes = 3;
	// 散列值缓存
	public static byte[] HashValue = null;
	// MAC地址
	public static byte[] MAC = null;
	// 数据库名
	public static String DatabaseName = null;
	// 数据表名
	public static String DataTableName = null;
	// 时间戳前34比特
	public static long HeadTime = 0;
	// 保存ID的前24位
	public static long ID24 = 0;

	// 获取MAC地址,返回MAC地址对应的字节数组
	public static byte[] getLocalMac() {
    
    
		// 获取网卡,获取地址
		InetAddress ia = null;
		byte[] mac = null;
		try {
    
    
			ia = InetAddress.getLocalHost();
			mac = NetworkInterface.getByInetAddress(ia).getHardwareAddress();
		} catch (UnknownHostException e) {
    
    
			// 添加异常日志
		} catch (SocketException e) {
    
    
			// 添加异常日志
		}
		return mac;
	}

	// 获取当前系统的时间戳,64位二进制
	public static long getSystemTimes() {
    
    
		return System.currentTimeMillis();
	}
	
	// 计算散列值
	public static byte[] getHashValue(byte[] mac, String databasename, String datatablename, int headtime) {
    
    
		int i = 0, j = 0;
		WJLHA3 wjlha = new WJLHA3();
		// 26比特需要4个字节
		byte headtime_arr[] = new byte[4];
		byte[] databasename_arr = databasename.getBytes(); // 如果包含中文需要用databasename.getBytes("GBK")等
		byte[] datatablename_arr = datatablename.getBytes();
		// 获取headtime的后4个字节
		headtime_arr[0] = (byte) ((headtime >> 24) & 0xFF);
		headtime_arr[1] = (byte) ((headtime >> 16) & 0xFF);
		headtime_arr[2] = (byte) ((headtime >> 8) & 0xFF);
		headtime_arr[3] = (byte) (headtime & 0xFF);
		// 将数组全部序列化成一个数组
		byte arr[] = new byte[mac.length + databasename_arr.length + datatablename_arr.length + headtime_arr.length];
		// 组合
		for (i = 0; i < mac.length; ++i) {
    
    
			arr[i] = mac[i];
		}
		for (j = 0; i < mac.length + databasename_arr.length; ++i) {
    
    
			arr[i] = databasename_arr[j];
			j++;
		}
		for (j = 0; i < mac.length + databasename_arr.length + datatablename_arr.length; ++i) {
    
    
			arr[i] = datatablename_arr[j];
			j++;
		}
		for (j = 0; i < mac.length + databasename_arr.length + datatablename_arr.length + headtime_arr.length; ++i) {
    
    
			arr[i] = headtime_arr[j];
			j++;
		}
		// 返回散列值
		return wjlha.getWJLHA(arr, HashValueBytes);
	}

	// 算法核心逻辑
	public static long getFAID(String databasename, String datatablename, int sequencenumber) {
    
    
		long ID = 0x0L;
		// 获取时间戳
		long time = getSystemTimes();
		// 保存时间戳的前34位
		long headtime = time >> 30;
		// 获取当前的MAC地址
		byte[] mac = getLocalMac();
		// 规整化sequencenumber
		if (sequencenumber > 1024) {
    
    
			// 序号过大,输入无效
			return 0;
		}
		// 仅保留序号的12位
		sequencenumber = sequencenumber & 0x0FFF;
		if (HashValue != null && Arrays.equals(mac, MAC) && databasename.equals(DatabaseName)
				&& datatablename.equals(DataTableName) && headtime == HeadTime) {
    
    
			// 封装30比特的时间戳和10比特的序号
			ID = ID24 | ((time & 0x3FFFFFFFL) << 10);
			ID |= (sequencenumber & 0X3FF);
		} else {
    
    
			// 缓存,方便进行比较
			MAC = mac;
			DatabaseName = databasename;
			DataTableName = datatablename;
			HeadTime = headtime;
			// 需要计算散列值
			HashValue = getHashValue(mac, databasename, datatablename, (int) headtime);
			// 直接封装
			ID = ((long)(HashValue[0] & 0xFF) << 56) | ((long)(HashValue[1] & 0xFF) << 48) | ((long)(HashValue[2] & 0xFF) << 40);
			// 缓存,方便加速运算
			ID24 = ID;
			// 封装30比特的时间戳和10比特的序号
			ID |= ((time & 0x3FFFFFFFL) << 10);
			ID |= (sequencenumber & 0X3FF);
		}
		return ID;
	}
	// test
	public static void main(String[] args) {
    
    
		// TODO Auto-generated method stub
		for(int i = 1; i < 20; ++i) {
    
    
			System.out.println(getFAID("Pay","Order",i)); // Order, Notifciation
		}
	}
}

四、WJLHA3源代码

// WJLCoder.java
package ForestAlgorithm;

public class WJLCoder {
    
    
	// Use 32bit Variable and not use the signed, so 31bit of int
	private int rangeCodeBitsLenght;
	// 31 - 8 = 23, Subtract 8 because need to leave one byte space.
	private int rangeCodeShiftBitsLenght;
	// Maximum value of interval.
	private long rangeCodeMaxIterval;
	// Minimum value of interval, Maximum value is one byte larger than the minimum.
	private long rangeCodeMinIterval;
	// JieLin Code Coefficient
	private double coefficient;
	// interval subscript EFLow
	private long encodeFlow = rangeCodeMinIterval;
	// interval length EFRange
	private long encodeFlowRange = rangeCodeMinIterval;
	// Delayed value output EFDigits
	private int encodeDelay;
	// the Delayed value count EFFollow
	private int encodeDelayCount;
	// Array subscript pointer by EOut_buff EOut_buff_loop
	private int encodeOutBufferLoop;
	// Hash Value cache array EOut_buff
	private byte[] encodeOutBufferArray;
	// Hash Value Byte Length
	private int hashValueOutByteLength;
	// Input Bytes BuFF Byte Length
	private int inBytesBuFFLength;
	
	public WJLCoder() {
    
    
		this.rangeCodeBitsLenght = 31;
		this.rangeCodeShiftBitsLenght = rangeCodeBitsLenght - 8;
		this.rangeCodeMaxIterval = toUnsignedInteger(1L << this.rangeCodeBitsLenght);
		this.rangeCodeMinIterval = toUnsignedInteger(1L << this.rangeCodeShiftBitsLenght);
		this.coefficient = 0.0;
		this.encodeFlow = this.rangeCodeMaxIterval;
		this.encodeFlowRange = this.rangeCodeMaxIterval;
		this.encodeDelay = 0;
		this.encodeDelayCount = 0;
		this.encodeOutBufferLoop = 0;
		this.hashValueOutByteLength = 0;
		this.inBytesBuFFLength = 0;
	}

	public int getRangeCodeBitsLenght() {
    
    
		return rangeCodeBitsLenght;
	}

	public void setRangeCodeBitsLenght(int rangeCodeBitsLenght) {
    
    
		this.rangeCodeBitsLenght = rangeCodeBitsLenght;
	}

	public int getRangeCodeShiftBitsLenght() {
    
    
		return rangeCodeShiftBitsLenght;
	}

	public void setRangeCodeShiftBitsLenght(int rangeCodeShiftBitsLenght) {
    
    
		this.rangeCodeShiftBitsLenght = rangeCodeShiftBitsLenght;
	}

	public long getRangeCodeMaxIterval() {
    
    
		return rangeCodeMaxIterval;
	}

	public void setRangeCodeMaxIterval(long rangeCodeMaxIterval) {
    
    
		this.rangeCodeMaxIterval = rangeCodeMaxIterval;
	}

	public long getRangeCodeMinIterval() {
    
    
		return rangeCodeMinIterval;
	}

	public void setRangeCodeMinIterval(long rangeCodeMinIterval) {
    
    
		this.rangeCodeMinIterval = rangeCodeMinIterval;
	}

	public double getCoefficient() {
    
    
		return coefficient;
	}

	public void setCoefficient(double coefficient) {
    
    
		this.coefficient = coefficient;
	}

	public long getEncodeFlow() {
    
    
		return encodeFlow;
	}

	public void setEncodeFlow(long encodeFlow) {
    
    
		this.encodeFlow = encodeFlow;
	}

	public long getEncodeFlowRange() {
    
    
		return encodeFlowRange;
	}

	public void setEncodeFlowRange(long encodeFlowRange) {
    
    
		this.encodeFlowRange = encodeFlowRange;
	}

	public int getEncodeDelay() {
    
    
		return encodeDelay;
	}

	public void setEncodeDelay(int encodeDelay) {
    
    
		this.encodeDelay = encodeDelay;
	}

	public int getEncodeDelayCount() {
    
    
		return encodeDelayCount;
	}

	public void setEncodeDelayCount(int encodeDelayCount) {
    
    
		this.encodeDelayCount = encodeDelayCount;
	}

	public int getEncodeOutBufferLoop() {
    
    
		return encodeOutBufferLoop;
	}

	public void setEncodeOutBufferLoop(int encodeOutBufferLoop) {
    
    
		this.encodeOutBufferLoop = encodeOutBufferLoop;
	}

	public byte[] getEncodeOutBufferArray() {
    
    
		return encodeOutBufferArray;
	}

	public void setEncodeOutBufferArray(int encodeOutBufferArrayLength) {
    
    
		this.encodeOutBufferArray = new byte[encodeOutBufferArrayLength];
	}

	
	public int getHashValueOutByteLength() {
    
    
		return hashValueOutByteLength;
	}

	public void setHashValueOutByteLength(int hashValueOutByteLength) {
    
    
		this.hashValueOutByteLength = hashValueOutByteLength;
	}

	public int getInBytesBuFFLength() {
    
    
		return inBytesBuFFLength;
	}

	public void setInBytesBuFFLength(int inBytesBuFFLength) {
    
    
		this.inBytesBuFFLength = inBytesBuFFLength;
	}

	// To convert signed int to unsigned int
	public long toUnsignedInteger(long value) {
    
    
		long result = value;
		if (value > 0xffffffffL) {
    
    
			result = (value % 0xffffffffL) - 1;
		} else if (value < 0) {
    
    
			result = value + 0xffffffffL + 1;
		}
		return result;
	}
}
// WJLHA3.java
package ForestAlgorithm;

public class WJLHA3 {
    
    
	public WJLCoder getWJLCoder() {
    
    
		return new WJLCoder();
	}

	// To convert signed bytes to unsigned bytes, is Right
	public int unsignedByte(byte value) {
    
    
		int result = value;
		if (value > 0xff) {
    
    
			result = (value % 0xff) - 1;
		} else if (value < 0) {
    
    
			result = value + 0xff + 1;
		}
		return result;
	}

	public byte IntToByte(int value) {
    
    
		return (byte) value;
	}

	// get the Jielin code coefficient
	private void GetJieLinCoe(WJLCoder wjlha) {
    
    
		wjlha.setCoefficient(Math.pow(2.0,
				1.0 - (((double) wjlha.getHashValueOutByteLength()) / (double) wjlha.getInBytesBuFFLength())));
	}

	// Output bytes to cache and weighted encoding
	private void OutPutByte(WJLCoder coder, int ucByte) {
    
    
		ucByte &= 0x00FF;
		coder.getEncodeOutBufferArray()[coder.getEncodeOutBufferLoop()
				% coder.getHashValueOutByteLength()] = (byte) (coder
						.getEncodeOutBufferArray()[(coder.getEncodeOutBufferLoop() + ucByte)
								% coder.getHashValueOutByteLength()]
						^ ucByte & 0x00FF);
		coder.setEncodeOutBufferLoop(coder.getEncodeOutBufferLoop() + 1);
	}

	// Encode by JielinCeo
	private void Encode(WJLCoder coder, byte symbol) {
    
    
		long High = 0, i = 0;
		if (1 == symbol) {
    
    // the Symbol 1
			coder.setEncodeFlow(coder.toUnsignedInteger(coder.getEncodeFlow()
					+ (long) ((double) coder.getEncodeFlowRange() * 0.5 * coder.getCoefficient())));
		}
		coder.setEncodeFlowRange(
				coder.toUnsignedInteger((long) ((double) coder.getEncodeFlowRange() * 0.5 * coder.getCoefficient())));
		while (coder.getEncodeFlowRange() <= coder.getRangeCodeMinIterval()) {
    
    
			High = coder.toUnsignedInteger(coder.getEncodeFlow() + coder.getEncodeFlowRange() - 1);
			if (coder.getEncodeDelayCount() != 0) {
    
    
				if (High <= coder.getRangeCodeMaxIterval()) {
    
    
					OutPutByte(coder, coder.getEncodeDelay());
					for (i = 1; i <= (coder.getEncodeDelayCount() - 1); ++i) {
    
    
						OutPutByte(coder, 0xFF);
					}
					coder.setEncodeDelayCount(0);
					coder.setEncodeFlow(
							coder.toUnsignedInteger(coder.getEncodeFlow() + coder.getRangeCodeMaxIterval()));
				} else if (coder.getEncodeFlow() >= coder.getRangeCodeMaxIterval()) {
    
    
					OutPutByte(coder, coder.getEncodeDelay() + 1);
					for (i = 1; i <= coder.getEncodeDelayCount() - 1; ++i) {
    
    
						OutPutByte(coder, 0x00);
					}
					coder.setEncodeDelayCount(0);
				} else {
    
    
					coder.setEncodeDelayCount(coder.getEncodeDelayCount() + 1);
					coder.setEncodeFlow(coder
							.toUnsignedInteger((coder.getEncodeFlow() << 8) & (coder.getRangeCodeMaxIterval() - 1)));
					coder.setEncodeFlowRange(coder.toUnsignedInteger(coder.getEncodeFlowRange() << 8));
					continue;
				}
			}
			if (((coder.getEncodeFlow() ^ High) & (0x00FFL << coder.getRangeCodeShiftBitsLenght())) == 0) {
    
    
				OutPutByte(coder, (int) ((coder.getEncodeFlow() & 0x7FFFFFFFL) >> coder.getRangeCodeShiftBitsLenght()));
			} else {
    
    
				coder.setEncodeFlow(coder.toUnsignedInteger(coder.getEncodeFlow() - coder.getRangeCodeMaxIterval()));
				coder.setEncodeDelay(
						(int) coder.toUnsignedInteger(coder.getEncodeFlow() >> coder.getRangeCodeShiftBitsLenght()));
				coder.setEncodeDelayCount(1);
			}
			coder.setEncodeFlow(
					coder.toUnsignedInteger((coder.getEncodeFlow() << 8) & (coder.getRangeCodeMaxIterval() - 1))
							| coder.toUnsignedInteger(coder.getEncodeFlow() & coder.getRangeCodeMaxIterval()));
			coder.setEncodeFlowRange(coder.toUnsignedInteger(coder.getEncodeFlowRange() << 8));
		}
	}

	// Finish Encode by JielinCeo
	private void FinishEncode(WJLCoder coder) {
    
    
		int n = 0;
		if (coder.getEncodeDelayCount() != 0) {
    
    
			if (coder.getEncodeFlow() < coder.getRangeCodeMaxIterval()) {
    
    
				OutPutByte(coder, coder.getEncodeDelay());
				for (n = 1; n <= coder.getEncodeDelayCount() - 1; n++) {
    
    
					OutPutByte(coder, 0xFF);
				}
			} else {
    
    
				OutPutByte(coder, coder.getEncodeDelay() + 1);
				for (n = 1; n <= coder.getEncodeDelayCount() - 1; n++) {
    
    
					OutPutByte(coder, 0x00);
				}
			}
		}
		coder.setEncodeFlow(coder.toUnsignedInteger(coder.getEncodeFlow() << 1));
		n = coder.getRangeCodeBitsLenght() + 1;
		do {
    
    
			n -= 8;
			OutPutByte(coder, (int) coder.toUnsignedInteger(coder.getEncodeFlow() >> n));
		} while (!(n <= 0));
	}

	// the main function
	public byte[] getWJLHA(byte[] InBytesBuFF, int HashValueBytesArray_Length) {
    
    
		int i = 0, j = 0;
		WJLCoder wjlha = getWJLCoder();
		
		wjlha.setEncodeOutBufferArray(HashValueBytesArray_Length);
		wjlha.setHashValueOutByteLength(HashValueBytesArray_Length);
		wjlha.setInBytesBuFFLength(InBytesBuFF.length);
		// Hash Algorithm for Weighted Probability Model
		GetJieLinCoe(wjlha);
		// Encode each bits
		for (i = 0; i < wjlha.getInBytesBuFFLength(); ++i) {
    
    
			for (j = 7; j >= 0; --j) {
    
    
				Encode(wjlha, (byte)((InBytesBuFF[i] >> j) & 0x01));
			}
		}
		if(wjlha.getInBytesBuFFLength() <= HashValueBytesArray_Length) {
    
    
			for (i = 0; i < wjlha.getInBytesBuFFLength() + 1; ++i) {
    
    
				for (j = 0; j < 8; ++j) {
    
    
					Encode(wjlha, (byte)(0x01));
				}
			}
		}
		FinishEncode(wjlha);
		return wjlha.getEncodeOutBufferArray();
	}
}

## 五、Forest Algorithm实验效果
通过main可以测试结果(64比特ID)如下:
-6721494604282226687
-6721494604281655294
-6721494604281117693
-6721494604280580092
-6721494604280043515
-6721494604279504890
-6721494604278971385
-6721494604278437880
-6721494604277902327
-6721494604277362678
-6721494604276825077
-6721494604276278260
-6721494604275738611
-6721494604275204082
-6721494604274668529
-6721494604274134000
-6721494604273597423
-6721494604273036270
-6721494604272504813
然后将Order修改为Notifciation时可得ID结果如下:
-5353051228372509695
-5353051228371942398
-5353051228371407869
-5353051228370879484
-5353051228370349051
-5353051228369820666
-5353051228369291257
-5353051228368763896
-5353051228368232439
-5353051228367704054
-5353051228367178741
-5353051228366650356
-5353051228366121971
-5353051228365592562
-5353051228365063153
-5353051228364535792
-5353051228364007407
-5353051228363474926
-5353051228362941421


猜你喜欢

转载自blog.csdn.net/wjlxueshu/article/details/122467495