premise
Recently, it has been found that various channels have recommended a lot of ULID
related articles. Here ULID
, the specification files are interpreted and implemented based on the Java
language , and the underlying principles ULID
are displayed through this implementation process .ULID
The background of the emergence of ULID
ULID
The full name is Universally Unique Lexicographically Sortable Identifier
, the literal translation is the universally unique lexicographically sorted identifier . Its original repository is https://github.com/ulid/javascript
that the project was initiated by the front-end developer alizain and written based on the JavaScript
language. From the commit
history of the project, it has been more than a 5
year, and the theory has been fully verified in practice . ULID
The reason for this is that some developers think that the mainstream UUID
solution may not be optimal in many scenarios. There are the following reasons:
UUID
Not the most efficient implementation of128 bit
random encoding (128 bit
generating strings from random numbers through encoding)UUID
implementation is impractical inv1/v2
many environments, since implementations of both versions require access to unique, stableMAC
addressesUUID
Thev3/v5
implementation requires unique seeds and produces random distributionsID
, which can lead to fragmentation in many data structuresUUID
No need tov4
provide additional information other than randomness, which can lead to fragmentation in many data structures
To summarize here, it is unrealistic UUID
for the v1/v2
implementation to rely on a unique stable address, and the implementation will be "fragmented" due to randomness .MAC
v3/v4/v5
ID
Based on this proposed ULID
, it works like this:
ulid() // 01ARZ3NDEKTSV4RRFFQ69G5FAV
复制代码
ULID
The features are as follows:
- 设计为
128 bit
大小,与UUID
兼容 - 每毫秒生成
1.21e+24
个唯一的ULID
(高性能) - 按字典顺序(字母顺序)排序
- 标准编码为
26
个字符的字符串,而不是像UUID
那样需要36
个字符 - 使用
Crockford
的base32
算法来提高效率和可读性(每个字符5 bit
) - 不区分大小写
- 没有特殊字符串(
URL
安全,不需要进行二次URL
编码) - 单调排序(正确地检测并处理相同的毫秒,所谓单调性,就是毫秒数相同的情况下,能够确保新的
ULID
随机部分的在最低有效位上加1
位)
ULID规范
下面的ULID
规范在ULID/javascript
类库中实现,此二进制格式目前没有在JavaScript
中实现:
01AN4Z07BY 79KA1307SR9X4MV3
|----------| |----------------|
Timestamp Randomness
48bits 80bits
复制代码
组成
时间戳(Timestamp
)
- 占据
48 bit
(high
) - 本质是
UNIX-time
,单位为毫秒 - 直到公元
10889
年才会用完
随机数(Randomness
)
- 占据
80 bit
(low
) - 如果可能的话,使用加密安全的随机源
排序
"最左边"的字符必须排在最前面,"最右边"的字符排在最后(词法顺序,或者俗称的字典排序),并且所有字符必须使用默认的ASCII
字符集。在相同的毫秒(时间戳)内,无法保证排序顺序。
规范的表示形式
ULID
规范的字符串表示形式如下:
ttttttttttrrrrrrrrrrrrrrrr
where
t is Timestamp (10 characters)
r is Randomness (16 characters)
复制代码
也就是:
- 时间戳占据高(左边)
10
个(编码后的)字符 - 随机数占据低(右边)
16
个(编码后的)字符
ULID
规范的字符串表示形式的长度是确定的,共占据26
个字符。
编码
使用Crockford Base32
编码算法,这个编码算法的字母表如下:
0123456789ABCDEFGHJKMNPQRSTVWXYZ
复制代码
该字母表排除了I
、 L
、O
、U
字母,目的是避免混淆和滥用。此算法实现不难,它的官网有详细的算法说明(见https://www.crockford.com/base32.html
):
单调性
(如果启用了单调性这个特性为前提下)当在相同的毫秒内生成多个ULID
时,可以保证排序的顺序。也就是说,如果检测到相同的毫秒,则随机分量在最低有效位上加1
位(带进位)。例如:
monotonicUlid() // 01BX5ZZKBKACTAV9WEVGEMMVRZ
monotonicUlid() // 01BX5ZZKBKACTAV9WEVGEMMVS0
复制代码
溢出错误处理
从技术实现上来看,26
个字符的Base32
编码字符串可以包含130 bit
信息,而ULID
只包含128 bit
信息,所以该编码算法是能完全满足ULID
的需要。基于Base32
编码能够生成的最大的合法ULID
其实就是7ZZZZZZZZZZZZZZZZZZZZZZZZZ
,并且使用的时间戳为epoch time
的281474976710655
或者说2 ^ 48 - 1
。对于任何对大于此值的ULID
进行解码或编码的尝试都应该被所有实现拒绝,以防止溢出错误。
二进制布局
二进制布局的多个部分被编码为16 byte
,每个部分都以最高字节优先(网络字节序,也就是big-endian
)进行编码,布局如下:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 32_bit_uint_time_high |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 16_bit_uint_time_low | 16_bit_uint_random |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 32_bit_uint_random |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 32_bit_uint_random |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
复制代码
ULID使用
对于script
标签引用:
<script src="https://unpkg.com/ulid@latest/dist/index.umd.js"></script>
<script>
ULID.ulid()
</script>
复制代码
NPM
安装:
npm install --save ulid
复制代码
TypeScript
, ES6+
, Babel
, Webpack
, Rollup
等等下使用:
// import
import { ulid } from 'ulid'
ulid()
// CommonJS env
const ULID = require('ulid')
ULID.ulid()
复制代码
后端Maven
项目中使用需要引入依赖,这里选用ulid-creator
实现:
<dependency>
<groupId>com.github.f4b6a3</groupId>
<artifactId>ulid-creator</artifactId>
<version>5.0.2</version>
</dependency>
复制代码
然后调用UlidCreator#getUlid()
系列方法:
// 常规
Ulid ulid = UlidCreator.getUlid();
// 单调排序
Ulid ulid = UlidCreator.getMonotonicUlid();
复制代码
实现ULID
前面已经提到ULID
的规范,其实具体实现ULID
就是对着规范里面的每一个小节进行编码实现。先看二进制布局,由于使用128 bit
去存储,可以借鉴UUID
那样,使用两个long
类似的成员变量存储ULID
的信息,看起来像这样:
public final class ULID {
/*
* The most significant 64 bits of this ULID.
*
*/
private final long msb;
/*
* The least significant 64 bits of this ULID.
*
*/
private final long lsb;
public ULID(long msb, long lsb) {
this.msb = msb;
this.lsb = lsb;
}
}
复制代码
按照ULID
的组成来看,可以提供一个入参为时间戳和随机数字节数组的构造:
public ULID(long timestamp, byte[] randomness) {
if ((timestamp & TIMESTAMP_MASK) != 0) {
throw new IllegalArgumentException("Invalid timestamp");
}
if (Objects.isNull(randomness) || RANDOMNESS_BYTE_LEN != randomness.length) {
throw new IllegalArgumentException("Invalid randomness");
}
long msb = 0;
long lsb = 0;
// 时间戳左移16位,低位补零准备填入部分随机数位,即16_bit_uint_random
msb |= timestamp << 16;
// randomness[0]左移0位填充到16_bit_uint_random的高8位,randomness[1]填充到16_bit_uint_random的低8位
msb |= (long) (randomness[0x0] & 0xff) << 8;
// randomness[1]填充到16_bit_uint_random的低8位
msb |= randomness[0x1] & 0xff;
// randomness[2] ~ randomness[9]填充到剩余的bit_uint_random中,要左移相应的位
lsb |= (long) (randomness[0x2] & 0xff) << 56;
lsb |= (long) (randomness[0x3] & 0xff) << 48;
lsb |= (long) (randomness[0x4] & 0xff) << 40;
lsb |= (long) (randomness[0x5] & 0xff) << 32;
lsb |= (long) (randomness[0x6] & 0xff) << 24;
lsb |= (long) (randomness[0x7] & 0xff) << 16;
lsb |= (long) (randomness[0x8] & 0xff) << 8;
lsb |= (randomness[0x9] & 0xff);
this.msb = msb;
this.lsb = lsb;
}
复制代码
这是完全按照规范的二进制布局编写代码,可以像UUID
的构造那样精简一下:
long msb = 0;
long lsb = 0;
byte[] data = new byte[16];
byte[] ts = ByteBuffer.allocate(8).putLong(0, timestamp << 16).array();
System.arraycopy(ts, 0, data, 0, 6);
System.arraycopy(randomness, 0, data, 6, 10);
for (int i = 0; i < 8; i++)
msb = (msb << 8) | (data[i] & 0xff);
for (int i = 8; i < 16; i++)
lsb = (lsb << 8) | (data[i] & 0xff);
复制代码
接着可以简单添加下面几个方法:
public long getMostSignificantBits() {
return this.msb;
}
public long getLeastSignificantBits() {
return this.lsb;
}
// 静态工厂方法,由UUID实例生成ULID实例
public static ULID fromUUID(UUID uuid) {
return new ULID(uuid.getMostSignificantBits(), uuid.getLeastSignificantBits());
}
// 实例方法,当前ULID实例转换为UUID实例
public UUID toUUID() {
return new UUID(this.msb, this.lsb);
}
复制代码
接着需要覆盖toString()
方法,这是ULID
的核心方法,需要通过Crockford Base32
编码生成规范的字符串表示形式。由于128 bit
要映射为26 char
,这里可以考虑分三段进行映射,也就是48 bit
时间戳映射为10 char
,剩下的两部分随机数分别做40 bit
到8 char
的映射,加起来就是26 char
:
|----------| |----------------|
Timestamp Randomness[split to 2 part]
48bit => 10char 80bit => 16char
复制代码
编写方法:
/**
* Default alphabet of ULID
*/
private static final char[] DEFAULT_ALPHABET = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C',
'D', 'E', 'F', 'G', 'H', 'J', 'K', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'X', 'Y', 'Z'};
/**
* Default alphabet mask
*/
private static final int DEFAULT_ALPHABET_MASK = 0b11111;
/**
* Character num of ULID
*/
private static final int ULID_CHAR_LEN = 0x1a;
@Override
public String toString() {
return toCanonicalString(DEFAULT_ALPHABET);
}
public String toCanonicalString(char[] alphabet) {
char[] chars = new char[ULID_CHAR_LEN];
long timestamp = this.msb >> 16;
// 第一部分随机数取msb的低16位+lsb的高24位,这里(msb & 0xffff) << 24作为第一部分随机数的高16位,所以要左移24位
long randMost = ((this.msb & 0xffffL) << 24) | (this.lsb >>> 40);
// 第二部分随机数取lsb的低40位,0xffffffffffL是2^40-1
long randLeast = (this.lsb & 0xffffffffffL);
// 接着每个部分的偏移量和DEFAULT_ALPHABET_MASK(31)做一次或运算就行,就是char[index] = alphabet[(part >> (step * index)) & 31]
chars[0x00] = alphabet[(int) (timestamp >>> 45 & DEFAULT_ALPHABET_MASK)];
chars[0x01] = alphabet[(int) (timestamp >>> 40 & DEFAULT_ALPHABET_MASK)];
chars[0x02] = alphabet[(int) (timestamp >>> 35 & DEFAULT_ALPHABET_MASK)];
chars[0x03] = alphabet[(int) (timestamp >>> 30 & DEFAULT_ALPHABET_MASK)];
chars[0x04] = alphabet[(int) (timestamp >>> 25 & DEFAULT_ALPHABET_MASK)];
chars[0x05] = alphabet[(int) (timestamp >>> 20 & DEFAULT_ALPHABET_MASK)];
chars[0x06] = alphabet[(int) (timestamp >>> 15 & DEFAULT_ALPHABET_MASK)];
chars[0x07] = alphabet[(int) (timestamp >>> 10 & DEFAULT_ALPHABET_MASK)];
chars[0x08] = alphabet[(int) (timestamp >>> 5 & DEFAULT_ALPHABET_MASK)];
chars[0x09] = alphabet[(int) (timestamp & DEFAULT_ALPHABET_MASK)];
chars[0x0a] = alphabet[(int) (randMost >>> 35 & DEFAULT_ALPHABET_MASK)];
chars[0x0b] = alphabet[(int) (randMost >>> 30 & DEFAULT_ALPHABET_MASK)];
chars[0x0c] = alphabet[(int) (randMost >>> 25 & DEFAULT_ALPHABET_MASK)];
chars[0x0d] = alphabet[(int) (randMost >>> 20 & DEFAULT_ALPHABET_MASK)];
chars[0x0e] = alphabet[(int) (randMost >>> 15 & DEFAULT_ALPHABET_MASK)];
chars[0x0f] = alphabet[(int) (randMost >>> 10 & DEFAULT_ALPHABET_MASK)];
chars[0x10] = alphabet[(int) (randMost >>> 5 & DEFAULT_ALPHABET_MASK)];
chars[0x11] = alphabet[(int) (randMost & DEFAULT_ALPHABET_MASK)];
chars[0x12] = alphabet[(int) (randLeast >>> 35 & DEFAULT_ALPHABET_MASK)];
chars[0x13] = alphabet[(int) (randLeast >>> 30 & DEFAULT_ALPHABET_MASK)];
chars[0x14] = alphabet[(int) (randLeast >>> 25 & DEFAULT_ALPHABET_MASK)];
chars[0x15] = alphabet[(int) (randLeast >>> 20 & DEFAULT_ALPHABET_MASK)];
chars[0x16] = alphabet[(int) (randLeast >>> 15 & DEFAULT_ALPHABET_MASK)];
chars[0x17] = alphabet[(int) (randLeast >>> 10 & DEFAULT_ALPHABET_MASK)];
chars[0x18] = alphabet[(int) (randLeast >>> 5 & DEFAULT_ALPHABET_MASK)];
chars[0x19] = alphabet[(int) (randLeast & DEFAULT_ALPHABET_MASK)];
return new String(chars);
}
复制代码
上面的方法toCanonicalString()
看起来很"臃肿",但是能保证性能比较高。接着添加常用的工厂方法:
public static ULID ulid() {
return ulid(System::currentTimeMillis, len -> {
byte[] bytes = new byte[len];
ThreadLocalRandom.current().nextBytes(bytes);
return bytes;
});
}
public static ULID ulid(Supplier<Long> timestampProvider,
IntFunction<byte[]> randomnessProvider) {
return new ULID(timestampProvider.get(), randomnessProvider.apply(RANDOMNESS_BYTE_LEN));
}
复制代码
The default is to ThreadLocalRandom
generate random numbers. If it is the JDK17
above, you can also choose a new PRNG
implementation with higher performance. The corresponding interface is RandomGenerator
, and the default implementation is L32X64MixRandom
. Write a main
method to run it:
public static void main(String[] args) {
System.out.println(ULID.ulid());
}
// 某次执行结果
01GFGGMBFGB5WKXBN7S84ATRDG
复制代码
Finally, to implement the "monotonically increasing" ULID
construction, first provide a "growth" method:
/**
* The least significant 64 bits increase overflow, 0xffffffffffffffffL + 1
*/
private static final long OVERFLOW = 0x0000000000000000L;
public ULID increment() {
long msb = this.msb;
long lsb = this.lsb + 1;
if (lsb == OVERFLOW) {
msb += 1;
}
return new ULID(msb, lsb);
}
复制代码
In fact, it is the low-order addition 1
, and the high-order addition after overflow 1
. Then add a static inner subclass and the response method as follows:
// 构造函数
public ULID(ULID other) {
this.msb = other.msb;
this.lsb = other.lsb;
}
public static byte[] defaultRandomBytes(int len) {
byte[] bytes = new byte[len];
ThreadLocalRandom.current().nextBytes(bytes);
return bytes;
}
public static MonotonicULIDSpi monotonicUlid() {
return monotonicUlid(System::currentTimeMillis, ULID::defaultRandomBytes);
}
public static MonotonicULIDSpi monotonicUlid(Supplier<Long> timestampProvider,
IntFunction<byte[]> randomnessProvider) {
return new MonotonicULID(timestampProvider, randomnessProvider, timestampProvider.get(),
randomnessProvider.apply(RANDOMNESS_BYTE_LEN));
}
// @SPI MonotonicULID
public interface MonotonicULIDSpi {
ULID next();
}
private static class MonotonicULID extends ULID implements MonotonicULIDSpi {
@Serial
private static final long serialVersionUID = -9158161806889605101L;
private volatile ULID lastULID;
private final Supplier<Long> timestampProvider;
private final IntFunction<byte[]> randomnessProvider;
public MonotonicULID(Supplier<Long> timestampProvider,
IntFunction<byte[]> randomnessProvider,
long timestamp,
byte[] randomness) {
super(timestamp, randomness);
this.timestampProvider = timestampProvider;
this.randomnessProvider = randomnessProvider;
this.lastULID = new ULID(timestamp, randomness);
}
// 这里没设计好,子类缓存了上一个节点,需要重写一下increment方法,父类可以移除此方法
@Override
public ULID increment() {
long newMsb = lastULID.msb;
long newLsb = lastULID.lsb + 1;
if (newLsb == OVERFLOW) {
newMsb += 1;
}
return new ULID(newMsb, newLsb);
}
@Override
public synchronized ULID next() {
long lastTimestamp = lastULID.getTimestamp();
long timestamp = getTimestamp();
// 这里做了一个恒为true的判断,后面再研读其他代码进行修改
if (lastTimestamp >= timestamp || timestamp - lastTimestamp <= 1000) {
this.lastULID = this.increment();
} else {
this.lastULID = new ULID(timestampProvider.get(), randomnessProvider.apply(RANDOMNESS_BYTE_LEN));
}
return new ULID(this.lastULID);
}
}
复制代码
Regarding the judgment of the timestamp between the previous ULID
and the next ULID
, the detailed implementation is not seen from the specification file. First, simply make a permanent true
branch judgment, and then further study and modify it later. It is used as follows:
public static void main(String[] args) {
MonotonicULIDSpi spi = ULID.monotonicUlid();
System.out.println(spi.next());
System.out.println(spi.next());
System.out.println(spi.next());
System.out.println(spi.next());
}
// 某次运行输出
01GFGASXXQXD5ZJ26PKSCFGNPF
01GFGASXXQXD5ZJ26PKSCFGNPG
01GFGASXXQXD5ZJ26PKSCFGNPH
01GFGASXXQXD5ZJ26PKSCFGNPJ
复制代码
In order to be more flexible here, instead of using a global static property to cache the previous ULID
instance, it is simply implemented by inheritance.
ULID performance evaluation
The introduction JMH
did a simple performance test, the code is as follows:
@Fork(1)
@Threads(10)
@State(Scope.Benchmark)
@BenchmarkMode(Mode.Throughput)
@Warmup(iterations = 1, time = 1)
@Measurement(iterations = 5, time = 3)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class BenchmarkRunner {
private static ULID.MonotonicULIDSpi SPI;
@Setup
public void setup() {
SPI = ULID.monotonicUlid();
}
@Benchmark
public UUID createUUID() {
return UUID.randomUUID();
}
@Benchmark
public String createUUIDToString() {
return UUID.randomUUID().toString();
}
@Benchmark
public ULID createULID() {
return ULID.ulid();
}
@Benchmark
public String createULIDToString() {
return ULID.ulid().toString();
}
@Benchmark
public ULID createMonotonicULID() {
return SPI.next();
}
@Benchmark
public String createMonotonicULIDToString() {
return SPI.next().toString();
}
public static void main(String[] args) throws Exception {
new Runner(new OptionsBuilder().build()).run();
}
}
复制代码
A test report is as follows (development environment Intel 6700K 4C8T 32G
, use OpenJDK-19
):
Benchmark Mode Cnt Score Error Units
BenchmarkRunner.createMonotonicULID thrpt 5 20335.118 ± 1656.772 ops/ms
BenchmarkRunner.createMonotonicULIDToString thrpt 5 13091.975 ± 1207.403 ops/ms
BenchmarkRunner.createULID thrpt 5 152574.703 ± 23740.021 ops/ms
BenchmarkRunner.createULIDToString thrpt 5 51559.800 ± 3608.085 ops/ms
BenchmarkRunner.createUUID thrpt 5 819.890 ± 15.508 ops/ms
BenchmarkRunner.createUUIDToString thrpt 5 786.072 ± 44.770 ops/ms
复制代码
summary
This article interprets the specifications, and ULID
implements them through specifications and reference to existing class libraries . It is suitable for some scenarios where "sort ID" is generated or requires "monotone ID" generation. It can be considered for database key design, sequence number design and other scenarios. In terms of implementation, it will perform better (especially monotonic , since there is no need to re-fetch the random number part, the throughput will be improved by an order of magnitude).ULID
Java
ULID
UUID
ULID
Demo
Project repository:
framework-mesh/ulid4j
:https://github.com/zjcscut/framework-mesh/tree/master/ulid4j
References: