Hardcore - Evolution and Thinking of Java Random Number-related APIs (Part 1)

In this series, the random number API before Java 17 and the unified API after Java 17 have been described in detail, and the characteristics and implementation ideas of random numbers have also been briefly analyzed to help you understand why there are so many Random number algorithms, and what are their design ideas.

This series will be divided into two parts. The first part describes the evolution of Java random number algorithm and the underlying principles and considerations. Then it introduces the random algorithm API and test performance before Java 17. The second part analyzes the random number generation after Java 17 in detail. algorithm, API and underlying implementation classes, their properties, performance and usage scenarios, how to choose random algorithms, etc., and the application of Java's random numbers to some future features of Java.

This is the first one .

How to generate random numbers

When we generally use random number generators, we think that the random number generator (Pseudo Random Number Generator, PRNG) is a black box:

image

The output of this black box is generally a number. Assume an int number. This result can be converted into various types we want , for example: if what we want is actually a long, then we can take it twice, one of the results is used as the upper 32 bits, and the other result is used as the lower 32 bits, Form a long (similar to boolean, byte, short, char, etc., take it once, and take some of them as the result). If what we want is a floating-point number, then we can combine multiple random ints according to the IEEE standard and then take some of the bits and combine them into the integer and decimal bits of the floating-point number.

If you want to limit the range , the easiest way is to implement the result with the remainder + offset. For example, if we want to take the range between 1 and 100, then we first take the result of the remainder of 99, then take the absolute value, and then +1. Of course, since the remainder operation is a relatively expensive operation, the simplest optimization is to check the AND operation of this number N and N-1. If it is equal to 0, that is, the book is 2 to the nth power (2 to the nth power The binary representation must be 100000, after subtracting 1, it is 011111, and the sum must be 0); taking the remainder to the nth power of 2 is equivalent to subtracting one from the nth power of 2 and taking the AND operation. This is a simple optimization, the actual optimization is much more complicated than this.

When initializing the black box, a SEED is generally used for initialization . The sources of this SEED may be various. Let's not list them first. Let's first look at some algorithms in this black box.

image

Linear Congruence Algorithm

The first is the most common random number algorithm: Linear Congruential Generator. That is, multiply a coefficient A according to the current Seed, then add an offset B, and finally take the remainder according to C (limit the whole within a certain range, so that the appropriate A and B can be selected, why do this will be discussed later) , get a random number, and then this random number is used as the seed for the next random, namely:

X(n+1) = ( A * X(n) + B ) % C

The advantage of this algorithm is that it is simple to implement and has relatively good performance. The values ​​of A and B must be calculated carefully, so that all numbers in the range of C are equally likely to appear . For example, an extreme example is A = 2, B = 2, C = 10, then 1, 3, 5, 7, 9 these odd numbers cannot appear in the following. In order to be able to calculate a suitable A and B, C should be limited to a relatively controllable range. Generally, for computational efficiency, C is limited to 2 to the nth power. This way the remainder operation can be optimized to an AND operation. Fortunately, the math masters have already found these values ​​(that is, magic numbers) , so we can just use them directly.

The random sequence generated by this algorithm is deterministic. For example, the next X is Y, and the next Y is Z, which can be understood as a definite loop. image.

The size of this ring, the Period . Since the Period is large enough, the initial SEED is generally different each time, which is approximately random. However, assuming that we need multiple random number generators, it is more troublesome, because although we can guarantee that the initial SEED of each random generator is different, under this algorithm, we cannot guarantee a random number generator. The initial SEED of is the next (or within a very short step) SEED of the initial SEED of another random number generator. For example, suppose the initial SEED of one random number generator is X and the other is Z. Although X and Z may seem very different, they are only separated by a Y in the random sequence of this algorithm. Such different random number generators do not work well .

So how can we ensure that the interval between different random number generators is relatively large ? That is, we can directly make the initial SEED of another random number generator and the initial SEED of the current one, separated by a relatively large number, by simple calculation (instead of calculating 100w times to adjust the random number after 100w times). This property is called jumpability . The Xoshiro algorithm based on the linear feedback shift register algorithm provides us with a jumpable random number algorithm .

Linear Feedback Shift Register Algorithm

Linear feedback shift register (LFSR) refers to a shift register that, given the output of the previous state, reuses a linear function of that output as the input. The XOR operation is the most common single-bit linear function: XOR some bits of the register as input, and then shift each bit in the register as a whole.

However, how to choose these Bits is a matter of knowledge. Currently, the more common implementations are the XorShift algorithm and the related algorithms of Xoshiro that are further optimized on this basis. The Xoshiro algorithm is a relatively new optimized random number algorithm with simple calculation and excellent performance. At the same time, jumpability is achieved.

This algorithm is skippable . Suppose we want to generate two random number generators with a large gap, we can use a random initial SEED to create a random number generator, and then use the algorithm's jump operation to directly generate a SEED with a large gap as another random number generator the initial SEED of the device.

image

Another interesting point is that the linear congruence algorithm is not invertible , we can only infer X(n + 1) from X(n), but cannot directly infer X(n) from X(n + 1). The business corresponding to this operation, such as playing the playlist randomly, the previous song and the next song, we do not need to record the entire playlist, but can only know it based on the current random number. The linear feedback shift register algorithm is reversible .

The linear feedback shift register algorithm also has limitations in generating different random sequence generators, that is, they still come from the same ring, even if the different random number generators are separated by jumping operations, but if the pressure is not balanced enough, it will follow. Over time, they may still be SEED and become the same again. So is there any random algorithm that can generate rings of different random sequences ?

DotMix algorithm

The DotMix algorithm provides another idea, that is, given an initial SEED, set a fixed step size M, each random time, add the SEED to the step size M, and pass a HASH function to map this value hash to a HASH value:

X(n+1) = HASH(X(n) + M)

This algorithm has relatively high requirements for the HASH algorithm, and the key requirement is that a little change in the input of the HASH algorithm will cause a large change in the output. The SplitMix algorithm based on the DotMix algorithm uses the MurMurHash3 algorithm, which is SplittableRandomthe .

The good thing about this algorithm is that we can easily make it clear that two random generators with different parameters have different generation sequences . For example, one generated random sequence is 1, 4, 3, 7, ... The other generated is 1, 5, 3, 2. This is exactly what the linear congruence algorithm can't do. No matter how SEED is modified, its sequence is determined, and we can't change the values ​​of A, B, and C in the algorithm at will, because it may lead to it being impossible to traverse all the numbers. , this has been said before. The same goes for Xoshiro. But don't worry about SplitMix algorithm, we can ensure that the generated sequence is different by specifying different SEED and different step size M. This property of generating different sequences is called divisibility

image

This is SplittableRandomalso Randomwhy Random is more suitable for multithreading than (Random is based on linear congruence):

  • Assuming that multiple threads use the same one Random, the randomness of the sequence is guaranteed, but there is a performance penalty for the new seed of CompareAndSet.
  • Assuming that each thread uses the same SEED Random, each thread generates the same random sequence.
  • It is assumed that each thread uses a different SEED Random, but we cannot Randomguarantee whether the RandomSEED of one is the next result of the other SEED (or the result within a very short step), in this case if the thread pressure is not uniform (the thread pool In relatively idle time, only some threads are actually working, these threads are likely to have their private Random come to the same SEED position as other threads), and some threads will also have the same random sequence.

Use SplittableRandomAs long as you use the interface directly, you can assign a parameter with different parameters splitto different threads , and different parameters can basically guarantee that the same sequence cannot be generated.SplittableRandom

Think: How do we generate random sequences whose Period is greater than the capacity of generating numbers?

In the simplest way, we merge two sequences with Period equal to capacity together by polling, so that we get Period = 容量 + 容量the sequence of :

image

We can also directly record the results of the two sequences, and then combine the results of the two sequences with some operation, such as XOR or hash operation. Thus, Period = 容量 * 容量.

If we want to expand more, we can splicing through the above methods. By splicing a sequence of different algorithms with certain operations, we can get the random advantage of each algorithm. The LXM algorithm introduced in Java 17 is an example.

LXM algorithm

This is an algorithm introduced in Java 17. The implementation of the LXM algorithm (L is linear congruence, X is Xoshiro, M is MurMurHash) is relatively simple, combining linear congruence algorithm and Xoshiro algorithm, and then hashing through MurMurHash, for example:

  • L34X64M: Even if one 32-bit number holds the result of linear congruence, two 32-bit numbers hold the result of the Xoshiro algorithm, use MurMurHash to combine these results into a 64-bit number.
  • L128X256M: Even if two 64-bit numbers are used to hold the linear congruence result, four 64-bit numbers hold the result of the Xoshiro algorithm, and the MurMurHash hash is used to combine these results into one 64-bit number.

The LXM algorithm achieves segmentation through MurMurhash without retaining Xoshiro's jumping.

Source of SEED

Since all random algorithms in the JDK are based on the last input, if we use a fixed SEED then the generated random sequence must be the same . This is not suitable for security-sensitive scenarios. cryptographically secureThe definition of SEED is that SEED must be unpredictable and produce non-deterministic output.

In Linux, system operation data such as user input and system interruption are collected, and random seeds are generated and put into the pool. The program can read this pool to obtain a random number. But this pool is only generated after collecting certain data, its size is limited, and its random distribution is definitely not good enough, so we can't use it directly to make random numbers, but use it as the seed of our random number generator. This pool is abstracted into two files in Linux, these two files are: /dev/randomand /dev/urandom. One is that data with a certain amount of entropy must be collected before it is released from the pool, otherwise it will be blocked, and the other is to return the existing data directly regardless of whether it is collected enough.

Before Linux 4.8:

image

After Linux 4.8:

image

When the entropy pool is not enough, it file:/dev/randomwill block , but it file:/dev/urandomwill not . For us, it is /dev/urandomgenerally enough, so generally use urandom to reduce blocking by -Djava.security.egd=file:/dev/./urandomsetting the JVM startup parameters .

We can also reset the SEED of all Random regularly to further increase the difficulty of being cracked through some features in the business. For example, the number of active users in the past hour * the number of orders placed every hour is used as a new SEED.

Testing Random Algorithms for Randomness

The above algorithms are all pseudo-random, that is, the current random number result is strongly correlated with the previous one . In fact, basically all the fast random algorithms at present are like this .

And even if we make SEED secret enough, if we know the algorithm, we can still infer the next random output from the current random output. Or the algorithm is unknown, but the algorithm can be deduced from several random results to deduce the subsequent results.

For this pseudo-random algorithm, it is necessary to verify that the random number generated by the algorithm satisfies some characteristics, such as:

  • The period is as long as possible : a full cycle or period means that the random sequence traverses all possible random results, and the results return to the number of results required by the initial seed. This period should be as long as possible.
  • Equidistribution , for each possible result of the random number generated, within a Period, it is necessary to ensure that the number of occurrences of each result is the same as possible. Otherwise, it will affect the use in some businesses, such as lottery business, we need to ensure that the probability is accurate.
  • Complexity test : Whether the generated random sequence is complex enough that there will be no regular sequence of numbers, such as arithmetic sequence, arithmetic sequence, etc.
  • Security test : It is difficult to deduce this random algorithm with relatively few results.

At present, there are many framework tools used to test the random sequence generated by an algorithm, evaluate the results of the random sequence, and verify the randomness of the algorithm. Commonly used ones include:

The built-in random algorithms in Java basically passed most of the tests of testU01 . At present, the optimization algorithms mentioned above all expose some randomness problems more or less. Currently, the LXM algorithm in Java 17 is the best performing randomness test . Note the randomness performance, not the performance .

All random algorithms involved in Java (excluding SecureRandom)

image

Why we rarely consider random security in practical business applications

The main reason is that we generally do load balancing multi-instance deployment and multi-threading. Typically each thread uses a different initial SEED instance of Random (eg ThreadLocalRandom). And a random sensitive business, such as a lottery, generally has a limited number of times for a single user, so it is difficult to collect enough results to deduce the algorithm and the next result, and you also need to draw with other users. Then, we generally limit the range of random numbers instead of using the original random numbers, which greatly increases the difficulty of inverse solutions. Finally, we can also use some real-time indicators of the business to set our SEED regularly, for example, use the past hour (number of active users * number of orders) as a new SEED every hour.

Therefore, in general real business, we rarely use SecureRandom. If we want the initial SEED not to be guessed by the programmer (and the timestamp can also be guessed), the initial SEED source of the random class can be specified via the JVM parameter -Djava.util.secureRandomSeed=true. This is valid for all random number generators in Java (eg, Random, SplittableRandom, ThreadLocalRandom, etc.)

Corresponding source code:

static {
        String sec = VM.getSavedProperty("java.util.secureRandomSeed");
        if (Boolean.parseBoolean(sec)) {
            //初始 SEED 从 SecureRandom 中取
            // SecureRandom 的 SEED 源,在 Linux 中即我们前面提到的环境变量 java.security.egd 指定的 /dev/random 或者 /dev/urandom
            byte[] seedBytes = java.security.SecureRandom.getSeed(8);
            long s = (long)seedBytes[0] & 0xffL;
            for (int i = 1; i < 8; ++i)
                s = (s << 8) | ((long)seedBytes[i] & 0xffL);
            seeder.set(s);
        }
    }

Therefore, for our business, we generally only care about the performance of the algorithm and the averageness of randomness , and the algorithm that has passed the test generally has no major problems with randomness, so we only care about performance .

For security-sensitive services, such as SSL encryption, to generate cryptographic random hashes, higher security randomness needs to be considered. At this time, consider using SecureRandom. In the implementation of SecureRandom, the random algorithm is more complex and involves some encryption ideas. We will not focus on these Secure Random algorithms here .

How to generate random numbers and the corresponding random algorithm before Java 17

First release the correspondence between the algorithm and the implementation class:

image

Using the JDK's API

1. Using java.util.Randomand API based on it :

Random random = new Random();
random.nextInt();

Math.random()The bottom layer is also based on Random

java.lang.Math

public static double random() {
    return RandomNumberGeneratorHolder.randomNumberGenerator.nextDouble();
}
private static final class RandomNumberGeneratorHolder {
    static final Random randomNumberGenerator = new Random();
}

RandomIt is inherently thread-safe by design, because the SEED is Atomic and just CAS updates this SEED randomly:

java.util.Random

protected int next(int bits) {
    long oldseed, nextseed;
    AtomicLong seed = this.seed;
    do {
        oldseed = seed.get();
        nextseed = (oldseed * multiplier + addend) & mask;
    } while (!seed.compareAndSet(oldseed, nextseed));
    return (int)(nextseed >>> (48 - bits));
}

At the same time, it can be seen that Random is based on the linear congruence algorithm.

2. Use java.util.SplittableRandomand API based on it

SplittableRandom splittableRandom = new SplittableRandom();
splittableRandom.nextInt();

In the previous analysis, we mentioned that SplittableRandom is implemented based on the SplitMix algorithm , that is, given an initial SEED, set a fixed step size M, each random time, add this SEED to the step size M, and go through a HASH function (here is MurMurHash3), Hash-map this value to a HASH value.

SplittableRandom Not thread-safe by itself : java.util.SplittableRandom:

public int nextInt() {
    return mix32(nextSeed());
}   
private long nextSeed() {
    //这里非线程安全
    return seed += gamma;
}

ThreadLocalRandomBased on the SplittableRandomimplementation , we use in a multithreaded environment ThreadLocalRandom:

ThreadLocalRandom.current().nextInt();

SplittableRandomA new parameter can be returned by the split method, and the random sequence characteristics are very different SplittableRandom. We can use them for different threads to generate random numbers, which is very common in parallel Stream:

IntStream.range(0, 1000)
    .parallel()
    .map(index -> usersService.getUsersByGood(index))
    .map(users -> users.get(splittableRandom.split().nextInt(users.size())))
    .collect(Collectors.toList());

However, due to the lack of alignment padding and other multi-threaded performance optimizations , its performance in a multi-threaded environment is still worse than that based SplittableRandomon ThreadLocalRandom.

3. Use java.security.SecureRandomto generate more secure random numbers

SecureRandom drbg = SecureRandom.getInstance("DRBG");
drbg.nextInt();

Generally, this algorithm is implemented based on an encryption algorithm, and the calculation is more complicated and the performance is relatively poor. Only services that are very sensitive to security will be used, and general services (such as lottery) will not be used.

Test performance

Single thread test:

Benchmark                                      Mode  Cnt          Score          Error  Units
TestRandom.testDRBGSecureRandomInt            thrpt   50     940907.223 ±    11505.342  ops/s
TestRandom.testDRBGSecureRandomIntWithBound   thrpt   50     992789.814 ±    71312.127  ops/s
TestRandom.testRandomInt                      thrpt   50  106491372.544 ±  8881505.674  ops/s
TestRandom.testRandomIntWithBound             thrpt   50   99009878.690 ±  9411874.862  ops/s
TestRandom.testSplittableRandomInt            thrpt   50  295631145.320 ± 82211818.950  ops/s
TestRandom.testSplittableRandomIntWithBound   thrpt   50  190550282.857 ± 17108994.427  ops/s
TestRandom.testThreadLocalRandomInt           thrpt   50  264264886.637 ± 67311258.237  ops/s
TestRandom.testThreadLocalRandomIntWithBound  thrpt   50  162884175.411 ± 12127863.560  ops/s

Multithreaded test:

Benchmark                                      Mode  Cnt          Score           Error  Units
TestRandom.testDRBGSecureRandomInt            thrpt   50    2492896.096 ±     19410.632  ops/s
TestRandom.testDRBGSecureRandomIntWithBound   thrpt   50    2478206.361 ±    111106.563  ops/s
TestRandom.testRandomInt                      thrpt   50  345345082.968 ±  21717020.450  ops/s
TestRandom.testRandomIntWithBound             thrpt   50  300777199.608 ±  17577234.117  ops/s
TestRandom.testSplittableRandomInt            thrpt   50  465579146.155 ±  25901118.711  ops/s
TestRandom.testSplittableRandomIntWithBound   thrpt   50  344833166.641 ±  30676425.124  ops/s
TestRandom.testThreadLocalRandomInt           thrpt   50  647483039.493 ± 120906932.951  ops/s
TestRandom.testThreadLocalRandomIntWithBound  thrpt   50  467680021.387 ±  82625535.510  ops/s

The results are basically the same as what we expected earlier, with ThreadLocalRandomthe . In a single- threaded environment, SplittableRandomand ThreadLocalRandomare basically close, and the performance is better than others. SecureRandomThe performance is hundreds of times worse than the others.

The test code is as follows ( note that although Random and SecureRandom are both thread-safe, ThreadLocal is still used in order to avoid too much performance degradation caused by compareAndSet .):

package prng;

import java.security.NoSuchAlgorithmException;
import java.security.SecureRandom;
import java.util.Random;
import java.util.SplittableRandom;
import java.util.concurrent.ThreadLocalRandom;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Threads;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

//测试指标为吞吐量
@BenchmarkMode(Mode.Throughput)
//需要预热,排除 jit 即时编译以及 JVM 采集各种指标带来的影响,由于我们单次循环很多次,所以预热一次就行
@Warmup(iterations = 1)
//线程个数
@Threads(10)
@Fork(1)
//测试次数,我们测试50次
@Measurement(iterations = 50)
//定义了一个类实例的生命周期,所有测试线程共享一个实例
@State(value = Scope.Benchmark)
public class TestRandom {
	ThreadLocal<Random> random = ThreadLocal.withInitial(Random::new);
	ThreadLocal<SplittableRandom> splittableRandom = ThreadLocal.withInitial(SplittableRandom::new);
	ThreadLocal<SecureRandom> drbg = ThreadLocal.withInitial(() -> {
		try {
			return SecureRandom.getInstance("DRBG");
		}
		catch (NoSuchAlgorithmException e) {
			throw new IllegalArgumentException(e);
		}
	});

	@Benchmark
	public void testRandomInt(Blackhole blackhole) throws Exception {
		blackhole.consume(random.get().nextInt());
	}

	@Benchmark
	public void testRandomIntWithBound(Blackhole blackhole) throws Exception {
		//注意不取 2^n 这种数字,因为这种数字一般不会作为实际应用的范围,但是底层针对这种数字有优化
		blackhole.consume(random.get().nextInt(1, 100));
	}

	@Benchmark
	public void testSplittableRandomInt(Blackhole blackhole) throws Exception {
		blackhole.consume(splittableRandom.get().nextInt());
	}

	@Benchmark
	public void testSplittableRandomIntWithBound(Blackhole blackhole) throws Exception {
		//注意不取 2^n 这种数字,因为这种数字一般不会作为实际应用的范围,但是底层针对这种数字有优化
		blackhole.consume(splittableRandom.get().nextInt(1, 100));
	}

	@Benchmark
	public void testThreadLocalRandomInt(Blackhole blackhole) throws Exception {
		blackhole.consume(ThreadLocalRandom.current().nextInt());
	}

	@Benchmark
	public void testThreadLocalRandomIntWithBound(Blackhole blackhole) throws Exception {
		//注意不取 2^n 这种数字,因为这种数字一般不会作为实际应用的范围,但是底层针对这种数字有优化
		blackhole.consume(ThreadLocalRandom.current().nextInt(1, 100));
	}

	@Benchmark
	public void testDRBGSecureRandomInt(Blackhole blackhole) {
		blackhole.consume(drbg.get().nextInt());
	}

	@Benchmark
	public void testDRBGSecureRandomIntWithBound(Blackhole blackhole) {
		//注意不取 2^n 这种数字,因为这种数字一般不会作为实际应用的范围,但是底层针对这种数字有优化
		blackhole.consume(drbg.get().nextInt(1, 100));
	}

	public static void main(String[] args) throws RunnerException {
		Options opt = new OptionsBuilder().include(TestRandom.class.getSimpleName()).build();
		new Runner(opt).run();
	}
}


Search "My Programming Meow" on WeChat, follow the official account, brush it every day, easily improve your technology, and gain various offers :

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324203127&siteId=291194637