Are you still using Random to generate random numbers under high concurrency?

Are you still using Random to generate random numbers under high concurrency?

foreword

Generating random numbers in code is a very common function, and JDK has provided a ready-made Random class to implement it, and the Random class is thread-safe.

Here's an implementation of Random.next() that generates a random integer:

    protected int next(int bits) {
        long oldseed, nextseed;
        AtomicLong seed = this.seed;
        do {
            oldseed = seed.get();
            nextseed = (oldseed * multiplier + addend) & mask;
          //CAS 有竞争是效率低下
        } while (!seed.compareAndSet(oldseed, nextseed));
        return (int)(nextseed >>> (48 - bits));
    }

It is not difficult to see that the above method uses the CAS operation to update the seed. In the scenario of a large number of thread competitions, this CAS operation is likely to fail. If it fails, it will retry, and this retry will consume CPU operations, which greatly improves the performance. Decreased.

Therefore, although Random is thread-safe, it is not "highly concurrent".

In order to improve this problem and enhance the performance of the random number generator in a high-concurrency environment, there is ThreadLocalRandom—a powerful high-concurrency random number generator.

ThreadLocalRandom inherits from Random. According to the Liskov substitution principle, this shows that ThreadLocalRandom provides the same random number generation function as Random, but the implementation algorithm is slightly different.

Variables in Thread

In order to deal with thread competition, there is a ThreadLocal class in Java, which allocates an independent and independent storage space for each thread.

The implementation of ThreadLocal depends on the ThreadLocal.ThreadLocalMap threadLocals member field in the Thread object.

Similarly, in order to allow the random number generator to only access local thread data and avoid competition, three more members are added to Thread:

    /** The current seed for a ThreadLocalRandom */
    @sun.misc.Contended("tlr")
    long threadLocalRandomSeed;
    /** Probe hash value; nonzero if threadLocalRandomSeed initialized */
    @sun.misc.Contended("tlr")
    int threadLocalRandomProbe;
    /** Secondary seed isolated from public ThreadLocalRandom sequence */
    @sun.misc.Contended("tlr")
    int threadLocalRandomSecondarySeed;

As members of the Thread class, these three fields are naturally tightly bound to each Thread object, so they become a veritable ThreadLocal variable, and the random number generator that relies on these variables becomes ThreadLocalRandom .

Eliminate false sharing

I don’t know if you have noticed that there is an annotation @sun.misc.Contended on these variables. What is this annotation for? To understand this, you must first know about an important issue in concurrent programming - false sharing :

We know that the CPU does not directly access the memory, the data is loaded from the cache to the register, and the cache has L1, L2, L3 and other levels. Here, we first simplify these responsible hierarchical relationships, assuming that there is only a first-level cache and a main memory.

When the CPU reads and updates the cache, it is performed in units of behavior, also called a cache line. A line is generally 64 bytes, which is the length of 8 longs.

Therefore, the question arises, a cache line can hold multiple variables, if multiple threads access different variables at the same time, and these different variables happen to be located in the same cache line, what will happen?

As shown in the figure above, X and Y are two adjacent variables located in the same cache line, both CPU core1 core2 have loaded them, core1 updates X, and core2 updates Y at the same time, because the data is read and updated by Cache behavior unit, which means that when these two things happen at the same time, there is competition, causing core1 and core2 may need to refresh their own data (the cache line is updated by the other party), which leads to system performance Big discount, this is the pseudo-sharing problem.

So how to improve? As shown below:

In the figure above, we use X to occupy a cache line alone, and Y to occupy a cache line alone, so that the respective updates and reads will not have any impact.

The @sun.misc.Contended("tlr") in the above code will help us generate some padding before and after the variable at the virtual machine level, so that the marked variable is located in the same cache line and does not conflict with other variables.

In the Thread object, the member variables threadLocalRandomSeed, threadLocalRandomProbe, and threadLocalRandomSecondarySeed are marked as the same group tlr, so that these three variables are placed in a separate cache line without conflicting with other variables, thereby improving access speed in a concurrent environment .

An Efficient Alternative to Reflection

The generation of random numbers requires access to members such as threadLocalRandomSeed of Thread, but considering the encapsulation of the class, these members are visible in the package.

Unfortunately, ThreadLocalRandom is located in the java.util.concurrent package, while Thread is located in the java.lang package. Therefore, ThreadLocalRandom has no way to access Thread's threadLocalRandomSeed and other variables.

At this time, Java veterans may jump out and say: What is this, look at my reflection method, no matter what you can dig out and visit.

It is true that reflection is a method that can bypass encapsulation and directly access the internal data of the object. However, the performance of reflection is not very good, and it is not suitable as a high-performance solution.

Is there any way to allow ThreadLocalRandom to access the internal members of Thread, and at the same time have a method that is far beyond reflection and infinitely close to direct variable access? The answer is yes, this is to use the Unsafe class.

Here, a brief introduction to the two Unsafe methods used:

public native long    getLong(Object o, long offset);
public native void    putLong(Object o, long offset, long x);

Among them, the getLong() method will read a long data of the offset byte of the object o; putLong() will write x into the offset of the offset byte of the object o.

This kind of C-like operation method has brought great performance improvement. More importantly, because it avoids the field name and directly uses the offset, it can easily bypass the visibility restrictions of members.

The performance problem is solved, so the next question is, how do I know the offset position of the threadLocalRandomSeed member in the Thread, which requires the unsafe objectFieldOffset() method, please see the following code:

The above static code, when the ThreadLocalRandom class is initialized, obtains the positions of the Thread member variables threadLocalRandomSeed, threadLocalRandomProbe, and threadLocalRandomSecondarySeed in the object offset.

Therefore, as long as ThreadLocalRandom needs to use these variables, they can be accessed through unsafe getLong() and putLong() (and possibly getInt() and putInt()).

For example, when generating a random number:

    protected int next(int bits) {
        return (int)(mix64(nextSeed()) >>> (64 - bits));
    }
    final long nextSeed() {
        Thread t; long r; // read and update per-thread seed
        //在ThreadLocalRandom中,访问了Thread的threadLocalRandomSeed变量
        UNSAFE.putLong(t = Thread.currentThread(), SEED,
                       r = UNSAFE.getLong(t, SEED) + GAMMA);
        return r;
    }

How fast can this Unsafe method fall to the ground? Let's take a look at it as an experiment:

Here, we write a ThreadTest class by ourselves, using two methods of reflection and unsafe to read and write threadLocalRandomSeed member variables continuously, and compare their performance differences. The code is as follows:

In the above code, use the reflection method byReflection() and the Unsafe method byUnsafe() to read and write the threadLocalRandomSeed variable 100 million times, and the test results are as follows:

byUnsafe spend :171ms
byReflection spend :645ms

It is not difficult to see that the method of using Unsafe is far superior to the method of reflection, which is one of the reasons why Unsafe is widely used instead of reflection inside JDK.

random number seed

We know that pseudo-random number generation requires a seed, and threadLocalRandomSeed and threadLocalRandomSecondarySeed are the seeds here. Among them, threadLocalRandomSeed is long type, and threadLocalRandomSecondarySeed is int.

threadLocalRandomSeed is the most widely used large number of random numbers are actually based on threadLocalRandomSeed. And threadLocalRandomSecondarySeed is only used in some specific JDK internal implementations, and it is not widely used.

The initial seed uses the system time by default:

In the above code, the initialization of the seed is completed, and the initialized seed is stored in the position of SEED through UNSAFE (that is, threadLocalRandomSeed).

Then you can use the nextInt() method to get a random integer:

    public int nextInt() {
        return mix32(nextSeed());
    }    
    final long nextSeed() {
        Thread t; long r; // read and update per-thread seed
        UNSAFE.putLong(t = Thread.currentThread(), SEED,
                       r = UNSAFE.getLong(t, SEED) + GAMMA);
        return r;
    }

Each call to nextInt() will update threadLocalRandomSeed with nextSeed(). Since this is a thread-specific variable, there will be no competition at all, and there will be no CAS retries, and the performance will be greatly improved.

The role of the probe Probe

In addition to the seed, there is also a threadLocalRandomProbe probe variable. What is this variable used for?

We can understand threadLocalRandomProbe as a Hash value (not 0) for each Thread, which can be used as a characteristic value of a thread, and based on this value, a specific position in the array can be found for the thread.

static final int getProbe() {
    return UNSAFE.getInt(Thread.currentThread(), PROBE);
}

Let's look at a code snippet:

        CounterCell[] as; long b, s;
        if ((as = counterCells) != null ||
            !U.compareAndSwapLong(this, BASECOUNT, b = baseCount, s = b + x)) {
            CounterCell a; long v; int m;
            boolean uncontended = true;
            if (as == null || (m = as.length - 1) < 0 ||
                // 使用probe,为每个线程找到一个在数组as中的位置
                // 由于每个线程的probe值不一样,因此大概率 每个线程对应的数组中的元素也是不一样的
                // 每个线程对应了不同的元素,就可以没有冲突的进行完全的并发操作
                // 因此探针probe在这里 就起到了防止冲突的作用
                (a = as[ThreadLocalRandom.getProbe() & m]) == null ||
                !(uncontended =
                  U.compareAndSwapLong(a, CELLVALUE, v = a.value, v + x))) {

In a specific implementation, if the above code conflicts, you can also use the ThreadLocalRandom.advanceProbe() method to modify the probe value of a thread, which can further avoid possible conflicts in the future, thereby reducing competition and improving concurrency performance.

    static final int advanceProbe(int probe) {
        //根据当前探针值,计算一个更新的探针值
        probe ^= probe << 13;   // xorshift
        probe ^= probe >>> 17;
        probe ^= probe << 5;
        //更新探针值到线程对象中 即修改了threadLocalRandomProbe变量
        UNSAFE.putInt(Thread.currentThread(), PROBE, probe);
        return probe;
    }

Summarize

Today, we introduce the ThreadLocalRandom object, a high-performance random number generator for high-concurrency environments.

We not only introduced the function and internal implementation principle of ThreadLocalRandom, but also introduced how the ThreadLocalRandom object achieves high performance (such as through false sharing, Unsafe, etc.), and hope that you can flexibly apply these technologies to your own projects.

Guess you like

Origin blog.csdn.net/blueheartstone/article/details/128136884