This paper finishing from "Java concurrent programming art" chapter of: HLA - A2 Peng Wei Fang Tengfei

Atoms (Atomic) intended to be "the smallest particles can not be further divided," the atomic operation (atomic operation) meaning "can not be interrupted or a series of operations." Atomic operations implemented on a multiprocessor becomes somewhat complicated. Let's chat in Intel processor and Java, is how to achieve atomic operation.

Definition of Terms

Before understanding realization of the principle of atomic operations, first find out the relevant terms:

Term Name	English	Explanation
Cache line	Cache line	Cache minimum operating unit
Compare and swap	Compare and Swap	CAS operation requires two input values, an old value (a desired value before the operation) and a new value, during the first operation of the older values are not changed, if the change does not occur, before switching to a new value, changes the no exchange.
CPU pipeline	CPU pipeline	CPU pipeline works like the industrial production assembly line, by the CPU circuit unit 5 to 6 different functions of an instruction processing pipeline, and then divided into an X86 instruction and then steps 5-6 respectively, by these circuit elements performed, so that we can achieve complete one instruction in a CPU clock cycle, thereby increasing the operation speed of the CPU
Memory conflict order	Memory order violation	Memory ordering conflict is typically caused by false sharing, false sharing refers to a plurality of different portions of the same CPU at the same time to modify a cache line caused a void in which the operation of the CPU, when the memory order of conflict, the pipeline CPU must be empty

How to achieve an atomic operation processor

32-bit IA-32 processor uses based cache way lock locking or bus to implement atomic operations between multiple processors . First, the processor will automatically ensure that substantially the atomic memory operations. To ensure that the processor reads from system memory or writes a byte atoms, meaning that when a processor reads a byte, other processors can not access the memory byte address. Pentium 6 and the latest processors can automatically ensure that a single processor 16/32/64-bit operating on the same line in the cache is atomic, but the complexity of the operation of the processor memory is not automatically guarantee their atomicity, such as cross access bus width, and across multiple cache lines across the page table. However, to provide the processor cache bus lock and locking mechanism to ensure the two complex atomic memory operations.

In the Intel 2019-year document, the section describes essentially the same, specifically examine 2957 8.1 LOCKED ATOMIC OPERATIONS the end the Intel documents

Using the bus key guarantee atomicity

The first mechanism is to ensure that a bus lock by atomic. If multiple processors simultaneously read modify write operation shared variable (i ++ is the classic read overwrite operation), the shared variables will be a plurality of processors operating simultaneously, so that the read operation is not rewritten atoms, share operation after completing values and expectations of variables inconsistent. For example, if i = 1, we perform two operations i ++, 3 is our desired result, but the result may be 2, shown in Figure 2-3.

The reason may be a plurality of processors while the variable i is read from the respective cache, respectively, are incremented, and then written to system memory, respectively. So, you want to overwrite read operation to ensure shared variables are atomic, we must ensure CPU1 read overwrite shared variable time, CPU2 can not operate the shared variable cache cache memory address. The processor uses the bus lock is to solve this problem. The so-called bus key is to use a LOCK # signal provided by the processor, this processor output when a signal on the bus, requesting other processors will be blocked live, then the processor may be exclusively shared memory.

Intel document 2959 8.1.2 Bus Locking

Using the cache lock guarantee atomicity

The second mechanism is locked by the cache to guarantee atomicity. At the same time, we need to ensure that the operation of a memory address can be atomic, but the bus lock the communication between the CPU and memory lock, which makes the lock period, other processors can not operate other memory address data, so the bus lock overhead is relatively large, the current processor bus instead of using the cache locking lock to be optimized in some cases.

Cache memory will be frequently used, the atomic operation can be performed at L1, L2 and L3 caches of the processor's internal cache in the processor directly, does not need to declare bus key, and the current may be Pentium processor 6 to realize complex atomic using the "cache locked" mode. The so-called "cache lock" refers to the area of memory if the cache line is cached in the processor, and is locked during the Lock operation, then perform the lock operation when it is written back to memory, the processor is not on the bus and say LOCK # signal, but modify the internal memory address, and allow it to cache coherency mechanism to ensure the atomic operation, because the cache coherency mechanism prevents data simultaneously modify the memory area by two or more processor cache, write back when other processors when the data is locked cache line, a cache line is invalidated will , in the example shown in Figure 2-3, when used CPU1 cache to lock the cache line modified i, then the CPU2 will not be cached while the i cache line.

But in both cases the processor does not use the cache locking:

The first situation is: the data can not be operated when the internal processor cache, or operated when data across multiple cache line (cache line), the processor calls the bus lock.
The second situation is: some processors do not support cache locking. For Intel 486 and Pentium processors, even if locked in the memory area of the processor cache line is also called bus lock. For the above two mechanisms, we offer a lot Lock prefix instructions by Intel processors. For example, bit test and modification instructions: BTS, BTR, BTC; exchange instructions XADD, CMPXCHG, and other operands and logical instructions (e.g., ADD, OR) and the like, which is operated memory area lock instruction will result, other The processor can not access it.

Intel document 2961 8.1.4 Effects of a LOCK Operation on Internal Processor Caches

How Java implementation of atomic operations

In Java can be implemented by an atomic operation and locking manner CAS cycle.

CAS cycle implemented using atomic operations

CAS operation is the use of the JVM CMPXCHG (Compare and Exchange) instructions provided by the processor implementation. The basic idea is to achieve the spin cycle CAS CAS operation until it is successful, the following code implements counter method safeCount a CAS-based thread-safe and a non-thread-safe counter count.

    private AtomicInteger atomicI = new AtomicInteger(0);
    private int i = 0;

    public static void main(String[] args) {
        final Counter cas = new Counter();
        List<Thread> ts = new ArrayList<Thread>(600);
        long start = System.currentTimeMillis();
        for (int j = 0; j < 100; j++) {
            Thread t = new Thread(new Runnable() {
                @Override
                public void run() {
                    for (int i = 0; i < 10000; i++) {
                        cas.count();
                        cas.safeCount();
                    }
                }
            });
            ts.add(t);
        }
        for (Thread t : ts) {
            t.start();
        }
        // 等待所有线程执行完成
        for (Thread t : ts) {
            try {
                t.join();
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
        System.out.println(cas.i);
        System.out.println(cas.atomicI.get());
        System.out.println(System.currentTimeMillis() - start);
    }

    /**
     * 使用CAS实现线程安全计数器
     */
    private void safeCount() {
        for (; ; ) {
            int i = atomicI.get();
            boolean suc = atomicI.compareAndSet(i, ++i);
            if (suc) {
                break;
            }
        }
    }

    /**
     * 非线程安全计数器
     */
    private void count() {
        i++;
    }
复制代码

Starting from Java 1.5, JDK concurrent package provides classes to support atomic operations, such as An AtomicBoolean is (updated atomically boolean value), of AtomicInteger (atomically updated int value) and AtomicLong (long updated atomically value). These atoms packaging also provides methods useful tools, such as to the current value of the atomic increment and decrement 1 1.

CAS achieve three major issues of atomic operations

And contracting in concurrent framework also use some spin CAS way to achieve an atomic operation, such as in a class Xfer method LinkedTransferQueue Java. CAS although very efficient solution to the atomic operations, but CAS is still three problems. ABA problem, long cycle time spending big, and can only guarantee atomic operation a shared variable.

ABA problem . Because CAS need when operating value, check the value has not changed, if not changed is updated, but if a value is A, became a B, he is a A, you will find that when using CAS inspection its value has not changed, but actually changed. Solutions ABA problem is to use a version number. In front of the additional variable version number, each time the variable update version number is incremented by 1 , then A → B → A becomes 1A → 2B → 3A. From Java 1.5, JDK's Atomic package provides a class AtomicStampedReference to address the ABA problem. CompareAndSet action of this class is the method checks the current reference is equal to the expected reference and checks whether the current flag is equal to the expected flag, if all equal Atomically update value and the set value of the flag for a given reference.

public boolean compareAndSet(
		V expectedReference, 	// 预期引用
		V newReference, 		// 更新后的引用
		int expectedStamp, 		// 预期标志
		int newStamp 			// 更新后的标志
)
复制代码

Long cycle time spending big . If the time is not successful spin CAS, it will bring a very large CPU execution cost. If the JVM can support pause instructions provided by the processor, so there will be some efficiency improvement. pause command has two effects: first, it can delay the pipelined execution of instructions (de-pipeline), so that the CPU does not consume too many resources to perform, the delay time of implementation dependent, in some processor time delay it is zero; second, it can avoid the memory conflict sequence (memory order Violation) caused by CPU pipeline is cleared (CPU pipeline Flush), in order to improve the efficiency of the CPU when exiting the loop.
Atomic operation can only guarantee a shared variable . When performing operations on a shared variable, we can use the CAS cycle approach to ensure an atomic operation, but when multiple shared variables operating cycle CAS can not guarantee atomic operations, this time you can use the lock. There is also a tricky way, is to merge multiple shared variables into a shared variable to operate. For example, there are two shared variables i = 2, j = a, merge at ij = 2a, and to operate with CAS ij. Starting from Java 1.5, JDK classes AtomicReference provided to ensure the reference atom between objects can be variable in the plurality of objects in a CAS operation is performed.

A lock mechanism to achieve atomic operations

Lock mechanism ensures that only the memory area of the thread to acquire a lock to be able to operate the lock. JVM internals to a variety of locking mechanism, biased lock, lock and lightweight mutex. It is interesting that in addition to biased locking, JVM achieve locking methods are used cycle CAS, that is, when a thread wants to enter the synchronized block when using a loop CAS way to get a lock when it exits sync block used when cycling CAS release the lock.

Interview essential: Java realization of the principle of atomic operations [fine long article]