Java concurrent programming principle 1 (atomicity, visible row, orderliness, volatile, synchronized)

1. Atomicity:

1.1 How to achieve thread safety in Java?

Problems with shared data in multithreaded operations.
Lock:

Pessimistic lock: synchronized, lock
Optimistic lock: CAS

According to the business situation, you can choose ThreadLocal to let each thread play with its own data.

1.2 CAS underlying implementation

From the perspective of Java, CAS can see native methods at most at the Java level.
You will know compare and swap:

First compare whether the value is consistent with the expected value, if it is consistent, exchange and return true
First compare whether the value is consistent with the expected value, if not, do not exchange, return false

You can go to the CAS operation provided in the Unsafe class

Four parameters: which object, which attribute's memory offset, oldValue, newValue

native is to directly call the method in the native dependency library C++.

https://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/69087d08d473/src/share/vm/prims/unsafe.cpp

https://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/69087d08d473/src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp

At the bottom of CAS, if it is a multi-core operating system, a lock instruction needs to be added

Single-core does not need to be added, because cmpxchg is a one-line instruction and cannot be split anymore

Seeing that cmpxchg is an assembly instruction, the underlying CPU hardware supports comparison and exchange (cmpxchg), and cmpxchg does not guarantee atomicity. (The operation of cmpxchg is an instruction that cannot be split)

That's why it will be judged whether the CPU is multi-core, and if it is multi-core, a lock instruction will be added.

You can understand the lock instruction as a lock at the CPU level. Generally, the granularity of the lock is a lock at the cache line level. Of course, there are also bus locks , but the cost is too high, and the CPU will choose according to the situation.

1.3 CAS

ABA: ABA doesn't have to be the problem! Because some operations that only exist in ++, –, even if there is an ABA problem, it will not affect the result!

Thread A: expect to change value from A1 - B2

Thread B: Expect value from B2 - A3

Thread C: expected to change value from A1 - C4

In terms of atomicity, thread safety cannot be guaranteed.

The solution is simple and is already provided on the Java side.

In human terms, when modifying the value, specify the version number.

The AtomicStampedReference provided under JUC can be implemented.

Too many spins:

Too many spins will take up a lot of CPU resources! a waste of resource.

Synchronized direction: After several CAS failures, suspend the thread (WAITING) to avoid taking up too many CPU resources!
LongAdder direction: Here is a solution based on a form similar to segmented locks (depending on the business, there are restrictions). The traditional AtmoicLong is to ++ for the only value in the memory. LongAdder has created many values in the memory . Each thread adds different values. When you need the result, I will add all the values and return it to you.

Atomicity is only guaranteed for one attribute: the solution, you can understand after learning AQS. ReentrantLock is implemented based on AQS, and AQS implements core functions based on CAS.

1.4 Four reference types + ThreadLocal

Four reference types:

Strong reference: User xx = new User(); xx is a strong reference, as long as the reference is still there, GC will not recycle it!
Soft reference: The object referenced by a SofeReference is a soft reference. If the memory space is insufficient, only the soft reference points to the object will be recycled. Generally used for caching .
```
SoftwareReference xx = new SoftwareReference (new User);

User user = xx.get();
```
Weak reference: The object referenced by WeakReference is generally a weak reference. As long as GC is executed, only the object pointed to by the weak reference will be recycled. You can solve the problem of memory leaks , just look at ThreadLocal
Phantom reference: The function of PhantomReference is to track the activities of the garbage collector to collect objects. During the GC process, if a PhantomReference is found, the GC will put the reference into the ReferenceQueue, which is handled by the programmer himself. When the programmer calls ReferenceQueue.pull () method, after removing the referenced ReferenceQueue, the Reference object will become Inactive, which means that the referenced object can be recycled.

Second, the visible line:

2.1 Java memory model

When processing instructions, the CPU will pull data, the priority is from L1 to L2 to L3, if there is none, it needs to be pulled from the main memory, JMM is to coordinate between the CPU and the main memory to ensure visibility and effective sequential operations.

Not the memory structure of the JVM, not a thing! ! ! ! (Java Memory Model)

,
CPU core, is the CPU core (register)

The cache is the cache of the CPU, and the cache of the CPU is divided into L1 (thread exclusive), L2 (core exclusive), L3 (multi-core sharing)

JMM is the core of the Java memory model, and visibility and order are based on this implementation.

The main memory JVM is your heap memory.

2.2 Ways to ensure visibility

What is visibility: Visibility refers to whether changes in variables are visible between threads

At the Java level, there are many ways to ensure visibility:

volatile , using the volatile basic data type can ensure that every time the CPU operates data, it directly reads and writes to the main memory.
Synchronized, the memory semantics of synchronized can ensure that after the lock is acquired, the previously operated data can be guaranteed to be visible.
lock (CAS-volatile), can also ensure that after CAS or operation of volatile variables, it can ensure that the previously operated data is visible.
final, it is a constant and cannot be moved~~

2.3 volatile modified reference data type

Let me talk about the results first. First of all, volatile modification of the reference data type can only ensure that the address of the reference data type is visible, and does not guarantee that the internal properties are visible.

But, this conclusion can only be realized in hotspot. If you change to a different version of the virtual machine, the effect may be different. volatile modifies the reference data type, the JVM has not standardized this kind of operation at all, and different virtual machine vendors can implement it by themselves.

2.4 With the MESI protocol, why is there still volatile?

MESI is a protocol for CPU cache coherence, and most CPU manufacturers have achieved the effect of cache coherence based on MESI.

The CPU already has the MESI protocol, isn't volatile a bit redundant! ?

First of all, the two brothers do not conflict, one is the consistency at the CPU hardware level, and the other is the consistency at the JMM level in Java.

The MESI protocol has a fixed mechanism, no matter whether you declare volatile or not, it will ensure the consistency (visibility) of the cache based on this mechanism. At the same time, it must be clear that if there is no MESI protocol, volatile will have some problems, but there are other solutions (bus lock, the time cost is too high, if the bus is locked, only one CPU core is working).

MESI is a protocol, a plan, and an interface, and it needs to be implemented by CPU manufacturers.

Since the CPU has MESI, why is it still volatile? Naturally, there is a problem with the MESI protocol. MESI guarantees the visibility between the exclusive caches of multi-core CPUs, but the CPU does not mean that the data in the registers must be written directly to L1, because in most CPUs with x86 architecture, there is a store buffer between the registers and L1 , the register value may fall into the store buffer, but not into L1, which will lead to cache inconsistency. In addition to CPUs with x86 architecture, in arm and power CPUs, there are also load buffers and invalid queues that will more or less affect cache consistency!

The MESI protocol and volatile do not conflict, because MESI is at the CPU level, and many CPU manufacturers have different implementations, and some details in the CPU architecture will also affect it. For example, the Store Buffer will affect the writing of registers to the L1 cache, resulting in cache inconsistency. The bottom layer of volatile generates an assembly lock instruction. This instruction will force to write to the main memory, and can ignore the Store Buffer cache to achieve the purpose of visibility, and will use the MESI protocol to invalidate other cache lines. *

2.5 Underlying implementation of volatile visibility

The bottom layer of volatile generates an assembly lock instruction. This instruction will force to write to the main memory, and can ignore the Store Buffer cache to achieve the purpose of visibility, and will use the MESI protocol to invalidate other cache lines.

3. Orderly high-frequency problems:

3.1 What is the order problem

There is such a problem in the lazy mechanism in the singleton mode.

In order to ensure thread safety, lazy people generally use DCL.

But with DCL alone, there is still a chance of problems.

The thread may get half-initialized objects to operate, and NullPointException is very likely to occur.

(Initialize the three parts of the object, open up space, initialize internal properties, and pointers to references)

When compiling .java into .class in Java, optimization will be done based on JIT, and the order of instructions will be adjusted to improve execution efficiency.

At the CPU level, some executions are also reordered to improve execution efficiency.

The adjustment of this instruction will cause problems in some special operations.

3.2 The underlying implementation of volatile order

Attributes modified by volatile will add memory barriers before and after compilation during compilation .

SS: Read and write operations before the barrier must be completed before performing subsequent operations

SL: All write operations before the barrier must be completed before performing subsequent read operations

LL: Read operations before the barrier must be completed before performing subsequent read operations

LS: Read operations before the barrier must be completed before performing subsequent write operations

This memory barrier is specified by the JDK, and the purpose is to ensure that the volatile modified attributes will not have the problem of instruction rearrangement.

Volatile is at the JMM level, it is understandable to ensure that JIT does not rearrange, but how does the CPU implement it.

Check out this document: https://gee.cs.oswego.edu/dl/jmm/cookbook.html

Different CPUs have certain support for memory barriers. For example, the x86 architecture has implemented LS, LL, and SS internally, and only supports SL.

Go to openJDK to check again how mfence supports it. In fact, the bottom layer is still specified by the lock inside mfence to solve the problem of instruction rearrangement.

四、synchronized：

4.1 The process of synchronized lock upgrade

A lock is an object, any one is fine, all objects in Java are locks.

No lock (anonymous bias), bias lock, lightweight lock, heavyweight lock

Lock-free (anonymous bias) : Under normal circumstances, an object from new is in a lock-free state. Because there is a delay in the biased lock, there is no biased lock in the 4s of starting the JVM, but if the setting of the biased lock delay is turned off, the new object is an anonymous bias.

Biased lock : When a certain thread acquires this lock resource, it will become a biased lock at this time, and the biased lock stores the ID of the thread

When the biased lock is upgraded, it will trigger the revocation of the biased lock. The revocation of the biased lock needs to wait until a safe point. For example, during GC, the cost of revocation of the biased lock is too high, so by default, the biased lock will be delayed at the beginning.

Safe point:

GC
before the method returns
after calling a method
Abnormal position
end of loop

Lightweight lock: When there is competition among multiple threads, it is necessary to upgrade to a lightweight lock (it is possible to directly change from no lock to a lightweight lock, and it is also possible to upgrade from a biased lock to a lightweight lock), The effect of lightweight locks is to try to acquire lock resources based on CAS. Adaptive spin locks are used here. According to whether the last CAS was successful or not, the number of spins this time is determined.

Heavyweight lock: If there is a heavyweight lock, there is nothing to say. If there are threads holding the lock, other competing ones will be suspended.

4.2 Synchronized lock coarsening & lock elimination

Lock coarsening (lock expansion): (JIT optimization)

while(){
   sync(){
      // 多次的获取和释放，成本太高，优化为下面这种
   }
}
//----
sync(){
   while(){
       //  优化成这样
   }
}

Lock elimination: In a sync, there are no shared resources, and there is no lock competition. When JIT compiles, the lock instructions are directly optimized away.

4.3 The principle of synchronized mutual exclusion

Biased lock: Check the thread ID in the MarkWord in the object header to see if it is the current thread. If not, try to change it with CAS. If it is, you have obtained the lock resource.

Lightweight lock: Check whether the Lock Record pointer in the MarkWord in the object header points to the virtual machine stack of the current thread. If so, use the lock to execute the business. If it is not CAS, try to modify it. Modify it several times. If it fails, try again Upgrade to a heavyweight lock.

Heavyweight lock: Check the ObjectMonitor pointed to by the MarkWord in the object header to see if the owner is the current thread. If not, throw it into the EntryList in the ObjectMonitor, queue up, and suspend the thread, waiting to be awakened.

4.4 Why is wait a method under Object?

Executing the wait method requires holding a sync lock.
The sync lock can be any object.
At the same time, the wait method is executed to release the lock resource when the sync lock is held.
Secondly, the wait method needs to operate the ObjectMonitor, and the operation of the ObjectMonitor must be operated on the premise of holding the lock resource, and the current thread is thrown into the WaitSet waiting pool.

In the same way, the notify method needs to throw the threads in the WaitSet waiting pool to the EntryList. If you don't own ObjectMonitor, how to do it!

Class locks are based on classes. Class is used as a class lock
Object lock is a new object as an object lock