The underlying implementation principle of Java concurrency mechanism

Context switch

Even single-core processors support multi-threaded code execution, and the CPU implements this mechanism by assigning CPU time slices to each thread. The time slice is the time allocated by the CPU to each thread. Because the time slice is very short, the CPU keeps switching thread execution, which makes us feel that multiple threads are executing at the same time. The time slice is generally tens of milliseconds (ms).

The CPU executes tasks cyclically through a time slice allocation algorithm. After the current task executes a time slice, it will switch to the next task. However, the state of the previous task will be saved before switching, so that the state of this task can be loaded again when switching back to this task next time. So the process from saving to reloading a task is a context switch.

When the accumulation operation is executed concurrently no more than a million times, the speed will be slower than the serial execution of the accumulation operation. So, why is the speed of concurrent execution slower than serial? This is because threads have the overhead of creation and context switching.

Reduce context switching

Lock-free concurrent programming. When multi-threaded competition locks, it will cause context switching, so when multi-threaded processing data, you can use some methods to avoid using locks, such as segmenting the data ID according to the Hash algorithm, and different threads process different segments of data.
CAS algorithm. Java's Atomic package uses the CAS algorithm to update data without locking.
Use minimal threads. Avoid creating unnecessary threads. For example, there are few tasks, but many threads are created for processing, which will cause a large number of threads to be in a waiting state.
Coroutine: Realize the scheduling of multiple tasks in a single thread, and maintain the switching between multiple tasks in a single thread.

CASE

Before JDK 5, the Java language relied on the synchronized keyword to ensure synchronization, which would lead to locks and thus overheads such as context switching.

Synchronized is an exclusive lock, which will cause all other threads that need to be locked to hang, waiting for the thread holding the lock to release the lock. Another more effective lock is optimistic lock. The so-called optimistic lock is to complete an operation without locking each time but assuming that there is no conflict. If it fails due to the conflict, it will retry until it succeeds.

CAS, the abbreviation of compare and swap, Chinese translation into compare and exchange.

In the early development of Java, the Java language was unable to use the convenience provided by hardware to improve the performance of the system. With the continuous development of Java and the emergence of Java Native Methods (JNI), Java programs have provided a convenient way to directly call local methods over the JVM. Therefore, there are more concurrent methods for Java.

CAS operation consists of three operands-memory location (V), expected original value (A) and new value (B). If the value of the memory location matches the expected original value, the processor will automatically update the location value to the new value. Otherwise, the processor does nothing. In either case, it will return the value of that position before the CAS instruction. (In some special cases of CAS, it will only return whether the CAS is successful, without extracting the current value.) CAS effectively states "I think position V should contain value A; if it contains this value, place B in this position; Otherwise, don't change the position, just tell me the current value of this position.

Use the CAS instruction of the CPU and JNI to complete the non-blocking algorithm of Java.

Deadlock avoidance method

Avoid that one thread acquires multiple locks at the same time.
Avoid one thread occupying multiple resources in the lock at the same time, and try to ensure that each lock only occupies one resource.
Try to use a timed lock, use lock.tryLock (timeout) instead of using the internal lock mechanism.
For database locks, locking and unlocking must be in a database connection, otherwise unlocking will fail

Bottom exploration

The Java code will become Java bytecode after compilation. The bytecode is loaded into the JVM by the class loader. The JVM executes the bytecode and finally needs to be converted into assembly instructions for execution on the CPU. The concurrency mechanism used in Java depends on Based on the implementation of JVM and CPU instructions.

volatile realization principle

Both synchronized and volatile play important roles in multi-threaded concurrent programming. Volatile is a lightweight synchronized, which guarantees the "visibility" of shared variables in multi-processor development. Visibility means that when one thread modifies a shared variable, another thread can read the modified value. If the volatile variable modifier is used appropriately, it is less expensive to use and execute than synchronized, because it will not cause thread context switching and scheduling.

When the volatile keyword is added, and then the shared variable is written, the Lock prefix instruction will cause two things from the assembly point of view:
1) The data of the current processor cache line is written back to the system memory.
2) This operation of writing back to the memory will invalidate the data cached at the memory address in other CPUs

The realization principle of synchronized

Java object header
The lock used for synchronized is stored in the Java object header.

The Mark Word in the Java object header stores the object's HashCode, generation age, and lock mark bits by default.

Lock upgrade
In order to reduce the performance cost of acquiring locks and releasing locks, Java SE 1.6 introduced "biased locks" and "lightweight locks". In Java SE 1.6, there are 4 lock states, from low to high. : No lock state, partial lock state, lightweight lock state and heavyweight lock state, these states will gradually upgrade with competition. The lock can be upgraded but cannot be downgraded, which means that the partial lock cannot be downgraded to the partial lock after being upgraded to a lightweight lock. This kind of lock upgrade but not downgrade strategy is to improve the efficiency of acquiring and releasing locks

Biased locks
When a thread accesses a synchronized block and acquires a lock, the thread ID of the lock bias is stored in the lock record in the object header and the stack frame. Later, the thread does not need to perform CAS operations to lock when entering and exiting the synchronized block. And to unlock, simply test whether there is a bias lock pointing to the current thread stored in the Mark Word of the object head. If the test is successful, it means that the thread has acquired the lock. If the test fails, you need to test whether the mark of the bias lock in the Mark Word is set to 1 (indicating that it is currently a bias lock): if not set, use CAS competition lock; if set, try to use CAS to set the object head The bias lock points to the current thread.

The realization principle of atomic operation

The processor implements atomic operations

Use bus locks to ensure atomicity

The so-called bus lock is to use a LOCK# signal provided by the processor. When a processor outputs this signal on the bus, the requests of other processors will be blocked, and the processor can monopolize the shared memory.

Use cache locks to ensure atomicity

Java achieves atomicity

Atomic operations can be achieved in Java by means of locks and loop CAS

锁机制实现原子性

The lock mechanism ensures that only the thread that obtains the lock can operate the locked memory area. There are many lock mechanisms implemented inside the JVM, including biased locks, lightweight locks, and mutual exclusion locks. What's interesting is that in addition to the bias lock, the JVM implements locks using cyclic CAS, that is, when a thread wants to enter the synchronized block, the cyclic CAS method is used to acquire the lock, and when it exits the synchronized block, the cyclic CAS is used to release the lock.

循环CAS实现原子操作的三大问题

The ABA problem is
because CAS needs to check whether the value has changed when operating the value. If there is no change, then update, but if a value is originally A, it becomes B, and then becomes A, then use CAS to check You will find that its value has not changed, but it has actually changed. The solution to the ABA problem is to use the version number. Add the version number in front of the variable, and add 1 to the version number every time the variable is updated, then A→B→A will become 1A→2B→3A.
Long cycle time and high overhead
Only guarantee the atomic operation
of one shared variable When operating on multiple shared variables, the loop CAS cannot guarantee the atomicity of the operation. At this time, locks can be used. Another trick is to combine multiple shared variables into one shared variable to operate. For example, if there are two shared variables i=2, j=a, merge ij=2a, and then use CAS to operate ij.