Java Spring Recruitment Interview Sprint Series: Java Concurrency Foundation

table of Contents

Why does Java have concurrency problems

The reason for this is
because Java is a multi-threaded processing model. So when a request comes, Java will spawn a thread to handle the request. If multiple threads access the same shared variable, concurrency problems will occur. Therefore, one of the conditions for concurrency problems is "shared variables". So what kind of variables are shared variables? This involves the Java memory model JMM. In the Java memory model, if a Java thread wants to obtain a variable, it needs to first put the variable from the main memory into the working memory, and then obtain it through the working memory, and experience a lock-> The process of read->load->use. Each thread has such a process to obtain variables, so it is naturally possible that the A thread obtains the variable, but has not been assigned back to the main memory, but is read or changed by the B thread. This will naturally cause inconsistencies. .

Concurrency problems and general solutions

  • Visibility problem
    Java memory model, if thread A changes the variable obj not yet complete, thread B will get the value of obj, and the data inconsistency problem caused by this is a visibility problem. To meet visibility, when a thread modifies the value of obj, the new value must be immediately visible to other threads. The visibility problem can be solved by using the volatile keyword. When a variable is modified by volatile, the value of the variable will not be obtained from the local working memory. Volatile actually solves the visibility problem by forcing the use of the value in the main memory.

  • The problem of atomicity
    But volatile does not completely solve the concurrency problem, because the operations we assume above are regarded as atomic operations by default. In fact, a large number of operations in Java are not atomic operations. This is the problem of atomicity.
    To solve the atomicity problem, you can use the Atomic class provided in the Java concurrency package. Its principle is CAS optimistic locking. Of course, the most important solution to the visibility and atomicity problem, and also the most popular way for programmers to use, is to use synchronized for locking. The use of synchronized and other related concurrency issues.

What is CAS

CAS algorithm
CAS (Compare And Swap, that is, compare and exchange). jdk5 adds the concurrency package java.util.concurrent.*, the following classes use the CAS algorithm to implement an optimistic lock that is different from the synchronous lock of the synchronous lock. Before JDK 5, the Java language relied on the synchronized keyword to ensure synchronization. This is an exclusive lock and also a pessimistic lock.

Understanding of CAS algorithm Understanding
of CAS, CAS is a lock-free algorithm, CAS has 3 operands, the memory value V, the old expected value A, and the new value B to be modified. If and only if the expected value A and the memory value V are the same, modify the memory value V to B, otherwise do nothing and return false. There may be interviewers asking how CAS is implemented at the bottom. In JAVA, CAS is implemented by calling the C++ library, and the C++ library then calls the CPU instruction set. In different architectures, cpu instructions are also significantly different. For example, the x86 CPU provides cmpxchg instruction; while in a simplified instruction set architecture, (such as "load and reserve" and "store conditional"), CAS is a very lightweight operation on most processors. This is also its advantage.
The pseudo code of CAS comparison and exchange can be expressed as:

do{
    
    
    备份旧数据;
    基于旧数据构造新数据;
}while(!CAS( 内存地址,备份的旧数据,新数据 ))

Suppose there are t1 and t2 threads that update the value of the same variable 56 at the same time. Because the t1 and t2 threads are both accessing the same variable 56 at the same time, they will copy the value of the main memory to their working memory space, so t1 and t2 The expected value of threads is 56. Suppose that thread t1 can update the value of the variable while t1 is competing with thread t2, and other threads fail. (The failed thread will not be suspended, but will be told that it has failed in this competition and can try again). The t1 thread updates the variable value to 57, and then writes it to the memory. At this time, for t2, the memory value becomes 57, which is inconsistent with the expected value 56, and the operation fails (the value you want to change is no longer the original value).
It means that when the two are compared, if they are equal, it proves that the shared data has not been modified, replaced with a new value, and then continues to run; if they are not equal, it means that the shared data has been modified, and the operation that has been done is abandoned, and then Re-execute the operation just now. It is easy to see that the CAS operation is based on the assumption that the shared data will not be modified, using a commit-retry model similar to the database. When there are few chances of synchronization conflicts, this assumption can bring greater performance improvements.

Advantages and disadvantages of the CAS algorithm
CAS (Compare and Exchange) is CPU指令级an operation that only has one atomic operation, so it is very fast. Moreover, CAS avoids the problem of requesting the operating system to determine the lock, and it can be done directly inside the CPU without bothering the operating system. But does CAS have no overhead? Do not! There is a cache miss. An 8-core CPU computer system, each CPU has a cache (a cache and register inside the CPU), and an interconnection module in the die, so that the two cores in the die can communicate with each other. The system interconnect module in the center of the figure allows four die to communicate with each other and connect the die to the main memory. Data is transmitted in the system in units of "cache lines", which correspond to a power-of-two byte block in memory, usually between 32 and 256 bytes in size. When the CPU reads a variable from memory to its register, it must first read the cache line containing the variable to the CPU cache. Similarly, when the CPU stores a value in a register into memory, it must not only read the cache line containing the value to the CPU cache, but also must ensure that no other CPU has a copy of the cache line. The overhead of flushing different CPU caches, so that means that CAS has overhead. The lock operation is more time-consuming than the CAS operation because the data structure of the lock operation requires two atomic operations (lock+update).

  • The disadvantages of CAS are as follows:
    • ABA problem
      If a thread finds that the memory value and the expected value are both A during the CAS operation, can it be determined that no thread modifies the value during the period? The answer is not necessarily. If the update of A -> B -> A occurs during the period, just judging that the value is A, may lead to unreasonable modification operations. In response to this situation, Java provides the AtomicStampedReference tool class to ensure the correctness of CAS by establishing a similar version number (stamp) for the reference.
    • Long cycle time and high cost
      The failure retry mechanism used in CAS hides an assumption that the contention situation is short-lived. In most application scenarios, it is true that most retries will only occur once and be successful. But there are always unexpected situations, so when necessary, consider limiting the number of spins to avoid excessive CPU consumption.
    • Only guarantee the atomic operation of a shared variable

Unsafe class interpretation

  • The Unsafe class is under the sun.misc package and is not part of the Java standard. This class encapsulates many similar pointer operations, which can directly perform operations such as memory management, object manipulation, and thread blocking/waking up. Java itself does not directly support pointer operations, so this is one of the reasons why the class is named Unsafe. But many basic Java class libraries, including some widely used high-performance development libraries, are developed based on Unsafe classes, such as Netty, Hadoop, Kafka, etc.
  • Using Unsafe can be used to directly access system memory resources and perform independent management. The Unsafe class plays a great role in improving the efficiency of Java operation and enhancing the underlying operating capabilities of the Java language.
  • Unsafe can be considered a backdoor left in Java, providing some low-level operations, such as direct memory access, thread scheduling, etc.
  • The official does not recommend using Unsafe.

Many CAS methods in JUC are actually operated by Unsafe classes internally.
For example, the compareAndSet method of AtomicBoolean:

public final boolean compareAndSet(boolean expect, boolean update) {
    
    
    int e = expect ? 1 : 0;
    int u = update ? 1 : 0;
    return unsafe.compareAndSwapInt(this, valueOffset, e, u);
}
compareAndSwapInt(Object o, long offset, int expected, int x);
// o是需要修改的对象 offset需要修改的字段到对象头的偏移量(通过偏移量可以快速定位修改的是哪个字段) expected 期望值 x要设置的值

The unsafe.compareAndSwapInt method is a native method. (If the field value in the object is equal to the expected value, modify the field value to x, and then return true; otherwise, return false):

Most of Unsafe's APIs are native methods, mainly including the following categories:

  1. Class related. Mainly provide the operation method of Class and its static fields.
  2. Object related. Mainly provide the operation method of Object and its fields.
  3. Arrray related. Mainly provide the operation method of the array and its elements.
  4. Concurrency related. Mainly provide low-level synchronization primitives, such as CAS, thread scheduling, volatile, memory barriers, etc.
  5. Memory related. Provides a direct memory access method (bypassing the Java heap to directly manipulate the local memory), which can freely use system memory resources like C.
  6. System related. Mainly return some low-level memory information, such as address size, memory page size.

JUC atomic class

Introduction to JUC Atomic Class

According to the modified data type, the atomic operation classes in the JUC package can be divided into 4 categories.

  1. Basic types: AtomicInteger, AtomicLong, AtomicBoolean;
  2. Array types: AtomicIntegerArray, AtomicLongArray, AtomicReferenceArray;
  3. Reference type: AtomicReference, AtomicStampedRerence, AtomicMarkableReference;
  4. Object attribute modification types: AtomicIntegerFieldUpdater, AtomicLongFieldUpdater, AtomicReferenceFieldUpdater.
    The purpose of these classes is to perform atomic operations on the corresponding data. The so-called atomic operation means that the operation process will not be interrupted, ensuring that the data operation is performed in an atomic manner.

Basic type AtomicInteger

Principle Interpretation (Take AtomicInteger as an example)
AtomicInteger is located in the java.util.concurrent.atomic package. It encapsulates int and provides atomic access and update operations. The implementation of atomic operations is based on CAS.

  • CAS operation

  • Value uses volatile modification

  • Analysis on the Principle of AtomicInteger

public class AtomicInteger extends Number implements java.io.Serializable {
    
    
    private static final long serialVersionUID = 6214790243416807050L;

    // setup to use Unsafe.compareAndSwapInt for updates
    private static final Unsafe unsafe = Unsafe.getUnsafe();
    private static final long valueOffset;

    static {
    
    
        try {
    
    
            valueOffset = unsafe.objectFieldOffset
                (AtomicInteger.class.getDeclaredField("value"));
        } catch (Exception ex) {
    
     throw new Error(ex); }
    }
    // volatile关键字修饰
    private volatile int value;
}

As can be seen from the internal properties of AtomicInteger, it relies on some of the underlying capabilities provided by Unsafe (CAS operation, modifying the value of the variable through the memory offset address), to perform underlying operations; such as the offset of the variable value in the memory represented by valueOffset Address to get the data.
The variable value is modified with volatile to ensure memory visibility between multiple threads.
Let's take getAndIncrement as an example to illustrate its atomic operation process

public final int getAndIncrement() {
    
    
    return unsafe.getAndAddInt(this, valueOffset, 1);
}
//unsafe.getAndAddInt
public final int getAndAddInt(Object var1, long var2, int var4) {
    
    
    int var5;
    do {
    
    
        var5 = this.getIntVolatile(var1, var2);
    } while(!this.compareAndSwapInt(var1, var2, var5, var5 + var4));

    return var5;
}

Assuming that thread 1 and thread 2 get the value of value through getIntVolatile to be 1, thread 1 is suspended, thread 2 continues to execute
thread 2 in the compareAndSwapInt operation because the expected value and memory value are both 1, so the memory value is successfully updated to 2
Thread 1 continues to execute, in the compareAndSwapInt operation, the expected value is 1, and the current memory value is 2, the CAS operation fails, do nothing, return false
Thread 1 gets the latest value of 2 through getIntVolatile again, and then proceed A compareAndSwapInt operation, this operation is successful, the memory value is updated to 3

The other
AtomicLong is to perform atomic operations on long plastics. In a 32-bit operating system, 64-bit long and double variables are not atomic because they are operated on by the JVM as two separate 32-bits. The use of AtomicLong can keep the operation of long atomic.

Array type

AtomicIntegerArray, AtomicLongArray, AtomicReferenceArray three array types of atomic classes, the principle and usage are similar.
These three types are similar, AtomicIntegerArray corresponds to AtomicInteger, AtomicLongArray corresponds to AtomicLong, and AtomicReferenceArray corresponds to AtomicReference.
AtomicLong is used to perform atomic operations on long shaping. The role of AtomicLongArray is to perform atomic operations on the "long integer array".
unsafe is an Unsafe object returned by Unsafe.getUnsafe(). Atomic operations are performed on the elements of the long array through Unsafe's CAS function.
In fact, reading the source code can also find that these array atomic classes are just more operations to find the address of the element in the memory by index than the corresponding ordinary atomic class.
Note: The atomic array does not mean that the thread can operate on all elements in the array atomically at one time. It means that each element in the array can be operated atomically.

Thread Pool

The java.util.concurrent package provides a ready-made thread pool implementation


The approximate steps are the following 3 steps:

  1. Call the static method of the executor class (Executors) to create the thread pool
  2. Call the submit method of the thread pool to submit Runnable or Callable objects
  3. When there is no need to add more tasks, call shutdown to close the entry
//创建线程池对象
ExecutorService service = Executors.newCachedThreadPool();
//创建一个用于递增输出i值的runnable对象
Runnable runnable = new Runnable() {
    
    
    @Override
    public void run() {
    
    
        for (int i = 0; i < 400; i++) {
    
    
            System.out.println(i);
        }
    }
};
//调用线程池的submit方法传入runnable(传入的runnable将会自动执行)
service.submit(runnable);
service.submit(runnable);
//当不需要传入更多的任务时调用shutdown方法来关闭入口
service.shutdown();

It should be noted that if you want to directly stop all tasks of the thread pool, you cannot operate through shutdown, because shutdown only closes the entrance, but the tasks that have been added will continue to execute, then we can call the shutdownNow method of the thread pool To operate, the function of shutdownNow is to close the entry of the thread pool and will try to terminate all tasks in the current thread pool.

//用来关闭线程池入口以及终止所有正在执行的任务
service.shutdownNow();

The submit method of service will return an object of type Future<?> But what kind of type is this? Let's take a look at the summary of the method in the api:
[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-LrwB3YNg-1613525641344)(img/java-thread-future- api.png)]
From the method summary, it can be seen that the object is used to cancel the task after joining the thread pool, view the status and other operations, if it is necessary to cancel the task after joining the thread pool, When submitting, you need to save the Future object.

Why use thread pool:

  • Pooling technology applications: thread pool, database connection pool, http connection pool, etc.
  • The idea of ​​pooling technology is mainly to reduce the consumption of each acquisition of resources and improve the utilization of resources.
  • The thread pool provides a strategy for limiting and managing resources. Each thread pool also maintains some basic statistics, such as the number of completed tasks.

Benefits of using thread pool:

  • 降低资源消耗: Reduce the consumption caused by thread creation and destruction by reusing the created threads.
  • 提高响应速度: When the task arrives, it can be executed immediately without waiting for the thread to be created.
  • 提高线程的可管理性: Thread is a scarce resource. If it is created unlimitedly, it will not only consume system resources, but also reduce the stability of the system. The thread pool can be used for unified allocation, monitoring and tuning.

The top-level interface of thread pool in Java is Executor, but in a strict sense, Executor is not a thread pool, but a tool for executing threads. The real thread pool interface is ExecutorService.

[External link image transfer failed. The origin site may have an anti-hotlinking mechanism. It is recommended to save the image and upload it directly (img-nBGISYu8-1613525641345)(img/java-thread-executor-framework.png)]

  • The main thread must first create a task object that implements the Runnable or Callable interface.
  • The created object that implements the Runnable/Callable interface is directly delivered to the ExecutorService for execution:
  • ExecutorService.execute(Runnable command)或者ExecutorService.sumbit(Runnable command)或ExecutorService.sumbit(Callable task).
  • If you execute ExecutorService.submit(...), ExecutorService will return an object that implements the Future interface. Finally, the main thread can execute the FutureTask.get() method to wait for the task execution to complete. The main thread can also execute FutureTask.cancel() to cancel the execution of the secondary task.

[External link image transfer failed. The origin site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-0pIYmeZ1-1613525641346)(img/java-thread-pool-process.png)]

There is a simple and widely used formula:

  • CPU-intensive tasks (N+1): This task mainly consumes CPU resources. You can set the number of threads to N (the number of CPU cores)+1. One more thread than the number of CPU cores is to prevent occasional thread shortages. The impact of page interruption or task suspension caused by other reasons. Once the task is stopped, the CPU will be in an idle state, and in this case, an extra thread can make full use of the idle time of the CPU.
  • I/O intensive (2N): When this kind of task is applied, the system is used to process I/O interaction most of the time, and the thread does not occupy the CPU to process the I/O within the time period, and then it can Hand over the CPU to other threads for use. Therefore, in applications with I/O-intensive tasks, more threads can be configured, and the specific calculation method is 2N.

Strategy processing

Quote

  • Why does Java have concurrency problems? https://blog.csdn.net/somehow1002/article/details/97049957
  • Java concurrency scenarios & reasons & problems talk https://blog.csdn.net/zangdaiyang1991/article/details/98481346
  • What is CAS https://www.jianshu.com/p/ab2c8fce878b
  • An article to understand the Unsafe class in Java https://www.jb51.net/article/140726.htm
  • Talking about the realization principle of AtomicInteger https://www.jianshu.com/p/cea1f9619e8f
    Concurrency Three Scenarios
    Division of
    labor Division of labor is the most basic scenario for multi-threaded concurrency. Each performs its duties and completes its own work. Division of labor means that each thread performs its duties and accomplishes different tasks. There are also many modes of division of labor. For example:

Producer-consumer mode;
MapReduce mode, splits the work into multiple copies, and after multiple threads are completed together, combine the results. Stream and Fork/Join in Java 8 are the embodiment of this mode;
Thread-Per-Message mode , The server is in this mode, receiving messages to different Threads for processing.
Synchronization If
there is a division of labor, there must be synchronization, and different workers need to cooperate, and so do different threads. The execution condition of a thread often depends on the execution result of another thread.

The most basic communication mechanism between threads is the monitor mode and wait/notify. In addition to this, there are multiple tool classes, such as:

Future and its derivative tool classes FutureTask/CompletableFuture, etc., can complete asynchronous programming;
CountDownLatch/CyclicBarrier can achieve collaboration in specific scenarios;
Semaphore provides classic PV synchronization primitives, and can also be used as a current limiter;
ReentrantLock and Condition, right Extension of
monitor synchronization; mutual exclusion If
multiple threads access the same shared variable, mutual exclusion is required. The division of labor and collaboration emphasizes performance, and the issue of mutual exclusion emphasizes the correctness, that is, thread safety. Java provides a lot of ideas and tools to solve the problem of mutual exclusion.

Avoid sharing, no sharing, no race, there is no harm, such as ThreadLocal;
no change, if everyone does not make changes, all are read-only, there is nothing wrong together;
Copy-on-write, you change yours, me Change to mine, and generate a new copy every time it changes. As long as there is no conflict, it can be parallelized;
CAS, before writing, check to see if there is anything dead or not (the variable is the same as when you read it), and no more writing , Otherwise make another change;
Lock is the last resort, but I don’t want to be too extreme, just enough, ReadWriteLock/StampedLock, just enough

Causes of concurrency problems
Visibility problems caused by cache
At runtime, there are two copies of the same data, one in the memory and one in the CPU cache. Each CPU has its own data cache (JMM memory model).

The atomicity problem caused by thread switching The
computer seems to be able to run more threads than its own core at the same time because of the time-sharing switching mechanism of modern operating systems. The time-sharing mechanism improves the utilization rate of the CPU, and can also ensure that the multi-thread can obtain the CPU relatively fairly. But the time-sharing mechanism has caused an inevitable problem, which is thread switching. When a thread switch occurs, the sleeping thread will be temporarily stored in the scene, including the PC (program counter) and stack. When this thread is awakened again, it may be discovered that the world has become a mess, because a high-level language instruction may correspond to multiple CPU instructions.

The order problem caused by compilation optimization
JAVA may rearrange instructions in order to optimize performance. These rearrangements are harmless most of the time. But sometimes, it may cause unexpected bugs. A classic problem caused by rearrangement is that double-weight checking creates a singleton.

The three problems of concurrency.
Security issues.
Concurrent programs are correct due to visibility, atomicity, and order issues.

Liveness problems
refer to problems caused by an operation that cannot be executed, such as deadlocks

Performance problems
are generally caused by the abuse of locks.
There are three main indicators for performance: throughput, latency, and concurrency.

Throughput, refers to the number of requests processed per unit time;
Latency, refers to the average time spent in a single processing;
Concurrency, the number of requests that can be accessed at the same time

The HikariPool thread pool is used in SpringBoot, which is known as the fastest. When we close the program, the log will be printed:

sconnected from the target VM, address: '127.0.0.1:57835', transport: 'socket'
2020-05-18 21:28:54.525  INFO 15232 --- [extShutdownHook] o.s.s.concurrent.ThreadPoolTaskExecutor  : Shutting down ExecutorService 'applicationTaskExecutor'
2020-05-18 21:28:54.526  INFO 15232 --- [extShutdownHook] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Shutdown initiated...
2020-05-18 21:28:55.925  INFO 15232 --- [extShutdownHook] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Shutdown completed.
05-18 21:28:54.526  INFO 15232 --- [extShutdownHook] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Shutdown initiated...
2020-05-18 21:28:55.925  INFO 15232 --- [extShutdownHook] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Shutdown completed.

Guess you like

Origin blog.csdn.net/weixin_43314519/article/details/113831223