The source of Java concurrent programming bugs

This article is the third article of "Java High Concurrency", first published on the personal website .

I believe that everyone has heard of concurrent programming, and this knowledge point is often asked in interviews. Sometimes, let me tell you whether you have any experience in concurrent programming, and talk about it in detail. The results can be imagined, theoretical knowledge can be said, but there is not much practical experience, what is even more troublesome is the huge gap between theory and practice. At work, the concurrency of the system is relatively low. With the help of databases and middleware like Tomcat, we basically do not need to write concurrent programs.

In a word, when the system concurrency is not high, the concurrency problems are basically solved by middleware and database, or the system data volume is relatively large and requires performance, then concurrent programming is required.

Concurrent programming is a good thing, but there is no free lunch in the world. Everything has a price. While obtaining high performance, it also has to bear many problems caused by concurrent programming.

Visibility issues caused by caching

Visibility (Shared Object Visibility) : The thread's visibility to shared variable modifications. When a thread modifies the value of a shared variable, other threads are immediately aware of the modification.

In the article Java Memory Model , the relationship between threads, working memory, and main memory is introduced. If you have no impression, you can review it.

If two or more threads share an object, updates to the shared object by one thread may not be visible to other threads: the shared object is initialized in main memory. A thread running on the CPU reads the shared object into the CPU cache and then modifies the object. As long as the CPU cache is not flushed to main memory, the modified version of the object is invisible to threads running on other CPUs. This approach could result in each thread having a private copy of the shared object, each in a different CPU cache.

The figure below illustrates this situation. The thread running on the left CPU copies the shared object into its CPU cache and then changes the value of the count variable to 2. This modification is invisible to other threads running on the right CPU, because the modified count value has not been flushed back to main memory.

Variables change between CPU cache and main memory

We demonstrate through a subordinate case. After thread B changes the value of the stopRequested variable, but has not had time to write it into the main memory, thread B turns to do other things, then thread A does not know the change of thread B to the stopRequested variable. , so it will continue to cycle.

public class VisibilityCacheTest {

  private static boolean stopRequested = false;

  public static void main(String[] args) throws InterruptedException {
    Thread thread1 = new Thread(() -> {
      int i = 0;
      while (!stopRequested) {
        i++;
      }
    },"A");

    Thread thread2 = new Thread(() -> {
      stopRequested = true;
    },"B");

    thread1.start();
    TimeUnit.SECONDS.sleep(1);	//为了演示死循环，特意sleep一秒
    thread2.start();
  }
}

This code is a typical piece of code, and many people may use this marking method when interrupting a thread. But in fact, will this code run exactly right? Will the thread be interrupted? Not necessarily, maybe most of the time, this code can interrupt the thread, but it may also cause the thread to be interrupted (although this possibility is very small, as long as this happens, it will cause an infinite loop).

Atomicity problem caused by thread switching

Even single-core processors support multi-threaded code execution, and the CPU implements this mechanism by assigning CPU time slices to each thread. The time slice is the time allocated by the CPU to each thread. Because the time slice is very short, the CPU switches the execution of threads continuously, making us feel that multiple threads are executing at the same time. The time slice is generally tens of milliseconds (ms).

The CPU executes tasks cyclically through the time slice allocation algorithm. After the current task executes a time slice, it will switch to the next task. However, the state of the previous task is saved before switching, so that the state of this task can be reloaded the next time you switch back to this task. So the process of the task from saving to reloading is a context switch .

The schematic diagram of thread switching is as follows:

Schematic diagram of thread switching

We still demonstrate through a classic case:

public class AtomicityTest {

  static int count = 0;

  public static void main(String[] args) throws InterruptedException {
    AtomicityTest obj = new AtomicityTest();
    Thread t1 = new Thread(() -> {
      obj.add();
    }, "A");

    Thread t2 = new Thread(() -> {
      obj.add();
    }, "B");

    t1.start();
    t2.start();

    t1.join();
    t2.join();

    System.out.println("main线程输入结果为==>" + count);
  }

  public void add() {
    for (int i = 0; i < 100000; i++) {
      count++;
    }
  }
}

What the above code does is very simple, open 2 threads to perform 100,000 plus 1 operations on the same shared integer variable respectively, we expect the value of count to be printed out at the end is 200000, but it doesn't work, run the above code, The value of count is very likely not equal to 200,000, and the result of each run is different, always less than 200,000. Why does this happen?

The auto-increment operation is not atomic, it includes reading the original value of the variable, adding 1, and writing to the working memory. Then it means that the three sub-operations of the auto-increment operation may be executed separately, which may lead to the following situations:

If the value of the variable count at a certain time is 10,

Thread A performs an auto-increment operation on the variable. Thread A first reads the original value of the variable count, and then thread A is blocked (if it exists);

然后线程B对变量进行自增操作，线程B也去读取变量 count 的原始值，由于线程A只是对变量 count 进行读取操作，而没有对变量进行修改操作，所以主存中 count 的值未发生改变，此时线程B会直接去主存读取 count 的值，发现 count 的值为10，然后进行加1操作，并把11写入工作内存，最后写入主存。

然后线程A接着进行加1操作，由于已经读取了 count 的值，注意此时在线程A的工作内存中 count 的值仍然为10，所以线程A对 count 进行加1操作后 count 的值为11，然后将11写入工作内存，最后写入主存。

那么两个线程分别进行了一次自增操作后，inc只增加了1。

编译优化带来的有序性问题

在 Java 高并发系列开始时，第一篇文章介绍了计算机的一些基础知识。处理器为了提高 CPU 的效率，通常会采用指令乱序执行的技术，即将两个没有数据依赖的指令乱序执行，但并不会影响最终的结果。与处理器的乱序执行优化类似，Java 虚拟机的即时编译器中也有类似的指令重排序（Instruction Reorder）优化。

在 Java 领域一个经典的案例就是利用双重检查创建单例对象，例如下面的代码：在获取实例 getInstance() 的方法中，我们首先判断 instance 是否为空，如果为空，则锁定 Singleton.class 并再次检查 instance 是否为空，如果还为空则创建 Singleton 的一个实例。

public class Singleton {
  static Singleton instance;
  static Singleton getInstance(){
    if (instance == null) {
      synchronized(Singleton.class) {
        if (instance == null)
          instance = new Singleton();
        }
    }
    return instance;
  }
}

Suppose there are two threads A and B calling the getInstance() method at the same time, they will find instance == null at the same time, so they lock the Singleton.class at the same time. At this time, the JVM guarantees that only one thread can lock successfully (assuming it is thread A) , another thread will be in a waiting state (assuming thread B); thread A will create a Singleton instance, and then release the lock, after the lock is released, thread B is awakened, and thread B tries to lock again, at this time it can be locked If the lock is successful, when thread B checks instance == null, it will find that a Singleton instance has already been created, so thread B will not create another Singleton instance.

There are three steps to instantiating an object:

(1) Allocate memory space.

(2) Initialize the object.

(3) Assign the address of the memory space to the corresponding reference.

But since the operating system can reorder instructions, the above process may also become the following process:

(1) Allocate memory space.

(2) Assign the address of the memory space to the corresponding reference.

(3) Initialize the object s

If it is this process, an uninitialized object reference may be exposed in a multi-threaded environment, resulting in unpredictable results.

Multithreaded initialization object

Summarize

To write a good concurrent program, you must first know where the problem of the concurrent program is. Only by determining the "target" can the problem be solved. After all, all solutions are for the problem.

The purpose of caching, threading, and compilation optimization is the same as our purpose of writing concurrent programs, which is to improve program performance. However, while technology solves one problem, it will inevitably bring another problem. Therefore, when adopting a technology, we must be clear about what problems it brings and how to avoid them.