Concurrent multi-threaded JAVA programming of the three core issues

Outline

Concurrent programming is one of the important features of the Java language that enables complex code easier, thus greatly simplifies the development of complex systems. Concurrent programming computing power can leverage the power of multi-processor systems, along with the number of processors continues to grow, how to efficiently concurrent becoming increasingly important. But development is difficult, complicated harder as concurrent programs prone to bug, the bug is more strange, difficult to track, and difficult to reproduce. If you want to solve these problems we must find the correct these problems, which need to clarify the nature of concurrent programming and concurrent programming to solve any problems. This article explains concurrent to solve three problems: atomicity, visibility, orderly.

basic concepts

Hardware development

Development of hardware, there has been a contradiction, the speed of CPU, memory, I / O device differences.

Speed ​​Sort: CPU >> Memory >> I / O devices

To balance the speed difference between these three, we made the following optimization:

  1. Increased CPU cache to speed balancing memory and CPU differences;

  2. Operating system increases the process, thread, a time-division multiplexing CPU, thus balancing I / O devices and the CPU speed difference;

  3. Compiler optimization instruction execution order, so that the buffer can be more rational use.

 After optimization, speed and performance improvement is also accompanied by the development of a variety of new problems arising, such as multithreading using multiple cpu problem.

 

Concurrent and Parallel

In the art of [the United States] Buleixiesi books concurrent quoted in the book are:

If a system supports two or more actions (Action) exist, then the system is a concurrent system. If a system supports two or more actions performed simultaneously, then the system is a parallel system. Concurrent Systems and Parallel system key difference between these two definitions is "presence" of the word.

In a concurrent program you can have two or more threads simultaneously. This means that if the program runs on a single core processor, the two threads alternately swapped in or swapped out of memory. These threads are simultaneously "existence" - each thread in a state of process execution. If the program can be executed in parallel, then it must be run on a multi-core processor. At this time, each thread in the program will be assigned to a separate processor core, it can be run simultaneously.
I believe you have been able to conclude - "parallel" concept is a subset of the concept of "concurrent." In other words, you can write a concurrent program has multiple threads or processes, but if there is no multi-core processors to execute this program, you will not be able to run code in parallel. Therefore, any problems in solving a single execution behavior relates to the programming mode or a plurality of execution processes fall within the scope of concurrent programming.
 
Reordering concept
In order to improve performance when executing a program, the compiler and processor instructions would often be reordered.
Java from the final source code into a sequence of instructions actually executed, will undergo the following three reordering respectively.
 

 

 1, compiler optimizations reordering: the compiler does not change the semantics of the premise of single-threaded programs, can rearrange the order of execution of the statement.

2, the reordering ILP: a processor to execute multiple instructions overlap, if the data dependency does not exist, the processor may change the execution of machine instructions corresponding to the statement sequence.

3, a reordering of the memory system: a processor using the cache and the read / write buffer, load and store operations such that it may appear out of order execution.

Reordering need to comply with certain rules:

Reordering with data dependencies: and if there are two operations to access the same two operating variable, a write operation, then the two operations on data dependencies. Operation of the data dependencies, not reordered. Operating data dependencies only performed for a single instruction sequence processor and a single thread of execution.

Reordering comply with as-if-serial operation: that no matter how reordering, the results of a single-threaded program will not change.

But the reordering will also bring some problems, leading to visibility problems multithreaded programs and orderliness appears. Here I described one by one.

JAVA memory model (JMM)

Java memory model (Java Memory Model) describes the Java program variables (variables shared among threads) of access rules, and the variable stored in memory in the JVM and the low-level details of such a variable is read from memory.

Variable here refers to: a shared variable

1, all the variables are stored in main memory

2, each thread has its own independent working memory, which saves a copy of the thread to use variables (main memory a copy of the variable)

X variable is shared variables:

 

Visibility

In short: Modifying a thread to shared variables, another thread can immediately see, which we call visibility.

Why use visibility?

对于如今的多核处理器,每个cpu都有自己的缓存,而缓存仅仅对他所在的处理器可见,CPU缓存与内存的数据不容易保持一致。为了避免处理器停顿下来等待向内存写入数据而产生的延迟,处理器使用写缓冲区来临时保存向内存写入的数据。写缓冲区合并对同一内存地址的多次写,并以批处理的方式刷新,也就是说写缓冲区不会即时将数据刷新到主内存中。缓存不能及时刷新导致可见性问题。

举例:

public class Test {
public int a = 0;

public void increase() {
		a++;
	}

public static void main(String[] args) {
final Test test = new Test();
for (int i = 0; i < 10; i++) {
new Thread() {
public void run() {
for (int j = 0; j < 1000; j++)
						test.increase();
				};
			}.start();
		}

while (Thread.activeCount() > 1) {
// 保证前面的线程都执行完
			Thread.yield();
		}
		System.out.println(test.a);
	}
}

  

目的:10个线程将inc加到10000。

结果:每次运行,得到的结果都小于10000。

原因分析:

假设线程1和线程2同时开始执行,那么第一次都会将a=0 读到各自的CPU缓存里,线程1执行a++之后a=1,但是此时线程2是看不到线程1中a的值的,所以线程2里a=0,执行a++后a=1。

线程1和线程2各自CPU缓存里的值都是1,之后线程1和线程2都会将自己缓存中的a=1写入内存,导致内存中a=1,而不是我们期望的2。所以导致最终 a 的值都是小于 10000 的。这就是缓存的可见性问题。

 要实现共享变量的可见性,必须保证两点:

1、线程修改后的共享变量值能够及时从工作内存刷新到主内存中
2、其他线程能够及时把共享变量的最新值从主内存更新到自己的工作内存中

 

原子性

原子性:把一个或者多个操作在cpu执行过程中不被中断的特性称为原子性。

在并发编程中,原子性的定义不应该和事务中的原子性(一旦代码运行异常可以回滚)一样。应该理解为:一段代码,或者一个变量的操作,在一个线程没有执行完之前,不能被其他线程执行。也就是说:

1,原子操作是对于多线程而言的,对于单一线程,无所谓原子性。有点多线程常识的朋友这个都应该知道,但也要时刻牢记

2,原子操作是针对共享变量的。因此,涉及局部变量(如方法中的变量)我们是没必要要求它具有原子性的。

3,原子操作是不可分割的。(我们要站在多线程的角度)指访问某个共享变量的操作从其执行线程之外的线程来看,该操作要么已经执行完毕,要么尚未发生,其他线程不会看到执行操作的中间结果。学过数据库的朋友应该很熟悉这种原子性。那么,站在访问变量的角度,我们可以这样看,如果要改变一个对象,而该对象包含一组需要同时改变的共享变量,那么,在一个线程开始改变一个变量之后,在其它线程看来,这个对象的所有属性要么都被修改,要么都没有被修改,不会看到部分修改的中间结果。

并且记住,在Java语言中,long型和double型以外的任何类型的变量的写操作都是原子操作。(不提读操作的原因是如果所有线程都是读操作的话,那么没必要保持原子性。

为什么会有原子性问题?

线程是CPU调度的基本单位。CPU会根据不同的调度算法进行线程调度,将时间片分派给线程。当一个线程获得时间片之后开始执行,在时间片耗尽之后,就会失去CPU使用权。多线程场景下,由于时间片在线程间轮换,就会发生原子性问题。

如:对于一段代码,一个线程还没执行完这段代码但是时间片耗尽,在等待CPU分配时间片,此时其他线程可以获取执行这段代码的时间片来执行这段代码,导致多个线程同时执行同一段代码,也就是原子性问题。

线程切换带来原子性问题。

在Java中,对基本数据类型的变量的读取和赋值操作是原子性操作,即这些操作是不可被中断的,要么执行,要么不执行。

 

i = 0;		// 原子性操作
j = i;		// 不是原子性操作,包含了两个操作:读取i,将i值赋值给j
i++; 			// 不是原子性操作,包含了三个操作:读取i值、i + 1 、将+1结果赋值给i
i = j + 1;		// 不是原子性操作,包含了三个操作:读取j值、j + 1 、将+1结果赋值给i

 

 举例:还是上文中的代码,10个线程将inc加到10000。假设在保证可见性的情况下,仍然会因为原子性问题导致执行结果达不到预期。为方便看,把代码贴到这里:

public class Test {
public int a = 0;

public void increase() {
		a++;
	}

public static void main(String[] args) {
final Test test = new Test();
for (int i = 0; i < 10; i++) {
new Thread() {
public void run() {
for (int j = 0; j < 1000; j++)
						test.increase();
				};
			}.start();
		}

while (Thread.activeCount() > 1) {
// 保证前面的线程都执行完
			Thread.yield();
		}
		System.out.println(test.a);
	}
}

  

 目的:10个线程将inc加到10000。
结果:每次运行,得到的结果都小于10000。

原因分析:

首先来看a++操作,其实包括三个操作: 

①读取a=0; 

②计算0+1=1; 

③将1赋值给a; 

保证a++的原子性,就是保证这三个操作在一个线程没有执行完之前,不能被其他线程执行。

 

关键一步:线程2在读取a的值时,线程1还没有完成a=1的赋值操作,导致线程2的计算结果也是a=1。

问题在于没有保证a++操作的原子性。如果保证a++的原子性,线程1在执行完三个操作之前,线程2不能执行a++,那么就可以保证在线程2执行a++时,读取到a=1,从而得到正确的结果。

 

有序性

 有序性:程序执行的顺序按照代码的先后顺序执行。导致乱序的原因有:指令的重排序和存储子系统的重排序。分别来自编译器处理器和高速缓存写缓冲器。

编译器为了优化性能,有时候会改变程序中语句的先后顺序。例如程序中:“a=6;b=7;”编译器优化后可能变成“b=7;a=6;”,在这个例子中,编译器调整了语句的顺序,但是不影响程序的最终结果。不过有时候编译器及解释器的优化可能导致意想不到的Bug。

举例:

public class Singleton {
  static Singleton instance;
  static Singleton getInstance(){
    if (instance == null) {
      synchronized(Singleton.class) {
        if (instance == null)
          instance = new Singleton();
        }
    }
    return instance;
  }
}

  

 

在获取实例getInstance()的方法中,我们首先判断 instance是否为空,如果为空,则锁定 Singleton.class并再次检查instance是否为空,如果还为空则创建Singleton的一个实例。
看似很完美,既保证了线程完全的初始化单例,又经过判断instance为null时再用synchronized同步加锁。但是还有问题!

instance = new Singleton(); 创建对象的代码,分为三步:
①分配内存空间
②初始化对象Singleton
③将内存空间的地址赋值给instance

但是这三步经过重排之后:
①分配内存空间
②将内存空间的地址赋值给instance
③初始化对象Singleton

会导致什么结果呢?

线程A先执行getInstance()方法,当执行完指令②时恰好发生了线程切换,切换到了线程B上;如果此时线程B也执行getInstance()方法,那么线程B在执行第一个判断时会发现instance!=null,所以直接返回instance,而此时的instance是没有初始化过的,如果我们这个时候访问instance的成员变量就可能触发空指针异常。

执行时序图:

 

 总结

并发编程的本质就是解决三大问题:原子性、可见性、有序性。

原子性:一个或者多个操作在 CPU 执行的过程中不被中断的特性。由于线程的切换,导致多个线程同时执行同一段代码,带来的原子性问题。

可见性:一个线程对共享变量的修改,另外一个线程能够立刻看到。缓存不能及时刷新导致了可见性问题。

有序性:程序执行的顺序按照代码的先后顺序执行。编译器为了优化性能而改变程序中语句的先后顺序,导致有序性问题。

启发:线程的切换、缓存及编译优化都是为了提高性能,但是引发了并发编程的问题。这也告诉我们技术在解决一个问题时,必然会带来另一个问题,需要我们提前考虑新技术带来的问题以规避风险。

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

Guess you like

Origin www.cnblogs.com/boanxin/p/11706795.html