Multithreading rationale -2

Preliminary understanding Volatile

Reflections caused by a piece of code 
The following code demonstrates the use of a volatile and did not use the volatile keyword, impact variables updated
public class VolatileDemo {
  public /*volatile*/ static boolean stop=false;
  public static void main(String[] args) throws InterruptedException {
    Thread thread=new Thread(()->{
      int i=0;
      while(!stop){
        i++;
      }
    });
    thread.start();
    System.out.println("begin start thread");
    Thread.sleep(1000);
    stop=true;
  } }
volatile role
volatile can make to ensure the visibility of shared variables in a multi-processor environment, then in the end what is the visibility it? I do not know if you have not thought about this problem, if a value to a variable is written first, and then read the value of this variable in the absence of written interference in single-threaded environment, then this time to read this variable the value should be written before that value. It would have been a normal thing. But in a multithreaded environment, when read and write occur in different threads may occur: Read thread can not read in a timely manner to the latest value of the other threads written. This is called the visibility in order to achieve visibility of memory writes across threads, you must use some mechanism to achieve. The volatile is such a mechanism
How volatile keyword is to ensure visibility?
We can use this tool [hsdis], to see the front of the assembly instruction This code demonstrates, the specific use please see Instructions can run code, set the following parameters jvm [-server -Xcomp -XX: + UnlockDiagnosticVMOptions - XX: + PrintAssembly -XX:. CompileCommand = compileonly, * App * (replaced by the actual operation code)] and then outputs the result, the lock find the next instruction will be found, when modified with a member variable that volatile, It will be one more lock instruction. a lock control instruction in a multiprocessor environment, lock mechanisms may be based on a bus assembler instruction cache lock or a lock to achieve the effect of visibility. To give you a better understanding of the nature of the visibility we need from the hardware layer
Face sort out
Understand the nature of visibility from the hardware level
A computer in the core component is CPU, memory, and I / O devices. In the development of the entire computer, in addition to the CPU, memory and I / O devices continue iterative upgrade to improve computer processing performance, there are a very central point of contradiction, it is the difference in the three processing speed. Calculation speed is very fast CPU, memory followed, and finally the IO device such as a disk. In most of the treatment program, there will be certain memory access, and some may also exist to access I / O devices in order to enhance computing performance, CPU upgrade from single-core to multicore even use the Hyper-Threading technology to maximize the increase in CPU performance, but only to enhance CPU performance is not enough, if the processing performance of the two have not kept pace behind, meaning that the overall computational efficiency depends on the slowest device. In order to balance the three speed difference, maximize the use of CPU to enhance performance, in terms of hardware, operating systems, compilers, etc. have made a lot of optimization
1. CPU cache increases
2. The operating system adds processes, threads. Toggle maximized lifting CPU usage by CPU time slice of
3. optimizing compiler directive, more reasonable to use good CPU cache and then optimize each, will bring the corresponding problems, and these problems are the root cause of thread safety issues. In order to understand the nature of the visibility problem mentioned earlier, we need to understand the optimization process
CPU cache
The thread is the smallest unit CPU scheduling of threads of the final design is still better use of computer processing performance, but the vast majority of computing tasks can not rely solely on processor "computing" can be done, need to interact with the processor memory , such as reading operation data, the operation result is stored, the I / O operation is difficult to eliminate. Since the gap between the operation speed of a computer storage device and the processor is very large, so that a modern computer system will increase the speed as close as possible one read operation speed of the processor cache as a buffer between the memory and the processor: the operation copy the data you want to use into the cache, so that operations can be carried out quickly, and then when the end of the operation from the cache synchronized into memory. By storing a cache of interaction good solution to the contradiction between the processor and memory speed, but it also brings greater complexity of computer systems, because it introduces a new problem, cache coherency.
What is the cache coherency it?
First, with the presence of the cache after processing of each CPU is to first calculate the cached data to be used in the CPU cache, the CPU calculates the data is read directly from the cache and calculation after completing written to the cache. After the whole operation process is completed, then the data cache to the main memory sync. Because of the multiple CPU types, each thread may run in different CPU, and each thread has its own cache. The same data may be cached in multiple CPU, if different threads running in different CPU seen with a cache memory is not the same value will be inconsistent caching problems. In order to solve the problem of inconsistent cache, the CPU level to do a lot of things,
It offers two main solutions
1. Bus locks
2. Cache lock
Bus and cache lock lock
Bus key, simply means that, in a multi-cpu, when one of the processors to the shared memory operation, issue a LOCK # signal on the bus, this signal so that other processors can not access to the shared memory via bus the data bus lock the communication between the CPU and memory of the lock, which makes during locking, other processors can not operate the other data memory address, the bus lock overhead is relatively large, this mechanism is clearly inappropriate .
How to optimize it? The best way is to protect the lock granularity of control, we only need to ensure that the same data is cached for multiple CPU is the same on the line. So the introduction of the cache lock mechanism is based on its core cache coherency protocol to achieve.
Cache coherency protocol
In order to achieve consistent data access, each processor needs to follow some protocol when accessing the cache, operate according to the protocol at the time of reading and writing, there is common agreement MSI, MESI, MOSI and so on. The most common is the MESI protocol. Next to you briefly explain MESI
MESI cache line represents the four states, namely,
1. M (Modify) represents only the shared data cached in the current CPU cache, and is a modified state, i.e. inconsistent data cache and the main memory data
2. E (Exclusive) exclusive state indicates the cache, the data cache only the current CPU cache and is not modified
3. S (Shared) may be data representing a plurality of CPU cache and main memory, and consistent data for each data cache
4. I (Invalid) cache has expired represents
In the MESI protocol, each cache cache controller not only know their own reading and writing, but also to monitor (snoop) Other Cache MESI protocol for read and write operations, read and write from the perspective of CPU will follow the following principles:
CPU read request: the cache is M, E, S status may be read, I status of the CPU can only read data from main memory
CPU write request: the cache is in M, E status before they can be written. For the S state of writing, it requires the other CPU cache line invalidated before writing
After using the bus key and lock mechanism cache, CPU to memory operations can be abstracted probably in a structure below. So as to achieve the effect of cache coherency.
Summed up the essence of visibility
It occurs due to CPU cache so that if a plurality of simultaneously cpu same shared data cache, there may be problems of visibility. That is CPU0 changes the value of their own local cache for CPU1 invisible. Consequences are not visible when the CPU1 in the subsequent data write operation, dirty data is used. Such that the final result of unpredictable data. Many students want to definitely want to look at the code inside to simulate the visibility of the problem, in fact, this is difficult to simulate. Because we can not allow a thread to specify a particular CPU, which is the system underlying algorithm, JVM should also can not be controlled. And most important point is that you can not predict when CPU cache stores the value passed to main memory, this time interval may be very short, as short as you can not be observed. The last question is the order of execution of the thread, because you can not control multi-threaded code is a sentence which thread will execute a sentence immediately after the code that another thread. So we can only based on its principle objective to understand the fact that such a presence, learned here, you should have a question, just not to say or bus-based cache coherency protocol lock cache coherency can meet the requirements of it?
Why also you need to add the volatile keyword? Or why would existence visibility problem?
MESI optimize visibility problems caused by
Although MESI cache coherency protocol can be achieved, but also there are some problems. Is the status of each CPU cache line is performed by message passing. If CPU0 to a shared variable is written in the cache, you first need to send a failure message to the CPU to the other cache data. And to wait until their confirmation receipt. CPU0 during this time will be in a blocked state. In order to avoid the waste of resources caused by obstruction. Store Bufferes introduced in the cpu. CPU0 only when writing shared data, write data directly to the store bufferes while sending invalidate message, and then continue to process other instructions.
Upon receipt of all other invalidate acknowledge the message sent by the CPU, then the data storage to store bufferes in the cache line. Finally, synchronized from the cache line to the main memory.
                                                                 
But there are two problems with this optimization
1. When submitting data is uncertain because of the need to wait for the other to reply cpu data will not be synchronized. In fact, here it is an asynchronous operation
2. After the introduction of storebufferes, the processor will attempt to read values ​​from storebuffer, if there is data storebuffer, then read directly from storebuffer, otherwise it is then read from the cache line
We look at an example
int value =3;
void exeToCPU0(){
    value =10;
    isFinsh = true;
}
void exeToCPU1(){
    if (isFinsh){
        assert value =10;
    }
}   
exeToCPU0 and exeToCPU1 are performed on two separate CPU. If the CPU0 cache line isFinish the shared variable is cached, and the state of (E), and Value may be (S) state. Then this time, the CPU0 when executed, will first command value = 10 is written into the storebuffer. And inform other CPU cache of the value of the variable. While waiting for notification of the result of the other CPU, CPU0 will continue isFinish = true directive. And because this is the CPU0 cache isFinish and Exclusive state, you can modify isFinish = true. This time CPU1 initiates read operations to read isFinish value may be true, but the value is not equal to the value of 10. In this case we can be considered out of order execution CPU, it can be considered a re-ordering, reordering and this will bring the issue of visibility. This time hardware engineers crazy, we can also understand, from the hardware level before and this is difficult to know the level of dependency on the software, there is no way to go through some means automatically resolved. So hardware engineer said: Since how to optimize not meet your requirements, or you write it. Therefore, the level of CPU provides instruction memory barrier (memory barrier) from a hardware perspective this memroy barrier is the CPU flushstore bufferes the instructions. The software level can decide the appropriate place to insert memory barrier.
CPU level memory barrier
What is a memory barrier? From the foregoing contents can basically have an initial guess, memory barriers is to write instructions to store bufferes in memory, so that the visibility of the other threads access the same shared memory. X86's memory barrier instruction includes lfence (read barrier) sfence (write barrier) mfence (full barrier).
Data Store Memory Barrier (write barrier) tells the processor before the write barrier of all is already stored in the cache memory (store bufferes) is synchronized to the main memory, it is simply making instruction before reading the results of the write barrier after barrier or write is visible.
Load Memory Barrier (read barrier) processor read operation after a read barrier, is executed after the read barrier. With the write barrier, making the memory update before the write barrier for barrier after reading the read operation is visible
After Full Memory Barrier (full barrier) to ensure that the barrier before the memory read and write operations result committed to memory, and then perform read and write operations after the barrier.
With the memory barrier in the future, for the above example, we can be so changed,
int value =. 3 ;
 void exeToCPU0 () { 
    value = 10 ; 
    storeMemoryBarrier; // implicit write insert a barrier to the value = 10 flush to the main memory 
    isFinsh = to true ; 
} 
void exeToCPU1 () {
     IF (isFinsh) { 
        loadMemoryBarrier; // implicitly a read barrier is inserted, so that the cpu value = 10 acquired from main memory 
        Assert value = 10 ; 
    } 
}   
Thereby avoiding problems can occur, in general, the memory barrier action by preventing the CPU memory access out of order to ensure visibility of the shared data is performed in parallel in multiple threads. But this barrier how to add to it? Back to the beginning of our talk volatile off
Key code word, this keyword will generate a Lock of assembly instructions, the instructions in fact to achieve a kind of memory barrier problem again this time, memory barriers, reorder and these things like hardware platform and architecture have Relationship. As the Java language features, many write once run. We should not consider issues related to the platform, and these so-called memory barriers should not let the programmer to care.
Jnn
JMM stands for Java Memory Model. JMM What is it? By analyzing the root causes of previous findings, leading to visibility problems is caching and reordering. The JMM actually provides a reasonable method to disable caching and prohibit reordering. So it is the core value is to address the visibility and orderliness.
JMM belong to the language level of abstract memory model can be simply understood as an abstraction of the hardware model, which defines the specifications shared memory multi-threaded programs read and write operations behavior: in a virtual machine to a shared variable is stored in memory and removed from memory shared variable by the underlying implementation details of these rules to regulate read and write operations to memory so as to ensure the correctness of the instruction, it solves the CPU multi-level cache, the processor optimization, reordering memory access instructions cause problems, ensuring concurrent scenario of visibility. Note that, JMM does not limit the execution engine using a processor register or cache to improve instruction execution speed, there is no limitation compiler reorders the instructions, that is to say in the JMM, cache coherency problem also exists and instruction reordering problem. JMM just the problem underlying surface abstraction to the JVM, then based on the instruction from the CPU level memory barrier provided, and reordering compiler limitations to resolve concurrency problems.
jMM abstract model divided into the main memory, working memory; main memory is shared by all threads, an object instance typically is variable, static fields, and the like are stored in an array of objects in the heap memory. Working memory are exclusive of each thread, the thread all operations on the variables must be in working memory, the main memory can not be directly read and write variables, pass variable values ​​shared between threads are done Java-based main memory memory model underlying implementation simply that: prohibits the reordering memory barrier by (memory barrier), time compiler depending on the underlying architecture, the memory barrier to replace these particular CPU instruction. For the compiler, memory barriers will limit reorder it can do optimization. And for the processor, memory barrier will lead to the cache refresh operation. For example, for the volatile, the compiler will insert some memory barrier for each read and write operations before and after the volatile field.
JMM is how to solve the problem of visibility ordering
In simple terms, JMM disable caching and provide some reordering of binary method to solve the visibility and order problem. These methods we are familiar with: volatile, synchronized, final; and
How JMM order to solve the consistency problem
Reordering problem
In order to improve the performance of the program execution, the compiler and processor instructions do reordered reordering wherein the processor in the previously analyzed. The so-called reordering actually refers to the sequence of instructions executed. Reordering compiler refers to instructions written in a program after compilation, the reordering of instructions may cause the program to optimize the execution performance. From source to final execution of the instruction, it may go through three kinds of reordering.
2 and 3 belong to the reordering processor. The reordering may cause visibility problems. Reordering compiler, JMM provides a compilation ban a specific type of high regard sort. Processor reordering, JMM will ask the compiler when generating the instruction inserts a memory barrier to prohibit processor reordering Of course, not all the principles of the program will appear reordering problems compiler and CPU reordering of the reordering of the same, we will comply with data dependent principle, the compiler and the processor does not change the execution order of the data dependency exists two operations, such as the following code,
a=1; b=a;
a=1;a=2;
a=b;b=1;
Three cases in which a single thread execution order if the code changes will lead to inconsistent results, it will not do the reordering optimization of such instructions. This rule has become as-if-serial. No matter how reordering for a single thread all the results can not be changed. such as
int a=2; //1
int b=3; //2
int rs=a*b; //3
1 and 3, 2 and 3 there is data dependent, so in the final execution of the instruction 3 can not be reordered before 1 and 2, otherwise the program error. 1 and 2 since the data dependency does not exist, it is possible to rearrange the order of 1 and 2
JMM-level memory barrier
In order to ensure the visibility of memory, Java compiler generated instruction sequence is inserted in place reordering memory barrier to inhibit a particular type of processor, the memory barrier in the JMM divided into four categories

HappenBefore

It is the intention of the results of the previous operation for a subsequent operation are visible, so it is an expression memory between a plurality of threads for visibility. So we can believe in JMM, if a result of the execution of operations requires another operation courseware, these two operations must exist happens-before relationship. These two operations may be the same thread, it can be a different thread
What are some ways in JMM establish happen-before rule
Program sequence rules
1. A thread in each operation, happens-before any subsequent to the thread operation; may simply be considered as-if-serial. Code sequence of a single thread matter how they change, the result is the same for the order of rule representation A happenns-before B; B happens before C
2. volatile variable rule for write operations volatile variables must happen-before subsequent read operations for volatile variables; The volatile rule, 2 happens before 3
3. The transfer rule, if 1 happens-before 2; 3 happens before 4; then transitivity rule indicates: 1 happens-before 4;
4. start 规则,如果线程 A 执行操作 ThreadB.start(),那么线程 A 的 ThreadB.start()操作 happens-before 线程 B 中的任意操作
public StartDemo{
  int x=0;
  Thread t1 = new Thread(()->{
  // 主线程调用 t1.start() 之前
  // 所有对共享变量的修改,此处皆可见
  // 此例中,x==10
  });
  // 此处对共享变量 x 修改
  x = 10;
  // 主线程启动子线程
  t1.start();
}

 

5. join 规则,如果线程 A 执行操作 ThreadB.join()并成功返回,那么线程 B 中的任意操作 happens-before 于线程A 从 ThreadB.join()操作成功返回。
Thread t1 = new Thread(()->{
// 此处对共享变量 x 修改
x= 100;
});
// 例如此处对共享变量修改,// 则这个修改结果对线程 t1 可见
// 主线程启动子线程
t1.start();
t1.join()
// 子线程所有对共享变量的修改
// 在主线程调用 t1.join() 之后皆可见
// 此例中,x==100
6. 监视器锁的规则,对一个锁的解锁,happens-before 于随后对这个锁的加锁
synchronized (this) { // 此处自动加锁
// x 是共享变量, 初始值 =10
if (this.x < 12) {
this.x = 12;
}
} // 此处自动解锁
假设 x 的初始值是 10,线程 A 执行完代码块后 x 的值会变成 12(执行完自动释放锁),线程 B 进入代码块时,能够看到线程 A 对 x 的写操作,也就是线程 B 能够看到 x==12。

Guess you like

Origin www.cnblogs.com/qlsem/p/11482108.html