In-depth understanding of JMM and Happens-Before

Hello everyone, I am Wang Youzhi , welcome to chat with me about technology and life abroad.
Come and join our Java bucket-carrying group: Java people who get rich together .

Recently I am obsessed with P5R, so the progress is not ideal, but I have to say Gao Juan Xing YYDS. Without further ado, let's start today's topic, JMM and Happens-Before .

There aren't many questions about them, basically just two:

  • What are JMMs? Describe JMM in detail.

  • Tell me about your understanding of JMM, why is it designed like this?

Tips : This article focuses on JMM theory.

What are JMMs?

JMM is Java Memory Model, Java Memory Model . The explanation of the memory model in the JSR-133 FAQ is:

At the processor level, a memory model defines necessary and sufficient conditions for knowing that writes to memory by other processors are visible to the current processor, and writes by the current processor are visible to other processors.

At the processor level, the memory model defines necessary and sufficient conditions for processor cores to have visibility into each other's memory write operations . as well as:

Moreover, writes to memory can be moved earlier in a program; in this case, other threads might see a write before it actually “occurs” in the program. All of this flexibility is by design – by giving the compiler, runtime, or hardware the flexibility to execute operations in the optimal order, within the bounds of the memory model, we can achieve higher performance.

Allows the compiler, runtime, or hardware to execute instructions in an optimal order to improve performance, within the limits of the memory model . The optimal order is the instruction execution order obtained by instruction reordering.

Let's summarize the processor-level memory model:

  • Defines the visibility of write operations between cores ;

  • Instruction reordering is constrained .

Then look at the description of JMM:

The Java Memory Model describes what behaviors are legal in multithreaded code, and how threads may interact through memory.It describes the relationship between variables in a program and the low-level details of storing and retrieving them to and from memory or registers in a real computer system.It does this in a way that can be implemented correctly using a wide variety of hardware and a wide variety of compiler optimizations.

Extract key information from this passage:

  • JMM describes the legality of behavior in multithreading and how threads interact through memory ;

  • The implementation differences between hardware and compiler are shielded to achieve consistent memory access effects .

Let's look at the memory model, what exactly is JMM?

  • From the perspective of JVM, JMM shields the underlying differences of different hardware/platforms to achieve consistent memory access effects ;

  • From the perspective of Java developers, JMM defines the visibility of write operations between threads and constrains the reordering of instructions .

So why have a memory model?

"Weird" concurrency issues

The 8 questions you must know about threads (above) give the 3 elements of concurrent programming and the problems caused by the inability to implement them correctly. Next, let's explore the underlying reasons.

Tips : Add a little thread scheduling related content in Linux.

Linux thread scheduling is preemptive scheduling based on time slices . It is simply understood that the thread has not yet finished execution, but the time slice is exhausted and the thread is suspended. Linux selects the thread with the highest priority in the waiting queue to allocate time slices, so the priority is high. The thread will always be executed .

Atomicity problems caused by context switching

Let's take the common self-increment operation count++as an example. Intuitively, we think that the self-increment operation is done in one go without any pause. but actually produces 3 instructions:

  • Instruction 1: will countread into the cache;

  • Instruction 2: Execute self-increment operation;

  • Instruction 3: Write the self-incremented value countinto the memory.

So here comes the question, if two threads t1 and t2 perform countself-increment operations at the same time, and thread switching occurs after t1 executes instruction 1, what will happen at this time?

insert image description here

We expected the result to be 2, but actually got 1. This is the atomicity problem caused by thread switching. So doesn't prohibiting thread switching solve the atomicity problem?

Even so, the cost of prohibiting thread switching is too great . We know that CPU computing speed is "fast", while I/O operations are "slow". Just imagine, if you are using steam to download P5R, but the computer is stuck, you can only write bugs happily after downloading , are you angry?

Therefore, when a thread in the operating system performs an I/O operation, it will give up the CPU time slice and give it to other threads to improve the utilization rate of the CPU .

P5R is the best in the world! ! !

Visibility problems caused by caching

You may think that in the above example, aren't threads t1 and t2 operating on the same thread count?

It appears to be the same count, but is actually counta copy in memory in a different cache. Because, not only is there a huge speed difference between I/O and CPU, but also the difference between memory and CPU is not small. In order to make up for the difference, a CPU cache is added between memory and CPU .

When the CPU core operates memory data, it first copies the data to the cache, and then operates the data copy in the cache respectively.

insert image description here

Let's ignore the impact of MESI first, and we can get that the thread's modification of the variables in the cache is not immediately visible to other threads .

Tips : The basic content of the MESI protocol is supplemented in the expansion .

Ordering problems caused by instruction reordering

In addition to the above ways to improve the running speed, there are other "moths" - instruction reordering . Let's change the example from the 8 questions you must know about threads (above) .

public static class Singleton {

  private Singleton instance;

  public Singleton getInstance() {
    if (instance == null) {
      synchronized(this) {
        if (instance == null) {
          instance = new Singleton();
        }
      }
    }
    return instance;
  }

  private Singleton() {
  }
}

In Java, new Singleton()you need to go through 3 steps:

  1. Allocate memory;

  2. initialize Singletonthe object;

  3. will instancepoint to this memory.

Analyze the dependencies between these 3 steps. Memory allocation must be performed first, otherwise 2 and 3 cannot be performed. As for 2 and 3, no matter who executes first, it will not affect the correctness of the semantics under a single thread . There is no difference between them. dependency.

But when it comes to multi-threaded scenarios, the situation becomes complicated:

insert image description here

At this time, what thread t2 gets instanceis an instance object that has not been initialized, and the order problem caused by reordering occurs .

Tips : Supplementary instruction reordering in the expansion .

What did JMM do?

Before formally describing JMM, two other memory models were mentioned in JSR-133:

  • Sequential Consistency Memory Model

  • Happens-Before memory model

The sequentially consistent memory model prohibits compiler and processor optimizations and provides strong memory visibility guarantees . It requires:

  • During execution, all read/write operations have a total order relationship;

  • The operations in the thread must be executed in the order of the program;

  • Operations must execute atomically and be immediately visible to all threads.

The sequential consistency model is too restrictive and obviously not suitable as a memory model for programming languages ​​that support concurrency.

Happens-Before

Happens-Before describes the relationship between the results of two operations . Operation A happens-before operation B (denoted as Ahb​B ) . Even after reordering, the result of operation A should be visible to operation B.

Tips : Happens-Before is a causal relationship, Ahb​B is the "cause", the result of A is visible to B as the "effect", and the execution process is none of my business.

The rule of Happens-Before, we quote the translation in "The Art of Java Concurrent Programming":

Program order rules : each operation in a thread happens-before any subsequent operation in that thread. Monitor lock rules : The unlocking of a lock happens-before followed by the locking of this lock. Volatile variable rules : The writing of a volatile variable happens-before any subsequent reading of this volatile variable. Transitivity : If A happens-before B, and B happens-before C, then A happens-before C. start() rule : If thread A executes the operation ThreadB.start() (start thread B), then the ThreadB.start() operation of thread A happens-before any operation in thread B. join() rule : If thread A executes the operation ThreadB.join() and returns successfully, then any operation in thread B happens-before thread A returns successfully from the ThreadB.join() operation.

The above content appears in JSR-133 Chapter 5 Happens-Before and Synchronizes-With Edges , the original text is difficult to read.

These seem to be nonsense, but don't forget that we are facing a multi-threaded environment , compiler, and hardware reordering .

Again, take the monitor lock rule as an example. Although it only says that unlocking occurs before locking, the actual result after unlocking (success/failure) occurs before locking.

Tips : Happens-Before can be translated as happening before… , Synchronizes-With can be translated as synchronizing with… .

In addition, JSR-133 also mentions the rules for non-volatile variables :

The values that can be seen by a non-volatile read are determined by a rule known as happens-before consistency.

That is, the visibility of read operations on non-volatile variables is determined by happens-before consistency .

Happens-Before consistency : There is a write operation W and a read operation R on the variable V. If Whb​R is satisfied, the result of operation W is visible to operation R (the definition on JSR 133 interprets the rigor of scientists).

Although JMM does not accept all Happens-Before rules (enhanced), it can still be considered: Happens−Before rules ≈ JMM rules.

So why choose Happens-Before? In fact, it is the result of the trade-off between easy programming , constraint and operating efficiency .

insert image description here

In the picture, only the memory models mentioned more or less today are selected, among which X86/ARM refers to the hardware architecture system.

Although Happens-Before is the core of JMM, in addition, JMM also shields the differences between hardware; and provides Java developers with 3 concurrency primitives, synchronized, volatileandfinal .

Expand content

The theoretical content about the memory model and JMM is over. Here is a supplement to the concepts that appear in the article. Most of them are hardware-level content. If you are not interested, you can skip it directly.

Cache Coherency Protocol

Cache Coherence Protocol (Cache Coherence Protocol), consistency is not a common Consistency.

Coherence and Consistency often appear in concurrent programming, compilation optimization and distributed system design. It is easy to misunderstand if you only understand it from Chinese translation. In fact, the difference between the two is still very big. Let’s look at the consistency model in Wikipedia explanation of:

Consistency is different from coherence, which occurs in systems that are cached or cache-less, and is consistency of data with respect to all processors. Coherence deals with maintaining a global order in which writes to a single location or single variable are seen by all processors. Consistency deals with the ordering of operations to multiple locations with respect to all processors.

Obviously, if it is Coherence, it is aimed at a single variable, while Consistency is aimed at multiple connections.

MESI agreement

The MESI protocol is the most commonly used cache coherency protocol based on invalidation. MESI represents 4 states of the cache:

  • M (Modified, modified) , the data in the cache has been modified and is different from the data in the main memory.

  • E (Exclusive, exclusive) , the data only exists in the cache of the current core, and is the same as the main memory data.

  • S (Shared, shared) , the data exists in multiple cores, and is the same as the main memory data.

  • I (Invalid, invalid) , the data in the cache is invalid.

Tips : In addition to the MESI protocol, there are MSI protocol, MOSI protocol, MOESI protocol, etc. The initials describe the status, and O stands for Owned.

MESI is a guarantee made at the hardware level, which guarantees the read and write order of a variable on multiple cores .

Different CPU architectures have different implementations of MESI. For example, X86 introduces store buffer, and ARM introduces load buffer and invalid queue. The read/write buffer and invalid queue improve the speed but bring another problem.

instruction reordering

Reordering can be broken down into 3 categories:

  • Instruction parallel reordering : In the absence of data dependencies, the processor can optimize the execution order of instructions by itself;

  • Compiler optimized reordering : The compiler can rearrange the execution order of statements without changing the single-thread semantics;

  • Memory system reordering : Introduce store/load buffer and execute it asynchronously. It seems that instructions are executed "out of order".

The first two reorderings are easy to understand, but how to understand the reordering of the memory system?

Introduce store buffer, load buffer and invalid queue, and modify the original synchronous interaction process to asynchronous interaction. Although synchronous blocking is reduced, it also brings the possibility of "out of order".

Of course, reordering is not "no taboo", it has two bottom lines:

  • Data dependency : Two operations depend on the same data, and it includes write operations . At this time, there is a data dependency between the two operations. If there is a data dependency between two operations, the order of the two operations cannot be modified when the compiler or processor reorders ;

  • as-if-serial semantics : as-if-serial semantics does not mean that they are executed like a single-threaded scenario, but no matter how they are reordered, the semantics in a single-threaded scenario cannot be changed (or the execution result remains unchanged) .

    recommended reading

Readings on the memory model and the JMM

Although "Time, Clocks, and the Ordering of Events in a Distributed System" discusses issues in the distributed field, it also has a huge impact in the field of concurrent programming.

In addition, I have prepared "Time, Clocks, and the Ordering of Events in a Distributed System" and the Chinese and English versions of JSR-133, a total of 3 PDF files, just reply to [JSR133].

Finally, an interesting thing to say, the blogs of the big guys are very "simple".
Doug Lea's blog homepage:

insert image description here

Lamport's blog homepage:

insert image description here

epilogue

Recently I am addicted to P5R, I have been lazy~~

The content of JMM is very tangled, because when it comes to the principle of concurrency, it is never the programming language itself that is fighting. Every link from the CPU to the programming language is involved, so it is difficult to control the details of each part of the content. slightly.

But fortunately, I also understand the essence and reason of JMM . I hope this article will be helpful to you. Welcome to leave a message and correct me.


Well, that's all for today, Bye~~

Guess you like

Origin blog.csdn.net/wyz_1945/article/details/131689955