In-depth understanding of the Java Memory Model (nine) - understanding memory barrier

In -depth understanding of the Java Memory Model (seven) - Consistency (Consistency) we say that the hardware layer provides the ability to meet the needs of some consistency, Java memory model takes advantage of the hardware layer provides the ability to specify a set of grammar rules and so Java developers can isolate achieve this underlying focus on the development of concurrent logic. This look at how the hardware layer is to provide these capabilities to achieve consistency of demand.

Hardware layer provides a range of memory barriers memory barrier / memory fence (Intel formulation) the ability to provide consistency. Take the X86 platform, there are several main memory barrier

1. lfence, a read barrier Load Barrier

2. sfence, a write Store Barrier Barrier

3. mfence, is a versatile barrier, and ability ifence of sfence

4. Lock prefix, Lock memory is not a barrier, but it can complete a similar memory barrier function. Lock will lock cache and CPU bus, the CPU can be appreciated A lock instruction level. It can be followed with instructions ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCH8B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG like.

There are two memory barrier ability:

1. The sides of the barrier instruction prevents reordering

2. Force the dirty data write buffer / cache write-back to main memory, so that the corresponding data in the cache failure

Load Barrier for, in the read barrier is inserted before the read command, allowing data cache failure, the data from the main memory reload

For Store Barrier, the write barrier is inserted after the write command, the write cache allows the latest data is written back to main memory

Lock prefixes achieve similar capabilities,

All dirty data will refresh the cache back to main memory 1. It first bus / lock the cache, then execute subsequent instructions, and then release the lock.

2. Lock locked in the bus when other CPU read and write requests will be blocked until the lock is released. After Lock will write other CPU-related cache line failure, so be loaded from memory the latest data from the new. This is done through cache coherency protocol.

And then cite this article describes some of the more Lock prefix on multi-processor multi-threaded use _asm lock mutex instruction can guarantee it?

"From the P6 processor, if the instruction accessing the memory area already present in the internal cache of the processor, the" lock "LOCK prefix does not lead to the potential of down, but the lock of the present processor internal cache, then rely on cache coherency protocol to ensure atomicity of operations.

IA32 CPU calls have lock prefix instructions, such as instructions or xchg, will lead to other CPU also trigger certain actions to sync your Cache.
#Lock pin linked to the Northbridge chip CPU (North Bridge) #lock the pin, when the belt lock execution prefix, will pull #lock Northbridge chip
level, thereby locking the bus, until the instruction is finished and then open. The bus will lock automatically invalidate all CPU memory _ _ covered by the Directive
Cache, so the barrier will be able to ensure the consistency of all of the CPU Cache.

 lock prefix (or cpuid, xchg instructions, etc.) so that this CPU is written to Cache memory, the write operation can also cause other CPU the invalidate its Cache.
Within each IA32 CPU implements Snoopying (BUS-Watching) technology, monitors whether there has been a memory write operation (by a CPU or a DMA controller on the bus
emitted Ltd.'s), as long as it occurs, to invalidate the relevant Cache line . Therefore, as long as the lock prefix cause the CPU write memory, it will lead to
all of its CPU to invalidate the relevant Cache line. "

Memory barrier well understood concept, different ways to achieve different hardware memory barrier, Java memory model shield difference in the underlying hardware platform, a respective machine code is generated for different platforms by the JVM.

Look at the implementation of volatile

Some material says Java implementation volatile time used a similar mfence such as memory barriers, but I tested was found on the X86 platform volatile Lock prefix is ​​achieved, the test is JDK6 and 7.

                   

Volatile write assembly code is generated when the lock addl $ 0x0, (% rsp), use the lock prefix before the write operation, lock the bus and the corresponding address so that others write and read must wait to release the lock. When the writing is completed, the lock is released, the cache is flushed to main memory.

1. Read volatile on a good understanding, no additional assembly instructions, CPU cache found corresponding to the address is locked up, waiting for the release of the lock, cache coherency protocol will ensure that it reads the latest value.

2. only need to use a write-volatile lock to lock the bus on the line, so that other read, write wait for the bus is released to continue reading. Lock will invalide other CPU cache, and load data from memory again.

Look at the implementation of synchronized. JVM instruction is synchronized block generation monitorenter, monitorexit, final assembly instructions is generated

lock cmpxchg %r15, 0x16(%r10)  和 lock cmpxchg %r10, (%r11)

CAS is cmpxchg assembly instructions, meaning here is to use the bus lock and lock instruction cache, then cmpxchg CAS synchronized operation flag set in the object header. CAS is complete release the lock, refresh the cache to the main memory.

Therefore, the synchronized operation of the underlying meaning of the first lock flag bit provided with the object header lock cmpxchg way into the "locked" state, when the lock is released in a modified manner lock cmpxchg lock object header flag is "released" status write operations are written back to main memory at once. JVM will further those threads synchronized CAS failures of the blocking operation, which is part of the logic is not reflected in the lock cmpxchg instruction, I guess is achieved through some sort of semaphore. The former lock cmpxchg instruction ensures visibility and preventing reordering the latter to ensure the atomicity operations.

                      

                       

 

Published 169 original articles · won praise 6 · views 3510

Guess you like

Origin blog.csdn.net/weixin_42073629/article/details/104743748