Locks are essential in multithreading. They provide synchronization functions for multithreading, so that multithreading can execute synchronized blocks with mutual exclusion and visibility .

This article will start from the happens-before relationship, combined with the ReentranLock source code, how to use the memory barrier, CAS operation, and LOCK instruction to implement the lock function.

Lock happens-before relationship

happens-before rule

Program order rules: in a thread, the previous operation happens-before the subsequent operation

Lock rules: For the same lock, unlock happens-before lock.

Transitive rule: A happens-before B, B happens-before C, then A happens-before C

Looking at the happens-before relationship from this code, thread A executes store() first, and thread B executes load() after

int value = 0;
boolean finish = 0;

//线程A
voidstore(){
    //A：加锁前的操作
    synchronized(this){ //B：加锁
        value = 1;      //C：写value
        finish = true;  //D：写finish
    }                   //E：解锁
    //F：解锁后的操作
}

//线程B
voidload(){
    //G：加锁前的操作
    synchronized(this){ //H：加锁
        if(finish){     //I：读finish
            assert value == 1; //J：读value
        }
    }                   //K：解锁
    //L：解锁后的操作
}

Here are 13 happens-before relationships. ①～⑤ is the program sequence relationship of thread A, ⑥～⑩ is the program sequence relationship of thread B, ⑪ is the lock rule relationship, and ⑫～⑬ is the transitive relationship

lock happens-before relationship

Analyze visibility from happens-before relationships

①~⑩ According to the program order rules, as long as the data-dependent instructions are not reordered, the execution result is correct, and the visibility within a single thread can be guaranteed.

⑪ According to the lock rule, E happens-before H, that is, thread A unlocks happens-before thread B locks .

⑫According to the transitive rule, all operations before thread A unlock need to be visible to thread B. ABCDE happens-before H, that is, thread A unlocks and its previous operation happens-before thread B locks .

⑬再根据传递性规则，线程A解锁前的操作都需要对线程B加锁之后的操作可见，ABCDE happens-before HIJKL，最终得出线程A解锁及其先前操作 happens-before 线程B加锁及其后续操作。

这样来看，为了保证解锁及其之前操作的可见性，需要把解锁线程的本地内存刷新到主内存去。同时为了保证加锁线程读到最新的值，需要将本地内存的共享变量设为无效，重新从主内存中读取。

实现锁的原理

前面得出来的锁的可见性：线程A解锁及其先前操作 happens-before 线程B加锁及其后续操作

将前面得出的可见性分解为三个等级：

线程A解锁 happens-before 线程B加锁
线程A解锁及其先前操作 happens-before 线程B加锁
线程A解锁及其先前操作 happens-before 线程B加锁及其后续操作

由于这是在多线程间实现可见性，那么就要考虑本地内存和主内存的缓存不一致问题，需要用到JMM的内存屏障：

memory barrier

逐级的实现可见性：

1) 对于第一级可见性，线程A解锁需要对线程B加锁可见，在多线程间的，会引发缓存不一致，所以要把线程A的本地内存刷新到主内存去。所以在解锁、加锁之间需要加写读内存屏障，这里有两种实现方式：

在线程A解锁后加StoreLoad Barrier
在线程B加锁前，加StoreLoad Barrier。

在常用的开发模式中，常常是一个线程负责写，多个线程负责读，典型的像生产者-消费者模式。所以相较后者，前者的内存屏障执行次数少，性能高。采用第一种实现方式比较好。

2) 对于第二级可见性，线程A解锁前的操作需要对加锁可见，也就是线程A解锁前的操作不能被重排序到解锁后。由于只有写操作会对改变共享变量，所以需要在解锁前加上StoreStore Barrier。

3) 对于第三级可见性，线程B加锁之后的读写操作不能重排序到加锁前，否则线程B可能读不到线程A的操作结果，以及线程B可能在线程A之前修改了共享变量。所以需要在线程B加锁后加上LoadLoad Barrier 和 LoadStore Barrier。

综上所述：

解锁前加StoreStore Barrier
解锁后加StoreLoad Barrier
加锁后加LoadLoad Barrier 和LoadStore Barrier

加上内存屏障后的程序：

int value = 0;
boolean finish = 0;

//线程A
voidstore(){
    //A：加锁前的操作
    synchronized(this){ //B：加锁
        loadLoadBarrier();
        loadStoreBarrier();
        value = 1;      //C：写value
        finish = true;  //D：写finish
        storeStoreBarrier();
                        //E：解锁
        storeLoadBarrier();
    }                   
    //F：解锁后的操作
}

//线程B
voidload(){
    //G：加锁前的操作
    synchronized(this){ //H：加锁
        loadLoadBarrier();
        loadStoreBarrier();
        if(finish){     //I：读finish
            assert value == 1; //J：读value
        }
        storeStoreBarrier();
                        //K：解锁
        storeLoadBarrier();
    }
    //L：解锁后的操作
}

分析锁的源码

Java提供的锁可以分为两种：隐形锁和显性锁。隐形锁就是常用的synchronized语句，是由Java语法提供的，语法的源码比较难找。在这里用显性锁的源码去分析，显性锁实际上是Java中的一个工具类，允许以调用函数的形式去加锁解锁。从功能上看显性锁的功能更强大，因为其能通过继承实现不同算法的锁，以便根据实际情况选择合适的锁。这里使用ReentrantLock去分析源码。

在前面实现锁的原理中，得出实现可见性的原理是在加锁解锁前后加上内存屏障。乍一看这不是和volatile的原理是一模一样的吗，连使用的内存屏障种类顺序都一样。所以在ReentrantLock中，他复用了volatile提供的可见性，并没有再去写内存屏障。

在ReentrantLock中，他有一个变量state是volatile的（继承自AbstractQueuedSynchorinizer）。解锁-加锁分别是由写-读state这个volatile变量去实现的。这个state变量可以理解成所被重入的次数（ReentrantLock是可重入锁），0表示没有线程拥有该锁，2表示被拥有者连续拥有了两次且没有释放。

ReentranLoack分为公平锁和不公平锁，下面分别看看这两种锁在解锁加锁的源码。

解锁的实现

公平锁和不公平锁的对于解锁的实现都是一样的，都是写state变量。最后都是调用ReentranLock.Sync.tryRelease()

//在java.util.concurrent.locks.ReentranLock.Sync.tryRelease()
protectedfinalbooleantryRelease(int releases) {
    int c = getState() - releases;
    if (Thread.currentThread() != getExclusiveOwnerThread())//如果当前线程不是该锁的拥有者则抛出异常
        throw new IllegalMonitorStateException();
    boolean free = false;//锁是否可用
    if (c == 0) {//state=0 表示该持有线程完全释放该锁，需要设置free为可用状态以及拥有者线程置空
        free = true;
        setExclusiveOwnerThread(null);
    }
    setState(c);//在释放锁的最后，写state
    return free;
}

根据volatile原理知道，写state这个volatile变量也就相当于

storeStoreBarrier();
解锁;
storeLoadBarrier();

这样的内存屏障和前面锁原理分析的是一样的，所以写volatile与解锁有一样的功能，也就能使用写volatile的方式实现解锁。

加锁的实现

加锁中，公平锁和不公平锁实现的方式就有很大的不同了。公平锁使用的是读volatile，不公平锁使用的是CompareAndSet(CAS)。

公平锁的加锁实现

先看公平锁的读state加锁实现，核心代码在ReentranLock.FairSync.tryAcquire()。

//在java.util.concurrent.locks.ReentranLock.FairSync.tryAcquire()
protectedfinalbooleantryAcquire(int acquires) {
    final Thread current = Thread.currentThread();
    int c = getState();//在加锁的一开始，读state
    if (c == 0) {//锁处于可用状态
        if (!hasQueuedPredecessors() &&
            compareAndSetState(0, acquires)) {
            setExclusiveOwnerThread(current);//设置锁被当前线程拥有
            return true;
        }
    }
    else if (current == getExclusiveOwnerThread()) {//state>0，重入了
        int nextc = c + acquires;
        if (nextc < 0)
            throw new Error("Maximum lock count exceeded");//超过最大重入次数2147483648（最大的int）
        setState(nextc);//更新state
        return true;
    }
    return false;
}

根据volatile原理知道，读state这个volatile变量也就相当于

加锁;
loadLoadBarrier();
loadStoreBarrier();

这样的内存屏障和前面锁原理分析的是一样的，所以读volatile与加锁有一样的功能，也就能使用读volatile的方式实现加锁。

不公平锁的加锁实现

//在java.util.concurrent.locks.ReentranLock.NoFairSync.lock()
finalvoidlock() {
    if (compareAndSetState(0, 1))//如果该锁可用，则占有
        setExclusiveOwnerThread(Thread.currentThread());
    else//尝试重入
        acquire(1);
}
//在java.util.concurrent.locks.AbstractQueuedSynchronizer.compareAndSetState()
protectedfinalbooleancompareAndSetState(int expect, int update) {
    return unsafe.compareAndSwapInt(this, stateOffset, expect, update);
}

如果该锁没占用的时候，调用的是unsafe.compareAndSwapInt()，这是一个CAS操作。如果该锁已经被占有了，尝试重入，这部分的代码是使用和公平锁一样的读state方式实现的。

unsafe.compareAndSwapInt()这是一个native方法，是用JNI调用C++或者汇编的，需要到openjdk看，位置在：openjdk-7-fcs-src-b147-
27_jun_2011\openjdk\hotspot\src\os_cpu\windows_x86\vm\atomic_windows_x86.inline.hpp

//CAS源码：
inline jint Atomic::cmpxchg (jint exchange_value, volatile jint* dest,
        jint compare_value) {
        // alternative for InterlockedCompareExchange
    int mp = os::is_MP();//是否为多核心处理器
    __asm {
        mov edx, dest           //要修改的地址，也就是state变量
        mov ecx, exchange_value //新值值
        mov eax, compare_value  //期待值
        LOCK_IF_MP(mp)          //如果是多处理器，在下面指令前加上LOCK前缀
        cmpxchg dword ptr [edx], ecx//[edx]与eax对比，相同则[edx]=ecx，否则不操作
    }
}

这里看到有一个LOCK_IF_MP，作用是如果是多处理器，在指令前加上LOCK前缀，因为在单处理器中，是不会存在缓存不一致的问题的，所有线程都在一个CPU上跑，使用同一个缓存区，也就不存在本地内存与主内存不一致的问题，不会造成可见性问题。然而在多核处理器中，共享内存需要从写缓存中刷新到主内存中去，并遵循缓存一致性协议通知其他处理器更新缓存。
Lock在这里的作用：

在cmpxchg执行期间，锁住内存地址[edx]，其他处理器不能访问该内存，保证原子性。即使是在32位机器上修改64位的内存也可以保证原子性。
将本处理器上写缓存全部强制写回主存中去，也就是写屏障，保证每个线程的本地内存与主存一致。
禁止cmpxchg与前后任何指令重排序，防止指令重排序。

It can be seen that the CAS operation has the same function as reading and writing volatile variables, and both can ensure visibility.

The realization principle of lock