1 单处理器编程无需过多关注系统的体系结构,而多处理器编程则需要,因为不同的体系结构将会对系统性能有很大的影响。
2 互斥协议提出一个问题,当获取不到锁的时候,线程有两种选择:
- keep trying,自旋,忙等待,当临界区域执行时间很短的时候,并且是多核处理器的时候,自旋是明智之举。
- blocking,挂起,让系统调度其他线程。由于线程切换会有较大的开销,只有当临界区执行时间较长时,才适合用阻塞法。
3 在多线程环境下,实现一个锁,一个要考虑共享变量的并发修改,要使用原子操作。因为线程访问的是线程工作内存,而不是共享内存。不同线程的并发修改对其他线程而言并不能及时可见。
例如,只有线程1和2,由于可见性原因,这把锁的实现,在并发下,会同时拿到锁,无法做到互斥。
1 class Peterson implementsLock { 2 private boolean[] flag =new boolean[2]; 3 private int victim; 4 public void lock() { 5 inti = ThreadID.get();// either 0 or 1 6 intj = 1-i; 7 flag[i] =true; 8 victim = i; 9 while(flag[j] && victim == i) {};// spin 10 } 11 }
4 Test-And-Set Locks
1 public class TASLock implements Lock { 2 AtomicBoolean state = new AtomicBoolean(false); 3 public void lock() { 4 while (state.getAndSet(true)) {} 5 } 6 public void unlock() { 7 state.set(false); 8 } 9 }
使用原子操作来保证共享变量state的可见性。考虑到cpu缓存的命中率,这个lock的实现在自旋会有大量的getAndSet,对频繁失效cpu的缓存,因此影响缓存命中率、降低性能。
他的升级版
1 public class TTASLock implements Lock { 2 AtomicBoolean state = new AtomicBoolean(false); 3 public void lock() { 4 while (true) { 5 while (state.get()) {}; 6 if (!state.getAndSet(true)) 7 return; 8 } 9 } 10 public void unlock() { 11 state.set(false); 12 } 13 }
自旋的时候,会先去读,当有机会获取锁的时候,才会去getAndSet。这样一来自旋对cpu高速缓存的失效频率就减少很多了,性能会有较大提升。但是他依旧避免不了,在竞争激烈的状况下getAndSet的高频率执行。
5 Exponential Backoff
指数回退,对上述实现的改良,当竞争锁失败的时候,线程休眠,随着失败次数的增多,休眠时间也相应增多。
1 public class Backoff { 2 final int minDelay, maxDelay; 3 int limit; 4 final Random random; 5 public Backoff(int min, int max) { 6 minDelay = min; 7 maxDelay = min; 8 limit = minDelay; 9 random = new Random(); 10 } 11 public void backoff() throws InterruptedException { 12 int delay = random.nextInt(limit); 13 limit = Math.min(maxDelay, 2 * limit); 14 Thread.sleep(delay); 15 } 16 }
1 public class BackoffLock implements Lock { 2 private AtomicBoolean state = new AtomicBoolean(false); 3 private static final int MIN_DELAY = ...; 4 private static final int MAX_DELAY = ...; 5 public void lock() { 6 Backoff backoff = new Backoff(MIN_DELAY, MAX_DELAY); 7 while (true) { 8 while (state.get()) {}; 9 if (!state.getAndSet(true)) { 10 return; 11 } else { 12 backoff.backoff(); 13 } 14 } 15 } 16 public void unlock() { 17 state.set(false); 18 } 19 ... 20 }
他的缺点是休眠时间难以界定,比较不可控。
上述的锁实现依然面临两个问题,一个是,自旋同一个内存区域导致cpu缓存失效频繁对命中率的影响,损害性能;另一个是难以控制回退时间,回退时间一长,临界区代码利用率会降低。
所以需要采用队列锁来解决。
6 Queue Locks
队列锁可以分散自旋的内存区域,减少自旋对cache失效的影响范围;提高临界区代码利用率,因为他不用去猜何时竞争锁;另外提供了fifo的服务,提高公平性。
7 Array-Based Locks
基于数组的锁,
1 public class ALock implements Lock { 2 ThreadLocal<Integer> mySlotIndex = new ThreadLocal<Integer> (){ 3 protected Integer initialValue() { 4 return 0; 5 } 6 }; 7 AtomicInteger tail; 8 boolean[] flag; 9 int size; 10 public ALock(int capacity) { 11 size = capacity; 12 tail = new AtomicInteger(0); 13 flag = new boolean[capacity]; 14 flag[0] = true; 15 } 16 public void lock() { 17 int slot = tail.getAndIncrement() % size; 18 mySlotIndex.set(slot); 19 while (! flag[slot]) {}; 20 } 21 public void unlock() { 22 int slot = mySlotIndex.get(); 23 flag[slot] = false; 24 flag[(slot + 1) % size] = true; 25
当竞争激烈时,依然会有较多的缓存失效。因为对齐的原因,cpu的缓存是以四个字节为一个单元的,也就是说当一个item(假设为一个字节)失效时,会同时失效和他同一个单元的其他item。解决办法是computing 4(i + 1) mod 32 instead of i + 1mod 8。
这个算法还可以避免饥饿的发生。但他还有一个严重的问题,就是空间复杂度,需要预先初始化固定长度的数组。当有n个线程竞争L个锁时,他需要O(Ln)的空间。
8 The CLH Queue Lock
采用虚拟的队列,每个节点关联一个线程。他的空间复杂度为O(L+n)。并且不需要预先分配数组长度。
1 public class CLHLock implements Lock { 2 AtomicReference<QNode> tail = new AtomicReference<QNode>(new QNode()); 3 ThreadLocal<QNode> myPred; 4 ThreadLocal<QNode> myNode; 5 public CLHLock() { 6 tail = new AtomicReference<QNode>(new QNode()); 7 myNode = new ThreadLocal<QNode>() { 8 protected QNode initialValue() { 9 return new QNode(); 10 } 11 }; 12 myPred = new ThreadLocal<QNode>() { 13 protected QNode initialValue() { 14 return null; 15 } 16 }; 17 } 18 ... 19 } 20 public void lock() { 21 QNode qnode = myNode.get(); 22 qnode.locked = true; 23 QNode pred = tail.getAndSet(qnode); 24 myPred.set(pred); 25 while (pred.locked) {} 26 } 27 public void unlock() { 28 QNode qnode = myNode.get(); 29 qnode.locked = false; 30 myNode.set(myPred.get()); 31 } 32 }
他适合在SMP的体系上使用,不适合NUMA,因为每个节点的前驱数组都有可能需要远程获取。
9 The MCS Queue Lock
用于解决NUMA的问题,他也是采用列表锁,但是他的列表是显式的,而CLH lock是虚拟的。
1 public class MCSLock implements Lock { 2 AtomicReference<QNode> tail; 3 ThreadLocal<QNode> myNode; 4 public MCSLock() { 5 queue = new AtomicReference<QNode>(null); 6 myNode = new ThreadLocal<QNode>() { 7 protected QNode initialValue() { 8 return new QNode(); 9 } 10 }; 11 } 12 ... 13 class QNode { 14 boolean locked = false; 15 QNode next = null; 16 } 17 } 18 public void lock() { 19 QNode qnode = myNode.get(); 20 QNode pred = tail.getAndSet(qnode); 21 if (pred != null) { 22 qnode.locked = true; 23 pred.next = qnode; 24 // wait until predecessor gives up the lock 25 while (qnode.locked) {} 26 } 27 } 28 public void unlock() { 29 QNode qnode = myNode.get(); 30 if (qnode.next == null) { 31 if (tail.compareAndSet(qnode, null)) 32 return; 33 // wait until predecessor fills in its next field 34 while (qnode.next == null) {} 35 } 36 qnode.next.locked = false; 37 qnode.next = null; 38 }
这种锁具备了CLH的优点,同时锁释放的时候,只会失效后继节点的缓存,他更加适合NUMA结构,因为每个线程控制他自旋的内存区域。空间复杂度也为O(L+n)。
他的缺点是释放锁需要自旋,读写、CAS的操作会更多。因此在SMP的体系结构下,CLH才是王道。
CLH还 有time out的版本,trylock的版本,提供超时控制;另外还有和指数回退结合的复合锁,先自旋,后回退。
1 public class TOLock implements Lock{ 2 static QNode AVAILABLE = new QNode(); 3 AtomicReference<QNode> tail; 4 ThreadLocal<QNode> myNode; 5 public TOLock() { 6 tail = new AtomicReference<QNode>(null); 7 myNode = new ThreadLocal<QNode>() { 8 protected QNode initialValue() { 9 return new QNode(); 10 } 11 }; 12 } 13 ... 14 static class QNode { 15 public QNode pred = null; 16 } 17 }
1 public boolean tryLock(long time, TimeUnit unit) 2 throws InterruptedException { 3 long startTime = System.currentTimeMillis(); 4 long patience = TimeUnit.MILLISECONDS.convert(time, unit); 5 QNode qnode = new QNode(); 6 myNode.set(qnode); 7 qnode.pred = null; 8 QNode myPred = tail.getAndSet(qnode); 9 if (myPred == null || myPred.pred == AVAILABLE) { 10 return true; 11 } 12 while (System.currentTimeMillis() - startTime < patience) { 13 QNode predPred = myPred.pred; 14 if (predPred == AVAILABLE) { 15 return true; 16 } else if (predPred != null) { 17 myPred = predPred; 18 } 19 } 20 if (!tail.compareAndSet(qnode, myPred)) 21 qnode.pred = myPred; 22 return false; 23 } 24 public void unlock() { 25 QNode qnode = myNode.get(); 26 if (!tail.compareAndSet(qnode, null)) 27 qnode.pred = AVAILABLE; 28 } 29 }
1 public class CompositeLock implements Lock{ 2 private static final int SIZE = ...; 3 private static final int MIN_BACKOFF = ...; 4 private static final int MAX_BACKOFF = ...; 5 AtomicStampedReference<QNode> tail; 6 QNode[] waiting; 7 Random random; 8 ThreadLocal<QNode> myNode = new ThreadLocal<QNode>() { 9 protected QNode initialValue() { return null; }; 10 }; 11 public CompositeLock() { 12 tail = new AtomicStampedReference<QNode>(null,0); 13 waiting = new QNode[SIZE]; 14 for (int i = 0; i < waiting.length; i++) { 15 waiting[i] = new QNode(); 16 } 17 random = new Random(); 18 } 19 public void unlock() { 20 QNode acqNode = myNode.get(); 21 acqNode.state.set(State.RELEASED); 22 myNode.set(null); 23 } 24 ... 25 }
1 enum State {FREE, WAITING, RELEASED, ABORTED}; 2 class QNode { 3 AtomicReference<State> state; 4 QNode pred; 5 public QNode() { 6 state = new AtomicReference<State>(State.FREE); 7 } 8 }
1 public boolean tryLock(long time, TimeUnit unit) 2 throws InterruptedException { 3 long patience = TimeUnit.MILLISECONDS.convert(time, unit); 4 long startTime = System.currentTimeMillis(); 5 Backoff backoff = new Backoff(MIN_BACKOFF, MAX_BACKOFF); 6 try { 7 QNode node = acquireQNode(backoff, startTime, patience); 8 QNode pred = spliceQNode(node, startTime, patience); 9 waitForPredecessor(pred, node, startTime, patience); 10 return true; 11 } catch (TimeoutException e) { 12 return false; 13 } 14 }
1 private QNode acquireQNode(Backoff backoff, long startTime, 2 long patience) 3 throws TimeoutException, InterruptedException { 4 QNode node = waiting[random.nextInt(SIZE)]; 5 QNode currTail; 6 int[] currStamp = {0}; 7 while (true) { 8 if (node.state.compareAndSet(State.FREE, State.WAITING)) { 9 return node; 10 } 11 currTail = tail.get(currStamp); 12 State state = node.state.get(); 13 if (state == State.ABORTED || state == State.RELEASED) { 14 if (node == currTail) { 15 QNode myPred = null; 16 if (state == State.ABORTED) { 17 myPred = node.pred; 18 } 19 if (tail.compareAndSet(currTail, myPred, 20 currStamp[0], currStamp[0]+1)) { 21 node.state.set(State.WAITING); 22 return node; 23 } 24 } 25 } 26 backoff.backoff(); 27 if (timeout(patience, startTime)) { 28 throw new TimeoutException(); 29 } 30 } 31 }
1 private QNode spliceQNode(QNode node, long startTime, long patience) 2 throws TimeoutException { 3 QNode currTail; 4 int[] currStamp = {0}; 5 do { 6 currTail = tail.get(currStamp); 7 if (timeout(startTime, patience)) { 8 node.state.set(State.FREE); 9 throw new TimeoutException(); 10 } 11 } while (!tail.compareAndSet(currTail, node, 12 currStamp[0], currStamp[0]+1)); 13 return currTail; 14 }
1 private void waitForPredecessor(QNode pred, QNode node, long startTime, 2 long patience) 3 throws TimeoutException { 4 int[] stamp = {0}; 5 if (pred == null) { 6 myNode.set(node); 7 return; 8 } 9 State predState = pred.state.get(); 10 while (predState != State.RELEASED) { 11 if (predState == State.ABORTED) { 12 QNode temp = pred; 13 pred = pred.pred; 14 temp.state.set(State.FREE); 15 } 16 if (timeout(patience, startTime)) { 17 node.pred = pred; 18 node.state.set(State.ABORTED); 19 throw new TimeoutException(); 20 } 21 predState = pred.state.get(); 22 } 23 pred.state.set(State.FREE); 24 myNode.set(node); 25 return; 26 }