Background
Speaking of concurrent programming problem, the first reaction most people think of a piece of code that appear when most cited example of thread safety:
...
i++; // 自增
...
复制代码
Then it is natural to think, since i++
this operation is actually in the bottom three operations:
tmp1 = i;
tmp2 = tmp1 + 1;
i = tmp2;
Therefore, i++
not an atomic operation, thread-safe in a multithreaded environment.
So the question is, if you want to achieve an accumulator, achieved under conditions of concurrent i++
functions, should be how to do?
such as:
- Interface methods in a record of how many times is called 1s
- In LFU (Least Frequently Used) algorithm statistics within a period of an object to be used many times
- ......
Scheme 1: AtomicLong
AtomicLong
Located java.util.concurrent.atomic
under the package, is a lock but no need to thread-safe and can be realized i++
or i += x
operation of the class, the author is Doug Lea.
Doug Lea Masters needless describes, in our view from his hand AtomicLong
efficient and thread-safe "no doubt." Of course, even if his name did not know we really should rethink, after all java.util.concurrent
packages covered under most of his class stamp it ......
AtomicLong
Class has a property value, for recording the value of this Long;
This value is volatile, indicating that modification in different threads on other threads are visible;
This class uses loop + CAS (Compare And Swap) of the lock-free thread-safe mode;
This class provides getAndAdd
, addAndGet
like the method implementation i++
, i += x
atomic and other operations.
In the AtomicLong
implementation, get and set methods directly read or modify the value, due to the volatile guarantee mechanism, these two operations are thread-safe.
The core of the method in Java7
compareAndSet
Method accepts two parameters, the expectations will be replaced to get the value and the actual value comparison, when the same update, otherwise fail. Of course, there is the ABA problem which, for the time being would not be discussed here.
By underlying instrument Unsafe
native methods compareAndSwapLong
to achieve CAS operation, this method is native support for computer hardware, it is possible to compete in a highly concurrent failure, because the real value has been modified by other threads lead to inconsistent compare results.
public final boolean compareAndSet(long expect, long update) {
return unsafe.compareAndSwapLong(this, valueOffset, expect, update);
}
复制代码
getAndSet
Read the old value write the new value, which is accomplished by circulating + CAS, CAS if the assignment fails, the other threads in the competition. This method keeps trying until success.
public final long getAndSet(long newValue) {
while (true) {
long current = get();
if (compareAndSet(current, newValue))
return current;
}
}
复制代码
Other core methods
There are several other methods commonly used in the core for effecting increment decrement operations:
getAndIncrement
i++getAndDecrement
i--getAndAdd
After the first value is calculatedincrementAndGet
++idecrementAndGet
--iaddAndGet
After calculating the first value
These methods are all realized by the circulation atomization + CAS manner to getAndIncrement
Example:
// getAndIncrement
while (true) {
long current = get();
long next = current + 1;
if (compareAndSet(current, next)) // 实际就是unsafe.compareAndSwapLong
return current;
}
复制代码
Implemented in the Java8
Unsafe added method getAndSetLong
, getAndAddLong
is actually in the Java7 AtomicLong
method of moving over.
In addition to compareAndSet
the method is still in use compareAndSwap
, other methods are completed by two getAndAdd method:
getAndIncrement
getAndDecrement
getAndAdd
incrementAndGet
decrementAndGet
addAndGet
Of course, getAndAddLong
and getAndSetLong
the method is still essentially implemented using CAS cycle, Java8 the Unsafe
class portion decompiled code:
public final long getAndAddLong(Object var1, long var2, long var4) {
long var6;
do {
var6 = this.getLongVolatile(var1, var2);
} while(!this.compareAndSwapLong(var1, var2, var6, var6 + var4));
return var6;
}
复制代码
However, according to: CAS 8 in AtomicLong the Java classes that are related to change how they work? - Code journal article saying, internal circulation in Java7 CAS method of conditional statements if there is a branch prediction optimization problems under high concurrency lead to less efficient. Original translation is very strange, I re-interpret what:
When the CAS in the circulation often fail, the CPU starts a branch prediction function to accelerate a desired efficiency; branch misprediction once - i.e. the CAS is successful - the processor stopping time consuming - Rollback - Thermal start up.
About branch predictor, we will post after introduction.
PS. Another argument is Java7 instruction is in the bottom CAS LOCK CMPXCHG, is in the LOCK XADD Java8, resulting in high efficiency in Java8 the CAS Java7 ratio. Description of reference https://bugs.java.com/bugdatabase/view_bug.do?bug_id=7023898 replace CMPXCHG with XADD I think should have been understood and implemented after JDK7u40 adaptive version of Java developers.
Program 2: LongAdder
In Java8, Doug Lea guru java.util.concurrent.atomic
adds several new classes package, which has a class called LongAdder
.
See the introduction, this class is for accumulating calculation of a highly concurrent.
This class is usually preferable to {@link AtomicLong} when multiple threads update a common sum that is used for purposes such as collecting statistics, not for fine-grained synchronization control. Under low update contention, the two classes have similar characteristics. But under high contention, expected throughput of this class is significantly higher, at the expense of higher space consumption.
This class (LongAdder) for multiple threads to update a cumulative value, such as for statistical and not for synchronization control, the effect will be even better than AtomicLong. At low conflict, similar to the features of these two classes, but in the case of high conflict, the expected throughput of this class will be higher at the expense of consuming more space.
LongAdder
Inherited Striped64
, we look at these two classes do, and why statistics for expected efficiency will be higher.
Striped64
Interior contains a subclass Cell
is actually AtomicLong
a subset of the functions, save only the value and CAS functions, but not others, such as get
, incrementAndGet
and other functions. And Cell
class uses the @Contended
annotation to avoid false sharing problem.
Internal defined transient volatile long base
for "save a part of the value"; also defines transient volatile Cell[] cells
, for "another portion of the stored value." Yeah, base and cells together to form the final value.
Striped64 core has two methods - longAccumulate
and doubleAccumulate
, similar logic, a process integer (e.g., LongAdder
using) a floating-point processing (e.g. DoubleAdder
use).
These two methods is more complex, a large amount of code, attach one here, we recommend a closer look, if you really do not want to see you can skip code is as follows:
final void longAccumulate(long x, LongBinaryOperator fn,
boolean wasUncontended) {
int h; // 探针值
if ((h = getProbe()) == 0) {
// 初始化一个探针值,其实就是一个跟线程相关的伪随机值
ThreadLocalRandom.current(); // force initialization
h = getProbe();
// 标记这个Striped64是否原来有值
// CAS失败调用longAccumulate方法时显然默认是false
wasUncontended = true;
}
// 线程操作碰撞标记,上一个槽非空时为true
// 也就是线程竞争碰撞时为true
boolean collide = false;
for (;;) {
Cell[] as; Cell a; int n; long v;
// 已经初始化的情况,绝大多数调用进入的分支
if ((as = cells) != null && (n = as.length) > 0) {
// 线程探针值对n取模,n是2的幂
// 实际上是找当前线程对应的Cell是否为null
if ((a = as[(n - 1) & h]) == null) {
if (cellsBusy == 0) { // 没人持有锁,尝试获取锁并添加Cell
Cell r = new Cell(x); // 乐观创建
if (cellsBusy == 0 && casCellsBusy()) { // CAS加锁
boolean created = false;
try { // 再次检查是否应当添加
Cell[] rs; int m, j;
if ((rs = cells) != null &&
(m = rs.length) > 0 &&
rs[j = (m - 1) & h] == null) {
rs[j] = r;
created = true;
}
} finally {
cellsBusy = 0; // 最后释放锁
}
// 如果创建成功了,跳出整个循环,计算结束
if (created)
break;
// 再次检查时这个槽已经被其他线程写入了,进入下一轮
continue; // Slot is now non-empty
}
}
// 这个槽是空的且CAS加锁失败的情况
collide = false;
}
// 线程对应的Cell有值了,且调用longAccumulate之前的CAS失败,之后的逻辑会重新计算探针值继续循环
// 下一次循环不会再进入这个分支,这个分支只进入一次
else if (!wasUncontended) // CAS already known to fail
wasUncontended = true; // Continue after rehash
// 线程对应的Cell有值,之前CAS也成功了,那么尝试正常计算并CAS设置这个Cell的value
else if (a.cas(v = a.value, ((fn == null) ? v + x :
fn.applyAsLong(v, x))))
break;
// 线程对应的Cell有值,之前CAS也成功了,但上一个分支条件中本线程对应的CAS失败了
else if (n >= NCPU || cells != as)
// 数组达到长度上限,或cells被其他线程并发修改了
// 清空collide,下一轮循环
collide = false; // At max size or stale
// 对应Cell有值,之前CAS成功,本次CAS失败,且数组长度未达上限,且未被其他线程修改
// 且collide标记为false
// 这个collide标记实际上是扩容前的最后一道防线
else if (!collide)
// 设置冲突标记
collide = true;
// 其他分支全部尝试过了且无效,最终方案加锁扩容
// 如果加锁扩容还失败,那继续循环
else if (cellsBusy == 0 && casCellsBusy()) {
try {
if (cells == as) { // Expand table unless stale
Cell[] rs = new Cell[n << 1];
for (int i = 0; i < n; ++i)
rs[i] = as[i];
cells = rs;
}
} finally {
cellsBusy = 0;
}
collide = false;
continue; // Retry with expanded table
}
h = advanceProbe(h);
}
// 未初始化,且cells未被其他线程扩容,且CAS获取到锁的情况
else if (cellsBusy == 0 && cells == as && casCellsBusy()) {
boolean init = false;
try { // Initialize table
// 获取锁后,再次检查
if (cells == as) {
// 创建length = 2的数组并添加当前值到线程对应的Cell
Cell[] rs = new Cell[2];
rs[h & 1] = new Cell(x);
cells = rs;
init = true;
}
} finally {
// 解锁
cellsBusy = 0;
}
// 本线程创建成功才跳出
// 如果cells又被其他线程扩容了,那就继续循环
if (init)
break;
}
// 未初始化cells,不巧被其他线程获取锁了,只好CAS修改base
// 如果CAS修改base还失败,那就继续循环
else if (casBase(v = base, ((fn == null) ? v + x :
fn.applyAsLong(v, x))))
break; // Fall back on using base
}
}
复制代码
Code above seems complicated, in fact, only the following core points:
- As described above,
Striped64
by the valuelong base
and theCell[] cells
two parts, wherein each thread is mapped to an array of cellsCell
- From the optimistic point of view, try not to lock, try to use CAS when competition for resources
- Not loop when there is no conflict, conflict and even under certain conditions will not cycle (Doug Lea consider very thoughtful)
- In essence, most of the branches are in conflict, the real core of only two branches:
- New or updated for the current thread corresponding
Cell
values - When the conflict is relatively large, expansion cells arrays, of course, there is the upper limit of the array length
- New or updated for the current thread corresponding
Why Striped64 To use the base + cells
At higher degree of concurrency, AtomicLong
use of higher frequency operation failed CAS, and a lot more unnecessary consumption of resources, resulting in performance degradation.
While Striped64
taking into account the AtomicLong
resources of a single competition in CAS, choose dispersed compete for resources when there is a conflict, as each thread is allocated a Cell
, let each resource corresponding to resource competition, significantly reduce conflict.
Long Adder
Since Striped64
is LongAccumulator
, LongAdder
like the parent class, out of the common concurrent calculation processing section, realized without special subclass value portion.
So LongAdder
it inherited Striped64
when not only the methods required to achieve an accumulator, also increased the value of several methods.
The core method
// 增加值
public void add(long x) {
Cell[] as; long b, v; int m; Cell a;
// 乐观情况下,完全无冲突时会使用父类Sriped64.casBase方法,更新base
// 一旦casBase产生冲突,会使得cells不为空,那么每个线程会通过探针值probe找到自己对应的Cell,通过CAS更新其value
// 一旦CAS更新Cell的值出现冲突,那么会使用Striped64.longAccumulate方法更新cells或base的值
if ((as = cells) != null || !casBase(b = base, b + x)) {
boolean uncontended = true;
if (as == null || (m = as.length - 1) < 0 ||
(a = as[getProbe() & m]) == null ||
!(uncontended = a.cas(v = a.value, v + x)))
longAccumulate(x, null, uncontended);
}
}
// 自增
public void increment() {
add(1L);
}
// 自减
public void decrement() {
add(-1L);
}
复制代码
As can be seen, the core of the conflict is complicated by the dispersion, calling Striped64
the longAccumulate
method.
The AtomicLong
biggest difference is the increment method did not return a value. (Nonsense, method name, said very clearly)
sum
Returns the sum, i.e. base and all cells in the array values.
Since unlocked when traversing the sums other threads in the update, the result may not be accurate.
reset
No lock to reset the data, including clearing the cells elements and clear base, which means that consumers need to make sure that no other threads in the update, or can not be completely cleared.
But the cells will not change the size of the array, elements are also not removed, which means that after reset multiplex expansion process conducted prior to this object will not be repeated.
sumThenReset
While traversing the sum value of elements in the array is cleared cells, attention is also unlocked, sum
and reset
modified by other threads problem with this method existing methods also exist.
Other methods
Provided longValue
, doubleValue
such as value method, the core is to use all sum
methods.
LongAdder shortcomings
As said before, AtomicLong
the SET, the get operation is directly the core value, the time complexity is O (1).
In contrast, LongAdder
there is such a number of disadvantages:
- Assignment method does not provide means to reuse need to call the
reset
method, the disadvantage of this method is the top has been said, will be more than another time complexityAtomicLong.set
is high, because there traversal operation - The method of time complexity value is also relatively high, due to the summation traverse
- As the authors describe, more space, because the use of
@Contended
annotations to avoid false sharing
Although the LongAdder
cells to traverse the length of the array of empty or when the maximum sum for the number of CPU cores, in fact, is not large, but considering the scenario of concurrent accumulator, if you call too many times too high proportion of the impact on performance will still be reflected of.
Compare and LongAdder of AtomicLong
LongAdder
More suitable for high concurrency, write a scene far more than read operations. This is Doug Lea described in the class description, "for statistical and not for synchronization control" than expected efficiency will AtomicLong
be much higher.
AtomicLong
Read more for far more than writing, or small number of threads scene.
At low concurrency, AtomicLong
the write efficiency with LongAdder
substantially the same, or even slightly better LongAdder
;
AtomicLong
The reading efficiency is always better than LongAdder
the;
But AtomicLong
the write efficiency with the intensity of competition in the linear reduction, but LongAdder
the write efficiency can be maintained almost be good.
Overall, the two scenarios is very different, should not be LongAdder
as AtomicLong
used.
Reference material
LongAdder principle of concurrent Java tools of summary · Issue # 22 · aCoder2013 / blog
Java 8 in AtomicLong class of CAS-related changes to how it works? - Code Log
Bug ID: JDK-7023898 Intrinsify AtomicLongFieldUpdater.getAndIncrement() —— java.com
This article moved from my blog , welcome to visit!