java concurrent programming Beauty - reading records 2

2.1 What is a multi-threaded programming

  Concurrency: refers to the same time period, performing multiple tasks simultaneously, and performs no end (of the same time period and includes a plurality of unit time, that is to say a cpu to perform multiple tasks)

  Parallel: execution means (i.e., a plurality of tasks simultaneously performed cpu) simultaneously in a plurality of tasks within a unit time

  

  In multi-threaded programming practice, the number of threads is generally greater than the number of cpu

Why to 2.2 multi-threaded programming

  Multiple cpu to perform multiple tasks simultaneously, reducing the overhead of thread context switching

2.3 thread-safety issues

  Shared resources: that the resource can be held by multiple threads, or that can be accessed by multiple threads.

   Changes to the shared resource will cause thread-safety issues.

2.4 shared memory variable visibility problems

  java memory model (JMM) requires that all variables are stored in main memory, when the thread using variables, will be the main variables in memory copy of their work to memory, and then threaded operating variables are themselves working memory (L1 cache or the L2 cache or register) variables.

  For this memory is not visible (do not use volatile variable modified) of the variables, it is possible to have different values ​​in different threads. FIG next to that a dual core cpu system, when operating a shared variable X, the thread will get the current variable X A memory, since thread A is the first operation, the working memory is not the current variable, then , thread a in the main memory will be a copy of the variable X to its working memory (L1 / L2 cache), thread a reassigned to the variable X (assuming a default value of the main memory, modify the thread a 2) after modification, the thread a value will be modified to brush main memory, then the thread a is working. The thread then manipulated variable X B also, the same will be the main memory copy of the variable X to its working memory (in this case the value of the variable X 2), acquired at this time the variable X (value of 2) a thread is the value after the operation, then the same thread B to modify the variable to 3, thread B will also modify the variable to brush back to the main memory, time, main memory 3 the value of the variable X, the thread a cache is 2, thread B is 3 cache, the thread a to be operated again when the variables X,, then the value is not the correct value of the data cache 2 will directly operated, there visibility of memory problems.

  Solving Memory visibility is talking about shared variable X or synchronized using the volatile keyword.

  

2.5synchronized keyword

  synchronized can solve the problem of visibility of memory shared variable, usually it is used to solve the problem atom.

  Semantic memory synchronized: that is entering synchrinize block, the variables used to remove the block from the working memory of the thread, the direct use of variable data in main memory, also in the exit synchronize block, the refresh operation to the shared variable to the main memory. (Also unlock the shackles and semantics, the value is locked empty thread work shared variable cache, main memory load data directly when in use, the lock is released when the data in the thread shared variable brush back to the main memory in)

2.6volatile keyword

  volatile guaranteed to be visible to other threads in the shared variable operation. to ensure the visibility of the volatile, but can not guarantee atomicity (visibility and to ensure the synchronized atomicity)

2.7 atomic operation

  Atomic operation: refers to perform a series of operations either succeed or fail.

  For example, a program counter ++ count; an atomic operation is not the operation, because it is designed to read the internal - change - three write operations.

2.8CAS operation

  CAS i.e. campare and swap operations are atomic operations provided jdk non-blocking, it is guaranteed by hardware "Comparison - Update" atomic operations.

  A classic problem in CAS ABA problem, the cause of the problems is that the variable conversion produces an annular, i.e. variable A-> B-> A

  AtomicStampedReference ABA can solve the problem (by giving each variable adds a time stamp)

2.9Unsafe class

  It provides a method of operating a hardware-level atomic (not recommended class code).

2.10 instructions remake

  java memory model allows the compiler and processor instructions to be reordered to provide performance, and only depend on the data line is not present instruction reordering.

  Reordering has no effect on the final result in single-threaded under the directive, but it will be problematic in a multi-threaded.

2.11 False Sharing

To understand false sharing is necessary to understand cpu cache (Level 1 cache, cache level 2, level 3 cache), a cache line, etc.

CPU is the heart of a computer, all operations and procedures will eventually be performed by it. In order to address the main cpu and memory speed problem of poor sets several-level cache between the CPU and main memory, because even direct access to main memory is very slow.

If you do the same operation on a data multiple times, so when the operation is performed from where it is loaded into the CPU close it makes sense (from a cache near and far faster cpu processing speed, the size of the smaller), for example, a cycle count, you do not want to go to main memory each cycle to get this data to grow it now.

 

The closer the faster the smaller the CPU cache, the L1 cache is small but very fast, and its use against the CPU core.

L2 larger, and slower, and still only be a single CPU core to use. L3 more prevalent in modern multi-core machine, still larger, slower, and shared by all CPU cores on a single socket.

Finally, the main memory holds all the data running, it is larger, slower, shared by all CPU cores on all sockets.

When the CPU executes operations, which go L1 find the desired data, go to L2, then L3, and finally if these are not in the cache, the data needed to get going to main memory, go farther, operation the longer time-consuming, so that if some very frequent operation, to ensure that the data in the L1 cache.

cpu cache line

Cache is composed of a cache line, usually a power of two number of bytes, such as 64 bytes (the cache line is 64 bytes common processor, the processor older cache line is 32 bytes), and it is effective references an address in the main memory.

Java is a type of 8 bytes long, and therefore can be present in a cache line 8 long variable type.

In the course of running the program, each update is loaded cache 64 consecutive bytes from main memory. Thus, if the access type of a long array, when the array value is loaded into the cache, the additional element 7 will be loaded into the cache.

However, if the item data structures used in the memory are not adjacent to each other, such as linked lists, it will not bring the benefits of free cache loaders.

不过,这种免费加载也有一个坏处。设想如果我们有个 long 类型的变量 a,它不是数组的一部分,而是一个单独的变量,并且还有另外一个 long 类型的变量 b 紧挨着它,那么当加载 a 的时候将免费加载 b(前提这两个变量都是volatile修饰的)。

看起来似乎没有什么毛病,但是如果一个 CPU 核心的线程在对 a 进行修改,另一个 CPU 核心的线程却在对 b 进行读取。

当前者修改 a 时,会把 a 和 b 同时加载到前者核心的缓存行中,更新完 a 后其它所有包含 a 的缓存行都将失效,因为其它缓存中的 a 不是最新值了。而当后者读取 b 时,发现这个缓存行已经失效了,需要从主内存中重新加载。

请记住,我们的缓存都是以缓存行作为一个单位来处理的,所以失效 a 的缓存的同时,也会把 b 失效,反之亦然。

 

这样就出现了一个问题,b 和 a 完全不相干,每次却要因为 a 的更新需要从主内存重新读取,它被缓存未命中给拖慢了。这就是伪共享。

当多线程修改互相独立的变量时,如果这些变量共享同一个缓存行,就会无意中影响彼此的性能,这就是伪共享

我们来看看下面这个例子,充分说明了伪共享是怎么回事。

public class FalseSharingTest {

    public static void main(String[] args) throws InterruptedException {
        testPointer(new Pointer());
    }

    private static void testPointer(Pointer pointer) throws InterruptedException {
        long start = System.currentTimeMillis();
        Thread t1 = new Thread(() -> {
            for (int i = 0; i < 100000000; i++) {
                pointer.x++;
            }
        });

        Thread t2 = new Thread(() -> {
            for (int i = 0; i < 100000000; i++) {
                pointer.y++;
            }
        });

        t1.start();
        t2.start();
        t1.join();
        t2.join();

        System.out.println(System.currentTimeMillis() - start);
        System.out.println(pointer);
    }
}

class Pointer {
    volatile long x;
    volatile long y;
}

这个例子中,我们声明了一个 Pointer 的类,它包含 x 和 y 两个变量(必须声明为volatile,保证可见性,关于内存屏障的东西我们后面再讲),一个线程对 x 进行自增1亿次,一个线程对 y 进行自增1亿次。

可以看到,x 和 y 完全没有任何关系,但是更新 x 的时候会把其它包含 x 的缓存行失效,同时也就失效了 y,运行这段程序输出的时间为3890ms

伪共享的原理我们知道了,一个缓存行是 64 个字节,一个 long 类型是 8 个字节,所以避免伪共享也很简单,大概有以下三种方式:

(1)在两个 long 类型的变量之间再加 7 个 long 类型

我们把上面的Pointer改成下面这个结构:

class Pointer {
    volatile long x;
    long p1, p2, p3, p4, p5, p6, p7; // 添加这7个变量的原因就是让x和y不在一个缓存行中,这样修改x的时候,就不会影响到对y的操作
    volatile long y;
}

再次运行程序,会发现输出时间神奇的缩短为了695ms

(2)重新创建自己的 long 类型,而不是 java 自带的 long

修改Pointer如下:

class Pointer {
    MyLong x = new MyLong();
    MyLong y = new MyLong();
}

class MyLong {
    volatile long value;
    long p1, p2, p3, p4, p5, p6, p7; // 同样是占用缓冲行的位置
}

同时把 pointer.x++; 修改为 pointer.x.value++;,把 pointer.y++; 修改为 pointer.y.value++;,再次运行程序发现时间是724ms

(3)使用 @sun.misc.Contended 注解(java8)

修改 MyLong 如下:

@sun.misc.Contended
class MyLong {
    volatile long value;
}

默认使用这个注解是无效的,需要在JVM启动参数加上-XX:-RestrictContended才会生效,,再次运行程序发现时间是718ms

注意,以上三种方式中的前两种是通过加字段的形式实现的,加的字段又没有地方使用,可能会被jvm优化掉,所以建议使用第三种方式。

(1)CPU具有多级缓存,越接近CPU的缓存越小也越快;

(2)CPU缓存中的数据是以缓存行为单位处理的;

(3)CPU缓存行能带来免费加载数据的好处,所以处理数组性能非常高;

(4)CPU缓存行也带来了弊端,多线程处理不相干的变量时会相互影响,也就是伪共享;

(5)避免伪共享的主要思路就是让不相干的变量不要出现在同一个缓存行中;

(6)一是每两个变量之间加七个 long 类型;

(7)二是创建自己的 long 类型,而不是用原生的;

(8)三是使用 java8 提供的注解;

 

2.12锁

  乐观锁和悲观锁:事务性

  公平锁和非公平锁:获取所的机制(先到先得就是公平,随机抢占就是非公平的)

  独占所和共享锁:能够被多个线程共同持有

  可重入锁:持有锁的对象是自己时,不会被阻塞

  自旋锁:当线程在获取锁的时候,发现锁已被其他线程占用,此时该线程并不会立刻阻塞,而是循环多次获取(默认次数为10次),扔获取不到时,才会阻塞线程。其中阻塞此时可以设置-XX:PreBlockSpinsh

 

 

 

参考:伪共享相关:https://www.jianshu.com/p/7758bb277985

Guess you like

Origin www.cnblogs.com/nxzblogs/p/11329270.html