Pseudo sharing and cache lines in JAVA

1. Pseudo sharing and cache lines

1.CPU cache architecture

The CPU is the heart of the computer, and all operations and programs are ultimately executed by it.

Main memory (RAM) is where data is stored. There are several levels of cache between the CPU and main memory, because even direct access to main memory is very slow.

The speed of the CPU is much greater than the speed of the memory. In order to solve this problem, the CPU introduces a three-level cache: L1, L2 and L3. L1 is closest to the CPU, L2 is the second, L3 is the farthest from the CPU, and L3 is the next. It's main memory. The speed is L1>L2>L3>main memory. The closer to the CPU, the smaller the capacity. When the CPU obtains data, it will search it from the third-level cache in sequence.

When the CPU wants to read a piece of data, it first searches it from the first-level cache. If it does not find it, it searches it from the second-level cache. If it still does not find it, it searches it from the third-level cache or memory. Generally speaking, the hit rate of each level of cache is about 80%, which means that 80% of the total data volume can be found in the first-level cache, and only 20% of the total data volume needs to be retrieved from the second-level cache. , Level 3 cache or reading from memory. It can be seen that the Level 1 cache is the most important part of the entire CPU cache architecture.

2. What is pseudo sharing?

In order to solve the gap between the running speed of the main memory and the CPU in the computer system, one or more levels of cache memory (Cache) are added between the CPU and the main memory. This Cache is usually integrated into the CPU, so it is also called the CPU Cache. , the following figure is a two-level cache structure

The cache is stored internally by rows, where each row is called a cache line. The cache line is the unit for data exchange between the cache and the main memory. The size of the cache line is generally a power of 2 bytes.

When the CPU accesses a certain variable, it will first check to see if the variable exists in the CPU Cache. If so, it will obtain it directly from it. Otherwise, it will obtain the variable from the main memory, and then cache the variable in the memory area where the variable is located. The memory is copied to the Cache (the cache line is the unit of data exchange between the Cache and the main memory). Since a block of memory is stored in the cache line rather than a single variable, multiple variables may be stored in a cache line. When multiple threads modify multiple variables in a cache line at the same time, since only one thread can operate the cache line at the same time, the performance will be lower than if each variable is placed in one cache line. This is pseudo sharing.

As shown in the picture above, the variables x and y are placed in the first-level and second-level caches of the CPU at the same time. When thread 1 uses CPU1 to update the variable x, it will first modify the cache line of the first-level cache variable x of cpu1. At this time, the cache consistency The protocol will cause the cache line corresponding to variable x in cpu2 to become invalid. Then when thread 2 writes variable . In the worse case, if the CPU only has a first-level cache, it will lead to frequent direct access to the main memory.

3. Why does pseudo sharing occur?

Pseudo sharing occurs because multiple variables are placed in one cache line, and multiple threads write to different variables in the cache line at the same time. So why are multiple variables put into one cache line? In fact, it is because the unit of data exchange between Cache and memory is Cache. When the variable that the CPU wants to access does not hit the Cache, according to the locality principle of program operation, the variable will be stored as a cache line in the memory with a size of Cache line.

4. Pseudo sharing in Java

The most direct way to solve pseudo-sharing is padding. For example, in the following VolatileLong, a long occupies 8 bytes, and the Java object header occupies 8 bytes (32-bit system) or 12 bytes (64-bit system, default Turn on object header compression, occupy 16 bytes if not turned on). A cache line is 64 bytes, then we can fill 6 longs (6 * 8 = 48 bytes).

Now, we learn about the memory model of JVM objects. All Java objects have an 8-byte object header. The first four bytes are used to save the hash code and lock status of the object, the first 3 bytes are used to store the hash code, and the last byte is used to store Lock status, once the object is locked, these 4 bytes will be taken out of the object and linked with pointers. The remaining 4 bytes are used to store a reference to the class to which the object belongs. For arrays, there is also a variable that holds the size of the array, which is 4 bytes. The size of each object will be aligned to a multiple of 8 bytes, and any part that is less than 8 bytes needs to be filled. In order to ensure efficiency, the Java compiler sorts the fields of Java objects by field type when compiling Java objects, as shown in the following table.

Therefore, we can isolate hot variables in different cache lines by filling long integer variables between any fields. By reducing false synchronization, we can greatly improve efficiency in multi-core CPUs.

The simplest way

/**
 * 缓存行填充父类
 */
public class DataPadding {
    //填充 6个long类型字段 8*4 = 48 个字节
    private long p1, p2, p3, p4, p5, p6;
    //需要操作的数据
    private long data;
}

Because after JDK1.7, the code will be automatically optimized and useless code will be deleted. This will not take effect in versions after JDK1.7.

Inheritance method

/**
 * 缓存行填充父类
 */
public class DataPadding {
    //填充 6个long类型字段 8*4 = 48 个字节
    private long p1, p2, p3, p4, p5, p6;
}

Inherit cache fill class

/**
 * 继承DataPadding
 */
public class VolatileData extends DataPadding {
    // 占用 8个字节 +48 + 对象头 = 64字节
    private long data = 0;

    public VolatileData() {
    }

    public VolatileData(long defValue) {
        this.data = defValue;
    }

    public long accumulationAdd() {
          //因为单线程操作不需要加锁
         data++;
        return data;
    }

    public long getValue() {
        return data;
    }
}

This can be used in JDK1.8

@Contended annotation

@Retention(RetentionPolicy.RUNTIME)
@Target({ElementType.FIELD, ElementType.TYPE})
public @interface Contended {
    String value() default "";
}

The Contended annotation can be used on types and attributes. After adding this annotation, the virtual machine will automatically fill it in to avoid false sharing. This annotation is used in Java8 ConcurrentHashMap, ForkJoinPool and Thread classes. Let’s take a look at how to use the Contended annotation in ConcurrentHashMap in Java 8 to solve the pseudo-sharing problem. The ConcurrentHashMap mentioned below are all Java8 versions.

Note : **@sun.misc.Contended is provided in Java 8 to avoid pseudo sharing. You need to set the JVM startup parameter -XX:-RestrictContended** at runtime , otherwise it may not take effect.

The power of cache line filling


/**
 * 缓存行测试
 */
public class CacheLineTest {
    /**
     * 是否启用缓存行填充
     */
    private final boolean isDataPadding = false;
    /**
     * 正常定义的变量
     */
    private volatile long x = 0;
    private volatile long y = 0;
    private volatile long z = 0;
    /**
     * 通过缓存行填充的变量
     */
    private volatile VolatileData volatileDataX = new VolatileData(0);
    private volatile VolatileData volatileDataY = new VolatileData(0);
    private volatile VolatileData volatileDataZ = new VolatileData(0);

    /**
     * 循环次数
     */
    private final long size = 100000000;

    /**
     * 进行累加操作
     */
    public void accumulationX() {
        //计算耗时
        long currentTime = System.currentTimeMillis();
        long value = 0;
        //循环累加
        for (int i = 0; i < size; i++) {
            //使用缓存行填充的方式
            if (isDataPadding) {
                value = volatileDataX.accumulationAdd();
            } else {
                //不使用缓存行填充的方式 因为时单线程操作不需要加锁
                value = (++x);
            }


        }
        //打印
        System.out.println(value);
        //打印耗时
        System.out.println("耗时：" + (System.currentTimeMillis() - currentTime));
    }

    /**
     * 进行累加操作
     */
    public void accumulationY() {
        long currentTime = System.currentTimeMillis();
        long value = 0;
        for (int i = 0; i < size; i++) {
            if (isDataPadding) {
                value = volatileDataY.accumulationAdd();
            } else {
                value = ++y;
            }


        }
        System.out.println(value);
        System.out.println("耗时：" + (System.currentTimeMillis() - currentTime));
    }

    /**
     * 进行累加操作
     */
    public void accumulationZ() {
        long currentTime = System.currentTimeMillis();
        long value = 0;
        for (int i = 0; i < size; i++) {
            if (isDataPadding) {
                value = volatileDataZ.accumulationAdd();
            } else {
                value = ++z;
            }
        }
        System.out.println(value);
        System.out.println("耗时：" + (System.currentTimeMillis() - currentTime));
    }

    public static void main(String[] args) {
        //创建对象
        CacheLineTest cacheRowTest = new CacheLineTest();
        //创建线程池
        ExecutorService executorService = Executors.newFixedThreadPool(3);
        //启动三个线程个调用他们各自的方法
        executorService.execute(() -> cacheRowTest.accumulationX());
        executorService.execute(() -> cacheRowTest.accumulationY());
        executorService.execute(() -> cacheRowTest.accumulationZ());
        executorService.shutdown();
    }
}

Testing without cache line filling

/**
  * 是否启用缓存行填充
  */
 private final boolean isDataPadding = false;

output

Test with cache line filling

/**
  * 是否启用缓存行填充
  */
 private final boolean isDataPadding = true;

output

There is a speed difference of nearly 50 times between them with the same structure.

Summarize

When multiple threads write to a shared cache line at the same time, the cache consistency principle of the cache system will cause a pseudo-sharing problem. A common solution is to supplementally align the shared variables according to the size of the cache line so that When it is loaded into the cache, it can have an exclusive cache line and avoid being stored in the same cache line as other shared variables.