[Java] How does java prove that the Linux cache line does exist?

Insert picture description here

1 Overview

Insert picture description here
The structure of
Insert picture description here
the third-level cache In the third-level cache, L1 and L2 are located in each cpu core, and the L3 cache is located in the CPU. It is shared by multiple cores (within a CPU).
Insert picture description here
Why should it be added to the cache?
Insert picture description here
Because the speed of CPU and memory do not match, it is approximately equal to 100:1

There are temporal locality and spatial locality in this. Please find Baidu for this concept.

Insert picture description here
This graph is spatial locality. When reading x, y may also be read the next time.

And after one cpu operates x, another cpu must be notified to read it again, which involves cache coherency.

The current cache line size is 64 bytes.

2. Proof

package com.java.memory.memoryline;

import java.util.concurrent.CountDownLatch;

/**
 * @author: chuanchuan.lcc
 * @date: 2020-12-20 16:26
 * @modifiedBy: chuanchuan.lcc
 * @version: 1.0
 * @description:
 */
public class MemoryLineDemo1 {
    
    

    private static long COUNT = 100000000L;

    private static class T1 {
    
    
        private volatile long x = 0L;
    }


    public static T1[] arr = new T1[2];

    static {
    
    
        arr[0] = new T1() ;
        arr[1] = new T1();
    }

    //耗时2843
    public static void main(String[] args) throws InterruptedException {
    
    
        CountDownLatch latch = new CountDownLatch(2);
        Thread t1 = new Thread(() -> {
    
    
            for (long i = 0; i < COUNT; i++) {
    
    
                arr[0].x = i;
            }
            latch.countDown();
        });

        Thread t2 = new Thread(() -> {
    
    
            for (long i = 0; i < COUNT; i++) {
    
    
                arr[1].x = i;
            }
            latch.countDown();
        });

        final long start = System.nanoTime();
        t1.start();
        t2.start();
        latch.await();

        final long end = System.nanoTime();
        System.out.println("耗时" + (end - start)/1000000);
    }


}

The execution of this program takes about 2843, and then the program is changed to the following

package com.java.memory.memoryline;

import java.util.concurrent.CountDownLatch;

/**
 * @author: chuanchuan.lcc
 * @date: 2020-12-20 16:31
 * @modifiedBy: chuanchuan.lcc
 * @version: 1.0
 * @description:
 */
public class MemoryLineDemo2 {
    
    


    private static long COUNT = 100000000L;

    private static class T1 {
    
    
        private long p1, p2, p3, p4, p5, p6, p7;
        private volatile long x = 0L;
        private long p8, p9, p10, p11, p12, p13, p14;
    }



    public static T1[] arr = new T1[2];

    static {
    
    
        arr[0] = new T1() ;
        arr[1] = new T1();
    }

    // 耗时1093
    public static void main(String[] args) throws InterruptedException {
    
    
        CountDownLatch latch = new CountDownLatch(2);
        Thread t1 = new Thread(() -> {
    
    
            for (long i = 0; i < COUNT; i++) {
    
    
                arr[0].x = i;
            }
            latch.countDown();
        });

        Thread t2 = new Thread(() -> {
    
    
            for (long i = 0; i < COUNT; i++) {
    
    
                arr[1].x = i;
            }
            latch.countDown();
        });

        final long start = System.nanoTime();
        t1.start();
        t2.start();
        latch.await();

        final long end = System.nanoTime();
        System.out.println("耗时" + (end - start)/1000000);
    }
}

The execution of this program takes about 1093. The difference between the two programs is the T1 object

3. Analysis

In the first procedure, since 8 bytes long, and new T1[2]when the memory array is continuous, arr[0] arr[1]is connected to two RAM memory, but since 8 bytes long, and is typically 64 byte cache line , Because these two parameters are likely to be in the same cache block, so each cpu core may cache two values, and then each modification will be notified to each other because of volatile, which is time-consuming here.

And the second program, as is 8 bytes long, and new T1[2]when the memory array is continuous, arr[0] arr[1]is connected to two RAM memory, but since 8 bytes long, and is typically 64 byte cache line, But before x, p1-p7 is 7*8=56 bytes, then x, and then 56 bytes.

56字节
x 8字节
56字节

Because the cache line is 64, and the x and the front and back combinations are both 64 bytes, it is impossible for two Xs to be in one cache line. Therefore, when two threads modify, they do not need to notify each other, so it will not be time-consuming.

4. Case

Is there such a use? Yes, there is a framework called dispature, which means lightning.
Look at the source code

Insert picture description here
Here is the 7 long values ​​are not used. There are also 7 longs in front

Insert picture description here
This is also used in currentHashMap.

Guess you like

Origin blog.csdn.net/qq_21383435/article/details/111461020