1 Overview
The structure of
the third-level cache In the third-level cache, L1 and L2 are located in each cpu core, and the L3 cache is located in the CPU. It is shared by multiple cores (within a CPU).
Why should it be added to the cache?
Because the speed of CPU and memory do not match, it is approximately equal to 100:1
There are temporal locality and spatial locality in this. Please find Baidu for this concept.
This graph is spatial locality. When reading x, y may also be read the next time.
And after one cpu operates x, another cpu must be notified to read it again, which involves cache coherency.
The current cache line size is 64 bytes.
2. Proof
package com.java.memory.memoryline;
import java.util.concurrent.CountDownLatch;
/**
* @author: chuanchuan.lcc
* @date: 2020-12-20 16:26
* @modifiedBy: chuanchuan.lcc
* @version: 1.0
* @description:
*/
public class MemoryLineDemo1 {
private static long COUNT = 100000000L;
private static class T1 {
private volatile long x = 0L;
}
public static T1[] arr = new T1[2];
static {
arr[0] = new T1() ;
arr[1] = new T1();
}
//耗时2843
public static void main(String[] args) throws InterruptedException {
CountDownLatch latch = new CountDownLatch(2);
Thread t1 = new Thread(() -> {
for (long i = 0; i < COUNT; i++) {
arr[0].x = i;
}
latch.countDown();
});
Thread t2 = new Thread(() -> {
for (long i = 0; i < COUNT; i++) {
arr[1].x = i;
}
latch.countDown();
});
final long start = System.nanoTime();
t1.start();
t2.start();
latch.await();
final long end = System.nanoTime();
System.out.println("耗时" + (end - start)/1000000);
}
}
The execution of this program takes about 2843, and then the program is changed to the following
package com.java.memory.memoryline;
import java.util.concurrent.CountDownLatch;
/**
* @author: chuanchuan.lcc
* @date: 2020-12-20 16:31
* @modifiedBy: chuanchuan.lcc
* @version: 1.0
* @description:
*/
public class MemoryLineDemo2 {
private static long COUNT = 100000000L;
private static class T1 {
private long p1, p2, p3, p4, p5, p6, p7;
private volatile long x = 0L;
private long p8, p9, p10, p11, p12, p13, p14;
}
public static T1[] arr = new T1[2];
static {
arr[0] = new T1() ;
arr[1] = new T1();
}
// 耗时1093
public static void main(String[] args) throws InterruptedException {
CountDownLatch latch = new CountDownLatch(2);
Thread t1 = new Thread(() -> {
for (long i = 0; i < COUNT; i++) {
arr[0].x = i;
}
latch.countDown();
});
Thread t2 = new Thread(() -> {
for (long i = 0; i < COUNT; i++) {
arr[1].x = i;
}
latch.countDown();
});
final long start = System.nanoTime();
t1.start();
t2.start();
latch.await();
final long end = System.nanoTime();
System.out.println("耗时" + (end - start)/1000000);
}
}
The execution of this program takes about 1093. The difference between the two programs is the T1 object
3. Analysis
In the first procedure, since 8 bytes long, and new T1[2]
when the memory array is continuous, arr[0] arr[1]
is connected to two RAM memory, but since 8 bytes long, and is typically 64 byte cache line , Because these two parameters are likely to be in the same cache block, so each cpu core may cache two values, and then each modification will be notified to each other because of volatile, which is time-consuming here.
And the second program, as is 8 bytes long, and new T1[2]
when the memory array is continuous, arr[0] arr[1]
is connected to two RAM memory, but since 8 bytes long, and is typically 64 byte cache line, But before x, p1-p7 is 7*8=56 bytes, then x, and then 56 bytes.
56字节
x 8字节
56字节
Because the cache line is 64, and the x and the front and back combinations are both 64 bytes, it is impossible for two Xs to be in one cache line. Therefore, when two threads modify, they do not need to notify each other, so it will not be time-consuming.
4. Case
Is there such a use? Yes, there is a framework called dispature, which means lightning.
Look at the source code
Here is the 7 long values are not used. There are also 7 longs in front
This is also used in currentHashMap.