Operating system articles-cpu

The following instructions for java

Assembly language (machine language) execution process

Power on the computer -> CPU reads the program in the memory (electrical signal input) -> The clock generator constantly oscillates on and off -> Push the CPU to execute step by step (the number of steps executed depends on the clock cycle required by the instruction) -> The calculation is completed- >Write back (electrical signal) ->Write to the graphics card output (sout, or graphics)

Why does cpu need a clock generator: The
clock is used to synchronize various gate circuits in the CPU.

Hyperthreading

Insert picture description here
Four cores and eight threads are actually one ALU corresponding to two registers (registers). Usually when reading data, one thread-related data is stored in the register, the instruction address is stored in the PC, and then the ALU calculates the data If the CPU time slice comes to switch threads, the register can be directly switched to the next register for the next data operation. Unlike when there is only one register, when switching threads, the data in the current register needs to be stored back into the memory.

The structure of the CPU cache

In the figure below, there are 2 cpus, 1 cpu2 core
Insert picture description here
L3 and some are also on the motherboard, very close to the cpu

Cache line

When reading data, it is read in blocks. These blocks are called cache lines in the field of cache. The
principle of program locality can improve efficiency.
Give full play to the ability of bus CPU pins to read more data at a time.

The larger the cache line, the higher the locality space efficiency, but the slower the read time. The
smaller the cache line, the lower the locality space efficiency, but the faster the read time is
a compromise value. At present, it is mostly used: 64 bytes

Cache Coherence Protocol

Reading data will first read the data to the cache line. If a data is accessed by two threads at the same time, and one of the cpu modifies the data, this time it will trigger the MESI Cache data consistency protocol (intel cpu), which will cause another thread The cache line is invalidated, and another data will be read from the memory.
Simply put, it is to mark the cache line with four states, namely Modified (modified), Exclusive (exclusive), Shared (shared), and Invalid (invalid). Operate according to different states to achieve data consistency.
But for some data, one cache line cannot be loaded. At this time, if you want to maintain data consistency, you need to lock the bus.
Data that spans multiple cache lines must use bus locks
Insert picture description here

Cache line alignment

Cache line alignment: For some particularly sensitive numbers, there will be high competition for access by threads, in order to ensure that false sharing does not occur (when multiple threads read the same cache line at the same time, in order to make the visibility between threads, volatile will be used Keyword to invalidate the cache lines of other threads and read data from the memory again, thereby causing performance degradation), you can use the cache line alignment programming method

In JDK7, many use long padding to improve efficiency

JDK8, added @Contended annotation, need to add: JVM -XX:-RestrictContended

Open source project example: disruptor high-performance concurrent queue
java a long 8 bytes, p1~p7 plus an inserted p, it is 64 bytes, corresponding to a cache line
Insert picture description here

Out of order execution

The CPU executes other instructions while waiting for reads, which is the root cause of the CPU disorder, not chaos, but to improve efficiency.

Problems that may arise from out-of-order execution

Classic example: Why add volative to DCL (Double Check Lock) singleton?

public class Singleton {
    
    
	// 加volatile禁止指令重排序
    private volatile Singleton singleton;
    // 构造函数是private,防止外部实例化
    private Singleton() {
    
    }
    public static Singleton getInstance() {
    
    
        if (singleton == null) {
    
     // 第一次检查
            synchronized (Singleton.class) {
    
    
                if (singleton == null) {
    
     // 第二次检查,"double check"的由来
                    singleton = new Singleton();
                }
            }
        }
        return singleton;
    }
}

Insert picture description here
The new instruction is not atomic:
when new is used, it only applies for a space in the memory and pushes the address (ie reference) to this memory on the top of the stack. Only after invokespecial is executed, m is assigned to 8.
Dup, that is, duplicate, its role is to copy the reference of the previously allocated new space and push it to the top of the stack. Why is this necessary because invokespecial finds the construction method through the constant pool entry #3. You need to know whose construction method it is, so it needs to be consumed. Only a reference can execute the constructor and then pop the stack.
astore_1: Call astore_1 to pop the top value of the stack at this time into the first position in the local variable (the 0th position is this).

Therefore, instruction rearrangement may occur in the half of new, so DCL (Double Check Lock) singleton should be volative
Insert picture description here

How to prohibit disorder at the CPU level?

Memory barrier
Add barriers before and after operations on a certain part of the memory, and the operations before and after the barrier cannot be executed out of order.
Insert picture description here
Intel cpu
low-level implementation: primitives, assembly instructions (mfence lfence sfence) or bus lock

sfence, write barrier, the write operation before the sfence instruction must be completed before the write operation after the sfence instruction.
Lfence, read barrier, the read operation before the lfence instruction must be completed before the read operation after the lfence instruction.
mfence, read and write barrier, read and write operations before mfence execution must be completed before read and write operations after mfence instruction.

The intel lock instruction
is a Full Barrier, which will lock the memory subsystem during execution to ensure the order of execution, even across multiple CPUs.

JVM specification prohibits disorder

JSR memory barrier: The
Insert picture description here
following is a volatile implementation of the JVM specification, but in hotspot, it is implemented through lock addl $0x0,(%rsp).

StoreStoreBarrier //write write barrier
volatile write operation
StoreLoadBarrier //write read barrier

LoadLoadBarrier
volatile读操作
LoadStoreBarrier

8 hanppens-before principles:Insert picture description here

as-if-serial: Regardless of the hardware order, the result of single-threaded execution remains the same, and it looks like it is in order.
For example, in a single thread,
x=1
y=2 is
changed to
y=2
x=1
. The result of single thread execution remains unchanged, and it looks like it is executed sequentially.

Guess you like

Origin blog.csdn.net/qq_33873431/article/details/109018238