"Operating System Practice 45 Lectures" 07 Cache and memory: where are the programs placed? (study notes)

Only as a study note for my study of "45 Lectures on Operating System Combat", the original course link: Geek Time "45 Lectures on Operating System Combat" - Peng Dong (net name LMOS)

The principle of program locality

The principle of program locality refers to that the program exhibits locality laws during execution, that is, within a period of time, the execution of the entire program is limited to a certain part of the program. Correspondingly, the storage space accessed by execution is also limited to a certain memory area. The principle of locality is manifested as: temporal locality and spatial locality. Temporal locality means that if an instruction in a program is executed once, the instruction may be executed again soon after; if some data is accessed, the data may be accessed again soon after. Spatial locality means that once a program accesses a storage unit, it will also be accessed shortly thereafter.

--Baidu Encyclopedia

RAM

The above mentioned the local principle of the program, that is, once the program is loaded into the memory, its address is determined, and the CPU is accessing the same or adjacent addresses most of the time. The program must be loaded into memory before it can run.

Memory (Memory) is an important part of the computer, also known as internal memory and main memory, which is used to temporarily store the operation data in the CPU and the data exchanged with external memory such as hard disks. It is the bridge between the external memory and the CPU. All programs in the computer run in the memory. The strength of the memory performance affects the overall performance of the computer. As long as the computer starts running, the operating system will transfer the data that needs to be calculated from the memory to the CPU for calculation. When the calculation is completed, the CPU will transmit the result.
The operation of the memory determines the overall running speed of the computer.
The memory stick is composed of memory chips, circuit boards, gold fingers and other parts.
insert image description here

--Baidu Encyclopedia

From a professional point of view, memory should be called DRAM, that is, dynamic random access memory. The storage unit in the memory storage particle chip is made of capacitors and related components. The more and less the capacitor stores the charge, which represents the 0 and 1 of the digital signal.

But over time, the capacitor will leak, so the DRAM needs to be refreshed periodically. DRAM has a simple structure and high integration, and is usually used to manufacture memory particle chips in memory bars.

The memory technology standards are constantly being updated, but the only thing that has changed is the discharge charge of the capacitor. The only thing that has been improved is the bit width, operating frequency, and the number of bits of data transmitted and prefetched. The internal structure has not undergone essential changes.

For example, DDR SDRAM, that is, double-rate synchronous dynamic random access memory, transmits data twice with the same clock pulse, once on the rising edge and once on the falling edge, so it becomes a double-rate SDRAM.

Later DDR2, DDR3, and DDR4 also improved the core frequency and the number of prefetched bits. DDR4 cancels the dual-channel mechanism, one memory is one channel, the operating frequency can reach 4266MHz, and the data transmission bandwidth of a single DDR4 memory is up to 34GB/s.

CPU to memory performance bottleneck

The materials of the technical process used by the CPU and memory are different, and the focus and price are also different. CPU technology and materials are far better than memory. Therefore, although the bandwidth of DDR4 memory is as high as 34GB/s, it is still several orders of magnitude slower than the data throughput of the CPU. In addition, the multi-core CPU will access the memory at the same time, so the data throughput of the memory will continue to decline.

CPU processing is fast, and memory can't keep up, so you will find that memory is the key to the overall performance of the system. But it is not feasible to directly improve memory performance to achieve the same level as CPU, so Cache appeared.

Cache

Cache, the original meaning refers to a high-speed memory with faster access speed than general random access memory (RAM). Usually, it does not use DRAM technology like system main memory, but uses expensive but faster SRAM technology. The setting of the cache is one of the important factors in the high performance of all modern computer systems.

--Baidu Encyclopedia

Cache works:

The working principle of the cache is that when the CPU wants to read a piece of data, it first looks up from the CPU cache, and if it is found, it is read immediately and sent to the CPU for processing; if it is not found, it is read from a relatively slow memory and sent to the CPU. At the same time, the data block where the data is located is called into the cache, so that the entire block of data can be read from the cache in the future, and there is no need to call the memory. It is this read mechanism that makes the CPU read cache hit rate very high (most CPUs can reach about 90%), that is to say, 90% of the data that the CPU will read next time is in the CPU cache, and only about 10%. % needs to be read from memory. This greatly saves the time for the CPU to directly read the memory, and also makes the CPU basically need not wait when reading data. In general, the order in which the CPU reads data is cached first and then memory.

--Baidu Encyclopedia

Problems with Cache

Although Cache improves the performance of CPU reading memory data, it also brings a new problem to software and hardware development - data consistency problem.

The following is the Cache structure of the x86 CPU drawn by the original author:

Image source: original text of the course

insert image description here

This is a simple dual-core CPU with three-level Cache. The first-level Cache is separate from instructions and data, the second-level Cache is independent of the CPU core, and the third-level Cache is shared by all CPU cores.

The consistency problem of Cache mainly includes the following three aspects:

  1. A CPU core's instruction cache and data cache coherency problem.
  2. Coherence issues of the respective level 2 caches of multiple CPU cores.
  3. The consistency between the CPU's 3-level Cache and device memory, such as DMA, network card frame storage, and video memory.

To solve these problems, hardware engineers have developed a variety of protocols, typical multi-core Cache data synchronization protocols are MESI and MOESI. MOESI and MESI are similar with minor differences. The following is a brief introduction to the MESI protocol.

Cache's MESI protocol

The MESI protocol is an Invalidate-based cache coherency protocol and is one of the most commonly used protocols to support write-back caches. It is also known as the Illinois Protocol (due to its development at the University of Illinois at Urbana-Champaign). Write-back caching can save a lot of bandwidth that would normally be wasted on write caching. There is always a dirty state in the write-back cache, which means that the data in the cache is different from the data in main memory. If the block resides in another cache, the Illinois protocol requires the cache to cache transfers on misses. This protocol reduces the number of main memory transactions relative to the MSI protocol. This marks a significant improvement in performance.

--Baidu Encyclopedia

  1. A modified (M)
    cache line exists only in the current cache and is dirty - it has been modified (M state) from a value in main memory. The cache needs to write the data back to main memory sometime in the future before allowing any other reads of the (no longer valid) main memory state. Writeback changes the row to the shared state(S).

  2. An exclusive (E)
    cache line exists only in the current cache, but is clean - it matches main memory. It can be changed to shared state at any time in response to read requests. Alternatively, it can be changed to the modified state on write.

  3. Shared (S)
    means that this cache line may be stored in other caches of the computer and is clean - it matches main memory. The row can be discarded (changed to an invalid state) at any time.

  4. Invalid (I)
    means this cache line is invalid (unused).

Guess you like

Origin blog.csdn.net/weixin_43772810/article/details/124244958