[Concurrent topic] Operating system model and three-level cache architecture

Course content

1. Detailed Explanation of the Von Neumann Computer Model

The modern computer model is based on the von Neumann computer model.
When the computer is running, it first fetches the first instruction from the memory, and through the decoding of the controller, according to the requirements of the instruction, fetches the data from the memory to perform the specified operation and logic. The operation waits for processing, and then sends the result to the memory according to the address. Next, take out the second instruction and complete the specified operation under the command of the controller. And so on. until a stop command is encountered.
The program is stored in the same way as the data. According to the order of programming, the instructions are taken out step by step, and the operations specified by the instructions are automatically completed. This is the most basic working model of the computer. This principle was first proposed by the American-Hungarian mathematician von Neumann in 1945, so it is called the von Neumann computer model.

1. The five core components of a computer

  1. Controller (Control): It is the central nerve of the entire computer. Its function is to interpret the control information specified by the program, control it according to its requirements, schedule programs, data, addresses, and coordinate the work of various parts of the computer and access to memory and peripherals. wait.
  2. Calculator (Datapath): The function of the calculator is to perform various arithmetic operations and logical operations on the data, that is, to process the data.
  3. Memory (Memory): The function of the memory is to store information such as programs, data, and various signals and commands, and to provide this information when needed.
  4. Input (Input system): The input device is an important part of the computer. The input device and the output device are combined as external devices, referred to as peripherals. The function of the input device is to transfer programs, raw data, text, characters, control commands or on-site collection data and other information into the computer. Common input devices include keyboards, mice, photoelectric input machines, tape drives, disk drives, CD drives, etc.
  5. Output (Output system): The output device is an important part of the computer as well as the input device. It outputs information such as the intermediate or final results of the external computer, various data symbols and text or various control signals in the computer. Commonly used output devices for microcomputers include display terminals, CRTs, printers, laser printers, plotters, tapes, and CD-ROMs.

The figure below - Von Neumann computer model diagram:
insert image description here
But the above figure belongs to an abstract theoretical model, and its specific application is the hardware structure design in modern computers, as follows:
insert image description here
In the hardware structure above, there are many accessories, but the core There are only two parts: CPU and memory. So we focus on these two parts.

2. CPU internal structure

The internal structure of the CPU includes a control unit, an arithmetic unit, and a data unit. Their functions are as follows:

  • Control unit: The control unit is the command and control center of the entire CPU, which is composed of the instruction register IR (Instruction Register), instruction decoder ID (Instruction Decoder) and operation controller OC (Operation Controller), and coordinates the orderly work of the entire computer. extremely important. According to the user's pre-programmed program, it sequentially takes out each instruction from the memory, puts it in the instruction register IR, determines what operation should be performed through instruction decoding (analysis), and then operates the controller OC according to the determined timing. Send micro-operation control signals to the corresponding components. The operation controller OC mainly includes control logic such as beat pulse generator, control matrix, clock pulse generator, reset circuit and start-stop circuit;
  • Arithmetic unit: The arithmetic unit is the core of the arithmetic unit. Arithmetic operations (including basic operations such as addition, subtraction, multiplication, and their additions) and logical operations (including shifts, logical tests, or comparisons of two values) can be performed. Compared with the control unit, the arithmetic unit operates under the command of the control unit, that is, all operations performed by the arithmetic unit are commanded by the control signal sent by the control unit, so it is an execution component;
  • Storage unit: The storage unit includes the CPU on-chip cache Cache and register group, which is the place where data is temporarily stored in the CPU. It stores the data waiting to be processed, or the data that has been processed. The time it takes for the CPU to access registers is longer than accessing memory The time is short. The register is an internal component of the CPU. The register has a very high read and write speed, so the data transfer between the registers is very fast. The use of registers can reduce the number of times the CPU accesses the memory, thereby increasing the working speed of the CPU. The register set can be divided into special-purpose registers and general-purpose registers. The role of special-purpose registers is fixed, and the corresponding data are stored respectively; while general-purpose registers have a wide range of uses and can be specified by the programmer.

The internal structure diagram of the CPU is as follows:
insert image description here

3. CPU cache structure

In order to improve execution efficiency and reduce the interaction between CPU and memory (interaction affects CPU efficiency), modern CPUs generally integrate a multi-level cache architecture on the CPU, and the common three-level cache structure

  • L1 Cache, divided into data cache and instruction cache, exclusive to logical cores
  • L2 Cache, physical core exclusive, logical core shared
  • L3 Cache, shared by all physical cores
    insert image description here
    Memory storage space size: memory>L3>L2>L1>register;
    order of memory speed: register>L1>L2>L3>memory;
    It is also worth noting that the cache is composed of the smallest storage block - the cache line (cacheline), and the size of the cache line is usually 64byte.
    What does cache line mean?
    For example, your L1 cache size is 512kb, and cacheline = 64byte, then there is 512 * 1024/64a cacheline in L1 (Just like the Mysql paging structure

4. CPU reads memory data process

Generally speaking, the process of CPU reading memory data is as follows:

  1. To get the value of register X, the CPU only needs one step: read directly;
  2. It takes 1-3 steps (or more) for the CPU to fetch a certain value from the L1 cache: lock the cache line, fetch a certain data, and unlock it. If it is not locked, it will be slow;
  3. If the CPU wants to get a certain value from the L2 cache, it first needs to get it from the L1 cache, which does not exist in L1. In L2, L2 starts to lock. After locking, the data in L2 is copied to L1, and then the read L1 is executed. Process, the above 3 steps, and then unlock;
  4. The same is true for the CPU to take the L3 cache, except that it is first copied from L3 to L2, from L2 to L1, and from L1 to the CPU;
  5. The CPU fetches memory is the most complicated: notify the memory controller to occupy the bus bandwidth, notify the memory to lock, initiate a memory read request, wait for the response, save the response data to L3 (if not, go to L2), then from L3/2 to L1, and then From L1 to the CPU, after which the bus is unlocked.
    Is it very complicated, so why is it designed like this?

5. Why does the CPU have a cache

The main reason is that the CPU is developing at a rate of doubling every 18 months under the guidance of Moore's Law, but the development speed of memory and hard disk is far behind that of CPU. This has resulted in high-performance memory and hard disk prices are extremely expensive. However, high-speed calculation of CPU requires high-speed data. In order to solve this problem, CPU manufacturers built a small amount of high-speed cache in the CPU to solve the mismatch between I\O speed and CPU operation speed.
When the CPU accesses the storage device, whether it is accessing data or accessing instructions, it tends to gather in a continuous area (Just like the Mysql paging structure), which is called the principle of locality, and has the following two types:

  • Temporal Locality: If an information item is being accessed, it is likely to be accessed again in the near future (the CPU thinks so). Such as loops, recursion, repeated method calls, etc.;
  • Spatial Locality (Spatial Locality): If a memory location is referenced, then its nearby locations will also be referenced in the future (the CPU will think so). For example, code executed sequentially, two objects created consecutively, arrays, etc.

summarize

  1. Learned some computer system models
  2. Learned the CPU three-level cache structure

Guess you like

Origin blog.csdn.net/qq_32681589/article/details/132011036