11 The Memory Hierarchy (memory hierarchy)

存储器系统玮CPU存放指令和数据。再简单的模型中,存储器系统是一个线性的字节数组,而CPU能够在一个常数时间内访问每一个存储器位置。

memory system

  • A memory system is a hierarchy of storage devices with varying capacities, costs, and access times.
  • CPU (Central Processing Unit) registers hold the most commonly used data
  • A small, fast cache memory
    close to the CPU that serves as a buffer area for data and instructions stored in relatively slow main memory.
  • Main memory caches data stored on larger, slower disks .
  • Disks often serve as buffer areas for data stored on disks or tapes on other machines connected by a grid
    Insert image description here

A memory hierarchy is a computer system in which memory units of varying speeds and capacities are organized into a series of levels, each with its own specific capacity, speed, and cost characteristics. The memory hierarchy usually includes the following levels from bottom to top:
(1) Slow but large-capacity auxiliary memory (such as hard disk, optical disk, flash memory, etc.): large capacity, low cost, but slow access speed.
(2) Faster but smaller capacity main memory (such as DRAM): Fast but smaller in capacity.
(3) Faster but smaller cache memory (such as CPU cache): faster but smaller capacity.
Each level in the memory hierarchy stores a copy of the data and instructions used in the level below it for faster access. Through hierarchies, computer systems are able to maximize the benefits of different memory units while minimizing memory cost and latency.

Hard
disk The hard disk is an important auxiliary storage device in the computer. Its main function is to store information such as the operating system, applications, data files and other files of the computer system. Compared with the main memory (RAM), the hard disk has a larger storage capacity and a longer data retention time. It can save a large amount of data and maintain its storage status for a long time, and the data will not be lost even if the computer is turned off.

The storage medium inside a hard drive is usually one or more platters, each covered with a layer of magnetic material used to store binary data. Hard drives read and write data through mechanical arms and read-write heads. These components can be moved and precisely positioned on the platter to access and manipulate data in different locations.

In addition to being the primary storage device for a computer system, hard drives can also be used to back up important data in case of data loss or corruption. In addition, hard drives can also be used to expand the storage capacity of computer systems, allowing users to store more data.

The bus determines the upper limit of the hard disk's transmission speed.

Main memory
Main memory: The one that directly exchanges data with the CPU is called main memory.

Connection between main memory and CPU:
The process of CPU accessing memory through data bus, address bus, and control bus can be divided into the following steps:

(1) Instruction addressing: The CPU uses the instruction counter (Instruction Pointer) to determine the address of the next instruction to be executed.

(2) Address transmission: The CPU transmits the address in the instruction to the address bus of the memory, telling the memory which storage (unit) to read or write.

(3) Memory response: After the memory receives the address signal, it retrieves or writes the corresponding data and returns it to the CPU.

(4) Data transmission: The CPU transmits data read or written from the memory through the data bus.

(5) Data processing: The CPU processes the data read from the memory and stores the results back to the memory.

In this process, the CPU and the memory communicate through the address bus, data bus and control bus. The address bus transmits address signals, the data bus transmits data, and the control bus transmits control signals. The communication speed and bandwidth between the CPU and memory are affected by many factors, including memory type, capacity, clock frequency, and bus bandwidth. Therefore, the performance of a computer system depends largely on the performance of the memory.

Registers
are divided into CPU internal registers and CPU external registers.
CPU internal registers: used to temporarily store data and operation results involved in operations.
CPU external registers: peripheral registers. In the peripherals, the CPU communicates with them to control the operation of the peripherals.

In a computer, a register is a very fast internal memory used to store data that the CPU (Central Processing Unit) needs to access frequently. Registers are usually implemented as hardware circuits inside the CPU that can store and read data quickly.
Unlike other internal memories (such as RAM or hard drives), registers have a very limited capacity and can usually only store a few bytes or bits. Therefore, registers can only be used to store data that is very important to the CPU, such as intermediate results during calculations, return addresses and function parameters when calling functions, etc.
In computers, different types of registers are used for different purposes. For example, general-purpose registers are used to store integer data, floating-point registers are used to store floating-point data, status registers are used to store information about the state of the CPU (for example, whether a certain error or interrupt has occurred), and so on. The number and type of registers vary depending on the CPU architecture, but they play a vital role in the operation of the CPU.

working principle

Registers are very fast storage devices in computers. They can be directly integrated inside the CPU chip or connected to the CPU as a separate chip. The working principle of a register is similar to that of other storage devices, in that it stores binary data in components such as capacitors, inductors or transistors in the circuit.
The working principle of the register includes the following aspects:
(1) Storage unit of the register: The register is divided into multiple storage units, and each storage unit can store a fixed size of data. For example, in a 32-bit CPU, the general-purpose register is divided into 32 storage units, and each storage unit can store a 32-bit binary value.
(2) Register read and write operations: The CPU can write data to the register and read data from the register. When the CPU needs to write data to a register, it sends the binary representation of the data to the input end of the register, and then controls the circuit to cause the data to be stored in the designated storage unit. When the CPU needs to read data from a register, it will read the data from the specified storage unit in the register and send it to other circuits inside the CPU for processing.
(3) Register clock control: In order to synchronize the data transmission between the CPU and the register, a clock signal is usually required to control the read and write operations of the register. The clock signal indicates the operation timing between the CPU and the register, so that the reading and writing of data can be performed at the correct time point.
(4) Register reading and writing speed: Registers are the fastest reading and writing storage devices in computers, and their reading and writing speeds can even reach hundreds of millions of times per second. This allows the register to store data that needs to be read and written frequently, such as the program counter, stack pointer, function parameters, etc.

The role of registers

Registers play a very important role in computers, and their main functions are as follows:
(1) Storage of operation results and intermediate values: When the arithmetic logic unit (ALU) in the computer performs operations, it needs to use registers to store operation results and intermediate values, such as Operations such as addition, subtraction, multiplication, and division. These registers can read and write data quickly, thus increasing the speed of operations.
(2) Store program counter: The program counter (Program Counter) is a register that points to the address of the instruction currently being executed. When the computer executes a program, the program counter is constantly updated to point to the address of the next instruction to be executed, thereby achieving sequential execution of the program.
(3) Store function parameters and return values: When a function is called, the function parameters and return values ​​need to be stored in registers. Registers can quickly access data, thus improving the efficiency of function calls.
(4) Storage stack pointer: The stack pointer (Stack Pointer) is a register pointing to the top of the stack and is used to store local variables and other temporary data when functions are called. When the function call ends, the stack pointer points to the next free location at the top of the stack.
(5) Store status information: The status register is used to store information about the CPU status, such as carry flag, zero flag, overflow flag, interrupt flag, etc. These flags can indicate whether the CPU needs to perform certain operations.
In short, registers are one of the fastest and most commonly used storage devices in computers. They can store some key data so that the CPU can perform operations and process data efficiently.

What is cache
?

Cache is a type of computer memory used to store data and instructions that are frequently accessed during the execution of computer programs. It is usually located between the computer's main memory and the central processing unit (CPU) to allow faster access to required data, making the computer run faster.
The cache is divided into multiple levels. The first level cache (L1 cache) is located inside the CPU and is the fastest, but has a small capacity, usually a few hundred KB or a few MB. The second level cache (L2 cache) is located between the CPU and the main memory. It is slightly slower but has a larger capacity, usually ranging from tens to hundreds of MB. There are also computer systems with third-level cache (L3 cache) and higher-level caches, which are larger in capacity but gradually reduce in speed.
Cache improves computer performance through caching algorithms and data replacement strategies. Common caching algorithms include direct mapping, group connected, fully connected, etc. Data replacement strategies include least recently used (LRU), first-in-first-out (FIFO), etc. The choice of these algorithms and strategies will affect cache efficiency and performance.

Why can these algorithms affect cache efficiency and performance?

The choice of caching algorithm and data replacement strategy will directly affect the efficiency and performance of the cache because they determine how the cache stores and replaces data.
The choice of caching algorithm will affect the cache hit rate. The hit rate refers to the probability of finding the required data in the cache. The higher the hit rate, the higher the efficiency of the cache. The direct mapping algorithm is the simplest cache algorithm, but its hit rate is low because multiple data at the same index position may be mapped to the same cache block, causing conflicts. Connected group and fully connected group algorithms can reduce conflicts and improve hit rates, but they also increase complexity and cost.
The choice of data replacement strategy affects cache performance. When the buffer is full, some data needs to be replaced to make room. The goal of the replacement strategy is to select the least used data for replacement, thereby maximizing the cache hit rate. The LRU algorithm is the most commonly used replacement strategy, which selects the least recently used data for replacement. However, the LRU algorithm needs to maintain the access timestamp of each data block, which increases the complexity and cost of the cache. The FIFO algorithm is another replacement strategy that selects the oldest cached data for replacement, but may result in a lower hit rate.
Therefore, the selection of caching algorithms and data replacement strategies needs to weigh factors such as efficiency, performance, and cost to meet the needs of different application scenarios.

How registers work:

Registers are one of the fastest storage devices in computers and are usually used to store data, instructions and status information required by the CPU for calculation operations.
The working principle of registers is as follows:
(1) Registers are usually hardware circuits inside the CPU and are located closest to the core components of the CPU for quick access.
(2) The storage unit of the register usually uses a flip-flop circuit, and each flip-flop can store one bit of data (0 or 1).
(3) The CPU can send the register's address and read and write control signals to the register through the address bus and control bus, and control the register to read or write data.
(4) When the CPU needs to store data into the register, the data is sent to the input end of the register and stored in the flip-flop through the control signal.
(5) When the CPU needs to read data from the register, the control signal is sent to the output end of the register, the data is read from the flip-flop, and returned to the CPU through the data bus.
Since the register is a memory inside the CPU, the read and write speed is very fast, and the read and write operations can be completed within the clock cycle of the CPU. This makes the register one of the most important storage devices in the computer, used to optimize the performance of the computer and improve the efficiency and speed of instruction execution.

Detailed explanation of hard disk structure

  • Disk: A disk (Disk) is a storage device, usually composed of a set of magnetic platters, used to store data and programs in a computer system. It usually consists of a controller, spindle motor, magnetic head, read and write circuit, data interface and other components, which can achieve fast read and write operations.
  • Track: A track is a circular track on the surface of the disk. Each track can store a certain amount of data. The surface of a disk is usually divided into concentric circles, and data is stored on the disk in units of tracks. The number and size of tracks determines the disk's storage capacity and performance.
  • Sector: A sector is an arc-shaped area on each track on the disk surface used to store data and metadata. Each sector is identified by a unique sector number, usually an integer. A sector is the smallest unit of disk data storage, usually 512 bytes or 4KB in size. The number and size of sectors determines the disk's storage capacity and performance.

During read and write operations, the magnetic head will be positioned at the specified track and sector location to read or write data. Because the magnetic head needs to move on the surface of the disk, the read and write speed of the disk is slower than that of other storage devices. However, the disk has a large storage capacity and a relatively low cost, and is widely used in computer systems.

  • Calculation of disk capacity: The calculation of disk capacity is determined based on the physical characteristics of the disk and the management method of the file system. Generally speaking, disk capacity can be calculated in two ways:

Physical capacity calculation method:
The physical capacity of a disk refers to the maximum amount of data that can be stored in the disk. The physical capacity of a disk is determined by the number of tracks on the disk's surface, the number of sectors per track, and the number of bytes per sector. The usual calculation formula for disk capacity is:

Disk capacity = number of tracks × number of sectors × number of bytes per sector

For example, if a disk has 1000 tracks, each track has 10 sectors, and the number of bytes in each sector is 512, then the capacity of the disk is:

Disk capacity = 1000 × 10 × 512 = 5,120,000 Bytes = 5 MB
Logical capacity calculation method:
The logical capacity of the disk refers to the storage capacity in the disk that can be used by the file system. It is affected by the management method of the file system, the disk formatting method and Influence of factors such as disk space utilization efficiency. Usually, the logical capacity of a disk can be obtained by viewing the disk properties or using disk management tools.

It should be noted that because the file system takes up a certain amount of space when storing files on the disk, the actual available capacity of the disk is usually smaller than the logical capacity. In addition, different operating systems and file systems may calculate disk capacity differently.

  • Factors affecting disk capacity:
    (1) Track density: refers to the number of tracks that can be stored on a disk per unit length. The greater the track density, the greater the disk capacity.
    (2) Recording density: refers to the number of records that can be stored on a track per unit length. The greater the recording density, the greater the disk capacity
    (3) Surface density: refers to the maximum amount of data that can be stored on both sides of the disk. The greater the areal density, the greater the disk capacity
    (4) Disk size: The larger the physical size of the disk, the greater the disk capacity
    (5) Disk technology: Different disk technologies have different capacity limits. As technology develops, disk capacity continues to increase.

  • Disk operations: refers to the process of reading and writing data to disks in a computer system. Disk operations typically involve disk rotation and head and arm movement.

  • Disk rotation: Disk rotation refers to the process of rotating the disk so that the data bits on the disk pass through the head one by one. The speed at which a disk spins is usually expressed in revolutions per minute (RPM). The rotation speed of modern hard disks is usually between 5400RPM and 15000RPM. High-speed disks can read and write data faster, but they also consume more power.

  • Magnetic arm movement: Magnetic arm movement refers to the movement of the magnetic head to different tracks of the disk to read or write data. The magnetic head of the disk is suspended from the magnetic arm. The magnetic arm can move horizontally on the disk so that the magnetic head can access data on different tracks. Modern hard drives usually use an electric head position adjustment system that can quickly locate the specified track position. The speed at which a magnet arm moves is usually expressed in terms of seek time, which is the time it takes to move from one track to an adjacent track. Modern hard drives typically have seek times between 3ms and 20ms.

  • When performing disk operations, the computer system needs to control the rotation of the disk and the movement of the magnetic head through the controller and drive. The speed and efficiency of disk operations have an important impact on the performance of computer systems. Therefore, when performing disk operations, you need to pay attention to optimizing the way of disk access to improve the speed and efficiency of disk access.

  • Disk read operation: Disk read operation refers to the process of reading data from the disk, which includes the time required to find the magnetic head, wait for the disk to rotate to the correct sector position, and the time to read data from the disk.

  • Access time: refers to the time required from issuing a read request to actually reading the data, which includes seek time, rotation delay time and data transfer time.

  • Seek time: The magnetic head of the disk is suspended from the magnetic arm. The magnetic arm can move horizontally on the disk so that the magnetic head can access data on different tracks. Modern hard drives usually use an electric head position adjustment system that can quickly locate the specified track position. The speed at which a magnet arm moves is usually expressed in terms of seek time, which is the time it takes to move from one track to an adjacent track. The seek time of modern hard disks is usually between 3ms and 20ms, and as the span of the head movement increases, the seek time will also increase.

  • Spin Time: Spin time is the time it takes for the disk to spin to the location of the desired sector. The rotation speed of modern hard disks is usually between 5400RPM and 15000RPM. High-speed disks can read and write data faster, but they also consume more power. Therefore, high-speed disks tend to overheat under heavy loads and require additional cooling.

  • Disk access time is an important indicator that affects disk read and write performance. It is affected by factors such as seek time, rotation time, and data transfer time. Optimizing disk access methods can improve disk read and write performance, thereby improving the overall performance of the computer system.

  • Logical disk: A logical disk refers to an abstract disk created and used by the operating system. It is built based on storage devices such as physical disks or disk arrays. At the abstract level of a logical disk, the operating system can divide the disk into multiple partitions, and each partition can be independently formatted and installed with different file systems for file storage and management. The operating system can also combine multiple physical disks into a logical disk array to increase disk capacity, performance, and reliability.

  • Spare cylinders: Spare cylinders can improve disk reliability and reduce the risk of data loss. The number of spare cylinders is usually proportional to the disk capacity, and as the capacity of the disk increases, the number of spare cylinders increases accordingly. Spare cylinders may be used multiple times during the life of a disk, so when the number of spare cylinders decreases, the disk becomes less reliable and data needs to be backed up more frequently to avoid data loss.

bus

IO bus

  • What is it?

The IO bus is a data transmission channel that connects the CPU, memory and external devices in a computer system. It is responsible for transmitting data and control signals involved in input/output (I/O) operations. The IO bus is usually a set of physical circuits consisting of multiple signal lines and control lines. These lines are used to transmit data, addresses, control signals, clocks and other information.

  • working principle?

The working principle of the IO bus is to send commands and data for I/O operations from the CPU to external devices, or to transfer response data from external devices back to the CPU. In computer systems, the IO bus usually uses DMA (direct memory access) technology to speed up data transfer. DMA technology allows peripherals to directly access system memory, thereby avoiding CPU intervention and improving data transmission efficiency.

  • effect?

The function of the IO bus is to connect various components in the computer system to realize data transmission between the CPU and external devices. It is an important part of the computer system and can help the computer system achieve various input/output operations, such as printing, reading and writing from the hard disk, etc. The performance and data transmission speed of the IO bus have an important impact on the overall performance and response speed of the computer system. Therefore, when designing the computer system, it is necessary to select the appropriate IO bus type and transmission technology according to the system needs and performance requirements.

bus

  • What is it?

Bus is a commonly used data transmission channel in computer systems, used to connect various components in computer systems, such as CPU, memory, I/O devices, etc. It is a collection of lines used to carry data, address, and control signals to carry information between different components. In computer systems, Bus buses are usually divided into different types such as 8-bit, 16-bit, 32-bit and 64-bit according to the bit width of transmitted data. The larger the data bit width, the faster the data transmission speed, but it also requires more lines and hardware support.

  • working principle?

The working principle of the Bus is that the sender sends a piece of data, and then the data is transmitted to the receiver through the bus. The bus's controller, such as a chipset or controller chip, controls the bus so that different hardware components can work together and share the bus.

  • What is the function?

The role of the Bus is to enable different hardware components to share data and instructions. Through the Bus, hardware components can transmit their data and instructions to other components to implement various calculations and operations. At the same time, the Bus can also support multiple hardware components to access the same data line at the same time, thereby improving the concurrency performance of the system.

In general, the Bus is an important part of the computer system. It connects various hardware components in the computer system so that they can work together and share data. In the design and optimization of computer systems, choosing the appropriate Bus type and transmission technology can greatly improve the performance and response speed of the system.

The steps that occur when the CPU accesses the disk

  • The CPU sends a disk access request through the bus. The request includes read and write instructions, the accessed disk number, sector number, head number and other information.
  • After the disk controller receives the access request from the CPU, it converts the request into physical operations of the disk, including head seeking, head positioning, disk rotation and other operations. Among them, head seeking and head positioning are the main time costs of disk access.
  • When head seeking and head positioning are complete, the disk controller begins reading or writing data from the disk. When reading data, the disk controller reads the data into the disk cache and transmits it to the CPU through the bus. When writing data, the CPU sends the data to be written to the disk controller, which writes the data to the disk cache and flushes it to the disk if necessary.
  • When the disk controller completes reading or writing data, it sends a completion signal to the CPU and transmits the data to the CPU through the bus. After receiving the data, the CPU can continue to execute the next instruction.

In general, the CPU needs to go through many steps to access the disk, including head seeking, head positioning, data reading and data writing operations. These operations take a certain amount of time, so disk access is often one of the bottlenecks in computer systems. In order to improve disk access performance, a variety of technologies can be used, such as disk arrays, caching technology, data compression, and fast access methods.

Single-core vs. dual-core CPU

  • Single-core and dual-core are both central processing unit (CPU) terms, indicating the number of cores in the processor. Single-core processors have only one core, while dual-core processors have two cores. Each core has its own control logic, cache, execution unit, etc., which can perform multiple tasks at the same time, improving the overall performance and efficiency of the processor.

  • Specifically, a single-core processor can only execute one task and only one instruction at a time. It needs to wait for the completion of the previous instruction before it can execute the next instruction. A dual-core processor can perform two tasks at the same time, and each core can execute instructions independently, thus greatly improving the performance and response speed of the processor.

For example, if you are running multiple programs at the same time when using your computer (such as opening multiple browser windows, editing documents at the same time, etc.), the system will become slower if you use a single-core processor because the processor needs to wait for each The instructions of the next program cannot be executed until the instructions of the program are executed; but if a dual-core processor is used, each core can execute a program independently, so the system will be smoother.
It should be noted that the number of processor cores is not the only indicator of processor performance. It is also affected by factors such as processor architecture, frequency, cache, etc. Different applications and tasks have different processor performance requirements, so multiple factors need to be considered when selecting a processor.

The key to bridging the gap between CPU and memory

Temporal Locality and Spatial Locality are two important concepts in computer science. They are used to describe the patterns and rules when a program accesses memory, which is the key to bridging the gap between CPU and memory.

  • Temporal locality means that after a data item is accessed at a certain point in time, it is likely to be accessed again in the future. For example, when a program traverses an array, it will access the same memory multiple times, so these memory data have temporal locality. This access mode allows the CPU to temporarily store these data in the cache, avoiding frequent reading of data from the memory, thereby improving program execution efficiency.

  • Spatial locality means that when a data item is accessed, its nearby data items are also likely to be accessed. For example, when a program traverses a two-dimensional array, it will access adjacent memory blocks multiple times, so these memory data have spatial locality. This access mode allows the CPU to read nearby data into the cache in advance, reducing the number of times data is read from the memory, thereby improving program execution efficiency.

effect:

Both temporal locality and spatial locality can make use of the cache in the CPU to improve program execution efficiency. Therefore, when writing a program, you should try to take advantage of the characteristics of time locality and space locality to reduce frequent access to memory. In addition, factors such as CPU cache size and cache strategy will also affect the effects of temporal locality and spatial locality, which require reasonable adjustments and optimizations.

SRAM and DRAM

  • SRAM (Static Random Access Memory), static random access memory, is a memory that uses flip-flops as storage units. SRAM has fast access speeds, can read and write data quickly, and does not require refreshing when reading. The disadvantage of SRAM is that it is larger and more expensive than DRAM, and it is not as highly integrated as DRAM.
  • DRAM (Dynamic Random Access Memory), dynamic random access memory, is a memory that uses capacitors as storage units. The advantage of DRAM is that it is highly integrated, so it is smaller and less expensive. The disadvantage of DRAM is that it needs to be refreshed to prevent charge leakage in the memory cells. DRAM has slower access speeds, and it takes time to write and read data.
  • Compared:

(1) The storage units of SRAM and DRAM are different. SRAM uses flip-flops as storage units, while DRAM uses capacitors as storage units.
(2) SRAM has a fast access speed and does not need to be refreshed when reading and writing data, while DRAM has a slower access speed and needs to regularly refresh the charge in the storage unit.
(3) SRAM has a relatively high price and low integration level, so it is large in size, while DRAM has a relatively low price and high integration level, so it is small in size.
(4) SRAM and DRAM have different application fields. SRAM is usually used in situations where data needs to be read and written quickly, such as caches, registers, and other memories that need to be accessed quickly; while DRAM is usually used in situations where large amounts of data are stored, such as main memory.
In general, SRAM and DRAM each have their own advantages and disadvantages and are suitable for different applications. In computer systems, the appropriate memory type is usually selected based on the needs to achieve optimal performance and cost-effectiveness.

ROM

What is it?

ROM (Read-Only Memory) is the abbreviation of read-only memory, which refers to a memory that is programmed at the factory and cannot be modified. Compared with RAM (Random Access Memory), the main feature of ROM is that it can only read the data stored in it, but cannot write or modify its contents.

working principle?

ROM is usually used to store data that does not change over time, such as firmware, operating system, boot program, etc. Since these data do not need to be modified frequently, using ROM storage can ensure its stability and reliability. Another feature of ROM is that the data stored in it can still be retained after powering off or restarting the system, because the data is stored through physical or chemical methods rather than through capacitance or resistance.

Classification:

(1) Mask ROM: Mask ROM is a ROM that is programmed during the manufacturing process, and its data cannot be modified. Since masks are required during manufacturing, the manufacturing cost is higher, but the storage density and stability are higher.
(2) PROM (Programmable Read-Only Memory): Programmable ROM can be programmed with special equipment or a programmer after manufacturing, and data can be written into it, but once programmed it cannot be modified again. The cost is lower because it can be programmed in the field after manufacturing, but the data storage density and stability are lower.
(3) EPROM (Erasable Programmable Read-Only Memory): Erasable programmable ROM can erase the data in it through a specific device or method and then reprogram it. The advantage of EPROM is that it can be reused, but it is more inconvenient to use because it requires a special erasing device.
(4) EEPROM (Electrically Erasable Programmable Read-Only Memory): Electrically erasable programmable ROM is similar to EPROM, but can be erased and programmed through electronic signals, so it is more convenient to use.
In general, ROM is a kind of read-only memory, suitable for storing data that is stable and does not need to be modified frequently. Different types of ROM have different characteristics and application ranges, and you need to choose according to your needs when using them.

DMA

What is it?

DMA (Direct Memory Access) transfer means that devices (such as disk controllers, network cards, etc.) directly access memory through the DMA controller without CPU intervention.

working principle?

(1) The peripheral device requests DMA transfer from the DMA controller, including target memory address, transfer length and other information.
(2) The DMA controller sends an interrupt request to the CPU, requesting the CPU to stop controlling the bus so that the DMA controller can access the memory.
(3) The CPU responds to the interrupt request, releases bus control, and hands over DMA transfer control to the DMA controller.
(4) The DMA controller directly accesses the memory through the bus and transfers data to the target memory address.
(5) After the transfer is completed, the DMA controller sends an interrupt request to the CPU to notify the transfer of completion and returns bus control to the CPU.

What is the function?

When transferring data from disk to memory, DMA transfer is usually used. Because the amount of data is large and the processing power of the CPU is limited, if each byte of data requires CPU intervention, the performance of the system will be significantly affected. Using DMA transfer, the control of transferring data can be handed over to the DMA controller, allowing it to directly access the memory, reducing the burden on the CPU and improving data transfer efficiency. At the same time, because the DMA controller can directly access the memory during the data transfer process, it does not need to go through the CPU to complete operations such as address conversion and data copying, and can also improve the efficiency of data transfer.

What is the relationship between DMA, CPU and memory?

Memory refers to the storage device used by computers to store programs and data. It usually refers to main memory, which is the memory chip component located on the computer motherboard. It is connected to the CPU through the address bus, data bus and control bus. The CPU can read or write programs and data by accessing the main memory. However, when accessing external devices (such as disks, network cards, etc.), the CPU needs to communicate with the device through the IO interface. Data transmission needs to go through the IO interface and main memory and therefore cannot completely avoid CPU intervention.

DMA transfer is a special data transfer method that transfers control of data transfer to the DMA controller and allows it to directly access the memory, thus reducing the burden on the CPU and improving data transfer efficiency. However, DMA transfer is not a method for all data transfers. For some operations that require CPU processing (such as encryption and decryption, etc.), CPU intervention is still required. At the same time, DMA transfer also requires CPU initialization and management, as well as processing of interrupt requests after the transfer is completed, etc. Therefore, although DMA transfer can reduce the burden on the CPU, it does not mean that data transfer does not go through the CPU at all.

SSD

What is it?

Solid State Drive (SSD) is a storage device that uses non-volatile memory (such as flash memory) to store data. Compared with traditional mechanical hard drives, it has higher data transfer speed and lower cost. Advantages include energy consumption, better durability and smaller size. It is widely used in personal computers, servers, industrial control and other fields.

working principle?

The working principle of a solid-state drive is to read and write data through electronic signals. It is composed of a controller chip and a storage chip. The controller chip is mainly responsible for controlling read and write operations, and the storage chip is responsible for actually storing data. In a solid-state drive, data is stored in a flash memory chip. Each flash memory chip contains a certain number of storage units. Each storage unit can store a binary data bit (0 or 1). A storage unit consists of multiple storage units. Composed of a byte (8 binary bits), a storage unit composed of multiple bytes is called a sector (usually 512 bytes or 4KB). The controller chip of the solid-state drive controls the read and write operations of the memory chip by transmitting data signals to realize data storage and reading.

effect:

Solid-state drives are used to store and read data. They can replace traditional mechanical hard drives as the main memory of a computer, and can also be used as auxiliary storage.

Compared with traditional mechanical disks

Compared with traditional mechanical hard drives, solid-state drives have faster read and write speeds because they do not require mechanical parts for read and write operations, but directly use electronic signals for control, so they have lower access latency and higher data transmission. speed. At the same time, solid-state drives are also more durable because they have no mechanical parts and are not affected by vibrations and collisions, giving them better reliability and stability.

Mechanical disks are more suitable for scenarios that require large-capacity storage and sequential reading and writing, such as large databases, media storage, and backups. Although the disk capacity is already large, the advantages of solid-state drives are even more obvious when high-speed data transmission is required.

Therefore, in actual applications, spinning disks or solid-state drives can be selected according to different needs and application scenarios.

Flash memory related concepts

Flash Translation Layer (FTL), Block and Page are important concepts in flash memory.

  • The flash memory in an SSD is divided into blocks and pages. A block is the smallest rewritable unit in flash memory, usually tens to hundreds of pages in size. A page is a subunit of a block, usually several KB to tens of KB in size.
  • The Flash Translation Layer (FTL) is an important component in SSDs. Its role is to convert logical addresses into physical addresses, manage the use of flash memory blocks, garbage collection and other functions. FTL realizes the conversion of logical addresses to physical addresses by mapping logical blocks to physical blocks, and records the usage and erasure times of flash memory blocks to achieve balanced use and extend life.
  • When the operating system needs to write data, the flash translation layer first checks for free blocks and then writes the data to one page of the block. When all pages of the block are filled, the block is marked as full and no new data can be written. When the operating system needs to read data, FTL will map the logical address to the corresponding physical address, and then read the corresponding data from the flash memory.
  • Due to the special characteristics of flash memory, such as limited erasing and writing times and slow erasing and writing speed, FTL will perform block erasure and garbage collection operations to ensure the reliability and performance of flash memory.

cache

What is it?

Caching is a computer storage technology used to temporarily store frequently accessed or recently accessed data so that it can be quickly retrieved the next time it is accessed. The main purpose of caching is to improve the speed and efficiency of data access, thereby improving computer performance

A cache is a smaller but faster-access storage space usually located near a computer's processor (CPU). The data in the cache is a copy of a portion of the data in main memory (such as RAM). When the calculator needs to access certain data, it will first check whether the data exists in the cache. If the data exists in the cache (cache hit), the computer will read the data directly from the cache, saving the time of accessing the main memory. . If the data does not exist in the cache (a cache miss), the computer needs to read the data from main memory.

Caching technology is widely used in various scenarios, such as CPU cache, hard disk cache, Web cache, database cache and content delivery network (CDN), etc. In these applications, caching plays a role in reducing data access latency, reducing system load, and improving response speed.

working principle?

Caching works by leveraging locality to temporarily store frequently accessed or recently accessed data in a smaller but faster-access storage space. The principle of locality includes temporal locality and spatial locality. Temporal locality means that the data that has been accessed is likely to be accessed again in the future, while spatial locality means that data near the accessed data is likely to be accessed again.

Proceed as follows:

  • Requesting data: When the computer needs to access certain data, it will first check whether the data exists in the cache
  • Query cache: The computer looks for requested data in the cache. Cache is usually located near the computer's processor (CPU) and can be accessed faster than main memory (such as RAM).
  • Hit or miss: If the requested data is present in the cache (called a "cache hit"), the computer will read the data directly from the cache, saving the time of accessing main memory. If the requested data does not exist in the cache (called a "cache miss"), the computer reads the data from main memory and caches the data for subsequent access.
  • Replacement policy: When the cache is full but new data needs to be added to the cache, the cache manager needs to decide which data should be retained and which data should be replaced based on the replacement policy. Common replacement strategies include least recently used (LRU), least frequently used (LFU), first in first out (FIFO), etc.

Through the above steps, caching technology can significantly reduce data access delays and improve computer performance.

The cpu is accessed through hardware

The CPU (Central Processing Unit) executes instructions and accesses data by communicating and cooperating with other computer hardware components.
Here's how the CPU communicates with other hardware components:

1. System Bus: The CPU communicates with other hardware components through the system bus. The system bus consists of data bus, address bus and control bus. The data bus is used to transfer data between the CPU, memory and other devices; the address bus is used to transfer the memory address of data or instructions; the control bus is used to send control signals between the CPU and other components.
2. Memory: The CPU accesses data and instructions through memory. Memory mainly includes RAM (random access memory) and ROM (read only memory). The CPU sends memory addresses through the address bus, reads or writes data through the data bus, and sends control signals for read or write operations through the control bus.
3. Cache: The CPU speeds up access to data and instructions by accessing the cache. Cache is usually divided into three levels: L1, L2 and L3 cache. The L1 and L2 caches are located inside the CPU, while the L3 cache is external to the CPU but packaged on the same chip as the CPU. The CPU accesses the cache via an internal bus, which makes access faster.
4. Input/output (I/O) devices: The CPU communicates with input/output devices (such as keyboard, mouse, monitor, hard disk, etc.) through the I/O bus and I/O controller. The I/O controller converts the device's data into a format that the CPU can understand and transmits data between the CPU and the device.
5. Chipset: A chipset is a set of integrated circuits that coordinates and controls data transmission between various components in a computer system. Mainly including North Bridge and South Bridge. The North Bridge is responsible for coordinating high-speed data communications, such as communications between the CPU and memory and graphics cards; the South Bridge is responsible for coordinating low-speed data communications, such as communications between the CPU and hard drives, USB devices, and other peripherals.

Through the cooperation of these hardware combinations, the CPU can execute instructions, access data, and communicate with other components in the computer system.

Memory chunking

What is it?

Memory Block is a concept of computer memory management. It refers to dividing computer memory (usually RAM) into multiple consecutive areas of equal or unequal size. Memory blocking helps organize and manage memory resources more efficiently while reducing memory fragmentation and waste.

Purpose : Memory blocking is mainly used to allocate and reclaim memory space. When a program needs memory space, the operating system will find a suitable space in the available memory block and allocate it to the program; when the program no longer needs the memory space, the operating system will reclaim and release the memory block so that other programs can use it.

**Memory management method: **According to different memory management methods, memory blocks can be divided into fixed partitions, dynamic partitions, and paging/segmentation.
+ Fixed partition: The memory is divided into fixed partitions of equal or unequal sizes. Each partition can only hold one program. This method is simple and easy to implement, but it may lead to memory fragmentation and waste
+ Dynamic partitioning: Memory is dynamically allocated on demand, and as much memory as the program needs is allocated. This method can make more efficient use of memory space, but the overhead of allocating and reclaiming memory is relatively large
+ paging/segmentation: memory is divided into pages (Pages) or segments (Segments) of equal sizes. A program's memory space is mapped into multiple pages or segments, which may be stored non-contiguously in memory. This approach can effectively reduce memory fragmentation and waste, while supporting advanced features such as virtual memory and memory protection.

**Memory allocation strategy:** The operating system can adopt different strategies when allocating memory blocks, such as First Fit, Best Fit, Worst Fit, etc. These strategies affect the efficiency and fragmentation of memory allocation.

Through memory blocking, the operating system can manage memory resources more effectively, meet the memory requirements of programs and reduce memory fragmentation and waste.

Why is dynamic partitioning expensive to allocate and reclaim memory?

Dynamic partitioning memory management is a method of dynamically allocating memory to programs on demand. While it makes more efficient use of memory space, it does have a larger overhead when allocating and reclaiming memory.
Here’s why:

  • Finding a suitable free memory block: In dynamic partition memory management, when a program requests memory, the operating system needs to traverse the memory space and find a free memory block large enough to meet the program's needs. This process may involve a large number of search and comparison operations, resulting in a large time overhead
  • Memory fragmentation: Dynamic partitioning is prone to memory fragmentation, especially external fragmentation. External fragmentation refers to unusable free memory due to uneven size and location distribution of memory blocks. As the program runs and memory is allocated and reclaimed, external fragmentation may become increasingly severe, resulting in slower memory allocation and lower memory utilization.
  • Memory merging and splitting: In dynamic partition memory management, the operating system needs to merge and split memory blocks when allocating and reclaiming memory. For example, when two adjacent free memory blocks are reclaimed, the operating system needs to merge them to reduce external fragmentation; when a free memory block is allocated to a program, the operating system may need to split it into the required size. These operations add memory management complexity and overhead.
  • Data structure and algorithm overhead: In order to implement dynamic partition memory management, the operating system needs to maintain some data structures (such as linked lists or trees) to track the status and location of memory blocks. At the same time, the operating system needs to implement some algorithms (such as first adaptation, best adaptation, or worst adaptation, etc.) to allocate and reclaim memory. These data structures and algorithms add memory management overhead.

Explanation of related terms:

Memory copy the entire block:
Memory copy the entire block refers to completely copying the data of one memory area to another memory area. This is usually achieved by using a memory copy function (such as the memcpy function in C language).
Memory copying the entire block is usually used in the following situations:
+ Data backup: copy the data in one memory area to another memory area to back up the data to prevent data loss
or damage
+ Data transfer: copy the data in one memory area to another An area of ​​memory so that data can be transferred between different processes
or systems.
+ Memory management: When doing dynamic memory allocation, you can use memory copy to copy the allocated memory block to a
new memory block to better manage the memory space.
It should be noted that when performing a memory copy operation, you should ensure that the target memory area has enough space to store the
data in the source memory area, and pointers and boundary conditions should be handled carefully to avoid problems such as memory leaks and out-of-bounds access.

Cache hits and cache misses:
A cache is a fast storage device in a computer that is used to temporarily store frequently accessed data for quick access. When the CPU needs to read or write data, it will first search the cache. If the data already exists in the cache, it is called a cache hit; if the data is not in the cache, it is called a cache miss.

A cache hit indicates that the required data is already in the cache, so the CPU can read or write the data directly from the cache without fetching the data from main memory, which can speed up data access. The cache hit rate represents the ratio between the data stored in the cache and the data actually accessed within a certain period of time. It is an important indicator for measuring cache performance. It is usually hoped that the higher the cache hit rate, the better.

A cache miss means that the required data is not in the cache and needs to be fetched from main memory, which introduces additional latency and overhead, resulting in slower access. When a cache miss occurs, the system attempts to read the required data from main memory into the cache so that the next access can read the data directly from the cache. If the cache capacity is insufficient or the cache replacement algorithm is improper, frequent cache misses may result, thereby degrading system performance.

Conflict Miss:
Conflict Miss is a type of cache miss, which refers to the fact that due to the way the cache is implemented, some blocks in the cache are shared by multiple data blocks, resulting in conflicts between these data blocks. A conflict occurs and cannot be cached in the cache at the same time. In the simplest cache implementation, the size of the cache is divided into multiple fixed-size blocks, each of which can cache a portion of the data. A conflict miss occurs if two or more data blocks are mapped into the same cache block.
For example, if the size of the cache is 8 blocks, and the address of the data block is mapped to the cache using the mapping function i mod 8, then the data blocks with the same remainder will be mapped into the same cache block, resulting in a conflict. Hits happen.

In order to solve the problem of conflict misses, more complex cache implementation methods can be used, such as associative mapping (Associative Mapping) or set-Associative Mapping (Set-Associative Mapping). In these implementations, data blocks can be mapped into multiple cache blocks, thereby reducing the probability of conflict misses and improving cache efficiency and performance.

Working Set:
Working Set refers to the set of all pages that a process accesses within a period of time. In an operating system, a process may need to access multiple pages, and these pages may be scattered in different locations in main memory. The working set can be understood as the set of all pages currently required by the process. It can reflect the memory access situation of the process and is also an important reference for memory management and page replacement algorithms.

Generally speaking, the operating system loads the pages required by the process into memory so that the process can quickly access these pages and reduce the time and overhead of accessing main memory. If the number of pages that a process needs to access exceeds the size of available memory, problems such as page faults or page replacements may occur, resulting in performance degradation.

Working set size can be used to evaluate memory usage and predict memory requirements. If the working set size is smaller than the available memory size, then the process can run well without problems such as page replacement; if the working set size is larger than the available memory size, then you need to adjust the memory usage of the process or increase the available memory. to solve the problem. Therefore, understanding the concept and size of the working set is very important for memory management and performance optimization of the operating system.

Associative mapping:
Associative mapping is a cache mapping method and one of the most commonly used mapping methods in cache. In associative mapping, each block in the cache can be mapped to any data block, unlike in direct mapping, where each data block can only be mapped to a fixed cache block.

Specifically, the cache structure of the associative mapping contains two parts: the cache block array and the tag array. The cache block array is used to store data blocks, and the tag array is used to mark which cache blocks are valid, which cache blocks are empty, and which cache blocks store which data block.

In associative mapping, when a data block needs to be accessed, the cache controller will first search for a valid cache block in the tag array. If there is an empty cache block, the data block will be read into the cache block. If there is no empty cache block, you need to select a cache block to replace, which is usually implemented using some replacement algorithms, such as the Least Recently Used algorithm. When choosing a cache block to replace, you need to consider which cache blocks store the least frequently used data blocks, and then replace that cache block with the new data block.

The functions and advantages of association mapping:

  • Reduce low cache hit rates. Since cache blocks can be mapped to arbitrary data blocks, associative mapping can effectively avoid the capacity and conflict miss problems in direct mapping and set associative mapping. In associative mapping, if a data block is in the cache, it can be found in any cache block without following a fixed mapping method.
  • Improve cache utilization. Since each cache block in associative mapping can be mapped to any data block, cache space can be used more flexibly and redundant data storage can be reduced. This can improve the cache utilization, allowing the cache to cache more data blocks, thus improving the cache hit rate.
  • Supports more flexible replacement algorithms. Since each cache block in the associative mapping can be mapped to any data block, a more flexible replacement algorithm can be used when replacing cache blocks, such as the LRU (least recently used) algorithm. This can better utilize cache space and improve cache hit rate and efficiency.

Compared with direct mapping, associative mapping can avoid the problem of conflict misses, and can use replacement algorithms to optimize cache efficiency and hit rate. However, due to the need to compare each data block, the hardware implementation of associative mapping is usually more complex than direct mapping, and requires more hardware resources and therefore is more expensive.

How each block in the cache maps to a data block:

  • Direct Mapping: In direct mapping, each block in the cache is mapped to a unique data block in main memory. The specific mapping method is to determine the location of the cache block based on a part of the main memory address. For example, if the cache size is 4 blocks and the main memory address is 12 bits, then the lower two bits of the main memory address can be used to determine the location of the cache block, that is, the remainder of the main memory address modulo 4 is used as the index of the cache block.
  • Associative mapping: In fully associative mapping, each block in the cache can be mapped to any data block in the main memory. The specific mapping method is to determine which block is mapped to the data block by comparing the main memory address and the tags of all blocks in the cache.
  • Group association mapping

No matter which mapping method is used, the mapping relationship between cache blocks and data blocks needs to be maintained in the cache controller. When a data block needs to be accessed, the cache controller will first determine whether the data block has been cached in the cache based on the mapping relationship. If it has been cached, the data block in the cache can be accessed directly. Otherwise, it needs to be read from the main memory. block of data and store it into the cache.

Guess you like

Origin blog.csdn.net/m0_56898461/article/details/129748988