2024 PubMed 408-Computer Composition Principles Chapter 3-Storage System Study Notes

Article Directory

foreword

Currently preparing for the 24 postgraduate entrance examination, the knowledge points learned and sorted out in 24 computer 408 are now summarized.

Blogger blog article directory index: blog directory index (continuously updated)

1. Memory overview

image-20230529150359857

image-20230529150418828


1.1. Hierarchical structure

image-20230529151010582

Registers include ACC, MQ, etc., which are much faster than cache.

image-20230529151027763

  • Data exchange between main and secondary memory is handled by the system programmer, with page replacement algorithms included in the operating system.
  • The data exchange between the Cache and the main memory is automatically completed by the hardware, and the software programmer does not need to care about the data exchange between them. This part is realized by the hardware engineer.

At this time, due to this relationship, the main memory seen by application programmers is usually larger. The following are the problems solved by main memory-auxiliary memory and Cache-main memory :

image-20230529151044895

The speed and price of each layer of storage : CD->mechanical hard disk->solid-state drive->memory stick

  • The speed is from low to high, and the price is also from low to high

image-20230529151537856


1.2, memory classification

1.2.1, hierarchical classification

image-20230529154927802

image-20230529155031978


1.2.2. Classification of storage media

image-20230529155318182

①Semiconductor memory: such as main memory, Cache

image-20230529155439228

② Magnetic surface storage: floppy disk, tape, mechanical hard disk (from left to right in the figure below)

image-20230529155453999

③ Optical storage: CD, DVD, VCD are all optical storage

image-20230529155458459


1.2.3. Access method

① Random access memory , such as a memory stick: the time to access the specified address is the same

image-20230529155641494

②Sequential access memory : the tape in the repeater

If you need to read a piece of content on the disk, you need to wait for the head to go there

image-20230529155717872

③ direct access memory

For example, mechanical hard disks and magnetic disks are typical direct access memories, which have both random access and sequential access characteristics.

First, the head arm will move back and forth to the area you want to read, and then there will be a disk, which will slide continuously, and the disk can be read and written accordingly.

Access speed (small to fast): sequential storage memory -> direct access memory -> random access memory.

image-20230529160152481

④ Associative memory: According to the content you are looking for, directly find where the content corresponds.

image-20230529160402414

Different: ①-③Access by address, ④Access by memory.


1.2.4. According to the changeability of information (read-write, read-only difference)

image-20230529162113593


1.3. Memory performance indicators

①Storage capacity : 存储字数x字长(such as 1M x 8 bits)

  • The number of MDR bits reflects the memory word length. The MAR response stores the word count.

②Unit cost : 每位价格=总成本/总成本, the cost of money paid for each bit.

Example:

image-20230529162517150

③Storage speed : 数据传输率=数据的宽度/存储周期, the data width is the storage word length

  • One memory cycle can read or write as much data as one memory word.
  • The access cycle is as follows:

image-20230529162907572


knowledge review

image-20230529150235646

2. Main memory

2.1, the basic composition of the main memory (introduction to DRAM)

image-20230529165801623

2.1.1, the basic components of the main memory

The main memory is divided into three parts: memory bank, MAR (address register), and MDR (data register):
image-20230529170013028

These three parts are used in conjunction with each other in the sequence control logic circuit:

image-20230529170124681

  • A memory bank is composed of multiple storage units => each storage unit is composed of multiple storage elements => using one storage element can store a binary bit of 0 or 1

image-20230529171212188

  • Give the MOS tube a threshold voltage (5v) to conduct electricity, if not, then an insulator will not conduct electricity.

2.1.2. Basic principles of memory chips

Based on the above storage elements, the following read and write principles are introduced:

  • The principle of reading binary : usually the data 0 or 1 is stored in the capacitor, so how to read it? You can add current to the MOS tube. If it reaches the threshold of 5V, 1 will flow out from the right end at this time, otherwise 0 will flow out.
  • The principle of writing in binary : a 5v high level can be added to the right end of the MOS tube, and a 5v level can be added to the MOS tube at the same time. At this time, the MOS tube can be connected, and 1 will be stored in the capacitor. At this time, the MOS tube If the tube port is connected, the charge in the capacitor cannot escape.

**How ​​to read multiple stored binary values? **Proceed by reading the storage unit, as follows, a storage unit composed of multiple storage units, connect each MOS tube on the storage unit, if you want to read the storage unit values ​​of a group of units, directly upload all voltage, it is possible to read all the binary bits of a memory cell (capacitance in each memory cell).

  • At this time, a memory bank consists of multiple memory units.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-hoJqQwm8-1686883020451) (C:\Users\93997\AppData\Roaming\Typora\typora-user-images\ image-20230529171556838.png)]

How to decide which storage word we want to read or write based on the address ?

image-20230529172423749

  • At this time, the use of the decoder is involved, and the decoder will change to a high-level signal specifying a certain line according to several addresses given in the MAR.
  • Each address will correspond to a line of the decoder.
  • An address signal can be converted into a high-level signal of a certain output line of the decoder. Once this word selection line is turned on, we can transmit each binary data to the MDR through the data line. At this time, the CPU will take an entire data from the MDR through the data bus.

Calculate the total capacity : if there are three bits, there are 8 lines.

image-20230529172522561

The function of the control circuit : ① It may not be stable when transmitting data to the MAR, so one of the tasks of the control circuit is to open the decoder switch to read and translate the address only when the data in the MAR is stable. give signal. ② At the same time, the data transmitted to the MDR must be stabilized before being transmitted to the data bus, which is also controlled by the control circuit.

image-20230529173531111

It has two chip select signals :

  • CS: chip select, chip select signal
  • CE: chip enable, chip enable signal

There are two additional read/write control lines, and there are two design schemes : ① A separate line for reading and writing. ② Read and write a line, and determine what operation is based on the input high and low levels.

Note: Usually the title will tell you how many read/write lines to use.


2.1.3. Complete memory chip and package diagram (know the chip selection line and the function of metal pins)

A complete memory chip is as follows :

image-20230529173601127

At this point, the entire memory chip is packaged :

image-20230529174653062

  • The driver is mainly used to amplify the signal
  • The address line is used to receive address information from the outside world, and the CPU transmits it through the address bus.
  • data line for data transmission.
  • Chip select line: to determine whether this chip is available.
  • Read and write control lines: there may be one, or there may be two.

What is the function of the chip select line ?

image-20230529174500508

  • A memory stick may contain multiple memory chips, and sometimes if we specify to read the data of a certain chip, then we can transmit low voltage for the data of that chip, and transmit high voltage for other memory chips. The memory chip is read.

**Corresponding to each memory chip, there are metal pins as shown in the picture below. What is this? **This is the pin corresponding to the address line, data line, chip select line and read/write control line after we packaged it in the above figure, and it is used for other modules to pass in data.

image-20230529174633516

image-20230529175357436


2.1.4, memory chip addressing

The problem about addressing is mainly to look at the storage matrix in the memory, where the data is stored : the addressing mode can be addressed according to byte, word, half word, double word, and the number corresponding to different addressing is also different. Larger is a multiple, the number of storage units can be shifted to the right

image-20230529175725392


knowledge review

image-20230529180136202


2.2, SRAM and DRAM

image-20230529181352507


2.2.1. Differences in characteristics caused by different storage elements

DRAM

In Section 2.1, the DRAM chip is studied, which uses capacitor storage , and for SRAM, it uses a bistable flip-flop to store information:

image-20230529181746513

image-20230529182844491

SRAM

The read/write principle of the bistable flip-flop :

image-20230529182801974

Among them, there are 6 bistable flip-flops, which can show two stable states in this memory:

  • The first kind of steady state A is high level, and B is low level, which corresponds to binary 1 at this time.
  • The second steady state A is low level, and B is high level, which corresponds to binary 0 at this time.

There is only one data line for the left capacitive storage element to read data, and the right bistable flip-flop needs two to read data. We can determine what is read according to whether BL or BLX reads 0 or 1 signaled.

How to write data?

  • When writing 0, add low level to BL and high level to BLX.
  • Write 1, left high and right low.

2.2.2 The difference between DRAM and SRAM

The difference between the two :

On the one hand: the difference caused by the storage element, that is, whether the read data needs to be rewritten. DRAM needs a rewrite operation after reading data, while SRAM does not need to be rewritten.

On the other hand :

  • Manufacturing cost: DRAM only needs one capacitor, and SRAM needs 6 MOS transistors. It can be seen that the cost of DRAM is lower, and the cost of SRAM is higher.
  • Integration: The capacitance of DRAM is very small, which is much smaller than that of SRAM. The same position must have more capacitors in DRAM, so the integration of DRAM is high, and the integration of SRAM is low.
  • Power Consumption: DRAM requires less power than SRAM because it has fewer components than SRAM.

2.2.3 Refresh of DRAM and SRAM

2.2.3.1. Understanding Refresh

image-20230529220328838

For DRAM, it needs to be refreshed every 2ms. If the data in the original capacitor is not refreshed, it will be lost!

2.2.3.2. Refresh detailed and in-depth understanding

A detailed introduction for DRAM refresh :

1. How often does it refresh?
  • Refresh cycle: generally 2ms.
2. How many storage units are refreshed each time? Introducing both models (with examples)
  • In units of rows, one row of memory cells is refreshed each time, but for each refresh of one row of memory cells, the efficiency is very low. If there are 20 bits, then 2 20 , which is a million memory addresses, is required. Imagine if one row is refreshed at a time The storage unit needs to be refreshed millions of times, which is very time-consuming, so the row and column addresses must be introduced .
  • image-20230529221930306

Simple model:

image-20230529220646515

Row and column address model:

Divide a decoder into two row-column decoders, then one decoder is responsible for 2 10 , and only needs to be executed 2 10 times in total. At this time, only 1,000 times are needed, which greatly improves efficiency. Split into n/2:

image-20230529221032164

With the development of memory, the storage capacity is getting larger and larger, and now some memories are arranged in three dimensions, and the principle is similar.

Example : Give you an address 00000000 how to access.

The first solution: the strobe is the storage unit 0

image-20230529221358500

A total of 2 8 or 256 gate lines are required:

image-20230529221748163

The second solution: each is disassembled and sent to the row address decoder and the column address decoder respectively

image-20230529221529815

Only 32 are needed for the row and column address decoder:

image-20230529221830192

3. How to refresh?
  • With hardware support, read out a row of information and rewrite it, taking up one read/write cycle.
4. When will it be refreshed? (decentralized, centralized, asynchronous refresh)
  • Assuming that the internal structure of DRAM is arranged in the form of 128x128, the read/write cycle (or access cycle) is 0.5us [the time to refresh a row]. 2ms total 2ms/0.5us=4000 cycles.

There are a variety of refresh ideas as follows:

①Scattered refresh: This idea can be refreshed 2000 times in a 2ms cycle, which can satisfy 128 times in 128x128.

image-20230529222449274

②Centralized refresh: concentrate on reading and writing for a period of time, and then concentrate on refreshing for a period of time. During this period of refreshment, the memory cannot be accessed

image-20230529222644638

③Asynchronous refresh: In this process, the dead time is dispersed, and the time period when the CPU does not need to access the memory (for example, the time period when the CPU is decoding) can be used to refresh

image-20230529223011404


2.2.4. Send row and column addresses (optimization of DRAM, address line multiplexing technology)

Simultaneous transmission: the front part of an address is sent to the row address decoder, and the second half of the address is sent to the column address decoder. At this time, the row and column addresses need to be transmitted at the same time, that is, n-bit address lines are required for simultaneous transmission.

image-20230529223601332

In DRAM, the address line multiplexing technology is usually used: that is, the row address and the column address are transmitted twice before and after, using n/2 address lines, and by adding a row address buffer and a column address buffer, they are transmitted separately

image-20230529225645386

Through this strategy, the original n address lines can be optimized to n/2 at this time.


Review of this section

image-20230529181442109


2.3, read-only memory ROM

2.3.1. The development history of ROM (MROM, PROM, EPROM, flash memory, SSD)

image-20230530090541193

The development history of ROM is as follows : Based on the five

①MROM: A memory that can only be read but not written.

The initialization is completed by the manufacturer, and the specified data can be written according to the requirements. A mask technology is used, and only batch customization can be performed for the mask technology.

image-20230530093015616

② In order to improve the flexibility of read-only memory, PROM was invented at this time. Users can use a dedicated PROM writer to write information, which cannot be changed after writing once.

image-20230530093030994

③ Then EPROM appeared, which can allow multiple rewriting. Data can be written into it through some special means.

image-20230530093041493

For EPROM is divided into the following two types:

  • UVEPROM (uv ultraviolet light): Ultraviolet rays can be used to irradiate and erase. Note that it is all information rather than part of the information. [The flexibility is not high]
  • EEPROM: Electrically erasable technology can be used to erase specific words . 【High flexibility】

④ At this time, Flash Memery (flash memory) appeared again, which retains the advantages of EEPROM and can save information after power failure, and can also perform multiple rapid erasing and rewriting.

Points to note: Flash memory needs to be erased before writing, and the writing speed is slower than the reading speed.

image-20230530093124061

  • High bit density explanation: For two chips with the same size, the number of Flash Memery storage elements is more than that of RAM, and more bits can be saved.

⑤ With the development of technology, SSD (solid state drive) appeared again, which also stored binary data by flash memory chips, and a control unit was added to control the reading and writing of multiple flash memory chips.

In fact, the media of many solid-state hard disks and U disks are flash memory chips, and the difference from U disks is that the control unit is different.

image-20230530093246068


2.3.2. Understanding BIOS

A program that the CPU has just begun to execute needs to read instructions from the BIOS:

image-20230530093924257

In fact, the main memory also includes ROM, which is composed of RAM+ROM, and the two are uniformly addressed, as follows:

image-20230530094059769


Review of this section

image-20230530094226212


Third, the connection between the main memory and the CPU

image-20230530172925178

3.1. The connection between a single memory chip and the CPU

image-20230530174818007

  • Word extension: A single memory chip is 8x8 bits, and the main memory address is connected to 8 bits. If you want to expand the number of words in the main memory, you can use this extension. [Multiple memory chips can be connected to expand the word count]
  • Bit extension: For an 8x8 memory chip, an address has only 8 bits, and the current calculator can process 64-bit data at a time. If the word length of the memory chip is less than the width of the data bus, then bit extension can be used. [Through the reasonable connection of multiple memory chips, the storage word length of the entire main memory can be expanded to be consistent with the width of the data bus]

In the past, MDR and MAR were integrated in the memory chip, and now these two components are integrated in the CPU in the computer:

image-20230530175341205

  • The role of the three components in the CPU : MDR transmits data through the data bus; MAR transmits the address through the address bus to transmit data; the CPU also needs to send some control information through the control bus to perform some read and write operations with the main memory.

At present, the main memory contains multiple memory chips as follows :

image-20230530175600844


3.2. Introduction to the input and output signal components of the memory chip

image-20230530180131135

  • Address (Address), Data (Data), WE (write), WR (read)

image-20230530180932971

WE: Indicates the write enable signal, write Enable, if it is high, it means writing data, and if it is low, it means reading data. The CPU also has a corresponding WE line to send signals, which are sent through control signals.

CS: chip select signal, since only one chip works, it can be directly connected to a high-level signal, because there is no horizontal line on the head of CS, which means that the chip select signal is active at high level.

Looking at the overall connection situation, it can be found that the data bus is only connected to one bit, and the address bus is only connected to a part, which does not give full play to the performance of the CPU. To solve this problem, you can add a chip of the same type to the main memory


3.3. Connection between multiple memory chips and CPU

3.3.1, bit extension

Bit extension : At this point we have a total of two main memories. Generally speaking, the data read bit length is two bits. At this time, two bits can be read and written at the same time.

Connect a new memory chip, you can connect the new D0 to the D1 of the CPU, and the corresponding A address line is consistent with the first connection, and you can read them all into their respective address lines.

image-20230530181246703

Complete the 8-bit data line expansion as follows, we need to add 8 8K x 1-bit to complete the expansion:

image-20230530182019964


3.3.2. Word expansion (line selection method and decoding chip selection method, including example of word expansion)

There are mainly two options :

image-20230530205026389

Line selection method: n lines on the CPU correspond to n memory chips.

  • For example: A0-A13 corresponds to each address line of the memory chip, A14 corresponds to memory chip A, and A15 corresponds to memory chip B. Then only 01xxxxx and 10xxxxx can respectively represent a group of memory chips, so 00 and 11 are directly wasted (for the line selection method, 00 is not selected, and 11 will cause data line reading conflicts, which are not allowed. Condition)

Decoding chip selection method: n lines on the CPU correspond to 2 n memory chips, and the utilization rate is better.

  • For example: A0-A13 corresponds to each address line of the memory chip. For one line + one decoder, it can correspond to two memory chips. 0 means that one memory chip is available, and 1 means that the other memory chip is available. The utilization rate of the selection method is higher.

line selection

image-20230530205932183

Use the two address lines of the CPU as an example:

The lowest address and the highest address when one of A13 and A14 is 0 and the other is 1: respectively represent

image-20230530210234548

Both A13 and A14 are 1, which will cause a conflict: you can see that there is a conflict at the bottom

image-20230530205800017

If they are all 0, since CS indicates that the high level is active, neither memory chip will be selected at this time.

decode chip select

Decoding chip selection method: use a decoder to achieve conversion, 1 address line can be converted to 2, and 2 address lines can be converted to 4, that is, 2 n

At this time, we only use one A13 to correspond to a decoder. When A13 is 1 : after being converted by the decoder, if the transmission of the first memory chip is low level, then the first one is not selected, and the second one is stored The chip transmits 1, which is high level, and the second memory chip takes effect at this time

image-20230530210522399

When A13 is 0, the first memory chip takes effect at this time:

image-20230530210752261

The address ranges are:

image-20230530210806862

It can be seen that the A13 above just selects one to convert to two, then the corresponding decoder is a 1-2 decoder: there can be 0xxxx and 1xxxx corresponding to two memory chips

image-20230530210918777

3-8 decoder: the input signal is 000, 001, 010, 011, 100, 101, 110, 111, and can be connected to 8 memory chips

image-20230530210930814

The following is an example of expanding 4 memory chips : there is a horizontal line on the memory chip CS below, indicating that the low level is active. At this time, you can also see a circle on the corresponding decoder to indicate right and wrong. That is, when this line is transmitted to the CS horizontal, it is only valid when it is 0:

image-20230530211716435


3.3.3. Simultaneous expansion of word bits

Example of word bit expansion at the same time:

image-20230530212006355

The CPU is a 64-bit address line, the data width is 8 bits, and the corresponding memory chip is 16k x 4 bits. At this time, we need 4 memory chips to perform word expansion, and 2 memory chips need to perform bit expansion. Expand at the same time, so 2*4=8 blocks are needed. You can see that the red box on the left of the above figure can form a 16k x 8-bit, so four are needed for sub-expansion, which can realize the expansion of the main memory capacity.


Supplement: Decoder (including the process of introducing CPU->decoder->memory chip, detailed explanation of RAM read cycle)

You can see that the left and right sides of the figure below are two different effective situations, the left is active at high level, and the right is active at low level:

image-20230530213031215

The corresponding decoder also contains enable signals: the one on the left is single, and the one on the right has multiple enable signals

image-20230530213952745

Introduce the process of CPU->decoder->storage chip : CPU will first send address signal through address line, including lower 13 bits and higher 3 bits, address information is output through electrical signal, when CPU just When the electrical signal starts to be output, the electrical signal may be unstable, so after the CPU sends these address information, it needs to wait for a while, and then send the main memory request signal (MREQ) after the current is stable, that is, to make a signal of the strobe line valid , when a memory chip is gated, the signal received by the memory chip must be stable.

image-20230530214111193

Detailed explanation of RAM read cycle :

image-20230530214949244

image-20230530214858982


Summary review

image-20230530212659972

Bit extension: Make the word length of the memory longer, so as to better utilize the transmission capacity of the data bus.

Word expansion: It can increase the number of words stored in the memory and make better use of the addressing ability of the CPU.

The transmission capability of the data bus can be expanded in different dimensions, and the number of words stored in the memory can be increased to better utilize the addressing capability of the CPU.


4. Improve the reading speed of main memory Dual-port (dual-port RAM, multi-module memory strategy)

4.1. Understand the access cycle and raise questions (including knowledge overview)

The access cycle is as follows :

image-20230531112637292

The read request for the DRAM chip is a destructive request, which needs to be restored after reading. The access time in the above figure is the time for DRAM to read a word. After reading, it takes a period of recovery time. During this recovery time, the CPU cannot access it.

  • For SRAM recovery time will be much shorter.

At this time, the problem is raised and solved for the situation that the memory chip needs to wait for the recovery time when reading a word :

1. One of the multi-core CPUs needs recovery time after accessing a block of memory. Since it is multi-core, another CPU will also access this block of space during the recovery time of the previous CPU. How to solve it?

  • Solution: dual-port RAM.

2. The reading and writing speed of the single-core CPU is much faster than that of the main memory. How to solve the long recovery time of the main memory?

  • Solution: multi-module memory.

image-20230531091830228


4.2. Dual-port RAM solves problem 1: accessing the same storage space, how do multiple CPUs solve the waiting recovery time

The dual-port RAM is designed as follows :

image-20230531113225271 Introduction : You can see that the left and right sides of a RAM in the above picture can be connected to the Unicom CPU. If we want to support this multi-CPU access mode, we need to have two completely independent data lines, address lines, and control lines in a RAM memory. Lines, at the same time more and more complex control circuits are required in RAM.

Benefits : For this design method, the speed at which a multi-core CPU can access a memory stick can be optimized.

Multiple ports allow operations on the same main memory : different ports access data to different address units; different ports can read data from the same port;

  • Operations not allowed such as: write write, read and write.

Solve the operation plan that is not allowed : when using dual ports, there will be four situations, and the writing, reading and writing operations should be prohibited, and this prohibition is realized through the circuit.

image-20230531113657820


4.3. Multi-body parallel memory solves problem 2: a single CPU accesses the main memory multiple times and waits too long for recovery

4.3.1. Recognize high-order and low-order cross addressing (understand the difference and read speed difference)

Aiming at the long waiting time for a single CPU to access the main memory multiple times, we can use multi-body parallel memory to solve it, that is, split a memory into multiple blocks for CPU access, which can be divided into high-level and low-level cross- coding Address :

image-20230531114406606

What is the difference between high and low cross addressing ? Each address is composed of body number and internal address. For high-order cross addressing, the body number comes first, and the body address follows; for low-order cross-coding, the body address comes first, and the body number follows.

image-20230531114517131

For different address designs, the corresponding address numbering order will also be affected differently. We can find that the memory address order of high-order cross addressing is from the first root from top to bottom, and then the second one down, while for Low-order cross-addressing is the separation of multiple blocks corresponding to each address. What impact does it have on such a difference?

Here's an example: to access 5 consecutive blocks of addresses, let's look at the reading time of high-order cross addressing and low-order cross addressing respectively.

① High cross addressing

Since the 5 consecutive addresses are in the first memory divided into four, it is equivalent to accessing the same root 5 times in a row, and the time of each access includes the access time 1r and the waiting time 3r:

image-20230531115051580

It can be seen that each visit actually takes 4r, that is, 1T. The second time because the same root is visited, it needs to wait for the end of the time to visit. The same is true for the next few times, and the total time required is 5T.

② Low cross addressing

image-20230531115335010

The low-order cross-addressing address number is separated by one, so when accessing the first bit, it is at the first root. At this time, the reading time is 1r, and the second root can be read directly after 1r, without further processing. Wait, the same is true for the next 3 and 4 roots. Since we set 4 low-order cross addressing above, we will visit the first root when we access the fifth bit. At this time, we can find that the waiting time of exactly 3r has passed At the end, you can access directly without waiting.

Using this method can be read at a very efficient speed, and the time-consuming only needs T + 4r, which is 2T!

Summary : When accessing the memory of a memory stick, there must be a waiting time for each access. For the same sequence of reading addresses

  • High memory: read five in sequence, then 5 * 5r= 25r 1T=5r, then it is 5T
  • Low memory: read five sequentially (accessing different storage units) is 5 * 1r = 5r, and when reading the fifth block, which is the first block, the previous waiting time has just ended and can be read again , the time-consuming at this time is 1T+4r = 2T. | Calculation of time overhead at the micro level: T + (n - 1).r, the time to read and write a word at the macro level is close to r.

From the perspective of efficiency, the low position is 4 times higher than the high position.


4.3.2. Why should we discuss the situation of "continuous" access?

The execution of program instructions is executed one by one, and is generally stored continuously, unless an if else situation is encountered.


4.3.3 How many individuals are optimally used for low-order cross addressing?

It can be seen from the above that the low-level cross-addressed multi-bank memory is very efficient for continuous address access, so how many banks should be taken ?

  • It should be ensured that the number of modules m>=T/r (access cycle T, access time r), if m < T/r, then when the T-th one is read, the T/r ones are still waiting to be restored, and cannot At this time, you need to wait for reading.

image-20230531120101451

The following is the case of different number of modules :

image-20230531120450398

Summary statement:

image-20230531120125905


4.3.4. Thinking: Given an address x, how to determine which memory bank it belongs to?

① Judge directly according to the body number at the end. ②According to the given m, let x % m perform remainder processing.


4.3.5. The difference between multi-bank parallel memory and single multi-word memory

image-20230531120552754

The former can read any word in a certain memory, while the latter can only read one line at a time, and sometimes reading several words across lines will result in more data being read.

Speed ​​comparison: the former reads one word close to r, and reading four is 4r; the latter reads one line at a time, and it takes 1T to read one line, which is also 4r, so the reading speed is similar.


Extended introduction: What is dual channel? How to combine for dual channel?

image-20230531121938579

The expansion and dual channels we talk about in daily life are actually the high-level crossover and low-level crossover situations we introduced above. Using low-level crossovers can form dual channels, and the reading speed can be greatly improved at this time!

How to form a dual channel ? One 16GB card can be replaced by two 8GB cards, which are inserted into the same color card slots above, and a dual channel can be formed at this time.

So why do we choose the same main frequency and the same capacity ?

  • Same main frequency: If you buy memory sticks with different main frequency, the memory stick with high main frequency will be down-clocked
  • Same capacity: If the capacity is not equal and dual-channel is used, the two parts of the same capacity form a low-order cross-addressing, which is very smooth when reading this piece of memory, while the capacity of the extra part of the memory stick is single Channel, at this time there will be a freeze problem during the game running.

Actual combat : For the dual channel in the computer, we can also check it in the system. For example, the following two 8GB slots are in slots 0 and 2 respectively, so they are configured with dual channels when they leave the factory.

image-20230531122248263


Review of this section

image-20230531122353861


5. External memory

image-20230531213854879

5.1, disk storage

5.1.1. Understanding disk storage

Metering group: mainly examines hardware features; operating system: examines disk management and scheduling algorithms.

For the 8bit data sent by the host, we need to build a circuit to write the 8bit data into 1bit and 1bit serially, and the same is true for reading data.

image-20230531214244201

The principle of reading binary bits from the disk : when the tape passes under the magnetic head, the magnetic head can write data in 1bit and 1bit, and only 1bit can be written at a time; the same is true for reading data, which can be read by reading coil 1bit and 1bit. .

Features : ① Every time the disk is read or written, it operates on 1 bit and 1 bit. ② Read and write operations cannot be performed together.

Advantages and disadvantages of magnetic surface memory :
image-20230531214309542

Get to know disk storage :

image-20230531214434979


5.1.2, Composition of disk storage

①Storage area

image-20230531214609348

Heads: Each disk has a read-write head (a disk storage may have multiple disks).

Tracks: The circles of each disk are the tracks. Since there are multiple disks inside the disk, each disk is divided into multiple tracks.

Cylinder: Tracks with the same number on different platters form a cylinder.

Sector: Each disk can be divided into multiple sectors

Every time the host reads and writes to the disk, it uses sectors as the unit.

② Hard disk storage

Requires disk drives, disk controllers (IO controllers), platters

image-20230531214658950

It can be noticed that the upper and lower discs can actually be read for the head with up and down, but the top and bottom do not.

image-20230531214921394


5.1.3. Disk performance indicators

①Disk capacity

image-20230531215316390

capacity:

  • Unformatted: From a physical point of view, the upper limit of bits that can be stored.
  • Formatting capacity: Some sectors of the disk will be damaged to prevent the damage of some sectors and cause the disk to fail to work normally. Therefore, disks produced by many manufacturers need to be formatted. For example, some spare sectors can be left for replacement. If A is damaged, sector B can be used to replace it.

Formatted capacity is smaller than unformatted capacity.

②Recording density

image-20230531215411471

  • Track Density: How many tracks are in the specified distance. 60 lanes/cm
  • Bit density: The number of binary code mantissas that can be recorded on a unit track. 600bit/cm
  • The areal density is simply the product of the first two.

image-20230531215354775

The number of bits stored in each circle of sectors on the disk is the same from the outside to the inside, but the density is different, and the density is lower the closer to the inside .

image-20230531215424263

③Average access time

image-20230531215506881

Seek time + rotation time + transmission time (let the head traverse the entire area to complete reading and writing)

  • If the spinning disk time is not given in the title, it is generally calculated as half a circle

The entire access time process diagram:

image-20230531215558842

Disk controller latency: It also takes some time to issue read and write commands to the disk.

④Data transfer rate

image-20230531215746852


5.1.4. How to determine the sector to be read on the disk? Know the disk address

Numbering according to disk address:

image-20230531215907559

  • The drive letter is the corresponding disk in our system
  • Cylinder: The magnetic head selects the specified circle position and moves back and forth
  • Disk number: You can see the head will be up and down in the previous figure, so you need to select a certain head
  • The last is to specify the sector position for interaction.

5.1.5. The working process of the disk

The working process of the disk : addressing, disk reading, and disk writing are all issued through the control word.

image-20230531220314638

The reading process requires the help of a serial-to-parallel conversion circuit:

image-20230531220334777



5.1.6. Understand the disk array RAID0-5 (to improve system performance and reliability of disk storage)

The main purpose of the disk array is to improve system performance and reliability of disk storage.

image-20230531220424287

Redundant array of disks: Logically adjacent data is actually placed in different levels, and parallel access can be performed at this time.

  • It is also the idea of ​​​​segmented storage in different blocks.

RAID0 : Data cannot be recovered if some sectors are damaged. There is no fault tolerance.

image-20230531220515562

image-20230531220809175

Through software processing, the original logical disks can be managed as four physical disks, which can make the entire disk system read and write faster.

The problem that arises:

  • No redundancy: There is no extra space to store a backup copy of data. If a bit goes wrong, it will be permanently lost.
  • No check: If there is an error in one of the consecutive bits, there is a problem, and the check cannot be performed.

RAID1 : mirrored disk array

Solution: A more secure disk array can be used. A copy of data is stored on each physical disk. At this time, there is redundancy but a verification function, and the relative cost is to waste half of the storage space. 1:1

image-20230531221138857

RAID2 : To improve the further utilization of the disk, several logically adjacent bits are scattered and stored in four physical disks, and several disks are added at the same time to save the 3bit Hamming check code corresponding to 4bit to correct a bit error and recover . 4:3

image-202 30531221229889

RAID3-5 : other strategies. The further back, the higher the reliability and the safer.

In order to increase the reliability and parallel access capability, the commercial level often uses this kind of redundant array of disks to improve the performance and reliability of the disk system.

A small summary is as follows :

image-20230531221417125


Summary review

image-20230531221511658


5.2, SSD solid state drive

The probability of multiple-choice questions appearing in 2023 is very high.

image-20230611144847762


5.2.1. Mechanical HDD VS SSD

The data stored in the mechanical hard disk is based on the magnetic material on the small disk surface to record binary 0 and 1; the solid-state hard disk storage medium is based on flash memory technology (the same is true for U disk).

image-20230611153538957

  • Each black block in the solid state drive is a flash memory chip.

5.2.2 Composition of SSD

The logical address is sent through the IO bus, and then mapped to the corresponding physical address through the flash translation layer. The flash translation layer does the work of address translation.

image-20230611153610464

Then dig deep into the internal structure of the flash memory chip. A flash memory chip is composed of several data blocks, and the size of a block is 16KB-512KB.

image-20230611153622378

Each block can be disassembled into pages, each page size is 512B-4KB.

image-20230611153632294

Note: The system reads and writes the SSD in units of pages. Read/write one page at a time.

  • For disks, one read and write corresponds to one block or sector. A page of a solid-state drive is equivalent to a sector of a disk. A block of a solid-state drive is equivalent to a track. Instead, a track contains multiple sectors.

5.2.3. Read/write characteristics

image-20230611153946746

For one of the characteristics : the solid state drive will be erased in block units, and each page in the erased block can be written once and read unlimited times.

If several pages are written in a block, then it is not allowed to modify those pages, unless the entire block is erased before it can be rewritten.

If I only want to rewrite one page, do I need to erase the whole block? In fact, the SSD will write the specified page except that page into other blocks, that is, copy it to another block first, then write a new page into another block, and then erase the original block. Just remove it.

  • Relative to this time, the flash translation layer will re-map a new physical address for the specified logical block number, that is, the new block.

image-20230611154008828

Due to this feature, the SSD reads fast and writes slowly.


5.2.4. Wear Leveling Technology

The difference between mechanical hard disk and solid-state hard disk positioning : solid-state hard disk is quickly positioned through the circuit; mechanical hard disk is rotated by moving the magnetic arm.

  • Solid state drives support random access, and access to any address takes the same time. And if the physical address of the mechanical hard disk is far away from the magnetic arm, it needs to be moved and rotated.

Disadvantages of solid-state drives : If you frequently erase and rewrite on a block, it will cause damage.

image-20230611154044100

According to this shortcoming, there is a solution: wear leveling technology . That is to find a way to evenly distribute the erasure on each block to improve the service life. For data blocks with more reads and fewer writes, the data can be migrated to older blocks. The main reason is that there are more reads and fewer writes, so there is little need for erasing.

  • It will monitor how many times each block is read/written in the background, and perform appropriate migration according to actual needs.

Extended: The Lifespan of SSDs

image-20230611154109269


5. Cache memory

High-frequency test points for big and small questions

5.1. Basic concepts and principles of Cache

5.1.1. Problems in the storage system

Problems in the storage system: After optimization, there is still a large gap between the speed and the CPU, and a cache layer can be added to alleviate it.

image-20230611213931761


5.1.2, the working principle of Cache

The working principle of the cache: In a short period of time, the same piece of code or data will be frequently accessed and used, at this time we can read it into the cache.

  • Currently, the cache is directly integrated into the CPU and implemented using SRAM. Due to the low integration of SRAM, it also leads to limited capacity.

image-20230611214017038


5.1.3, the principle of program locality

Spatial locality (closer to the current use), temporal locality (to be used in the near future, it is likely to be used now)

  • The principle of program locality: use program A as an example.

For example: For example, when accessing a two-dimensional array, the spatial locality of jumping to access interlaced rows is worse, and if the elements in a row are in sequence, the spatial locality is better.

image-20230611214622951


5.1.4, performance analysis (average access time calculation)

image-20230611214737002

Performance analysis : H indicates the ratio of CPU access information in the cache, 1-H indicates the miss rate

average access time

  • Solution 1: First, it will find the data in the Cache, and then it will find the data in the main memory. t = Ht c + (1 - H)(t c +t m )
  • Solution 2: Find the data in the main memory while looking for the Cache, at this time t=Ht c + (1 - H)t m

Actual example :

image-20230611214819240


5.1.5. The principle of spatial locality, how to define "surroundings"

image-20230611214922349

Since the main memory and Cache exchange data in units of "blocks", and for arrays, a[0][1]we can determine which block it is in based on the address information, and we can put all the information of this block into the cache.

  • Both the cache and the main memory are based on blocks, and it is very convenient to exchange data between the cache and the main memory at this time.
  • A block of main memory is also called a page/page/page frame. A block in the cache is also called a line.

image-20230611215055788

At this time, the address of the main memory can be divided into the block number and the address within the block.


Review of Knowledge Points

image-20230611213820979

The following are the problems to be studied and solved in the next chapters:

  • Correspondence between cache and main memory data block: the mapping method of cache and main memory.
  • The cache is small, but the main memory is large. What if the cache is full? Use the replacement algorithm.
  • The cache modifies the data copy in the cache, how to ensure the consistency of the data master copy in the main memory.

5.2, Cache and main memory mapping algorithm (three kinds)

Three mapping methods and understanding Cache tag numbers and effective bits

Three mapping methods : fully associative mapping, direct mapping, group associative mapping

  • Fully associative mapping: The main memory block can be placed anywhere in the Cache.
  • Direct Mapping: Each block of main memory can only be placed in a specific location.
  • Group associative mapping: each main memory block can be placed in any position of a specific group. After confirming the grouping, choose a free location to store.

image-20230612185415085

How to distinguish which main memory block is stored in the Cache ?

  • Add a "mark" to each Cache to record the corresponding main memory block number. If there is no cache block storing main memory data, it is marked with 0.

At this time, the mark number is 0. There is also a problem, that is, the main memory address starts from 0, then there will be a conflict at this time! How to solve it?

  • Set a valid bit: 1 means there is storage, 0 means no storage.

image-20230612185433318


5.2.1, fully associative mapping (play freely)

image-20230612185556824

①Main memory block number and address distribution within the block?

The total address space of the given main memory is 256MB, and the line length is 64B

256MB=2 28 B, the calculation of the main memory block number is 2 28 B / 2 6 B = 2 22 , that is, the main memory block number is 22, and the address in the block is 6.

②How to divide?

When placing the first block in the main memory, it can be placed in any block in the cache. After putting it in, it will record the mark number (main memory address) and the effective bit is 1 (indicating the occupied position)

③ How to access the main memory address?

1. First, the first 22 tags of the main memory data will be used for matching and comparison in the cache. If the comparison is the same, if the valid bit is 1, it means a cache hit.

2. If it cannot be hit, according to the valid bit = 0, it indicates whether it is accessible. If it is 0, go directly to the main memory to find the data.


5.2.2. Direct mapping (can only be placed in a fixed position)

image-20230612185954693

Main memory mapping cache location formula:主存块在cache中的位置 = 主存块号 % Cache总块数

Storage process : If the cache has 0-7 blocks and the main memory has 0-2 22 -1, according to the rules, we will put the two main memory blocks 0 and 7 in turn:

  • First place block 0 of the main memory, 0 % 7 = 0, at this time the cache block is free, and directly put the main memory data into block 0 of the cache.
  • Then place 7 pieces of main memory, 7 % 7 = 0, at this time the cache is already occupied, and the put operation will also be performed at this time, directly overwriting the previous 0 pieces of data in the main memory.

Through the above process, we can clearly find a problem : when a certain block of main memory is allocated, the corresponding cache position is occupied, and at this time there are other positions free, but this main memory block must be a block. If it is written by someone, it will be overwritten directly!

Disadvantage : There are free Cache blocks in other places, but the No. 7 main memory block cannot be used, and the space utilization rate is not sufficient.

Optimization about tag storage content

Lead : For example, the first 22 block numbers of the main memory are 0...01000, and the location of the cache is determined to be 0 by % 7. At this time, the mark bit in the cache is also the first 22 block numbers of the main memory 0...01000, At this point we can optimize this flag bit.

Optimization point : We can see that we are the number of blocks in %cache, so the last 3 digits of the first 22 block numbers are our cache numbers. At this time, we can omit the last 3 digits of the first 22 (that is, the cache block Number of binary digits), can be optimized to 0...01, at this time, the marking of the corresponding main memory location can be completed, and only 19 bits are needed at this time.

image-20230612190051838

Address distribution within the block

Based on the previous case of fully associative mapping.

The main memory block number is divided into : 19-bit mark + 3-bit row number, and the address in the block is 6 blocks.

Direct mapping memory access process :

① Determine the cache line according to the last 3 digits of the main memory block number.

② If the first 19 bits of the main memory block number match the Cache tag and the effective bit = 1, then the Cache hits and the unit of 001110 is accessed.

②If it is a miss or valid bit = 0, access the main memory normally.


5.2.3, group associative mapping (can be placed in a specific group)

The formula for determining the location of the main memory data in the cache :所属分组 = 主存块号 % 分组数

  • Address distribution within a block: 2-way group associative mapping (2 blocks form a group, divided into four groups), n-way means n blocks form a group.

The main memory block number is divided into : 20-bit mark + 2-bit group number, and the address in the block is 6 blocks.

image-20230612190622841

Mark position optimization point : Since the number of groups is 4 bits, it can be represented by two bits. At this time, we can omit the last two bits in the label to indicate the location of the main memory. At this time, only 20 bits of the marker bits need to be stored.

image-20230612190709165

Group associative mapping memory access process :

① Determine the group number to which it belongs according to the last 2 digits of the main memory block number.

②If the first 20 bits match a label in the group and the effective bit is 1, the cache hits at this time.

③ If it is a miss, directly access the main memory.


Review of Knowledge Points

image-20230612185302524


5.3, Cache replacement algorithm (to solve the problem of full Cache)

5.3.1. Discussion on whether to use the replacement algorithm for the three mapping methods

image-20230613114327404

Fully associative mapping: Only after the Cache is full, it needs to be replaced.

  • The implementation is simple, without considering the principle of locality, the hit rate is low, and the actual effect is unstable.

Direct mapping: It can be directly replaced without considering the replacement algorithm.

Group associative mapping: Only when the specified group is full, it needs to be replaced, and it is necessary to choose which block to replace in the group.

Summary: Full associative and group associative maps need replacement algorithms.


5.3.2. Four replacement algorithms

5.3.2.1, random algorithm

image-20230613114436181

Random algorithm : If the cache is full, randomly select a block to replace.


5.3.2.2. First-in-first-out algorithm

image-20230613114524999

First-in-first-out algorithm : If the cache is full, replace the block that was transferred into the cache first.

  • The implementation is simple. At the beginning, 0, 1, 2, and 3 are put into the cache, and then 0, 1, 2, and 3 are replaced in turn. This algorithm does not take into account the principle of locality, and the first called cache block may be frequently accessed.
  • Jitter phenomenon: Frequent swapping in and swapping out.

5.3.2.3. Least Recently Used Algorithm (LRU)

image-20230613114639421

Least Recently Used Algorithm (LRU) : Each cache block sets a counter to record how long the cache has not been accessed. If the cache is full, replace the largest of the counters.

  • When there is no access to a cache block, look forward from the cache block to see which ones have been accessed, and replace it with the latest accessed one.

Process :

  1. When hit, the counter of the row hit is cleared, the counter lower than it is incremented by 1, and the rest remain unchanged.
  2. When there is a miss and there is an idle row, the counter of the newly loaded row is set to 0, the counters of the remaining non-idle rows are set to 0, and the rest of the non-idle rows are all +1.
  3. When there is no hit and there is no idle row, the information block of the row with the largest count value is eliminated, the counter of the newly loaded block is set to 0, and the rest are all incremented by 1. ,

Counter optimization point : the total number of cache blocks = 2 n , then the calculator only needs n bits, and the values ​​of all counters must not be repeated after the Cache is full.

image-20230613114656354

Effect : The LRU algorithm is based on the principle of locality, which is reasonable and has excellent practical application effect.

Jitter occurs : If the number of frequently accessed main memory blocks > the number of Cache lines, jitter will occur.


5.3.2.4. Recently Infrequently Used Algorithm (LFU)

image-20230613114724873

LFU algorithm : Each cache block also has a counter, which is used to record how many times each cache block has been accessed. When the cache is full, replace the one with the smallest "counter".

Counting rule : The counter of the newly transferred block is 0, and the counter is increased by 1 each time it is accessed. When it needs to be replaced, select the row with the smallest counter.

  • If there are multiple lines with the smallest counter, they can be selected according to the line number increment or FIFO strategy: for example, the top line number (the smallest line number) is selected first. or follow the first-in-first-out rule

Problem : The main memory blocks that have been frequently accessed may not be used in the future (for example: blocks related to WeChat video chat), and do not follow the principle of locality, and the actual application effect is not as good as LRU.

  • Example: If a piece of code is frequently used for a period of time, the counter corresponding to this code accumulates to a very high value, but it is rarely used later. Because the counter is very high, it will be stored in the cache for a long time and will not be replaced. .

Summary of knowledge points

image-20230613114800309


5.4. Cache write strategy (modify the data copy in the cache to ensure the consistency of the data master copy in the main memory)

image-20230613150944669

Different processing is performed according to whether the cache hits or not.

Write hit : full write method, write back method.

Write miss : write allocation method, non-write allocation method.

Why not discuss the case of read hits and read misses?

  • The read operation will not cause data inconsistency between Cache and main memory.

5.4.1, write hit

5.4.1.1, write back method

image-20230613151513572

Write-back method : When the CPU hits the Cache, it only modifies the content in the Cache and does not immediately write it into the main memory, and writes it back to the main memory only when the block is swapped out.

Note : For whether the specified cache block has been modified, we can use a dirty bit to indicate its status, 1 means modified, 0 means not modified.

Advantages and disadvantages : It can reduce the number of memory accesses, but there is a hidden danger of data inconsistency.


5.4.1.2, full writing method (or writing straight through method)

image-20230613151545922

Full write method (or write-through method) : When the CPU writes a hit to the Cache, the data must be written to the Cache and the main memory at the same time, and the write cache is generally used.

Pros and cons : The number of memory accesses increases and the speed becomes slower, but it can better ensure data consistency.

Write buffer : use the FIFO queue (first in first out) implemented by SRAM, and the write buffer is faster.

Detailed process : If the CPU hits #2 in the cache, it will first write to the designated block #2 in the cache, and then write data A to the write buffer. At this time, it hits #1 in the cache again, and it will also write first Enter #1 in the cache, and then write B to the write buffer. At this time, the order in the queue is AB, and the CPU will go to do other things after each two-step writing. At this time, the data in the write buffer is written back one by one under the control of a special control circuit.

  • Since the write buffer is implemented by SRAM, it is much faster for the CPU to write directly to SRAM than to main memory.

Pros and cons : After using the write buffer, the CPU writes very quickly. If the write operation is infrequent, the effect is good; if the write operation is frequent, it may be blocked due to the saturation of the write buffer.


5.4.2. Write miss

5.4.2.1, write allocation method

image-20230613152122021

Write allocation method : When the CPU misses writing to the Cache, the block in the main memory is transferred into the cache and modified in the cache. Usually used with the write-back method.


5.4.2.2, non-write allocation method

image-20230613152155153

Non-write allocation method : When the CPU misses writing to the Cache, it only writes to the main memory, and does not transfer to the Cache, with the full write method.

Note : Only when the read operation misses the cache, will the address of the main memory be transferred into the cache.


5.4.3, multi-level cache

image-20230613152344069

The closest to the CPU is L1, which contains the Write Buffer.

Example : Currently there are two levels of Cache, L1 and L2. L2 stores a small part of the data in the main memory, while L1 stores a small part of the data in L2.

Data consistency also needs to be maintained in caches at all levels, and the "full write method" + non-write allocation method is required to maintain between all levels.


6. Virtual memory

6.1. Page memory

6.1.1. Understanding page storage (leading out logical address, physical address, page table)

The transfer between cache and main memory is in units of blocks.

image-20230614164649230

Paging storage : A program (process) is logically divided into several "pages" of equal size. The size of the page is actually the same as the size of our main memory block. We can divide a 4KB program into 4 pages, you can discretely put 4 pages into different locations in the main memory.

  • The code data of the entire program will not be directly put into the memory continuously.

image-20230614164713410

  • The Page Number indicates which virtual page to access, and the Page Offset indicates which byte within the page to access. (In detail: the page address in the page table stores the offset inside the virtual page, and the CPU will add the offset to the starting address of the physical page in the page table entry to calculate the physical address .)

If a program is split into multiple blocks and stored discretely in the main memory, how to execute it ? Introduce the concept of logical address and physical address.

Logical address : From the perspective of the programmer, the logical address is divided into logical page number + page address, and the logical page number refers to the page number of the corresponding program page. At this time, we can map the corresponding logical page number to the main memory block number in the main memory (logical address mapping physical address), and finally splicing the physical address + page address.

  • For this mapping relationship, we store it in the page table, which contains the logical page number and the main memory block number.

image-20230614164822340

Page table : Data is stored in the main memory, and the CPU needs to query the page table when performing address translation, which means that the CPU needs to perform a memory access operation. A row in the page table is represented as a page table entry, and a page table entry corresponds to a logical page number and a main memory block number.


6.1.2. Address conversion process (slow table, fast table)

Address translation process based on slow table

image-20230614164950407

Address translation process : the page table base address register stores the address of the page table in main memory

Query the page table (slow table) process in main memory :

① First give the opcode and execution address (000001 001000000011), split to get the logical address (001000000011), logical address = logical page number + page address (001000000011 = 00 + 1000000011).

②Query the page table in the main memory, match the logical page number to get the physical address of the main memory 00-> 000000000010, and splice the physical address with the address in the page = 000000000010 1000000011 to get the physical address. [Query page table once fetched]

③ After obtaining the physical address, we will first go to the cache to query.

Address translation process based on fast table

image-20230614165009099

In the address conversion process written above, it can be found that when the logical page number 00 is queried in the page table, this piece of data will be found in the main memory, so if the logical address is also 00 when accessed later, then it is not very efficient at this time , how to solve it?

Based on this problem, let's introduce the fast table (TLB), and the page table we stored in the main memory before that can be called the slow table here. How is the whole process after the introduction of the fast table?

Fast table : Stored in the cache memory, it also includes two parts: tag and main memory block number. At the beginning, the fast table is empty

Query fast table process :

① Query the logical page number specified by the fast table in the cache, and obtain it directly if there is one. 【Access cache 1 time】

② If not, then execute the process of querying the page table (slow table) in the main memory at this time. 【Access 1 time】

③ After the query obtains the logical page number + main memory block number, it can be stored in the fast table, which can be used again later.


6.1.3. The difference between fast table and cache, fast table and slow table

Note the difference between fast table and cache:

  1. The fast table stores copies of page table entries, and the cache stores copies of main memory blocks.
  2. The fast table plays a role in accelerating the address conversion, and the cache plays a role in accelerating the final obtained address (physical address) and then accessing this address.

The difference between fast watch and slow watch:

  1. The fast table uses SRAM; the slow table uses DRAM.
  2. In terms of circuit, the fast table adopts "associative memory", which can be addressed according to the content.

The role of the fast table: to speed up the speed of address conversion and reduce a memory access.


knowledge review

image-20230614165050609


6.2, virtual memory

image-20230614173445392

Virtual storage system: The user feels that the capacity used is larger than the actual physical capacity used.

For example: WeChat has 1GB of data, so only a part is actually loaded into the memory. How do we define such a part of the data?


6.2.1. Paging virtual memory (including hierarchical structure of memory)

image-20230614173523066

  • Main storage block number, fast storage block number.
  • Valid bit: Indicates whether the corresponding data has been transferred into the main memory, if it has been transferred, it is 1, and the physical page or disk address points to the main memory.
  • Access bit: It is mainly used for the page replacement algorithm. It is mainly a bit that needs to be replaced after the corresponding page table is filled. The actual number of visits to each page can be recorded, and the corresponding LFU can be realized through the actual number of visits (the page replacement algorithm mainly solves the replacement between main memory and auxiliary memory)
  • Dirty bit: If we modify the content in the main memory block number, we need to write it back to the main memory later, so we need to set a dirty bit.

image-20230614173740724

memory hierarchy

Main memory-auxiliary storage: mainly completed by the operating system.

Cache-main memory: mainly done automatically by hardware.

image-20230614173838345


6.2.2. Segmented virtual memory

Comparison of page and segment memory :

image-20230614173901903

Segment memory : split according to functional modules, each segment has a different size.

Virtual address structure : segment number + segment address

Segment table : Increase the segment length on the original basis, including the segment initial address, loading bit, and segment length.

  • Base Address: Refers to the starting address of the logical segment in physical memory. The first address of the segment and the segment number (Segment Number) together determine the actual physical address of the logical address.
  • Load bit ([Virtual] Memory Bound): It refers to a continuous virtual address between the logical segment and the logical upper limit, and indicates the size of the logical segment in memory, that is, the size of the virtual address space it occupies. It is calculated as the logical first address plus the segment length minus one (also known as the segment limit in some systems).
  • Segment length (Length): The length of a logical segment in physical memory, that is, the address of the last unit in the logical segment minus the start address of the logical segment. The segment length and load address together determine the virtual address range of the logical segment.

6.2.3. Segment page virtual memory

Segmented page virtual memory : segment a program first, and then page. That is, first segment according to the function module, and then page each segment with different capacity (the capacity of each page is the same).

image-20230614174153840

Segmented page virtual storage divides the program memory space into multiple logical segments, and each logical segment is further divided into multiple fixed-size pages. When a program accesses memory, it first generates a virtual address, which is divided into two parts: segment number and page number. Each part expresses a different meaning.

Specifically, virtual addresses are broken down into the following two parts:

  1. Segment Number: Indicates which logical segment is being accessed, interpreted by the segment number in the segment table, and determines which logical segment is to be accessed.
  2. Page Number and Offset (Page Number and Offset): Indicate which page in the selected logical segment to access and which unit (byte) in the page to access, where the page number is associated with the page table and points to The required page table entry, while the page offset determines the address offset where the specific unit (byte) to be accessed is located.

The operating system then translates the logical address into a physical address. In a segmented and paged virtual storage management system, the segment number and page number in the logical address will be used to retrieve the corresponding physical address from the segment table and page table. Ultimately, physical addresses are used to access data that is actually located in physical memory.


Organizer: Long Road Time: 2023.5.28-6.14

Guess you like

Origin blog.csdn.net/cl939974883/article/details/131242207