Memory Series Learning (8): The Final Chapter - The End Is the Beginning

foreword

Modern computers are based on the Von Neumann architecture, whether embedded systems, PCs or servers. The main features of this architecture are: CPU (Central Processing Unit, central processing unit, or Processor for short) and memory (Memory) are the two main components of a computer. The fetch instruction (Fetch) is executed, some of which allow the CPU to do calculations, and some of which allow the CPU to read and write data in the memory. This chapter briefly introduces the CPU, memory and devices that make up a computer and the relationship between them, laying the foundation for the study of subsequent chapters.

memory and address

We have all seen many mailboxes hanging on the wall as shown in Figure 16.1. Each mailbox has a room number. According to the room number, find the corresponding mailbox to put in or take out letters. The memory is similar. Each memory unit has an address (Address). The memory address is an integer numbered from 0. The CPU finds the corresponding memory unit through the address, fetches the instructions in it or reads and writes the data in it. Different from the mailbox, the memory unit corresponding to an address cannot store many things, only one byte can be stored. The multi-byte data types such as int and float mentioned before need to occupy multiple consecutive addresses in the memory. , in which case the address of the data is the starting address of the memory unit it occupies.
insert image description here

CPU

The CPU is always doing the same thing over and over: fetching an instruction from memory, interpreting it, then fetching the next instruction, interpreting it again. The core functional units of the CPU include:

  • ● Registers. It is a high-speed memory inside the CPU. It can access data like memory, but it is much faster than accessing memory. In the following chapters, we will introduce x86 registers eax, esp, eip, etc. in detail. Some registers can only be used for a specific purpose, such as eip used as a program counter, which is called a special-purpose register (Special-purpose Register), and Other registers can be used in various operations and instructions for reading and writing memory, such as the eax register, which is called a general-purpose register (General-purpose Register).

  • ● Program Counter (PC). It is a special register that stores the address of the next instruction that the CPU fetches. The CPU fetches the instruction from the memory according to the address saved by the program counter and then interprets and executes it. At this time, the address saved by the program counter will automatically add the length of the instruction and point to the memory. the next instruction in .

  • ● Instruction Decoder. The instruction fetched by the CPU is composed of several bytes. Some bits in these bytes represent the memory address, some bits represent the register number, and some bits represent the operation of this instruction, whether it is addition, subtraction, multiplication, division or reading and writing memory, instruction decoding The processor is responsible for interpreting the meaning of this instruction, and then mobilizing the corresponding execution unit to execute it.

  • ● Arithmetic and Logic Unit (ALU). If the decoder interprets an instruction as an operation instruction, it mobilizes the arithmetic logic unit to perform operations, such as addition, subtraction, multiplication, division, bit operations, and logic operations. The instruction will indicate where to save the operation result, it may be saved in a register, or it may be saved in memory. ● Address and data bus (Bus). The CPU and the memory are connected by an address bus, a data bus, and a control line, and each line has two states of 1 and 0. If you need to access memory during the execution of instructions, such as reading a number from memory to a register, the execution process can be imagined as this (as shown in Figure 16.2).
    insert image description here

  • 1. The CPU internally connects the register to the data bus, so that each bit of the register is connected to a data line, waiting to receive data.

  • 2. The CPU sends a read request through the control line, and sends the memory address to the memory through the address line.

  • 3. After the memory receives the address and read request, connect the corresponding memory unit to the other end of the data bus, so that the 1 or 0 state of each bit of the memory unit reaches the corresponding bit in the CPU register through a data line, and the process is completed. data transmission.

  • 4. The process of writing data into memory is similar, except that the direction of transmission on the data line is reversed.

In Figure 16.2, 32 address lines and 32 data lines are drawn. The CPU register is also 32 bits. It can be said that this architecture is 32 bits. For example, x86 is such an architecture. The current mainstream processors are 32 bits or 64 bits. bits. The number of bits of the address line, data line, and CPU register is usually the same. From Figure 16.2, it can be seen that the number of bits of the data line and the CPU register should be the same. In addition, some registers (such as the program counter) need to save a memory address, so the address line It should also be consistent with the number of bits in the CPU register. The number of bits of the processor is also called the word length. The concept of word (Word) is confusing. In some contexts, it refers to 16 bits, and in some contexts it refers to 32 bits (in this case, 16 bits are called Half Word Half Word) , which in some contexts refers to the word size of the processor, if the processor is 32 bits then a word is 32 bits, if the processor is 64 bits then a word is 64 bits. A 32-bit computer has 32 address lines, and the address space (Address Space) is 0x00000000~0xffffffff, a total of 4GB, while a 64-bit computer has a larger address space.

Finally, it should be noted that the address line and data line mentioned in this section refer to the internal bus of the CPU, which is directly connected to the execution unit of the CPU. The internal bus is converted to the chip pin after being converted by the MMU and the bus interface. The number of bits of the external bus, external address lines and external data lines may be different from that of the internal bus. For example, the addressable space of the external address bus of a 32-bit processor can be greater than 4GB, which will be explained in detail in Section 16.4.

Let's take a look at the process of CPU instruction fetch execution in combination with Table 1.1, as shown in Figure 16.3.

insert image description here

  • 1. The eip register points to the address 0x80483a2, and the CPU starts to fetch a 5-byte instruction from here, and then the eip register points to the starting address 0x80483a7 of the next instruction.
  • 2. The CPU decodes these 5 bytes and learns that this instruction requires 4 bytes starting from address 0x804a01c to be saved to the eax register.
  • 3. Execute the instruction, read the memory, get the number 3, and save it to the eax register. Note that the four bytes stored in address 0x804a01c~0x804a01f cannot be regarded as 0x03000000 in the order of addresses from low to high, but should be regarded as 0x00000003 in the order of addresses from high to low. That is to say, for multi-byte integer types, the low address stores the low bits of the integer, which is called Little Endian byte order (Byte Order). The x86 platform is in little-endian byte order, while other platforms stipulate that low addresses store the high bits of integers, which is called Big Endian (Big Endian) byte order. Note that the above picture only draws the first three steps, and readers can draw pictures to understand the remaining steps.
  • 4. The CPU fetches a 3-byte instruction from the address pointed to by the eip register, and then the eip register points to the starting address 0x80483aa of the next instruction.
  • 5. The CPU decodes these 3 bytes and learns that this instruction requires adding 1 to the value of the eax register, and the result is still saved in the eax register.
  • 6. Execute the instruction, and now the number in the eax register is 4.
  • 7. The CPU fetches a 5-byte instruction from the address pointed to by the eip register, and then the eip register points to the starting address 0x80483af of the next instruction.
  • 8. The CPU decodes these 5 bytes and learns that this instruction requires saving the value of the eax register to 4 bytes starting from address 0x804a018.
  • 9. Execute the instruction and save the value of 4 to 4 bytes starting from address 0x804a018 (saved in little-endian byte order).

equipment

In addition to accessing the memory, the CPU executes instructions to access many devices (Device), such as keyboards, mice, hard disks, monitors, etc., so how are they connected to the CPU? As shown in Figure 16.4.
insert image description here

Some devices are connected to the address bus and data bus of the processor like a memory chip. It is called a "bus" because multiple devices and memory chips can be connected to the address line and data line, but different devices and memory chips should occupy different positions. address range. Accessing this kind of device is like accessing memory, you can read and write according to the address, but unlike accessing memory, writing data to an address is just sending a command to the device, the data does not have to be saved, and reading data from an address is also It is not necessarily to read the data previously saved at this address, but to get the current state of the device. The units that can be read and written in the device are usually called device registers (note that they are not the same as CPU registers). The process of operating the device is the process of reading and writing these device registers. The data will be sent out, and the data received by the serial port device can be read by reading the value of the serial port receiving register.

There are also some devices integrated in the processor chip. In Figure 16.4, one end of the address and data bus drawn from the CPU core is led out to the chip pin through the bus interface, and the other end is not led out, but connected to the device integrated inside the chip, whether it is connected outside the CPU The devices on the bus or the devices connected to the bus inside the CPU have their own address ranges, which can be accessed like accessing memory. Many architectures (such as ARM) use this method to operate devices, which is called memory-mapped I/O (Memory- mapped I/O).

But x86 is special. x86 has an independent port address space for devices. The CPU core needs to lead out additional address lines to connect on-chip devices (different from the address lines used to access memory), and use special in/out instructions when accessing device registers. , instead of using the same instructions as accessing memory, this method is called Port I/O (Port I/O).

From the perspective of the CPU, there are only two types of access devices: memory-mapped I/O and port I/O, which are either accessed like memory or accessed with a dedicated instruction. In fact, the access equipment is quite complicated. There are various types of computer equipment, and the performance requirements of various equipment are different. Some require large bandwidth, some require fast response, and some require hot swapping. Device buses such as PCI, AGP, USB, 1394, SATA, etc. These device buses are not directly connected to the CPU. The CPU accesses the corresponding bus controller through memory-mapped I/O or port I/O, and then through the bus controller. to access devices on the bus. So the box labeled "Device" in the diagram above could be the actual device, or it could be the controller of the device bus.

On the x86 platform, the hard disk is a device connected to the IDE, SATA or SCSI bus. The program stored on the hard disk cannot be directly fetched and executed by the CPU. The operating system will copy it from the hard disk to the memory when executing the program. In this way, the CPU can fetch instructions and execute them. This process is called loading. After the program is loaded into the memory, it becomes a task scheduled and executed by the operating system, which is called a process. There is not a one-to-one correspondence between processes and programs.

A program can be loaded into the memory multiple times and become multiple processes running at the same time. For example, multiple terminal windows can be opened at the same time, and each window runs a Shell process, and their corresponding programs are /bin/bash files on the disk. .

The operating system (Operating System) itself is also a program stored on the disk. When the computer is started, it executes a fixed startup code (called Bootloader). Other programs needed are loaded into memory. The difference between the operating system and other user programs is that the operating system is resident in memory, while other user programs are not necessarily. Whichever program the user needs to run, the operating system will load it into the memory. The system terminates it and releases the memory it occupies.

The core function of the operating system is to manage process scheduling, manage memory allocation and use, and manage various devices. The program that does this work is called the kernel (Kernel). The kernel program on my system is /boot/vmlinuz-2.6.28 -13-generic file, which is loaded into memory when the computer starts and is resident in memory. The concept of an operating system in a broad sense also includes some essential user programs. For example, Shell is indispensable for every Linux system, while Office suites are dispensable, so the former also belongs to the category of operating systems in a broad sense. , while the latter belongs to application software.

Accessing devices is also a little different from accessing memory. The memory only saves data and does not generate new data. If the CPU does not read it, it does not need to actively provide data to the CPU, so the memory is always passively waiting to be read or written. The device often generates data by itself, and needs to actively notify the CPU to read the data. For example, typing a keyboard to generate an input character, the user hopes that the computer will respond to his input immediately, which requires the keyboard device to actively notify the CPU to read this character and do corresponding actions. Process and respond to the user. This is realized by the interrupt (Interrupt) mechanism, each device has an interrupt line, connected to the CPU through the interrupt controller, when the device needs to actively notify the CPU, it triggers an interrupt signal, and the instruction being executed by the CPU will be interrupted Interrupt, the program counter will point to a fixed address (this address is defined by the architecture), so the CPU starts to fetch instructions from this address (or jump to this address), executes the Interrupt Service Routine (Interrupt Service Routine, ISR), After completing the interrupt processing, return to the previously interrupted place to execute subsequent instructions.

For example, if a certain architecture stipulates to jump to the address 0x00000010 for execution when an interrupt occurs, then an ISR program must be loaded to this address in advance. The ISR program is a part of the kernel code. In this code, first determine which device caused the interrupt. , and then call the device's interrupt handler for further processing.

Since the operation methods of various devices are different, each device requires a dedicated device driver (Device Driver). An operating system needs a large number of device drivers in order to support a wide range of devices. In fact, in the Linux kernel source code The vast majority are device drivers. A device driver is usually a set of functions in the kernel, which implements operations such as initialization, reading, and writing of the device by reading and writing device registers, and some devices also provide an interrupt processing function for the ISR to call.

MMU

Modern operating systems generally adopt a virtual memory management (Virtual Memory Management) mechanism, which requires the support of the MMU (Memory Management Unit, memory management unit) in the processor. This section briefly introduces the role of the MMU.

First introduce two concepts, virtual address and physical address. If the processor does not have an MMU, or has an MMU but is not enabled, the memory address issued by the CPU execution unit will be directly transmitted to the chip pin and received by the memory chip (hereinafter referred to as physical memory to distinguish it from virtual memory), which is called Physical address (Physical Address, hereinafter referred to as PA), as shown in Figure 16.5.
insert image description here
If the processor has MMU enabled, the memory address issued by the CPU execution unit will be intercepted by the MMU. The address from the CPU to the MMU is called a virtual address (Virtual Address, hereinafter referred to as VA), and the MMU translates this address into another address and sends it to On the external address pin of the CPU chip, that is, VA is mapped to PA, as shown in Figure 16.6.

insert image description here
If it is a 32-bit processor, the internal address bus is 32 bits and connected to the CPU execution unit (only 4 address lines are schematically drawn in the figure), but the external address bus after MMU conversion is not necessarily 32 bits. That is to say, the virtual address space and the physical address space are independent, the virtual address space of a 32-bit processor is 4GB, and the physical address space can be larger or smaller than 4GB. The MMU maps VA to PA in units of pages (Page), and the page size of a 32-bit processor is usually 4KB.

For example, the MMU can map a page 0xb7001000~0xb7001fff of VA to a page 0x2000~0x2fff of PA through a mapping item. If the CPU execution unit wants to access the virtual address 0xb7001008, the actual physical address accessed is 0x2008. The pages in physical memory are called physical pages or page frames (Page Frame).

Which page of virtual memory is mapped to which page frame of physical memory is described by the Page Table, which is stored in physical memory, and the MMU will look up the page table to determine what PA a VA should be mapped to.

The operating system and MMU cooperate like this:

  • 1. When the operating system initializes or allocates and releases memory, it will execute some instructions to fill in the page table in the physical memory, and then use the instruction to set the MMU to tell the MMU where the page table is in the physical memory.
  • 2. After setting, every time the CPU executes an instruction to access the memory, it will automatically trigger the MMU to do table lookup and address translation operations. The address translation operation is automatically completed by the hardware, and there is no need to use instructions to control the MMU to do it.

The variables and functions we use in the program have their own addresses. After the program is compiled, these addresses become the addresses in the instructions. The addresses in the instructions are interpreted and executed by the CPU and become the memory addresses issued by the CPU execution unit. Therefore, when the MMU is enabled, the addresses used in the program are all virtual addresses, which will cause the MMU to perform table lookup and address conversion operations. So why design such a complicated memory management mechanism? What are the benefits of adding an extra layer of VA to PA conversion? All problems in computer science can be solved by another level of indirection. Remember that quote? The extra layer of indirection must be to solve some problems. After the necessary preparatory knowledge is finished, the role of the virtual memory management mechanism will be discussed in the first section.

In addition to address translation, the MMU also provides memory protection mechanisms. Various architectures are divided into User Mode and Privileged Mode. As shown in Figure 16.7, the operating system can set the access rights of each memory page in the page table, and some pages are not allowed to be accessed. Some pages are only allowed to be accessed when the CPU is in the privileged mode, some pages can be accessed in both the user mode and the privileged mode, and the access rights are divided into three types: readable, writable, and executable. After this setting, when the CPU wants to access a VA, the MMU will check whether the CPU is currently in user mode or privileged mode, whether the purpose of accessing the memory is to read data, write data or fetch instructions, and if it matches the page permissions set by the operating system , the access is allowed and converted to PA, otherwise the access is not allowed and an exception is generated.

The exception handling process is similar to the interrupt. The difference is that the interrupt is generated by the external device and the exception is generated by the CPU. The cause of the interrupt has nothing to do with the instruction currently executed by the CPU, and the exception is caused by a problem with the instruction currently executed by the CPU. For example, the instruction to access the memory is detected by the MMU as a permission error, and the divisor of the division instruction is 0, etc., will cause exceptions.

insert image description here

Usually the operating system divides the virtual address space into user space and kernel space. For example, the virtual address space of the Linux system on the x86 platform is 0x00000000~0xffffffff, the first 3GB (0x00000000~0xbfffffff) is the user space, and the last 1GB (0xc0000000~0xffffffff) is the kernel space . The user program is loaded into the user space and executed in user mode. It cannot access the data in the kernel, nor can it jump to the kernel code for execution.

This can protect the kernel. If a process accesses an illegal address, at most this process will crash without affecting the stability of the kernel and the entire system. When the CPU generates an interrupt or exception, it will not only jump to the interrupt or exception service program, but also automatically switch modes, from user mode to privileged mode, so the interrupt or exception service program can jump to the kernel code for execution. In fact, the entire kernel is composed of various interrupt and exception handlers.

To sum up: under normal circumstances, the processor executes the user program in user mode, and in the event of an interrupt or exception, the processor switches to the privileged mode to execute the kernel program, and returns to the user mode to continue executing the user program after processing the interrupt or exception.

Segmentation fault we have encountered many times, it is generated like this:

  • 1. A VA to be accessed by the user program has no access after checking by the MMU.
  • 2. The MMU generates an exception, the CPU switches from user mode to privileged mode, and jumps to the kernel code to execute the exception service program.
  • 3. The kernel interprets this exception as a segment fault and terminates the process that caused the exception.

Memory Hierarchy

Hard disks, memory, CPU registers, and Cache in this section are all memories. Why do computers have so many kinds of memories? What are the characteristics of each of these memories? This is the question to be discussed in this section. Due to the limitations of hardware technology, we can manufacture a memory with a small capacity but a fast access speed, or a memory with a large capacity but a slow access speed, but it is impossible to take advantage of both sides, and it is impossible to manufacture High-capacity memory with fast access speed.

Therefore, modern computers divide the memory into several levels, called Memory Hierarchy. According to the order from near to far from the CPU, they are CPU registers, Cache, memory, and hard disk. The closer to the CPU, the smaller the memory capacity but the faster the access speed. The figure shows the typical values ​​of the capacity and access speed of various memories, and the specific description is shown in the table.
insert image description here
insert image description here
insert image description here

The table is summarized as follows.
● Data in registers, Cache, and memory are lost when power is turned off. This is called volatile memory (Volatile Memory). In contrast, hard disk is a kind of non-volatile memory (Non-volatile Memory).
● Except for accessing registers, which are directly controlled by program instructions, accessing other memories is not directly controlled by instructions, some are completed automatically by hardware, and some are completed by the operating system in cooperation with hardware.

● The Cache will prefetch a Cache Line when fetching data from the internal memory, and the operating system will prefetch several pages and cache it when reading data from the hard disk, hoping that the data will be accessed by the program in the future. The behavior of most programs is characterized by locality: they spend a lot of time repeatedly executing a small piece of code (such as a loop), or repeatedly accessing data in a small address range (such as accessing an array). So the method of pre-reading + caching is very effective: the CPU fetches an instruction, and I cache all the instructions adjacent to it, and the CPU is likely to fetch it immediately; the CPU accesses a piece of data, and I cache the instructions adjacent to it. The data is also cached, and the CPU is likely to access it immediately.

Imagine that there are two computers, one with 256KB of Cache and the other without Cache, the memory of both computers is 512MB, and the hard disk is 100GB, although the extra 256KB of Cache is insignificant compared with the capacity of memory and hard disk , but accessing the Cache is several orders of magnitude faster than accessing the memory and hard disk. Due to the principle of locality, the CPU spends most of its time dealing with the Cache. A computer with a Cache is obviously much faster. The capacity of high-speed memory can only be made small, but it can significantly improve the performance of the computer. This is the meaning of Memory Hierarchy.

summary

Really really memory! ! !

Guess you like

Origin blog.csdn.net/weixin_45264425/article/details/132315782