Principles of Computers and Operating Systems

First, the concept of operating system

Definition: In essence, the operating system abstracts the underlying hardware into a virtual machine, so the computer itself is a virtual machine. The computer itself doesn't do anything, it's just a bunch of iron lumps, even if you power it up, it doesn't do anything, the cpu only does things under the direction of the program. Therefore, the startup of the operating system is a bootstrapping process. The code in a ROM chip on the motherboard will be automatically mapped to the low address space of the memory at the moment of power-on. This ROM chip stores the BIOS.  

2. The five core components

In the von Neumann system, the computer has five components, namely arithmetic unit, controller, register, input device and output device. The core of the CPU is the arithmetic unit, the controller and the register.

Operator : responsible for arithmetic, logical operations, etc.,

Controller : Control instructions, including data access procedures. A program is made up of instructions + data

Register : temporarily store the retrieved data here, and at the same time play the function of storing intermediate data calculation results.

Therefore, the arithmetic unit continuously reads data from the register under the control of the controller, and the memory in the computer is the fastest, which is called the register (temporary storage). The reason why it is called a register is because the data refresh frequency inside can be synchronized with the CPU refresh frequency very quickly. The next is the first level cache, the second level cache, the third level cache, and then the outside is the memory, from the inside to the outside. The cost is getting lower and lower, the access speed is getting slower and slower, and the capacity is getting bigger and bigger. The first-level cache is further divided into the first-level instruction cache and the first-level data cache; the second-level cache has no such distinction. Multi-core CPU, each core CPU has its own L1 and L2 cache, and the L3 cache is shared.

Although registers can store data, but the space is too small, it is not the core component of the storage device, so it must deal with the memory device

Memory (RAM) : It consists of multiple storage units, one byte is one storage unit, or one cell is one storage unit. Each cell has its own storage address, programmed in hexadecimal.

If the CPU wants to access data, it needs to know the storage address of the data in the memory, and it must have the function of addressing data.

Northbridge chip (NorthBridge) : used to process high-speed signals. Usually handles the communication between CPU (processor), RAM (memory), AGP port or PCI Express and Southbridge chip. That is, the relationship between the CPU circuit unit and the storage circuit of the RAM is established.

32-bit CPU : It is equivalent to that the CPU has 32 address lines connected to the memory, and each address line can transmit two-bit signals of 0 and 1, that is, the information that can be processed is 2^32bits=512Mbyte, and the total of 32 lines will determine There are 2^32 power positions, each of which is 1Byte, which is the basic unit of memory, so the maximum memory that can be supported is 2^32Byte=4GB. Similarly, 64-bit CPU, 4G*4G is equivalent to more than 4 billion 4G.

 

The CPU completes the addressing line, the data reading line, and the control command line. If each line is equipped with 32 address lines, it becomes quite complicated. Therefore, the CPU multiplexes these lines ( line multiplexing ), and controls the bits to distinguish which are data read, which are addressed, etc.

PAE (Physical Address Extension) : Physical Address Extension. Adding 4 bits to the 32-bit (bit) addressing bus is equivalent to 2^36Byte=64G, but requires the operating system kernel to support the ability to address 64G. In a 32-bit operating system, regardless of whether the kernel supports PAE, the address space that can be used by a single process cannot exceed 3G, and the remaining 1G is mapped to the kernel. For example, MySQL runs in a single process with multiple threads, and the maximum memory that can be used on a 32-bit operating system is only 2.7G, so it is best to use a 64-bit operating system to install 64-bit MySQL.

A question is involved here, why does caching improve speed? It is because of the principle of locality of programs, which is what we often call the rule of twenty-eight. The most frequently executed code in a program is often only 20%, 80% of the code is rarely used, and this 20% of the code completes 80% of the functions of the entire program. We can cache the 20% of the code in the cpu first-level cache or second-level cache, because the speed of the cache in the cpu is closest to the clock frequency of the cpu, so that the speed of the program can be improved,

The general idea of ​​caching algorithm design is: the least recently used principle. Remove the least recently used data from the cache. After all, the cache space is limited. If the price of the cache and the memory are the same, there is no need to design the cache. The principle of program locality is divided into spatial locality and temporal locality: spatial locality means that when a piece of code is accessed, the surrounding code is also very likely to be accessed. Temporal locality means that if a piece of code is accessed at a certain time, it is very likely that this code will be accessed again after a while.

N-way associative: The first-level cache space of a PC is usually 64KB, and the memory is much larger than the first-level cache. However, the data read by the CPU must come from the first-level cache. If the first-level cache is not available, it will go to the second-level cache to find it. If it is found, it will replace the one in the first-level cache, and vice versa. Due to the huge difference between the first-level cache and the memory, the probability of the CPU hitting the required data is extremely low. Each storage unit in the RAM can be directly cached in each location in the L1 cache, which is called direct mapping , so in order to improve the hit rate, N-way associative technology is introduced. In principle, the RAM is divided into several pieces, and the first-level cache is also divided into corresponding pieces, as shown in the figure below, the so-called 1-way association (1 way), 00, 08, 16, 24 in the memory can only be cached in set 0 On the unit, 01, 09, 17, and 25 can only be cached on the set1 unit. If 00 is already on the set0 unit, at this time, if 08 wants to be cached on the set0, the 00 must be replaced.

 

2-way association

As shown in the figure below, 00 and 08 can be cached on the set0 unit at the same time, while 16 and 24 need to replace the first two when they want to cache

 

4-way association

As shown in the figure below, 00, 08, 16, and 24 can be cached on the set0 unit at the same time, while 01, 09, 17, and 25 need to replace the first four when they want to be cached.

 

 

fully associated

 

 

Write-through strategy : When the data is updated, the cache and the back-end storage are written at the same time. The advantage of this mode is that the operation is simple; the disadvantage is that the data writing speed is slower because the data modification needs to be written to the storage at the same time.

Write-back strategy (Write-back): only write to the cache when the data is updated. The modified cached data is written to the backend storage only when the data is replaced out of the cache. The advantage of this mode is that the data is written quickly because it does not need to be written to the storage; the disadvantage is that once the system is powered off when the updated data is not written to the storage, the data cannot be retrieved.

Graphics card (video card) : It interacts with the CPU with a large amount of data, and is also connected to the north bridge. It is a high-speed bus.

IO devices : Devices other than operators, controllers, and registers in the CPU are all IO devices. IO devices are divided into low-speed IO and high-speed IO. High-speed IO usually refers to the PCI bus

In order to connect various devices in the computer system that are slower than the cpu, the early motherboards integrated the north bridge chip and the south bridge chip (the current motherboard may not be designed in this way). The south bridge chip is a collection of slow devices and connected together Into the north bridge chip, so the bridge chip is to put it bluntly to summarize the external devices and finally complete the interaction with the cpu. The one connected to the south bridge is usually called the ISA bus. The early PCI bus was connected to the south bridge, and the one connected to the north bridge was called PCI-E. The speed of the PCI-E bus is much faster than that of the PCI bus. many. Common disk buses are in PCI format. SCSI, IDE, and SATA are collectively referred to as PCI buses. PCI (Peripheral Device Interconnect) is just a general term. The mouse and keyboard are serial interfaces. Usually, the U disk is connected to the South Bridge-North Bridge-CPU through the PCI bus for data interaction. If the U disk is made into a PCI-E interface, the line bandwidth is large enough, but the U disk is too slow. At this time, N U disks are connected in parallel and used as a storage disk, and a PCI-E bus is used to connect the North Bridge and the CPU. For data interaction, this is called a solid-state drive. Nowadays, many SSD interfaces are equipped with SATA interfaces. It is recommended to purchase SSDs with PCI-E interfaces. So the question is, the computer has so many external devices, how does the cpu distinguish between different IO devices? By analogy with the way a computer distinguishes various processes that communicate with the Internet, a computer relies on sockets, that is, IP address + port number, to distinguish different processes from external communication. Here, the CPU distinguishes different IO devices by port numbers, which are called IO ports. The number of IO ports on a computer is also 65535. When any hardware device is connected to the computer through the IO bus, it must apply to register a batch of continuous IO ports as soon as it is turned on.

The circuit of any hardware device may be inconsistent with the internal circuit of the CPU, so each external device has a controller and an adapter. The controller and the adapter convert the signal of the external device into a signal that can be understood on the CPU bus, which is equivalent to a translator. , and control the transmission rate, verification and other functions of external devices at the same time. The so-called driver is the command of the controller chip and the hardware to work.

Poll : When the CPU connects so many external devices, how does it distinguish whether the electrical signal comes from the hard disk, the mouse or the network card? It polls every few milliseconds to check whether these devices have signal transmission.

Interrupt : Because the polling efficiency is very low, when each device sends a signal, it notifies the CPU to check it. How does the CPU know which device is the signal? Maybe you think of identifying through IO ports, IO ports are for data interaction rather than signal interaction

Interrupt Controller (Interrupt Controller) : CPU external chip, receiving interrupt signal. When a signal is sent from an external device (such as a network card), the CPU interrupts the current operation and receives the signal into the memory. The interrupt controller is connected to an interrupt line, each line represents a device (not a fixed device), which is used to distinguish external devices, and the lines can be reused.

Direct Memory Access (DMA) : If the CPU needs to handle the signal sent by each external device, it will make the CPU very cumbersome, so DMA is used to solve this problem. The CPU allocates the space required for a certain data transfer in the memory, and authorizes a certain line to be used by the DMA. It allows hardware devices of different speeds to communicate without relying on the CPU's heavy interrupt load. Otherwise, the CPU needs to copy each piece of data from the source to the scratchpad, and then write them back to the new place again. During this time, the CPU is unavailable for other work.

When implementing DMA transfer, the DMA controller directly manages the bus, so there is a problem of bus control transfer. That is, before the DMA transfer, the CPU should give the bus control to the DMA controller, and after the DMA transfer is completed, the DMA controller should immediately return the bus control to the CPU. A complete DMA transfer process must go through four steps: DMA request, DMA response, DMA transfer, and DMA end.

In the physical memory, the starting address of the space with the lowest address and the most easily addressable space is reserved for DMA, generally 16M, and there is 1M of space in front of DMA reserved for BIOS.

The working frequency of the CPU is relatively fast, and the working frequency of the memory is relatively slow. When the memory transmits data to the CPU, the CPU is in an idle state most of the time, so the CPU will waste many clock cycles when dealing with slow devices. There is a time generator (crystal oscillator) inside the CPU, which always generates clock pulses. As shown in the figure below, the CPU has been running for several cycles before it starts to deal with Uchimura. In order to coordinate the steps, the CPU needs to plan how many cycles to interact with the memory. Generally, the interaction occurs on the rising edge of the clock cycle (ie, when the high level and the low level are toggled).

 

Operating system : The interaction between the CPU and external devices is often inconsistent. In order to make reasonable use of CPU resources, the Monitor (monitor) is derived, which later becomes the OS (operating system), and then the operating system abstracts the computer into a virtual machine.

 

The reason why the operating system is called a virtual machine is because we only have one cpu chip (maybe multi-core), only one memory, one mouse, and one keyboard... But every process wants to exclusive use of this entire set of resources. The cpu virtualizes a cpu chip into multiple cpus by means of time slice rotation, and the virtual memory is cut into fixed-size pages through the paging mechanism. Well, now the two most important components in the computer system have been virtualized, the arithmetic unit and the memory (in fact, the virtual cpu and memory). How to virtualize the remaining IO devices? In fact, IO virtualization does not need to be done specially, because which process currently obtains the right to use the system, the IO device is handed over to the entire process.

Process : A program has many functions, but an instance that loads part of the program's functions into the CPU for execution is called a process, which is equivalent to an independent running unit.

When multiple independent processes run at the same time, how to allocate resources reasonably on CPU, cache, memory, and IO devices?

1. CPU : Divide the time into independent units, and complete the slicing in the dimension of time to complete the CPU virtualization.

2. Cache : If there is enough space available, you don't need to do anything. If not, you need to save it. However, this process has not been executed yet, and the time allocated by the CPU has reached the next process. At this time, the number of instructions of the current process (instruction counter inside the CPU: register) must be saved, that is, the scene is saved, and it will be restored when it comes back again. on site

3. Memory : Slice the space, reserve a part of the kernel, and assign it to process 1, process 2, etc., and so on. If it is divided in this way, the process can be started and terminated at any time, or some processes need a lot of space, some is small, the divided space process is not enough, and some are divided too much, so a memory protection mechanism has to be introduced. In this way, the memory can only be cut into a fixed size, such as 4k as a unit (storage slot), each storage slot is a page frame, and the data stored in each storage slot is called a page (page). ), add a page and page frame mapping mechanism on the page and page frame, and the processes on this map each think that they own all the memory. Process space (instruction area + code area + data area + bss segment + heap + stack), for example, one page of instruction, one page of code, two pages of data, etc., as shown in the figure below, the code area and stack are mapped to the memory by the control chip On a page frame, it is not a continuous page frame. From the perspective of the process, the address of the data it needs is a linear address, and the real data is stored in the physical address, so it needs to be searched through the control chip. With so much data, how does the control chip quickly search? In fact, in the control chip, the corresponding relationship between the two is divided into page directories (first-level directory, second-level directory, third-level directory, etc.)

 

       4. I/O devices : To deal with hardware (ie I/O devices), it must pass through the kernel and be transferred from the kernel to the process.

  There is only one cpu chip. At a certain moment, either the kernel process runs on it, or the user space process runs on it. When the kernel runs on the cpu, it is called kernel mode, and when the process runs on the cpu, it is called user mode. The memory space occupied by the kernel in the memory is called kernel space, and the space occupied by user processes is called user space. In user mode, a process cannot directly control the hardware. This is because inside the cpu, the cpu manufacturer divides the instructions that the cpu can run into 4 layers (only for the x86 architecture), ring0, ring1, ring2, ring3, due to historical reasons, ring1 and ring2 are not used, linux only Ring0 and ring3 are used. ring0 is called kernel mode, also known as privileged instruction mode, which can directly control the hardware, and ring3 is user mode, which can execute general instructions.

When a running process wants to open a file or operate a microphone, it finds that it does not have permission to execute privileged instructions, so it will initiate a system call. Once the system call is generated, the process will exit and switch from user mode to privileged mode, which is called mode switching. . The kernel is responsible for loading the data into the physical memory (the physical memory is divided into its own space for the kernel and each process user) first to the kernel space, then to the process user space, and then mapped to the linear address. At this time, the kernel again Wake up the user process for data interaction. If there are multiple processes, that is, process queues, the state of the process is involved here. Here are a few briefly. Ready state: Ready means that in all process queues, all the resources required by this process are ready. The sleep state that is not ready is called sleep state, and the sleep state is divided into interruptible sleep and uninterruptible sleep. The difference is: interruptible sleep means that it can be woken up at any time, and uninterruptible sleep means that the data prepared by the kernel for it is not ready. , even if you wake it up, it can't work. Interruptable sleep is not because the resources are not ready to sleep, it is just that one stage of work has been done, and the next stage of work has not yet come, so it goes to sleep, you can wake it up at any time, so it is called interruptible sleep. . The uninterruptible sleep generally enters the sleep state because of IO.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325069638&siteId=291194637