computer architecture framework

This is a framework based on the computer architecture course of Teacher Hu Weiwu. I hope that people who have not studied this course can understand how computers are made, and those who study this course can understand the overall framework before learning the course. There is a preliminary understanding.
If you don't want to read the text, you can watch the video !

01 with transistor

The computer is a great invention, it disseminates our information, processes it and even completes our work automatically. So how does it work inside him? Let's talk about the input information first.
The information required in the computer, such as images, text and sound, is converted into various codes inside the computer, such as a SCL code, etc., and will eventually be converted into binary. He is actually a way of expression, and our various inputs will be abstractly mapped to binary.
Just like the English people adopt English as their mother tongue, we adopt Chinese as our mother tongue. They are all rules set by people, and they can be transformed into each other. Binary is a language of computers, so we need to convert all the information before he can understand it. The essence of the conversion rule (or encoding) is actually a mapping table.
Why use binary, because it is the most common, and more importantly, the transistor, the most basic component of a computer, can express binary through the circuit on and off caused by the voltage level. A high voltage represents a 1, and a low voltage represents a 0. In other fields, in quantum computers, he uses the level of energy to represent binary, and the process of superconductors also expresses 01 through the presence or absence of magnetic flux.

The computer is used for calculation. The transistor I just mentioned is actually the calculation part to express 01, but we need to store the calculation in binary. This part actually uses a capacitor. If the capacitor reflows the charge, it represents 1. If No charge means 0.

In junior high school physics, we have learned the simplest light bulb lighting circuit, which requires a switch to control, but it is very large, so we need to do more complicated tasks, so we need to use smaller substances to make the circuit on and off The basic unit of the transistor, and this basic unit is the transistor, which is mainly the cmos tube in the computer. The cmos tube is divided into nmos tube and pmos tube. We can see that when the nmos tube is given high voltage, it is a path, and when it is low voltage, it is a short circuit. , while the pmos tube is the opposite, which is caused by the opposite direction of current flow caused by the opposite material doped inside.
insert image description here

basic structure

So what does the tiny transistor have to do with our computer?
Just like our calculations also start from the simplest addition and subtraction, and our logical calculations also start from the simplest AND or NOT. If you have studied digital electricity, you should be familiar with it. If you are not familiar with it, you can follow similar addition and subtraction. The basic unit of law to understand.
insert image description here

As shown in the figure, it represents an inverter, which is also a basic unit. We found that it only needs a p-electrode tube and an n-electrode tube to achieve the opposite function.
In the same way, other basic units or other functional modules are also implemented by combining cmos tubes.
More abstract, we can pack it into an input as a line, and a black box in the middle, and then output a module of it, then we can continue to add
from small modules, and then become a Large modules, and then continue to form more complex logic from the large modules, and then package them into a larger module, and then stack them layer by layer to build our ideal logical structure.
How to package it into a black box? We can find that assuming that his box is an inverter we mentioned above, when we traverse its input and corresponding output, we call it a value-added table, for example, we list all his possible situations, and then go to know What does the output corresponding to its input look like?
In this way, we can encapsulate it into a sealed module, and then express this module with a value-added table. This module can be used in more complex logic. In fact, at present The same is true for hardware, that is, it will provide various modules very maturely, and then we need to see the functions of various modules. Then go and complete our various hardware structures.

Calculation process

We can turn the above adder into a black box again, here is a shift adder, will he be able to calculate if he is given data? We found that he was told by a control terminal whether I should perform a calculation on these data? This involves instructions. Its set is an instruction set, which includes all possible sets of instructions we may give, and then we will give specific instructions to compile and edit instructions through various situations. My understanding Which control terminal he wants is a high voltage, and then some low control terminals, and then the control terminal is low voltage and zero. For example, this instruction is an addition operation, so in this adder, his control terminal will become one, then on the side of the multiplier, its control terminal will be zero, similar
insert image description here

In the figure, I marked storage, instruction and calculation. Storage is where data and instructions are stored. Operation is the adder module. This is the result of Von Neumann, and storage and calculation are separated. Instructions are codes, which tell the computer what to do through instructions.
And what our computer has been doing is to repeatedly execute such a pipeline, but this task may carry out thousands of instructions, so serial execution is very slow, so how to improve the system structure from different aspects The speed of the flowing water.

Speed-up structure

insert image description here

static flow

insert image description here

Each shot is actually an instruction that can do one thing. When the first instruction comes to be executed, the third instruction arrives at the fetch instruction, and then we find that the third instruction specifies that he needs to get the information of a, and then For the information of a, but it has not finished executing, and it has not been written back to the register, for example, I don’t know the latest result of your calculation, so I will use your latest result to calculate, then How to do it? We need to wait where he is, and tell me when he gets the latest data, and then I can go.
In order to reduce the number of waiting shots, at the end of the fourth and third steps, another circuit line was given to lead him to the second step, that is, at the second step, he can tell me the fourth step and the new data of the third step

The software scheduling is to use the method of a soft programmer to let him change the order of the program instructions so that they are farther away. Therefore, when executing the sixth a, he is actually a of the first a. The entire pipeline has been completed, so he can already know when compiling, and I can know what the latest data of a is?

Dynamic flow

insert image description here

The dynamic pipeline is that he can overtake in a curve, that is, he does not execute instructions sequentially in one execution.
Just add one more module in the hardware and then open up a module called a reservation station, and let him wait in it for instructions that have not yet obtained results and have not been able to get new data. The key lies in instructions that are slower than him and have data ready. He was able to keep going without being blocked.
When we looked at the following pipeline, we found that when the local instruction was written back to the register, the second instruction was still compiling the second step of the instruction, and then the third instruction had bypassed the second instruction. It has already started to execute to achieve such an effect, so that it can avoid the straight line and storage access in the middle, and the fourth step has been waiting. A condition in which no data is processed.

multiple launches

insert image description here
In addition to overtaking, we can also widen the lanes to make the processing and calculation faster. Everyone knows that in a factory, if there is only one person working on an industrial line, it will be very slow. If there are multiple people working on this industrial line , is it possible soon? Therefore, we use multiple launches, that is to say, add multiple pipelines to the above pipeline, but there is an important point that must also be taken into account, the correlation of the data just mentioned, so when multiple launches It is also necessary to check at all times, that is, whether your instructions on the parallel path are also relevant, so as to avoid some new instructions not getting the latest data, such a logical error.

divert guesswork

insert image description here
Logically, we will generate many instruction jumps. For example, if you do this, consequence A will occur, and if you do not do such a thing, consequence B will occur again, but what about this consequence? He will only know when it is executed, and he will be able to know where you are going to fetch instructions, so I have to wait 2 more shots for no reason to let him know the result, so that I can know where I should go next Go instruction, so we added transfer guessing to jump similar instructions to increase its speed

Why can guess this way? Because tasks often have regularity. For example, I need to go to a room to get an apple ten times. In this case, I keep going to the room to jump to get an apple. This is actually historical. Can we just define it? explain? If you jumped last time, then I will jump this time,

In this case, if you cycle ten times, you only need to be unsuccessful when you enter at the beginning, and then you will make a wrong prediction when you go out. In this case, the accuracy of my prediction is 80%, which means I am 80%. % of the instructions can not wait for this row, I guess I must go there to get the address, and then let the assembly line go, 20% of the guess is wrong, need to add 20% of the time cost of the wrong prediction, you need to put this guess The wrong follow-up is deleted, and then jumps to our correct instruction again.
But in general, with this guess, it can still speed up our pipeline calculations, because oh, we have certain logic events in many logical events. regular

As far as the law just mentioned, it may be cycle repetition, or it may be the correlation between instructions. For example, if I go to one place, then I will definitely not go to another place, then I know that I went to the first place. place, the next place, I knew he would not go, so this is also a guess.

cache

insert image description here
How to fetch data faster? Here is a technology of case. After the acceleration of my previous pipeline, I found that the calculation is very fast, but it is time-consuming in terms of data fetching. It is really long and will lead to constant fetching of data. , he has been consuming the counting part, but the calculation has been done quickly, and he is free, so how to use the free time? Therefore, we need to carry out a calculation in the
fetching Optimization, so that it can be balanced with the calculation time, can make up for the shortcomings.
Here, we use a process of data fetching. If we usually fetch data, we will use the memory to fetch one. Data, but memory and storage have a rule

If the storage space is larger, its access speed will be slower. We still need these large storage data in the computer to meet our needs, so what should we do? We are inside the CPU, plus a smaller one. The storage space is called cache. In fact, it is actually a subset of the memory, not a simple additional storage space.

It is found through research that sometimes during some tasks, he will repeatedly get the same data in the memory, then we put these repeated data in the cache, and then this cache can make the CPU faster Take the same data, and put some uncommon data into a larger storage memory, and then we will transfer it from the memory to the cache if necessary.
If a CPU needs to get the same 4 KB data 100 times, then I will add the 4 KB data to the cache. In this case, I only get 4 KB data from the cache every time. KB of data, but it will be faster than the previous one to get the data in the memory. If the data is infrequent, then I will waste some time to get the data in the memory.

TLB

insert image description here
In order to clearly talk about the entire computer framework later, I will talk about something called TLB here,
which is actually related to the operating system. In daily life, we found that our computer can actually be used by naming multiple users, and then during the user process, we found that the cpu and memory are mine, and I can use them casually. Of course, this is also our In daily life, the storage disk will be larger than our needs, so they will not affect each other

It is actually a virtual storage technology. The operating system constructs a virtual address, each process has its own virtual address space, and then the operating system will map these virtual addresses to the physical address space on the disk, then this The mapping table of the share is saved by the operating system, and it is also maintained by the operating system

What about the CPU? It needs to map the virtual address to a physical address to address and get data, but suppose a program builds a virtual space mapping physical space page table with a page table of 4M, then if there are 100 If the program is running, there is 400M. After our CPU gets a virtual address to map to a physical address, do we have to traverse this 400M? If it is placed in a large table, then he needs to traverse the 400M mapping table, and then find the corresponding physical address, which is particularly time-consuming, so after that, someone proposed classification, But there are a few more conversion processes, which reduces the speed of low-level conversion. This is not what we need, we just want to increase its speed, so I think you can't use a smaller subset called TLB to store our page table entries like Cache.
Therefore, when the CPU is performing virtual-physical address translation, it will first check the TLB, and if it is not found, then check the complete mapping table saved by the operating system. to increase its speed.

Overall architecture

We have basically finished the content of the computer architecture. From the beginning, we have a five-stage pipeline, and we can know how the entire computer works when it is doing calculations. Then the picture below is the architecture of the computer. On the basis of the five-level pipeline, he added some structures to improve his speed that we just mentioned, and then formed the entire computer architecture.
You can see the video description for details, or you can also have a clear understanding of this picture through the above.
insert image description here

Guess you like

Origin blog.csdn.net/Carol_learning/article/details/130466133