Group review (3): Streamlined data path, pipeline risk detection and processing

Preface

Yesterday, I learned about the single-cycle mips pipeline data path and learned how the hardware structure uses a series of control signals to complete the differential processing of different instructions.

However, a single-cycle data path is an ideal situation, because the entire instruction execution process occupies all hardware. In fact, for efficiency, the pipelined hardware structure can be shared by up to 5 instructions at the same time (corresponding to the 5 stages of mips), which brings about the problem of data consistency.

The pipelined data path avoids data conflicts by means of intermediate caching, and cleverly uses the five-stage hardware and logically separates them to truly implement pipelined execution instructions.


Recommended reading first:

Pipeline thinking

An instruction is logically divided into 5 stages, which are:

  1. Instruction fetch (IF)
  2. Decoding (ID)
  3. Execution (EX)
  4. Memory access (MEM)
  5. Write back (WB)

Because any instruction requires 5 steps, 5 different stages can be divided from the hardware level.

Insert picture description here
Divided in this way, as long as a component completes its work, the corresponding part of the next instruction can be executed immediately, which enables the realization of streamlining and greatly improves efficiency:

Insert picture description here
Note: I use the previous blog a map to do the laundry, for example (too lazy to draw a new map

Speedup ratio

As shown in the figure above, assuming that the time spent in each stage of the pipeline is the same (assuming it is 1 clock cycle), and there is no risk (idealization), then:

  • Serial execution of n instructions requires 5n cycles
  • Pipeline execution of n instructions requires n+2 cycles (see the figure above to understand why +2)
  • The speedup ratio is 5n÷(n+2)=5

If the time spent on each instruction is not equal, and the instruction sequence is not infinitely long, the speedup ratio will drop further. (Because the throughput rate is reduced, and +2 cannot be ignored at the same time)

Streamlined data path

Streamlining means that each component is independent, but it backfires. For the same instruction, the components of the next stage always depend on the results of the previous component.

Take the ID and EX stages as an example. The EX stage needs the control signals and register data given in the ID stage before it can be put into the ALU for operation.

For example: Pipeline execution of instructions 1, 2. The execution has reached the EX stage of instruction 1, and the data of instruction 1 ID stage is needed. But because there is only one set of hardware, the ID component has been filled with the data of instruction 2 at this time, and an error will occur if it is accessed again:

Insert picture description here
Smart Moffett mips designer, by adding caches between each of the different stages, to store the results of previous instruction operation (including data and control signals), a member for the next stage of the call:

Insert picture description here
There are 5 stages, so a total of 4 buffers can be added to complete the pass of all data and control signals. They are:

  1. IF/ID cache
  2. ID/EX cache
  3. EX/MEM cache
  4. MEM/WB cache

Note
Actually these caches are registers

Streaming data path of lw instruction

Let's understand through the pipelined data path of lw instruction. In the following figure, red means writing and blue means reading .

The first is the IF stage, because it is a new instruction and does not need to read any cache. Read the instruction binary code directly, calculate PC+4 at the same time, and write the result into the IF/ID cache:

Insert picture description here

Then there is the ID phase. From the IF/ID cache, read the results of the IF phase, and give the results of the ID phase, and store them in the ID/EX cache. Here read the source and destination register number given by the instruction, and read the actual data from the register, don't forget to pass the PC. As shown below:
Insert picture description here

Then comes the EX stage, and repeat the previous operation (read data from the cache of the previous stage, and write the running result to the next stage). Here, the target address and the offset given by the immediate data are read, and then sent to the EX/MEM cache for the next stage to read the memory for use:

Insert picture description here
In the MEM stage, read the address calculated in the EX stage from the EX/MEM cache, then access the memory, and write the obtained data into the MEM/WB cache:
Insert picture description here

WB stage read MEM / WB cache, the target data (the blue arrow in FIG.) Lw instructions read simultaneously from the MEM / WB cache read lw destination register number (the green arrow in FIG.), And then the target Register number, write the data back to the correct target register. As shown below:

Insert picture description here

Control signal transmission

Like data, the 9-bit control signal generated in the decoding stage also needs to be passed backwards! among them

  • ALUSrc, RegDst, ALUOp are used in the EX phase
  • MemRead, MemWrite, and Branch are used in the MEM phase
  • MemtoReg, RegWrite is used in the WB phase

The data path needs to pass the necessary control signals of each level backward step by step:
Insert picture description here
The following table shows the transfer relationship of this hierarchical control signal:
Insert picture description here

Data hazard

Data hazard refers to two consecutive instructions 1, 2 of which instruction 2 depends on the calculation result of instruction 1, so you must wait for the WB stage of instruction 1 to read the correct data from the target register.

Cited here about my previous blog content (lazy:
Insert picture description here

The data hazard can be solved by the forwarding strategy, that is, immediately after the EX of instruction 1, the data is sent to the EX stage of instruction 2, so that instruction 2 can continue without extra waiting.

Data hazard detection

There are two types of data hazards, namely:

  1. Data hazard occurs between two consecutive instructions, called Type 1 data hazard
  2. After an instruction, data hazard still occurs, which is called type 2 data hazard

As shown:

Insert picture description here

According to the idea of ​​caching in the above data path, we pass the source and destination registers of the instruction along the way , so that data hazards can be detected.

Type 1 hazard detection

Let's first look at the type 1 hazard, that is, the data hazard between two immediately adjacent instructions. Suppose there is the following code:

指令1
指令2

Then before the EX phase of instruction 2 is executed , the result after the EX phase of instruction 1 is executed is required . According to the data path, you can know:

  1. ID/EX cache stores the source register number of instruction 2
  2. The EX/MEM cache stores the calculation result of instruction 1 and the destination register number of instruction 1

Then by judging whether the [source register number in the ID/EX cache] and the [destination register number in the EX/MEM cache] are equal, we can detect type 1 data risk! The principle is as follows:

Insert picture description here

Type 2 hazard detection

Similarly, the two types of data hazards have similar principles. A data hazard occurs between two spaced instructions, then the code is:

指令1
指令2(无关指令)
指令3

The EX stage of instruction 3 also needs the result of the execution of EX of instruction 1. Then it can be seen through the data path:

  1. When instruction 3 is executed to EX stage, instruction 1 has completed MEM and is ready to enter WB stage, then the destination register number of instruction 1 must exist in the MEM/WB cache
  2. When instruction 3 is executed to the EX stage, the ID/EX cache will carry the source register number of instruction 3

So we can judge whether the [source register number in the ID/EX buffer] and [the destination register number in the MEM/WB buffer] are equal, we can detect 2 types of data risk! The principle is as follows:

Insert picture description here

Complete risk judgment

The idea mentioned above is to judge whether the read and write registers are the same, if it is, a risk occurs. In fact, both R-type instructions and I-type instructions need to read the data of two registers:

Insert picture description here

That is, the rs and rt registers, then each risk requires two situations to be judged, a total of 4 situations, namely:

Insert picture description here

Double adventure

Double adventure is to satisfy both Type 1 and Type 2 adventures at the same time. The solution is very simple, that is, only type 1 adventure (experience at EX) does not occur, then the forward push of type 2 adventure (adventure at MEM) is performed. The detailed explanation is as follows:
Insert picture description here

Access-use adventure

Different from data hazard, "fetch-use" hazard is that the source operand of the next instruction depends on the data read from memory by the previous instruction , for example:

lw r2, 2(r3)
add r1, r1, r2

This means that the forward push no longer occurs after the EX, on the contrary, the forward push operation can only occur after the MEM phase of the fetch instruction. And no matter how forward it is, it will always be delayed by one cycle :

Insert picture description here
Or with:

lw r2, 2(r3)
add r1, r1, r2

As an example: because we have to delay one cycle anyway, we cannot wait until the add instruction EX to judge the "fetch-use" risk, because the IF and ID components have already done the wrong operation without delay (that is, the sequence after the add is executed) instruction)

On the contrary, we must start to detect whether there is a "fetch-use" hazard in the ID phase of the add instruction. Also by judgment

  • ID / EX in the cache, lw instruction purpose register number rd
  • IF / ID buffer, add instruction is a source register number of rs / rt

Whether they are equal, to determine whether a "fetch-use" risk occurs. As shown below:
Insert picture description here

Blocking handling

"Fetch-Use" risk will definitely delay a cycle, we achieve by blocking the pipeline, we do three things:

  1. In the ID/EX buffer, the control signals are all set to 0, so that the following EX, MEM, WB are all no operations (nop)
  2. PC, IF/ID cache remains unchanged

Pipeline blocking + forward push to solve the risk of "fetching-using", the diagram is as follows:
Insert picture description here
or:

lw r2, 2(r3)
add r1, r1, r2

As an example: after the IF is executed, the IF/ID cached data (corresponding to add) and the ID/EX cached data (corresponding to lw) are transferred to the blocking unit, and then the blocking unit determines whether a "fetch-use" risk occurs, And keep the PC as it is to block one clock cycle, as shown in the figure below:
Insert picture description here

Branch adventure

The mips branch prediction will never jump. In addition, the branch instruction is generally beq. In the EX phase, the ALU calculates whether the two operands are equal and gives a control signal. But the actual execution of the branch is still in the MEM stage .

As for why the MEM phase is, you can read my previous blog: Group Planning Review (2): Single-cycle data path and control signal
Insert picture description here

This requires cancellation of some instructions. If the branch instruction is far enough away from beq, then we can cancel at most 3 instructions. Because beq is in the MEM stage, it means that the following IF, ID, and EX three stages all have 3 instructions, and there can only be 3 instructions at most!

Just like blocking, we set all the control signals to 0.

Insert picture description here

Interrupt exception

The inner ghost said here to be tested. But I think there should be nothing to pay attention to, mainly because the address of the interrupt service routine after the jump is 8000 00180

Insert picture description here

Then the whole picture of the pipeline:

Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_44176696/article/details/112471051
Recommended