Computer Architecture Experiment 1
1. Experimental purpose
Understand the data flow and control signals of RISC-V instruction execution, and be familiar with the working process of the instruction pipeline.
2. Experimental process
1.RISC-V related instructions
The simulator of the experiment uses the RISC-V instruction set. In order to facilitate subsequent analysis, first learn the RISC-V instructions used in the experiment.
Basic RISC-V uses 32-bit fixed-length instructions. But the standard RISC-V encoding mode supports variable-length instructions and can use 16-bit instructions. These instructions are called standard compressed instructions and are named C. These instructions are often common operations that can reduce static and dynamic code size. All the following instructions of the form "C.inst" belong to this type of instructions.
The assembly code in the experiment mainly includes the following types of instructions:
- Control transfer instructions: cj, c.jr, bge
- Arithmetic and logical operation instructions: c.slli, c.addi
- Integer generation instructions: c.li
- Memory access instructions: lw, sw
transfer of control instruction
The cj instruction is an unconditional jump instruction. It uses PC relative jump. The PC is added to the offset to form the jump target address. It is equivalent to the Jal instruction. After the jump, the value of PC+4 will be written to the rd register. For cj, rd=x0, and x0 cannot be written, which is equivalent to this instruction only completing the jump. The c.jr instruction executes an unconditional control jump to take out the value of rs1 and sets the lowest bit to 0 as the jump address, which is equivalent to the Jalr instruction. It also writes the value of PC+4 to the rd register after the jump. But rd is x0. The bge instruction compares the values of the rs1 and rs2 registers as signed numbers. If rs1 is greater than or equal to rs2, it jumps to the target address.
Arithmetic and logical operation instructions
The c.slli instruction performs a logical left shift operation on the value in rd, and writes the result to rd. The shamt field in the instruction is the number of shifts. c.Addi, add and other instructions are all addition instructions, only the operands are different.
Integer generation instructions
The c.li instruction writes the sign-extended 6-bit immediate value to register rd, and is only valid when rd!=x0.
memory access instructions
The lw instruction reads the value with rs1+offset as the address from the memory and stores it in the rd register. The sw instruction writes the value of rs2 into the memory, and the address is rs1+offset.
2. Assembly code analysis
By analyzing the code given in the experiment and converting it to assembly code, we can find the assembly instruction parts of the two loops.
The part of the first loop is as follows:
1014c: fe042623 sw x0 -20 x8 #x8-20 : i
10150: a005 c.j 32 #-> 10170
10152: fec42783 lw x15 -20 x8
10156: 078a c.slli x15 2 #x15 = 4i
10158: ff040713 addi x14 x8 -16
1015c: 97ba c.add x15 x14
1015e: fec42703 lw x14 -20 x8
10162: e6e7a623 sw x14 -404 x15 #A[i] = i
#i++
10166: fec42783 lw x15 -20 x8
1016a: 0785 c.addi x15 1
1016c: fef42623 sw x15 -20 x8
#if(i<100)
10170: fec42703 lw x14 -20 x8
10174: 06300793 addi x15 x0 99
10178: fce7dde3 bge x15 x14 -38 #i<99 -> 10152
The part of the second loop is as follows:
1017c: 4785 c.li x15 1
1017e: fef42623 sw x15 -20 x8 #x8-20: i
10182: a80d c.j 50 #-> 101B4
10184: fec42783 lw x15 -20 x8
10188: 17fd c.addi x15 -1
1018a: 078a c.slli x15 2
1018c: ff040713 addi x14 x8 -16
10190: 97ba c.add x15 x14
10192: e6c7a783 lw x15 -404 x15 #x15: A[i-1]
10196: 3e878713 addi x14 x15 1000
1019a: fec42783 lw x15 -20 x8
1019e: 078a c.slli x15 2
101a0: ff040693 addi x13 x8 -16
101a4: 97b6 c.add x15 x13
101a6: e6e7a623 sw x14 -404 x15 #A[i] = A[i-1]+1000
#i++
101aa: fec42783 lw x15 -20 x8
101ae: 0785 c.addi x15 1
101b0: fef42623 sw x15 -20 x8
#if(i<100)
101b4: fec42703 lw x14 -20 x8
101b8: 06300793 addi x15 x0 99
101bc: fce7d4e3 bge x15 x14 -56 #i<99 -> 10184
3.RISC-V circuit analysis
The circuit design diagram of RISC-V is as follows:
Address phase
The circuit is analyzed stage by stage, starting with the addressing stage. The NPC Generator in the value phase generates the address of the next instruction, and the address comes from the address calculation of Jal jump, the address calculation of Jalr jump, the address calculation of branch, and the next instruction is selected through the selection signal of the control unit and branch selector. the address of. Using the PC value generated by the NPC as the address, the value is fetched from the instruction memory and passed to the decoding stage. The PCF of NPC is connected to branch prediction. For conditional branches such as BGE, if the prediction is selected, there is no need to wait until the execution stage to judge the condition. The jump address can be calculated directly in the ID stage and written to the NPC.
decoding stage
The work completed in the decoding stage is to generate control signals and read operands. The main work units are the register group and the immediate value operation unit. RegWriteW, WD3 and A3 of the register group are the signals of the write-back stage. These signals are passed from the decoding stage to the write-back stage. The immediate value operation unit completes the expansion of the immediate value and passes it to the execution stage, where it can directly complete the calculation of the Jal jump target address and pass it to the NPC generator. For the unconditional jump to Jal, the control unit can issue control at this stage. Signal JalD, jump is made.
Execution phase
The execution phase completes three tasks: selecting operands and completing ALU operations; branch selection; and calculation of the target address of the Jalr instruction. The sources of operands include register values read in the ID stage, data forward results in the EX stage and MEM stage, and immediate numbers. All control signals are generated by the control unit in the decoding stage and then passed to various components in the execution stage.
The branch selector compares the values of the two registers according to the branch type (may come from the value phase, or may come from the execution and memory access phase), chooses whether to perform a branch jump, and sends the branch signal to the NPC generator. Since the branch selection is in This stage is carried out, and the branch target address is also passed to the NPC generator at this stage. It is the BrNPC output by EX, and the value is the immediate number obtained in the decoding stage. If the dynamic prediction is not selected at the beginning, but the branch calculation result is selected again, the prediction is wrong, and the NPC jump target address is written in the EX stage.
The Alu operation selects the operand from the register value read in the data forward and decoding stages. The control signal Forward is issued by the Harzard unit. If the destination register currently written (previous/previous instruction) is the same as the source register of the current operation , select the results of the memory access and write back phases. The AluSrc signal selects the operand as register/PC/immediate data, and AluContrl gives the calculation type. If it is a Jalr instruction, set the lowest bit of the value of rs1 to 0 and pass it to the NPC generator through AluOut to jump.
Access phase
The main work of the memory access phase is to read and write the memory, and also complete the transfer of the results of the execution phase. The write enable signal is passed from the decoding stage.
For arithmetic instructions, the results of the execution phase are directly passed to the write-back phase and written into the rd register. For instructions such as jal and jr, PC+4 will be written into the rd register. The control unit sends the LoadNpc control signal when decoding. The choice between the two.
write back phase
The write-back stage selects between the memory access result and the ALU operation/PC+4 result, and transmits it to the register group. MemToReg is the selection signal, which is generated by the control unit and passed to the write-back stage.
control unit
The control unit is as follows:
The functions of some of these signals are as follows:
-
RegWriteD: Write enable signal of register group
-
MemToRegD: Whether the data written to the register group comes from memory
-
LoadNpcD: Whether to calculate PC+4 and store it in the rd register
-
RegReadD: Whether to read the register and pass it to the Harvard unit
-
BranchTypeD: Indicates the branch type
-
AluSrc1D, AluSrc2D: The source selection signal of the ALU operand. The optional sources of Alusrc1D include register value and PC value; the optional sources of Alusrc2D include register value and immediate data.
-
AluContrlD: ALU specific operation selection signal
-
ImmType: Type of immediate data (length)
4. Execution process of instructions
The execution process of a.add x15, x14, x15
The instruction is located at:
10188: 17fd c.addi x15 -1
1018a: 078a c.slli x15 2
1018c: ff040713 addi x14 x8 -16
10190: 97ba c.add x15 x14
In the instruction fetch stage, the PC value is 10190, the instruction at this address is read, and the instruction is transferred to the ID stage. Key control signals:
- JalrE, JalE, BrE: are all in a non-enabled state because there are no branches or jumps in the first few instructions.
In the decoding stage, the value of A1 is 0xE, which represents the x14 register, and the value of A2 is 0xF, which represents the x15 register. Read the values of these two registers from the register group and pass the values to the EX stage. At the same time, the control signal corresponding to the add instruction is also passed to the EX stage. At this time, the write control signal of the register group is the write signal of the previous instruction. For the add instruction, there are no important control signals at this stage.
In the execution phase, the register value enters the selector from RegOut1E and RegOut2E. Since the operand x14 is the result of the previous instruction, the value pushed forward in the memory access phase of the previous instruction is selected; the operand x15 is the result of the previous instruction. Select Write back the results of the stage push. AluContrlD selects the addition operation, adding the two operands. Key control signals:
- Forward1E: Select the value pushed forward in the memory access stage as the register read value
- ForWard2E: Select the value pushed forward in the writeback stage as the register read value
- AluSrc1E, AluSrc2E: both select the value of the register
- AluContrlD: ADDOP signal, ALU performs addition operation
In the memory access stage, since there is no need to access memory, the MemWriteM signal is in a non-enabled state. The data written is the result of the ALU operation, not the value of PC+4. The key signals are:
- LoadNPCM: Disabled, selects the ALU operation result to be passed to the writeback stage
In the write-back phase, MemToRegW is not enabled. The selector selects the result of the ALU operation and outputs it to the WD3 interface of the register group. A3 of the register group is the address (0xF) of the write register. The RegWriteW signal is also passed down and connected to the register group. And for write enable, write the addition result of the add instruction to the x15 register. Key signals:
- RegWriteW: enable state
- MemToRegW: Disabled state, select ALU operation result
The data path of the entire process is:
b.bge x15, x14, -56 instruction execution process
This instruction and the previous two instructions are:
101b4: fec42703 lw x14 -20 x8
101b8: 06300793 addi x15 x0 99
101bc: fce7d4e3 bge x15 x14 -56
The key signals in the index fetching stage are:
- JalrE, JalE, BrE: are all in a non-enabled state because there are no branches or jumps in the first few instructions.
In the decoding stage, the value of A1 is 0xF, which represents the x15 register, and the value of A2 is 0xE, which represents the x14 register. Read the values of these two registers from the register set, pass the values to the EX stage, and calculate the value of PC-56 to pass to the EX stage and also pass to the NPC as the branch target address. At this time, PCF predicts whether to jump. If a jump is made, the NPC is written.
In the execution phase, the values of x14 and x15 are updated by the first two instructions, so the values in the memory access and write back phases are selected, the value of the operand selection register, and the branch selector compares the two values to determine whether a jump is required. The key signals are:
- Forward1E: Select the value pushed forward in the memory access stage as the register read value
- ForWard2E: Select the value pushed forward in the writeback stage as the register read value
- AluSrc1E, AluSrc2E: both select the value of the register
- BrType, BrE: bge type branch, compare the values of the two operands, if op1>=op2, BrE is enabled, the jump should be made, otherwise it should not jump
If a jump is predicted and the branch condition is met and a jump occurs, the pipeline will be flushed (the harvest component generates a flush signal), the instruction fetch result is the jump target, and the next instruction of the bge instruction is invalid. If a jump is predicted and the branch condition is not met, the prediction is wrong and the instruction must be fetched again in the next cycle. If the prediction does not jump and the condition is not met, continue with the normal instruction. If the prediction does not jump but the conditions are met, the prediction is wrong and the jump target must be written to the NPC in the EX stage. The following assumes that the prediction does not jump but the prediction is wrong, and the NPC is written in the EX stage.
In the memory access stage, there is no need to access memory. The MemWriteM signal is in a non-enabled state. This instruction does not write to the register. The key signals are:
-
LoadNPCM: No writing, so selecting any data has no effect. It is regarded as non-enabled by default. The ALU result is passed
during the write-back stage. There is no need to write data. RegWriteW is not enabled. Key signals: -
RegWriteW: non-enabled state, no writing
-
MemToRegW: no effect
The data path of the entire process is (no jump is predicted but the prediction is wrong, the jump address is written in the EX stage):
c.lw x15, -20 x8 execution process
This instruction and the previous two instructions are:
101a4: 97b6 c.add x15 x13
101a6: e6e7a623 sw x14 -404 x15 #A[i] = A[i-1]+1000
101aa: fec42783 lw x15 -20 x8
The key signals in the index fetching stage are:
- JalrE, JalE, BrE: are all in a non-enabled state because there are no branches or jumps in the first few instructions.
In the decoding stage, the value of A1 is 0x8, which represents the x8 register. The value of this register is read from the register group and the value is passed to the EX stage. The other operand is an immediate value, which is passed to the EX stage through the immediate unit, which has no important control signals.
In the execution phase, operand 1 selects the value of x8 read from the register, operand 2 selects the value of the immediate value, the operation type is addition, and the result is passed to the memory access phase. Key signals:
- Forward1E: Select the value read from the register
- ForWard2E: No impact
- AluSrc1E, AluSrc2E: OP1 selects the register value, OP2 selects the immediate value
- AluContrl: ADDOP signal, addition operation
In the memory access stage, the result of the ALU operation is used as an address to read the value from the memory and pass it to the write-back stage. Key signals:
- LoadNPCM: no impact, reads data from memory
- MemWriteM: Disabled, read data
In the write-back phase, the memory access result is transferred to the register group and written to register x15. Key signals:
- RegWriteW: enable status, write to register
- MemToRegW: Enabled state, select the memory access result
The data path of the entire process is:
d.sw x15 -20 x8 instruction execution process
This instruction and the previous two instructions are:
101aa: fec42783 lw x15 -20 x8
101ae: 0785 c.addi x15 1
101b0: fef42623 sw x15 -20 x8
The key signals in the index fetching stage are:
- JalrE, JalE, BrE: are all in a non-enabled state because there are no branches or jumps in the first few instructions.
In the decoding stage, the value of A1 is 0x8, read the x8 register, the value of A2 is 0XF, read the value of the x15 register, the other operand is an immediate number, the process of reading and transferring is the same as the lw instruction. There are no important control signals.
In the execution phase, the storage address needs to be calculated. The first two instructions have not written to x8, so the value of x8 and the immediate value read in the decoding phase are selected as operands for operation. The key signals are:
- Forward1E: Select the value read from the register
- ForWard2E: No impact
- AluSrc1E, AluSrc2E: OP1 selects the register value, OP2 selects the immediate value
- AluContrl: ADDOP signal, addition operation
In the memory access stage, it is necessary to write to the memory. The value written is the data written back by the previous instruction, and the address is the result of the ALU calculation. Key signals:
- LoadNPCM: no impact
- MemWriteM: write enable, write data
The writeback phase does not require writing to the register. Key signals:
- RegWriteW: non-enabled state, no need to write to the register
Its data path is:
The blue one is the result of the previous instruction in the writeback stage, which is passed to StoreDataM as write data. The data source of StoreDataM is not shown in the original circuit diagram. It should be the same as the operand selection in the execution stage. You can choose the value read from the register. or the calculation result of a previous instruction.
4.The role of BranchE signal
The function of the BranchE signal is to determine whether the branch condition is correct. For branch instructions, the values of rs1 and rs2 need to be compared during the execution phase, and the two values are compared according to the branch type to determine whether a jump is needed. If a jump is needed, the BrE signal will be in the enabled state, and the NPC generator should write the jump The target address of the transfer. For dynamic branch prediction, this value may have been written in the ID stage. The EX stage only verifies that the prediction is correct. If it fails, the instruction must be fetched again to flush the pipeline.
If the prediction is correct and a jump occurs, then the instruction after the jump instruction is invalid, so the pipeline needs to be flushed and the execution of the next instruction is canceled. Therefore, the BrE signal is also passed to the Harzard unit. When a jump is required, it is received The jump signal's Harzard unit will send a flush signal to the pipeline intermediate register, flush the pipeline, cancel the instruction being executed, and re-read the instruction at the jump target address to continue execution.
The prediction is selected and the prediction is correct
CLK1 | CLK2 | CLK3 | CLK4 | CLK5 | CLK6 | CLK7 | |
---|---|---|---|---|---|---|---|
pipeline i1 | IF(NPC=NPC+4) | ID (branch prediction, branch address calculation) | EX (calculate branch condition) | MEM | WB | ||
pipline i2 | IF (update NPC based on prediction results=NPC+4/branch target) | ID | flush | flush | flush | ||
pipeline i3 | IF(NPC=PC+4) | ID(branch target) | EX | MEM | WB |
Prediction is unchecked and prediction is correct
CLK1 | CLK2 | CLK3 | CLK4 | CLK5 | CLK6 | CLK7 | |
---|---|---|---|---|---|---|---|
pipeline i1 | IF(NPC=NPC+4) | ID (branch prediction, branch address calculation) | EX (calculate branch condition) | MEM | WB | ||
pipline i2 | IF (update NPC based on prediction results=NPC+4/branch target) | ID | EX | MEM | WB | ||
pipeline i3 | IF(NPC=NPC+4) | ID(LAST NPC) | EX | MEM | WB |
5. NPC Generator jump target selection
The NPC generator has four optional drop targets:
- PC+4: execute the next instruction by default
- BrT: branch jump address
- JalrT: unconditional jump address, the target address is the value of the register, set the lowest bit to 0, and is calculated during the execution phase
- JalT: unconditional jump address, the target address is calculated in the decoding stage, which is PC+Imm
There are three corresponding enable signals and a branch prediction signal. As long as there are branch and jump enable or predicted jump signals, PC+4 is not used as the lower address. The result verification of the branch calculation and the target address of Jalr cannot be obtained until the execution stage. The target address of the Jal instruction is calculated in the decoding stage, and the control signal is also passed to the NPC generator in the decoding stage. At this time, there may be One of the other two instructions in the execution phase is also determined to jump. The instruction in the execution phase is executed first, so it should jump according to the jump target address of the execution phase. Therefore, if there is an enable signal, BrT and JalrT 's choice takes precedence over JalT's choice.
instruction | |||
---|---|---|---|
BrT | IF | ID | EX: BrE true |
Jal | IF | ID: JalE true |
Here BrT is the case where the prediction is not selected or the prediction is wrong. It is the latest case where the jump is determined. If the prediction jump is correct, the ID will jump without conflict.
6. Harzard unit (additional thinking questions)
Conflict handling
There are three types of conflicts in the pipeline:
- Structural conflict: caused by resource conflict
- Data conflict: conflict caused by adjacent instructions reading and writing the same data object
- Control conflicts: conflicts caused by branch and jump instructions modifying NPC
Using separate instruction memory and data memory, and register reads and writes in the first and second half of a clock cycle avoids structural conflicts, but structural conflicts may still exist. When an instruction cache miss occurs, the next instruction cannot enter the value phase. When a data cache miss occurs, the next instruction cannot enter the memory access phase. Therefore, when the Harzard unit receives DCacheMiss or ICacheMiss, the pipeline needs to be paused. Need to pause for multiple cycles.
There are three types of data conflicts: RAW, WAR, and WAW. WAW and WAR will not appear in sequentially executed scalar processors. Focus on RAW read-after-write conflicts. A read-after-write conflict occurs when the source operand of the current instruction is the result of a previous instruction, such as the following situation:
c.slli x15 2
addi x14 x8 -16
c.add x15 x14
The updated values of x15 and x14 have not yet been written, and the add instruction will already use these two values during the execution phase. This conflict is resolved through data push forward. When the Harzard unit receives that the RegWrite and RegRead signals are enabled at the same time, and the source register and the write destination register are the same, it selects the result of data forwarding through the forward signal.
There are also RAW conflicts that forward push cannot handle, as follows:
lw x15 -20 x8
addi x14 x15 -16
The lw instruction reads the new value of x15 after the memory access stage, but at this time the add instruction is already in the execution stage and this value is needed. At this time, the pipeline can only be paused. When the Harzard unit receives that MemToRegE is enabled and the rs register and rd register are the same register, it can detect the conflict and send the stall signal to suspend the execution phase and all previous phases, while the memory access and write back phases continue. , you only need to pause for one cycle, the write-back results can be pushed back to the execution cycle, and the pipeline can continue to work.
Control conflicts are caused by branch and jump instructions. When one of BrE, JalE, and JalrE is enabled, the harvest unit detects the branch jump, sends a flush signal, and flushes the part before the execution phase, while accessing the memory phase and writing back The stage continues to work and complete the previous instructions.
prediction miss
If the prediction miss mechanism is used, the next instruction is executed normally. When a jump is required, the last two instructions are invalid and written to the NPC. For IF, ID and EX, the flush signal is true and the pipeline is flushed. No need to pause the pipeline. In the next cycle, the address is retrieved and new instructions are executed.
instruction | ||||||
---|---|---|---|---|---|---|
BrT | IF | ID | EX: BrE true -> flush | |||
Jal | IF | ID: JalE true | ||||
add | IF | |||||
… | ||||||
BrTnext | IF | ID | EX… |