Computer Architecture Experiment 1

1. Experimental purpose

Understand the data flow and control signals of RISC-V instruction execution, and be familiar with the working process of the instruction pipeline.

2. Experimental process

1.RISC-V related instructions

The simulator of the experiment uses the RISC-V instruction set. In order to facilitate subsequent analysis, first learn the RISC-V instructions used in the experiment.

Basic RISC-V uses 32-bit fixed-length instructions. But the standard RISC-V encoding mode supports variable-length instructions and can use 16-bit instructions. These instructions are called standard compressed instructions and are named C. These instructions are often common operations that can reduce static and dynamic code size. All the following instructions of the form "C.inst" belong to this type of instructions.

The assembly code in the experiment mainly includes the following types of instructions:

Control transfer instructions: cj, c.jr, bge
Arithmetic and logical operation instructions: c.slli, c.addi
Integer generation instructions: c.li
Memory access instructions: lw, sw

transfer of control instruction

The cj instruction is an unconditional jump instruction. It uses PC relative jump. The PC is added to the offset to form the jump target address. It is equivalent to the Jal instruction. After the jump, the value of PC+4 will be written to the rd register. For cj, rd=x0, and x0 cannot be written, which is equivalent to this instruction only completing the jump. The c.jr instruction executes an unconditional control jump to take out the value of rs1 and sets the lowest bit to 0 as the jump address, which is equivalent to the Jalr instruction. It also writes the value of PC+4 to the rd register after the jump. But rd is x0. The bge instruction compares the values of the rs1 and rs2 registers as signed numbers. If rs1 is greater than or equal to rs2, it jumps to the target address.

Insert image description here

cj instruction

Insert image description here

c.jr instruction

Insert image description here

bge command

Arithmetic and logical operation instructions

The c.slli instruction performs a logical left shift operation on the value in rd, and writes the result to rd. The shamt field in the instruction is the number of shifts. c.Addi, add and other instructions are all addition instructions, only the operands are different.

Insert image description here

c.slli

Integer generation instructions

The c.li instruction writes the sign-extended 6-bit immediate value to register rd, and is only valid when rd!=x0.

Insert image description here

memory access instructions

The lw instruction reads the value with rs1+offset as the address from the memory and stores it in the rd register. The sw instruction writes the value of rs2 into the memory, and the address is rs1+offset.

Insert image description here

2. Assembly code analysis

By analyzing the code given in the experiment and converting it to assembly code, we can find the assembly instruction parts of the two loops.

The part of the first loop is as follows:

1014c:        fe042623        sw x0 -20 x8		#x8-20 : i 
10150:        a005            c.j 32			#-> 10170
10152:        fec42783        lw x15 -20 x8
10156:        078a            c.slli x15 2		#x15 = 4i
10158:        ff040713        addi x14 x8 -16	
1015c:        97ba            c.add x15 x14		
1015e:        fec42703        lw x14 -20 x8		
10162:        e6e7a623        sw x14 -404 x15	#A[i] = i
#i++
10166:        fec42783        lw x15 -20 x8
1016a:        0785            c.addi x15 1
1016c:        fef42623        sw x15 -20 x8
#if(i<100)
10170:        fec42703        lw x14 -20 x8		
10174:        06300793        addi x15 x0 99
10178:        fce7dde3        bge x15 x14 -38	#i<99 -> 10152

The part of the second loop is as follows:

1017c:        4785            c.li x15 1		
1017e:        fef42623        sw x15 -20 x8		#x8-20: i
10182:        a80d            c.j 50			#-> 101B4
10184:        fec42783        lw x15 -20 x8
10188:        17fd            c.addi x15 -1
1018a:        078a            c.slli x15 2
1018c:        ff040713        addi x14 x8 -16
10190:        97ba            c.add x15 x14
10192:        e6c7a783        lw x15 -404 x15	#x15: A[i-1]
10196:        3e878713        addi x14 x15 1000
1019a:        fec42783        lw x15 -20 x8
1019e:        078a            c.slli x15 2
101a0:        ff040693        addi x13 x8 -16
101a4:        97b6            c.add x15 x13
101a6:        e6e7a623        sw x14 -404 x15	#A[i] = A[i-1]+1000
#i++
101aa:        fec42783        lw x15 -20 x8
101ae:        0785            c.addi x15 1
101b0:        fef42623        sw x15 -20 x8
#if(i<100)
101b4:        fec42703        lw x14 -20 x8
101b8:        06300793        addi x15 x0 99
101bc:        fce7d4e3        bge x15 x14 -56	#i<99 -> 10184

3.RISC-V circuit analysis

The circuit design diagram of RISC-V is as follows:

Insert image description here

Address phase

The circuit is analyzed stage by stage, starting with the addressing stage. The NPC Generator in the value phase generates the address of the next instruction, and the address comes from the address calculation of Jal jump, the address calculation of Jalr jump, the address calculation of branch, and the next instruction is selected through the selection signal of the control unit and branch selector. the address of. Using the PC value generated by the NPC as the address, the value is fetched from the instruction memory and passed to the decoding stage. The PCF of NPC is connected to branch prediction. For conditional branches such as BGE, if the prediction is selected, there is no need to wait until the execution stage to judge the condition. The jump address can be calculated directly in the ID stage and written to the NPC.

Insert image description here

decoding stage

The work completed in the decoding stage is to generate control signals and read operands. The main work units are the register group and the immediate value operation unit. RegWriteW, WD3 and A3 of the register group are the signals of the write-back stage. These signals are passed from the decoding stage to the write-back stage. The immediate value operation unit completes the expansion of the immediate value and passes it to the execution stage, where it can directly complete the calculation of the Jal jump target address and pass it to the NPC generator. For the unconditional jump to Jal, the control unit can issue control at this stage. Signal JalD, jump is made.

Insert image description here

Execution phase

The execution phase completes three tasks: selecting operands and completing ALU operations; branch selection; and calculation of the target address of the Jalr instruction. The sources of operands include register values read in the ID stage, data forward results in the EX stage and MEM stage, and immediate numbers. All control signals are generated by the control unit in the decoding stage and then passed to various components in the execution stage.

The branch selector compares the values of the two registers according to the branch type (may come from the value phase, or may come from the execution and memory access phase), chooses whether to perform a branch jump, and sends the branch signal to the NPC generator. Since the branch selection is in This stage is carried out, and the branch target address is also passed to the NPC generator at this stage. It is the BrNPC output by EX, and the value is the immediate number obtained in the decoding stage. If the dynamic prediction is not selected at the beginning, but the branch calculation result is selected again, the prediction is wrong, and the NPC jump target address is written in the EX stage.

Insert image description here

The Alu operation selects the operand from the register value read in the data forward and decoding stages. The control signal Forward is issued by the Harzard unit. If the destination register currently written (previous/previous instruction) is the same as the source register of the current operation , select the results of the memory access and write back phases. The AluSrc signal selects the operand as register/PC/immediate data, and AluContrl gives the calculation type. If it is a Jalr instruction, set the lowest bit of the value of rs1 to 0 and pass it to the NPC generator through AluOut to jump.

Insert image description here

Access phase

The main work of the memory access phase is to read and write the memory, and also complete the transfer of the results of the execution phase. The write enable signal is passed from the decoding stage.

Insert image description here

For arithmetic instructions, the results of the execution phase are directly passed to the write-back phase and written into the rd register. For instructions such as jal and jr, PC+4 will be written into the rd register. The control unit sends the LoadNpc control signal when decoding. The choice between the two.

Insert image description here

write back phase

The write-back stage selects between the memory access result and the ALU operation/PC+4 result, and transmits it to the register group. MemToReg is the selection signal, which is generated by the control unit and passed to the write-back stage.

Insert image description here

control unit

The control unit is as follows:

Insert image description here

The functions of some of these signals are as follows:

RegWriteD: Write enable signal of register group
MemToRegD: Whether the data written to the register group comes from memory
LoadNpcD: Whether to calculate PC+4 and store it in the rd register
RegReadD: Whether to read the register and pass it to the Harvard unit
BranchTypeD: Indicates the branch type
AluSrc1D, AluSrc2D: The source selection signal of the ALU operand. The optional sources of Alusrc1D include register value and PC value; the optional sources of Alusrc2D include register value and immediate data.
AluContrlD: ALU specific operation selection signal
ImmType: Type of immediate data (length)

4. Execution process of instructions

The execution process of a.add x15, x14, x15
The instruction is located at:

10188:        17fd            c.addi x15 -1
1018a:        078a            c.slli x15 2
1018c:        ff040713        addi x14 x8 -16
10190:        97ba            c.add x15 x14

In the instruction fetch stage, the PC value is 10190, the instruction at this address is read, and the instruction is transferred to the ID stage. Key control signals:

JalrE, JalE, BrE: are all in a non-enabled state because there are no branches or jumps in the first few instructions.

In the decoding stage, the value of A1 is 0xE, which represents the x14 register, and the value of A2 is 0xF, which represents the x15 register. Read the values of these two registers from the register group and pass the values to the EX stage. At the same time, the control signal corresponding to the add instruction is also passed to the EX stage. At this time, the write control signal of the register group is the write signal of the previous instruction. For the add instruction, there are no important control signals at this stage.

In the execution phase, the register value enters the selector from RegOut1E and RegOut2E. Since the operand x14 is the result of the previous instruction, the value pushed forward in the memory access phase of the previous instruction is selected; the operand x15 is the result of the previous instruction. Select Write back the results of the stage push. AluContrlD selects the addition operation, adding the two operands. Key control signals:

Forward1E: Select the value pushed forward in the memory access stage as the register read value
ForWard2E: Select the value pushed forward in the writeback stage as the register read value
AluSrc1E, AluSrc2E: both select the value of the register
AluContrlD: ADDOP signal, ALU performs addition operation

In the memory access stage, since there is no need to access memory, the MemWriteM signal is in a non-enabled state. The data written is the result of the ALU operation, not the value of PC+4. The key signals are:

LoadNPCM: Disabled, selects the ALU operation result to be passed to the writeback stage

In the write-back phase, MemToRegW is not enabled. The selector selects the result of the ALU operation and outputs it to the WD3 interface of the register group. A3 of the register group is the address (0xF) of the write register. The RegWriteW signal is also passed down and connected to the register group. And for write enable, write the addition result of the add instruction to the x15 register. Key signals:

RegWriteW: enable state
MemToRegW: Disabled state, select ALU operation result

The data path of the entire process is:

Insert image description here

b.bge x15, x14, -56 instruction execution process

This instruction and the previous two instructions are:

101b4:        fec42703        lw x14 -20 x8
101b8:        06300793        addi x15 x0 99
101bc:        fce7d4e3        bge x15 x14 -56

The key signals in the index fetching stage are:

JalrE, JalE, BrE: are all in a non-enabled state because there are no branches or jumps in the first few instructions.

In the decoding stage, the value of A1 is 0xF, which represents the x15 register, and the value of A2 is 0xE, which represents the x14 register. Read the values of these two registers from the register set, pass the values to the EX stage, and calculate the value of PC-56 to pass to the EX stage and also pass to the NPC as the branch target address. At this time, PCF predicts whether to jump. If a jump is made, the NPC is written.

In the execution phase, the values of x14 and x15 are updated by the first two instructions, so the values in the memory access and write back phases are selected, the value of the operand selection register, and the branch selector compares the two values to determine whether a jump is required. The key signals are:

Forward1E: Select the value pushed forward in the memory access stage as the register read value
ForWard2E: Select the value pushed forward in the writeback stage as the register read value
AluSrc1E, AluSrc2E: both select the value of the register
BrType, BrE: bge type branch, compare the values of the two operands, if op1>=op2, BrE is enabled, the jump should be made, otherwise it should not jump

If a jump is predicted and the branch condition is met and a jump occurs, the pipeline will be flushed (the harvest component generates a flush signal), the instruction fetch result is the jump target, and the next instruction of the bge instruction is invalid. If a jump is predicted and the branch condition is not met, the prediction is wrong and the instruction must be fetched again in the next cycle. If the prediction does not jump and the condition is not met, continue with the normal instruction. If the prediction does not jump but the conditions are met, the prediction is wrong and the jump target must be written to the NPC in the EX stage. The following assumes that the prediction does not jump but the prediction is wrong, and the NPC is written in the EX stage.

In the memory access stage, there is no need to access memory. The MemWriteM signal is in a non-enabled state. This instruction does not write to the register. The key signals are:

LoadNPCM: No writing, so selecting any data has no effect. It is regarded as non-enabled by default. The ALU result is passed
during the write-back stage. There is no need to write data. RegWriteW is not enabled. Key signals:
RegWriteW: non-enabled state, no writing
MemToRegW: no effect

The data path of the entire process is (no jump is predicted but the prediction is wrong, the jump address is written in the EX stage):

Insert image description here

c.lw x15, -20 x8 execution process

This instruction and the previous two instructions are:

101a4:        97b6            c.add x15 x13
101a6:        e6e7a623        sw x14 -404 x15	#A[i] = A[i-1]+1000
101aa:        fec42783        lw x15 -20 x8

The key signals in the index fetching stage are:

JalrE, JalE, BrE: are all in a non-enabled state because there are no branches or jumps in the first few instructions.

In the decoding stage, the value of A1 is 0x8, which represents the x8 register. The value of this register is read from the register group and the value is passed to the EX stage. The other operand is an immediate value, which is passed to the EX stage through the immediate unit, which has no important control signals.

In the execution phase, operand 1 selects the value of x8 read from the register, operand 2 selects the value of the immediate value, the operation type is addition, and the result is passed to the memory access phase. Key signals:

Forward1E: Select the value read from the register
ForWard2E: No impact
AluSrc1E, AluSrc2E: OP1 selects the register value, OP2 selects the immediate value
AluContrl: ADDOP signal, addition operation

In the memory access stage, the result of the ALU operation is used as an address to read the value from the memory and pass it to the write-back stage. Key signals:

LoadNPCM: no impact, reads data from memory
MemWriteM: Disabled, read data

In the write-back phase, the memory access result is transferred to the register group and written to register x15. Key signals:

RegWriteW: enable status, write to register
MemToRegW: Enabled state, select the memory access result

The data path of the entire process is:

Insert image description here

d.sw x15 -20 x8 instruction execution process

This instruction and the previous two instructions are:

101aa:        fec42783        lw x15 -20 x8
101ae:        0785            c.addi x15 1
101b0:        fef42623        sw x15 -20 x8

The key signals in the index fetching stage are:

JalrE, JalE, BrE: are all in a non-enabled state because there are no branches or jumps in the first few instructions.

In the decoding stage, the value of A1 is 0x8, read the x8 register, the value of A2 is 0XF, read the value of the x15 register, the other operand is an immediate number, the process of reading and transferring is the same as the lw instruction. There are no important control signals.

In the execution phase, the storage address needs to be calculated. The first two instructions have not written to x8, so the value of x8 and the immediate value read in the decoding phase are selected as operands for operation. The key signals are:

Forward1E: Select the value read from the register
ForWard2E: No impact
AluSrc1E, AluSrc2E: OP1 selects the register value, OP2 selects the immediate value
AluContrl: ADDOP signal, addition operation

In the memory access stage, it is necessary to write to the memory. The value written is the data written back by the previous instruction, and the address is the result of the ALU calculation. Key signals:

LoadNPCM: no impact
MemWriteM: write enable, write data

The writeback phase does not require writing to the register. Key signals:

RegWriteW: non-enabled state, no need to write to the register

Its data path is:

Insert image description here

The blue one is the result of the previous instruction in the writeback stage, which is passed to StoreDataM as write data. The data source of StoreDataM is not shown in the original circuit diagram. It should be the same as the operand selection in the execution stage. You can choose the value read from the register. or the calculation result of a previous instruction.

4.The role of BranchE signal

The function of the BranchE signal is to determine whether the branch condition is correct. For branch instructions, the values of rs1 and rs2 need to be compared during the execution phase, and the two values are compared according to the branch type to determine whether a jump is needed. If a jump is needed, the BrE signal will be in the enabled state, and the NPC generator should write the jump The target address of the transfer. For dynamic branch prediction, this value may have been written in the ID stage. The EX stage only verifies that the prediction is correct. If it fails, the instruction must be fetched again to flush the pipeline.

If the prediction is correct and a jump occurs, then the instruction after the jump instruction is invalid, so the pipeline needs to be flushed and the execution of the next instruction is canceled. Therefore, the BrE signal is also passed to the Harzard unit. When a jump is required, it is received The jump signal's Harzard unit will send a flush signal to the pipeline intermediate register, flush the pipeline, cancel the instruction being executed, and re-read the instruction at the jump target address to continue execution.

The prediction is selected and the prediction is correct

	CLK1	CLK2	CLK3	CLK4	CLK5	CLK6	CLK7
pipeline i1	IF(NPC=NPC+4)	ID (branch prediction, branch address calculation)	EX (calculate branch condition)	MEM	WB
pipline i2		IF (update NPC based on prediction results=NPC+4/branch target)	ID	flush	flush	flush
pipeline i3			IF(NPC=PC+4)	ID(branch target)	EX	MEM	WB

Prediction is unchecked and prediction is correct

	CLK1	CLK2	CLK3	CLK4	CLK5	CLK6	CLK7
pipeline i1	IF(NPC=NPC+4)	ID (branch prediction, branch address calculation)	EX (calculate branch condition)	MEM	WB
pipline i2		IF (update NPC based on prediction results=NPC+4/branch target)	ID	EX	MEM	WB
pipeline i3			IF(NPC=NPC+4)	ID(LAST NPC)	EX	MEM	WB

5. NPC Generator jump target selection

The NPC generator has four optional drop targets:

PC+4: execute the next instruction by default
BrT: branch jump address
JalrT: unconditional jump address, the target address is the value of the register, set the lowest bit to 0, and is calculated during the execution phase
JalT: unconditional jump address, the target address is calculated in the decoding stage, which is PC+Imm

There are three corresponding enable signals and a branch prediction signal. As long as there are branch and jump enable or predicted jump signals, PC+4 is not used as the lower address. The result verification of the branch calculation and the target address of Jalr cannot be obtained until the execution stage. The target address of the Jal instruction is calculated in the decoding stage, and the control signal is also passed to the NPC generator in the decoding stage. At this time, there may be One of the other two instructions in the execution phase is also determined to jump. The instruction in the execution phase is executed first, so it should jump according to the jump target address of the execution phase. Therefore, if there is an enable signal, BrT and JalrT 's choice takes precedence over JalT's choice.

instruction
BrT	IF	ID	EX: BrE true
Jal		IF	ID: JalE true

Here BrT is the case where the prediction is not selected or the prediction is wrong. It is the latest case where the jump is determined. If the prediction jump is correct, the ID will jump without conflict.

6. Harzard unit (additional thinking questions)

Conflict handling

There are three types of conflicts in the pipeline:

Structural conflict: caused by resource conflict
Data conflict: conflict caused by adjacent instructions reading and writing the same data object
Control conflicts: conflicts caused by branch and jump instructions modifying NPC

Using separate instruction memory and data memory, and register reads and writes in the first and second half of a clock cycle avoids structural conflicts, but structural conflicts may still exist. When an instruction cache miss occurs, the next instruction cannot enter the value phase. When a data cache miss occurs, the next instruction cannot enter the memory access phase. Therefore, when the Harzard unit receives DCacheMiss or ICacheMiss, the pipeline needs to be paused. Need to pause for multiple cycles.

There are three types of data conflicts: RAW, WAR, and WAW. WAW and WAR will not appear in sequentially executed scalar processors. Focus on RAW read-after-write conflicts. A read-after-write conflict occurs when the source operand of the current instruction is the result of a previous instruction, such as the following situation:

c.slli x15 2
addi x14 x8 -16
c.add x15 x14

The updated values of x15 and x14 have not yet been written, and the add instruction will already use these two values during the execution phase. This conflict is resolved through data push forward. When the Harzard unit receives that the RegWrite and RegRead signals are enabled at the same time, and the source register and the write destination register are the same, it selects the result of data forwarding through the forward signal.

Insert image description here

There are also RAW conflicts that forward push cannot handle, as follows:

lw x15 -20 x8
addi x14 x15 -16

The lw instruction reads the new value of x15 after the memory access stage, but at this time the add instruction is already in the execution stage and this value is needed. At this time, the pipeline can only be paused. When the Harzard unit receives that MemToRegE is enabled and the rs register and rd register are the same register, it can detect the conflict and send the stall signal to suspend the execution phase and all previous phases, while the memory access and write back phases continue. , you only need to pause for one cycle, the write-back results can be pushed back to the execution cycle, and the pipeline can continue to work.

Control conflicts are caused by branch and jump instructions. When one of BrE, JalE, and JalrE is enabled, the harvest unit detects the branch jump, sends a flush signal, and flushes the part before the execution phase, while accessing the memory phase and writing back The stage continues to work and complete the previous instructions.

prediction miss

If the prediction miss mechanism is used, the next instruction is executed normally. When a jump is required, the last two instructions are invalid and written to the NPC. For IF, ID and EX, the flush signal is true and the pipeline is flushed. No need to pause the pipeline. In the next cycle, the address is retrieved and new instructions are executed.

instruction
BrT	IF	ID	EX: BrE true -> flush
Jal		IF	ID: JalE true
add			IF
…
BrTnext				IF	ID	EX…