verilog instance - pipeline (Pipeline)

1. Introduction to the assembly line

concept

The so-called pipeline design actually divides the combinational logic circuit with large scale and many levels into several stages, and inserts the register group and temporarily stores the intermediate data at each stage.

The K-level pipeline has exactly K register sets from the input to the output of the combinational logic (divided into K levels, each level has a register set), and the output of the upper level is the input of the next level without feedback. circuit.

In essence, the pipeline can be understood as a means of trading area for performance (Trade Area for Performance) and space for time (Trade Space for Timing)

The performance improvement of the pipeline design is at the cost of consuming more register resources. Pipelining is the most common means of increasing the processing speed and throughput of combinational logic designs.

A brief introduction to the MIPS five-stage pipeline

Please add a picture description

The life cycle of an instruction in this pipeline is divided into the following steps.
(1) IF (Instruction Fetch)

  • Instruction fetching refers to the process of reading instructions from memory.

(2) Decoding ID (Instruction Decode)

  • Instruction decoding refers to the process of translating instructions fetched from memory. After decoding, the operand register index required by the instruction can be obtained, and this index can be used to read the operand from the general register file (Register File, Regfile).

(3) Execute EX (Instruction Execute)

  • After the instruction is decoded, the types of calculations that need to be performed have been known, and the required operands have been read from the general-purpose register group, then the instruction is executed next. Instruction execution refers to the process of performing actual calculations on instructions. For example, if the instruction is an addition instruction, the operand is added; if it is a subtraction instruction, the operand is subtracted.
  • The most common component in the "execution" stage is the arithmetic logic unit operator (Arithmetic Logical Unit, ALU), as a hardware functional unit that implements specific operations.

(4) Fetch MEM (Memory Access)

  • Memory access instructions are often one of the most important instruction types in an instruction set. Memory access refers to a process in which a memory access instruction reads data from or writes data into the memory.

(5) Write back to WB (Write-Back)

  • Writeback refers to the process of writing the results of instruction execution back to the general-purpose register bank. If it is an ordinary operation instruction, the result value comes from the result calculated in the "execution" stage; if it is a memory read instruction, the result comes from the data read from the memory in the "memory access" stage.
    insert image description here

2. The role of Pipeline

  1. improved performance
  2. optimized timing
  3. Improve throughput

Notes: state machines are the opposite of

insert image description here

3. The depth of the pipeline

The number of subtasks divided by the main task becomes the pipeline depth .
The greater the depth, the smaller each processing unit is, the less hardware logic can be accommodated in each stage of the pipeline, and the less time it takes for each unit to complete subtasks.

  • Less hardware logic between the two stages of registers (each stage of the pipeline consists of registers), means that it can run to a higher frequency. The higher the main frequency also means the higher the throughput rate of the pipeline, thus the higher the performance , which is the positive meaning of deepening the pipeline .
  • Since each stage of the pipeline is composed of registers, more pipeline stages consume more registers and more area overhead . This is the negative meaning of pipeline deepening .
  • Since each stage of the pipeline needs to shake hands , the backpressure signal of the last stage of the pipeline may crosstalk all the way to the first stage, causing serious timing problems. Some more advanced techniques need to be used to solve such backpressure timing problems. This is the negative meaning of pipeline deepening .
  • There is another problem with the deep processor pipeline, that is, because it is impossible to know whether the result of the conditional jump is to jump or not to jump in the instruction fetch stage of the pipeline , it can only be predicted , and it can only be passed at the end of the pipeline. The operation knows whether the branch should really jump or not. If it is found that the real result (for example, the jump) does not match the previously predicted result (for example, no jump is predicted), it means that the prediction fails, and all prefetched error instruction streams need to be discarded. Refetch the correct instruction stream , this process is called "Pipeline Flush". Although the branch predictor can be used to ensure that the early branch prediction is as accurate as possible, it cannot be foolproof. Then, the deeper the pipeline, it means that the predicted Taking more error instruction streams, you need to discard them all and restart them, which not only wastes power consumption, but also causes performance loss. The deeper the pipeline, the more serious the waste and loss; the shallower the pipeline, the waste and less loss. This is another major negative of pipeline deepening .

Different depths of the pipeline have their own advantages and disadvantages, and need to be reasonably selected according to different application backgrounds.

4. Backpressure in the pipeline

The deeper the pipeline, because each stage of the pipeline needs to shake hands, the back-pressure signal of the last stage of the pipeline may crosstalk all the way to the first stage, causing serious back-pressure (Back-pressure) timing problems. Some more advanced techniques are needed to address these timing issues.

  • Cancel handshake: This method can prevent the occurrence of back pressure, and the timing performance is very good. But canceling the handshake means that each stage in the pipeline does not handshake with the next stage, which may cause functional errors or loss of instructions. Therefore, this method often needs to cooperate with other mechanisms, such as re-execution (Replay), reserving a large cache, and so on.
  • Add ping-pong buffer: Adding ping-pong buffer (Ping-pong Buffer) is a method of exchanging area for timing, and it is also the easiest way to solve back pressure . By using the ping-pong cache (with two entries> to replace the ordinary first-level pipeline (with only one entry), the handshake reception signal of the upper-level pipeline from this level of pipeline can only focus on whether there is more than one free in the ping-pong cache. Table entries are enough, and there is no need to crosstalk the received handshake signal of the next level to the upper level.
  • Adding forward bypass buffer: Adding forward bypass buffer (Forward Bypass Buffer) is also a method of exchanging area for timing, which is a very clever way to solve back pressure. The bypass cache only has one entry. With the addition of this additional cache entry, the timing path of the backward handshake signal can be cut off, but the forward path is not affected, so it can be widely used in the handshake interface.

5. Conflicts in the pipeline

Another problem in the pipeline design of the processor is the conflict (Hazards) in the pipeline, which is mainly divided into resource conflict and data conflict .

(a) Resource Conflicts

The resource conflict refers to the conflict of hardware resources in the pipeline, the most common is the conflict of the operation unit , for example, the divider needs multiple clock cycles to complete the operation. Therefore, if the new division instruction also needs the divider before the previous division instruction completes the operation, there will be a resource conflict.

(b) Data Conflicts

Data conflicts refer to conflicts caused by data dependencies among operands between different instructions . Common data dependencies are as follows.

  • WAR (Write-After-Read) dependency, also known as read-after-write dependency: indicates "result register index that needs to be written back by the instruction executed in the subsequent order" and "source operand register that needs to be read in the instruction executed in the previous order" Index" data dependencies caused by the same. Therefore, theoretically speaking, in the pipeline, the "sequential instruction" must not be executed before the "pre-sequence instruction" that has WAR correlation with it, otherwise the "sequence instruction" will first write back the result to the general-purpose register group, " When the operand is read by "preorder instruction", the wrong value will be read.
  • WAW (Write-After-Write) correlation, also known as write-after-write correlation: means "result register index that needs to be written back for instructions executed in subsequent order" and "result register index that needs to be written for instruction executed in previous order" Data dependencies caused by the same. Therefore, theoretically speaking, in the pipeline, the "post-order instruction" must not be executed before the "pre-order instruction" that has WAW correlation with it, otherwise the "post-order instruction" will first write back the result to the general-purpose register group, " "pre-order instruction" and then write the result back to the general-purpose register bank to overwrite it.
  • RAW (Read-After-Write) correlation, also known as write-after-read correlation: indicates "the index of the source operand register that needs to be read by the instruction executed in the subsequent order" and "the result register that the instruction executed in the previous order needs to write back Index" data dependencies caused by the same. Therefore, in theory, in the pipeline, the "subsequent instruction" must not be executed before its RAW-related pre-order instruction, otherwise the "subsequent instruction" will read back the wrong source operation from the general-purpose register group. number.

Among the above three kinds of correlations, RAW belongs to true data correlation.

4. Example of pipeline design

(1) Pipeline adder

insert image description here
insert image description here

/*----------------------------------------------------------
Filename				:	adder_pipelined 
Author					:	deilt   
Description				:	two 32bits_adder to 64bits adder 
Called by				:	
Revision History		:	10/25/2022
                            Revison 1.0
Email					:	[email protected]
Company:Deilt Technology.INC
Copyright(c) 1999, Deilt Technology Inc, All right reserved
--------------------------------------------------------------*/
module adder_pipelined
#(
    parameter           DATA_WITCH          =   64  ,
    parameter           HALF_DATA_WITCH     =   32
)
(   
    input                              clk  ,
    input                              rstn ,
    input   [DATA_WITCH-1:0]           a    ,
    input   [DATA_WITCH-1:0]           b    ,
    output  [DATA_WITCH:0  ]           out 
);

    wire    [HALF_DATA_WITCH:0  ]      add1     ;
    wire    [HALF_DATA_WITCH:0  ]      add2     ;
    reg     [HALF_DATA_WITCH:0  ]      add1_d1  ;
    reg     [HALF_DATA_WITCH:0  ]      add1_d2  ;
    reg     [HALF_DATA_WITCH:0  ]      add2_d1  ;    
    reg     [HALF_DATA_WITCH-1:0]      a_63_32  ;
    reg     [HALF_DATA_WITCH-1:0]      b_63_32  ;
    wire                               add1_carry_d1    ;


    assign add1 = a[HALF_DATA_WITCH-1:0] + b[HALF_DATA_WITCH-1:0]   ;
    assign add1_carry_d1 = add1_d1[HALF_DATA_WITCH]   ;

    assign add2 = a_63_32 + b_63_32 + add1_carry_d1 ;

    assign out = {
    
    add2_d1,add1_d2}  ;


    always @(posedge clk or negedge rstn)begin
        if(!rstn)begin
            add1_d1 <= 0  ;
            add2_d1 <= 0  ;
            a_63_32 <= 0  ;
            b_63_32 <= 0  ;
            add1_d2 <= 0  ;
        end
        else
            add1_d1 <= add1   ;
            add1_d2 <= add1_d1;
            add2_d1 <= add2   ;
            a_63_32 <= a[DATA_WITCH-1:HALF_DATA_WITCH]  ;
            b_63_32 <= b[DATA_WITCH-1:HALF_DATA_WITCH]  ;
    end


endmodule

(2) Parallel adder

insert image description here
insert image description here

/*----------------------------------------------------------
Filename				:   adder_parallel	
Author					:	deilt
Description				:	
Called by				:	
Revision History		:	10/26/2022
                            Revison 1.0
Email					:	[email protected]
Company:Deilt Technology.INC
Copyright(c) 1999, Deilt Technology Inc, All right reserved
--------------------------------------------------------------*/
module adder_parallel
#(
    parameter           DATA_WITCH = 64
)
(
    input                                clk         ,
    input                                rstn        ,
    input [DATA_WITCH-1:0]               a           ,
    input [DATA_WITCH-1:0]               b           ,
    input                                alternate   ,//low choice adder1,high choice high adder2
    output[DATA_WITCH:0]                 Finalsum    
);
    reg [DATA_WITCH-1:0]                a_d1            ;
    reg [DATA_WITCH-1:0]                b_d1            ;
    
    reg [DATA_WITCH-1:0]                a_alte_d1       ;
    reg [DATA_WITCH-1:0]                b_alte_d1       ;
    
    wire [DATA_WITCH:0]                 sum1            ;
    wire [DATA_WITCH:0]                 sum2            ;
    reg [DATA_WITCH:0]                  sum1_d1         ;
    reg [DATA_WITCH:0]                  sum2_d1         ;


    //数据输入且alternate为低电平时,选择add1
    //数据输入且alternate为高电平时,选择add2
    //输出时,alternate为高电平选择sum1,其他选择sum2
    always @(posedge clk or negedge rstn)begin
        if(!rstn)begin
            a_d1 <= 0   ;
            b_d1 <= 0   ;
            a_alte_d1 <= 0  ;
            b_alte_d1 <= 0  ;
        end
        else if(alternate)begin
            a_d1 <= a   ;
            b_d1 <= b   ;
            a_alte_d1 <= a_alte_d1  ;
            b_alte_d1 <= b_alte_d1  ;
        end
        else if(!alternate)begin            
            a_d1 <= a_d1   ;
            b_d1 <= b_d1   ;
            a_alte_d1 <= a  ;
            b_alte_d1 <= b  ;   
        end         
    end

    //add1
    assign sum1 = a_d1 + b_d1   ;
    //add2
    assign sum2 = a_alte_d1 + b_alte_d1 ;

    //sum delay
    always @(posedge clk or negedge rstn)begin
        if(!rstn)begin
            sum1_d1 <= 0    ;
            sum2_d1 <= 0    ;
        end
        else
            sum1_d1 <= sum1 ;
            sum2_d1 <= sum2 ;
    end 

    //choice 
    assign Finalsum = alternate ? sum2_d1 : sum1_d1 ;

endmodule

Guess you like

Origin blog.csdn.net/qq_70829439/article/details/127604560