Implementing ping pong buffer based on RAM

Ping pong design ideas

The ping-pong operation is generally used to process the conversion from fast data flow to slow data flow. It is also often used to deal with cross-clock domain problems. The idea of ​​serial-to-parallel conversion is a strange idea of ​​pipeline technology. . As shown in the figure below:
Insert image description here
The problem now is that the input data needs to be transferred from the fast clock domain to the slow clock domain, and how to effectively transmit and process the data. At this time, ping pong operation is used. Now assume that the clock frequency of clock domain 1 is 50MHz and the frequency of clock 2 is 25MHz. Assume that 16-bit and 8-bit data needs to be transmitted within a buffer cycle (actually the storage capacity of BUFFER). Write data to BUFFER1 during the first buffer cycle. In the second buffer cycle, write data to BUFFER2, and at the same time read data to BUFFER1 using the frequency of clock 2, but at this time you will find that after reading the second clock cycle, the data of BUFFER1 is The reading is only half completed (if it is only read according to the original data width). In other words, when writing BUFFER1 in the third buffer cycle, it is found that the data of BUFFER1 has not been read yet. The way to solve this problem is what we call serial to parallel, that is, instead of reading 8-bit data each time, we read 16-bit data each time, so that we can convert BUFFER1's data in the second buffer cycle. After the data reading is completed, BUFFER1 is written and the data of BUFFER2 is read at the same time in the third buffer cycle, and the pipeline is formed over and over again.

RAM IP

The IP of the declared RAM is a dual-port ram with an input data width of 8 bits wide and a depth of 16. The width of the output data is 16 bits wide and the depth is 8. As shown in the following figure:
Insert image description here
Insert image description here

Phase Locked Loop IP

Its role is to input a 50MHz clock and output a 25MHz clock.
Insert image description here
Insert image description here

The design idea of ​​ping pong BUFFER

Write BUFFER

The state machine is used as the control logic to decide whether to write BUFFER1 or BUFFER2.

IDLE idle state
WR_BUF1 Write BUFFER1
WR_BUF2 Write BUFFER2
END_WR end write status

The following code jumps between states:

always @(*) begin
        case(state)
            IDLE:   next_state <= start ? WR_BFR1 : IDLE;
            WR_BFR1:next_state <= wr_buf1_end ? WR_BFR2 : WR_BFR1;
            WR_BFR2:next_state <= wr_buf2_end ? END_WR : WR_BFR2;
            END_WR: next_state <= IDLE;
        endcase
    end

Read BUFFER

The state machine is used as the control logic to decide whether to read BUFFER1 or BUFFER2.

IDLE idle state
RD_BUF1 Read BUFFER1
RD_BUF2 Read BUFFER2
END_RD end read status
The jump logic is as follows:
always @(*) begin
        case(read_state)
            IDLE:   rd_nxt_state <= wr_buf1_end ? RD_BFR1 : IDLE;
            RD_BFR1:rd_nxt_state <= rd_buf1_end ? RD_BUBBLE : RD_BFR1;
            RD_BUBBLE:rd_nxt_state <= RD_BFR2;
            RD_BFR2:rd_nxt_state <= rd_buf2_end ? END_RD : RD_BFR2;
            END_RD: rd_nxt_state <= IDLE;
        endcase
    end

All code implemented (detailed comments)

//date:2022/9/7
//function:实现一个ram的pingpongbuffer
//输入数据流的时钟是50MHz
//输出数据流(处理数据流)的时钟是25MHz
//采用vivado的IP生成的ram
module pingpong_buffer(
    input   wire                clk         ,
    input   wire                rst_n       ,
    input   wire    [7:0]       data_in     ,
    input   wire                start       ,

    output  wire    [15:0]      data_out    ,
    output  wire                locked       
);
    parameter   IDLE    =   3'd0    ,       //空闲状态
                WR_BFR1 =   3'd1    ,       //写BUF1状态
                WR_BFR2 =   3'd2    ,       //写BUF2状态
                END_WR  =   3'd3    ,       //写BUF结束状态
                RD_BFR1 =   3'd4    ,       //读BUF1状态
                RD_BFR2 =   3'd5    ,       //读BUF2状态
                END_RD  =   3'd6    ,       //结束读状态
                RD_BUBBLE=  3'd7    ;       //读气泡

    wire                clk_25mhz   ;
    //wire                locked      ;
    wire                asso_rst_n  ;
    wire                wr_buf1_end ;   
    wire                wr_buf2_end ;
    wire                rd_buf1_end ;
    wire                rd_buf2_end ;
    wire    [15:0]      data_o_buf1 ;
    wire    [15:0]      data_o_buf2 ;
    

    reg                 wea         ;       //ram写使能信号
    reg     [2:0]       state       ;
    reg     [2:0]       next_state  ;
    reg     [3:0]       wr_buf1_addr;       //写ram1地址
    reg     [3:0]       wr_buf2_addr;       //写ram2地址
    reg     [3:0]       read_state  ;
    reg     [3:0]       rd_nxt_state;
    reg     [2:0]       rd_buf1_addr;       //读ram1地址
    reg     [2:0]       rd_buf2_addr;       //读ram1地址
    reg                 wr_end      ;       //写结束
    reg                 rd_end      ;       //读结束

    assign asso_rst_n = rst_n && locked;   //生成一个新的复位信号

    always @(posedge clk or negedge asso_rst_n) begin
        if(!asso_rst_n) begin
            state <= IDLE;
        end
        else begin
            state <= next_state;
        end
    end

    always @(posedge clk_25mhz or negedge asso_rst_n) begin
        if(!asso_rst_n) begin
            read_state <= IDLE;
        end
        else begin
            read_state <= rd_nxt_state;
        end
    end
    
    always @(posedge clk or negedge asso_rst_n) begin
        if(~asso_rst_n) begin
            wea <= 1'b0;
        end
        else if (next_state == WR_BFR1) begin
            wea <= 1'b1;
        end
        else if (next_state == WR_BFR2) begin
            wea <= 1'b0;
        end
        else begin
            wea <= 1'b0;
        end
    end
    always @(posedge clk or negedge asso_rst_n) begin
        if(!asso_rst_n) begin
            wr_buf1_addr <= 4'd0;
            wr_buf2_addr <= 4'd0;
            wr_end <= 1'b0;
        end
        else begin
            case(state)
                IDLE:   begin
                    wr_buf1_addr <= 4'd0;
                    wr_buf2_addr <= 4'd0;
                    wr_end <= 1'b0;
                end
                WR_BFR1: begin
                    wr_buf1_addr <= (wr_buf1_addr == 4'd15) ? 4'd0 : wr_buf1_addr + 1'b1;
                    wr_buf2_addr <= 4'd0;
                end
                WR_BFR2: begin
                    wr_buf1_addr <= 4'd0;
                    wr_buf2_addr <= (wr_buf2_addr == 4'd15) ? 4'd0 : wr_buf2_addr + 1'b1;
                end
                END_WR: begin
                    wr_buf1_addr <= 4'd0;
                    wr_buf2_addr <= 4'd0;
                    wr_end <= 1'b1;
                end
                default: begin
                    wr_buf1_addr <= 4'd0;
                    wr_buf2_addr <= 4'd0;
                    wr_end <= 1'b0;
                end
            endcase
        end
    end
    always @(*) begin
        case(state)
            IDLE:   next_state <= start ? WR_BFR1 : IDLE;
            WR_BFR1:next_state <= wr_buf1_end ? WR_BFR2 : WR_BFR1;
            WR_BFR2:next_state <= wr_buf2_end ? END_WR : WR_BFR2;
            END_WR: next_state <= IDLE;
        endcase
    end

    assign wr_buf1_end = (wr_buf1_addr == 4'd15) ? 1'b1 : 1'b0;
    assign wr_buf2_end = (wr_buf2_addr == 4'd15) ? 1'b1 : 1'b0;
    assign rd_buf1_end = (rd_buf1_addr == 3'd7) ? 1'b1 : 1'b0;
    assign rd_buf2_end = (rd_buf2_addr == 3'd7) ? 1'b1 : 1'b0;
    always @(posedge clk_25mhz or negedge asso_rst_n) begin
        if(!asso_rst_n) begin
            rd_buf1_addr <= 3'd0;
            rd_buf2_addr <= 3'd0;
            rd_end <= 1'b0;
        end
        else begin
            case(read_state)
                IDLE:   begin
                    rd_buf1_addr <= 3'd0;
                    rd_buf2_addr <= 3'd0;
                    rd_end <= 1'b0;
                end
                RD_BFR1: begin
                    rd_buf1_addr <= (rd_buf1_addr == 3'd7) ? 3'd0 : rd_buf1_addr + 1'b1;
                    rd_buf2_addr <= 3'd0;
                end
                RD_BFR2:begin
                    rd_buf1_addr <= 3'd0;
                    rd_buf2_addr <= (rd_buf2_addr == 3'd7) ? 3'd0 : rd_buf2_addr + 1'b1;
                end
                END_RD:begin
                    rd_buf1_addr <= 3'd0;
                    rd_buf2_addr <= 3'd0;
                    rd_end <= 1'b1;
                end
                default: begin
                    rd_buf1_addr <= 3'd0;
                    rd_buf2_addr <= 3'd0;
                    rd_end <= 1'b0;
                end
            endcase
        end
    end

    always @(*) begin
        case(read_state)
            IDLE:   rd_nxt_state <= wr_buf1_end ? RD_BFR1 : IDLE;
            RD_BFR1:rd_nxt_state <= rd_buf1_end ? RD_BUBBLE : RD_BFR1;
            RD_BUBBLE:rd_nxt_state <= RD_BFR2;
            RD_BFR2:rd_nxt_state <= rd_buf2_end ? END_RD : RD_BFR2;
            END_RD: rd_nxt_state <= IDLE;
        endcase
    end
    //调用时钟锁相环的ip
    clk_pll_v1   u_clk_pll (
        .clk_in1    (clk)       ,
        .reset      (~rst_n)    ,

        .locked     (locked)    ,
        .clk_out1   (clk_25mhz)
    );
    //调用ram的ip
    blk_mem_gen_0 buffer1(
        .clka   (clk)           ,
        .clkb   (clk_25mhz)     ,
        .wea    (wea)           ,
        .addra  (wr_buf1_addr)  ,
        .addrb  (rd_buf1_addr)  ,
        .dina   (data_in)       ,
        .doutb  (data_o_buf1)
    );
    blk_mem_gen_0 buffer2(
        .clka   (clk)           ,
        .clkb   (clk_25mhz)     ,
        .wea    (~wea)          ,
        .addra  (wr_buf2_addr)  ,
        .addrb  (rd_buf2_addr)  ,
        .dina   (data_in)       ,
        .doutb  (data_o_buf2)
    );

    assign data_out = ((read_state == RD_BFR1 && rd_buf1_addr >= 1) || read_state == RD_BUBBLE ) ? data_o_buf1 : (read_state == RD_BFR2) ? data_o_buf2 : 16'd0;

endmodule

The testbench is as follows:

`timescale 1ns/1ns
`define CLK_CYCLE 20
module pingpong_tb();

    reg             clk         ;
    reg             rst_n       ;
    reg     [7:0]   data_in     ;
    reg             start       ;

    wire    [15:0]  data_out    ;
    wire            locked      ;

    pingpong_buffer u_pingpong_buffer(
    .   clk        (clk),
    .   rst_n      (rst_n),
    .   data_in    (data_in),
    .   start      (start),

    .   data_out  (data_out),
    .   locked     (locked)  
);

    initial begin
        clk = 0;
        rst_n = 0;
        data_in = 8'd0;
        start = 1'b0;
        #30
        rst_n = 1;
        #40
        @(posedge locked);
        start = 1'b1;
        #20
        start = 1'b0;
        repeat(50) begin
            data_in = $random() % 256;
            @(posedge clk);
        end
    end
    
    always # (`CLK_CYCLE / 2) clk = ~clk;

endmodule

Simulation waveform (it can be seen that the result is correct):
Insert image description here

Summarize

Knowing the idea of ​​ping-pong design, I am more proficient in the design of state machines. Keep up the good work!

Guess you like

Origin blog.csdn.net/weixin_45614076/article/details/126771064