Ping pong design ideas
The ping-pong operation is generally used to process the conversion from fast data flow to slow data flow. It is also often used to deal with cross-clock domain problems. The idea of serial-to-parallel conversion is a strange idea of pipeline technology. . As shown in the figure below:
The problem now is that the input data needs to be transferred from the fast clock domain to the slow clock domain, and how to effectively transmit and process the data. At this time, ping pong operation is used. Now assume that the clock frequency of clock domain 1 is 50MHz and the frequency of clock 2 is 25MHz. Assume that 16-bit and 8-bit data needs to be transmitted within a buffer cycle (actually the storage capacity of BUFFER). Write data to BUFFER1 during the first buffer cycle. In the second buffer cycle, write data to BUFFER2, and at the same time read data to BUFFER1 using the frequency of clock 2, but at this time you will find that after reading the second clock cycle, the data of BUFFER1 is The reading is only half completed (if it is only read according to the original data width). In other words, when writing BUFFER1 in the third buffer cycle, it is found that the data of BUFFER1 has not been read yet. The way to solve this problem is what we call serial to parallel, that is, instead of reading 8-bit data each time, we read 16-bit data each time, so that we can convert BUFFER1's data in the second buffer cycle. After the data reading is completed, BUFFER1 is written and the data of BUFFER2 is read at the same time in the third buffer cycle, and the pipeline is formed over and over again.
RAM IP
The IP of the declared RAM is a dual-port ram with an input data width of 8 bits wide and a depth of 16. The width of the output data is 16 bits wide and the depth is 8. As shown in the following figure:
Phase Locked Loop IP
Its role is to input a 50MHz clock and output a 25MHz clock.
The design idea of ping pong BUFFER
Write BUFFER
The state machine is used as the control logic to decide whether to write BUFFER1 or BUFFER2.
IDLE | idle state |
---|---|
WR_BUF1 | Write BUFFER1 |
WR_BUF2 | Write BUFFER2 |
END_WR | end write status |
The following code jumps between states:
always @(*) begin
case(state)
IDLE: next_state <= start ? WR_BFR1 : IDLE;
WR_BFR1:next_state <= wr_buf1_end ? WR_BFR2 : WR_BFR1;
WR_BFR2:next_state <= wr_buf2_end ? END_WR : WR_BFR2;
END_WR: next_state <= IDLE;
endcase
end
Read BUFFER
The state machine is used as the control logic to decide whether to read BUFFER1 or BUFFER2.
IDLE | idle state |
---|---|
RD_BUF1 | Read BUFFER1 |
RD_BUF2 | Read BUFFER2 |
END_RD | end read status |
The jump logic is as follows: |
always @(*) begin
case(read_state)
IDLE: rd_nxt_state <= wr_buf1_end ? RD_BFR1 : IDLE;
RD_BFR1:rd_nxt_state <= rd_buf1_end ? RD_BUBBLE : RD_BFR1;
RD_BUBBLE:rd_nxt_state <= RD_BFR2;
RD_BFR2:rd_nxt_state <= rd_buf2_end ? END_RD : RD_BFR2;
END_RD: rd_nxt_state <= IDLE;
endcase
end
All code implemented (detailed comments)
//date:2022/9/7
//function:实现一个ram的pingpongbuffer
//输入数据流的时钟是50MHz
//输出数据流(处理数据流)的时钟是25MHz
//采用vivado的IP生成的ram
module pingpong_buffer(
input wire clk ,
input wire rst_n ,
input wire [7:0] data_in ,
input wire start ,
output wire [15:0] data_out ,
output wire locked
);
parameter IDLE = 3'd0 , //空闲状态
WR_BFR1 = 3'd1 , //写BUF1状态
WR_BFR2 = 3'd2 , //写BUF2状态
END_WR = 3'd3 , //写BUF结束状态
RD_BFR1 = 3'd4 , //读BUF1状态
RD_BFR2 = 3'd5 , //读BUF2状态
END_RD = 3'd6 , //结束读状态
RD_BUBBLE= 3'd7 ; //读气泡
wire clk_25mhz ;
//wire locked ;
wire asso_rst_n ;
wire wr_buf1_end ;
wire wr_buf2_end ;
wire rd_buf1_end ;
wire rd_buf2_end ;
wire [15:0] data_o_buf1 ;
wire [15:0] data_o_buf2 ;
reg wea ; //ram写使能信号
reg [2:0] state ;
reg [2:0] next_state ;
reg [3:0] wr_buf1_addr; //写ram1地址
reg [3:0] wr_buf2_addr; //写ram2地址
reg [3:0] read_state ;
reg [3:0] rd_nxt_state;
reg [2:0] rd_buf1_addr; //读ram1地址
reg [2:0] rd_buf2_addr; //读ram1地址
reg wr_end ; //写结束
reg rd_end ; //读结束
assign asso_rst_n = rst_n && locked; //生成一个新的复位信号
always @(posedge clk or negedge asso_rst_n) begin
if(!asso_rst_n) begin
state <= IDLE;
end
else begin
state <= next_state;
end
end
always @(posedge clk_25mhz or negedge asso_rst_n) begin
if(!asso_rst_n) begin
read_state <= IDLE;
end
else begin
read_state <= rd_nxt_state;
end
end
always @(posedge clk or negedge asso_rst_n) begin
if(~asso_rst_n) begin
wea <= 1'b0;
end
else if (next_state == WR_BFR1) begin
wea <= 1'b1;
end
else if (next_state == WR_BFR2) begin
wea <= 1'b0;
end
else begin
wea <= 1'b0;
end
end
always @(posedge clk or negedge asso_rst_n) begin
if(!asso_rst_n) begin
wr_buf1_addr <= 4'd0;
wr_buf2_addr <= 4'd0;
wr_end <= 1'b0;
end
else begin
case(state)
IDLE: begin
wr_buf1_addr <= 4'd0;
wr_buf2_addr <= 4'd0;
wr_end <= 1'b0;
end
WR_BFR1: begin
wr_buf1_addr <= (wr_buf1_addr == 4'd15) ? 4'd0 : wr_buf1_addr + 1'b1;
wr_buf2_addr <= 4'd0;
end
WR_BFR2: begin
wr_buf1_addr <= 4'd0;
wr_buf2_addr <= (wr_buf2_addr == 4'd15) ? 4'd0 : wr_buf2_addr + 1'b1;
end
END_WR: begin
wr_buf1_addr <= 4'd0;
wr_buf2_addr <= 4'd0;
wr_end <= 1'b1;
end
default: begin
wr_buf1_addr <= 4'd0;
wr_buf2_addr <= 4'd0;
wr_end <= 1'b0;
end
endcase
end
end
always @(*) begin
case(state)
IDLE: next_state <= start ? WR_BFR1 : IDLE;
WR_BFR1:next_state <= wr_buf1_end ? WR_BFR2 : WR_BFR1;
WR_BFR2:next_state <= wr_buf2_end ? END_WR : WR_BFR2;
END_WR: next_state <= IDLE;
endcase
end
assign wr_buf1_end = (wr_buf1_addr == 4'd15) ? 1'b1 : 1'b0;
assign wr_buf2_end = (wr_buf2_addr == 4'd15) ? 1'b1 : 1'b0;
assign rd_buf1_end = (rd_buf1_addr == 3'd7) ? 1'b1 : 1'b0;
assign rd_buf2_end = (rd_buf2_addr == 3'd7) ? 1'b1 : 1'b0;
always @(posedge clk_25mhz or negedge asso_rst_n) begin
if(!asso_rst_n) begin
rd_buf1_addr <= 3'd0;
rd_buf2_addr <= 3'd0;
rd_end <= 1'b0;
end
else begin
case(read_state)
IDLE: begin
rd_buf1_addr <= 3'd0;
rd_buf2_addr <= 3'd0;
rd_end <= 1'b0;
end
RD_BFR1: begin
rd_buf1_addr <= (rd_buf1_addr == 3'd7) ? 3'd0 : rd_buf1_addr + 1'b1;
rd_buf2_addr <= 3'd0;
end
RD_BFR2:begin
rd_buf1_addr <= 3'd0;
rd_buf2_addr <= (rd_buf2_addr == 3'd7) ? 3'd0 : rd_buf2_addr + 1'b1;
end
END_RD:begin
rd_buf1_addr <= 3'd0;
rd_buf2_addr <= 3'd0;
rd_end <= 1'b1;
end
default: begin
rd_buf1_addr <= 3'd0;
rd_buf2_addr <= 3'd0;
rd_end <= 1'b0;
end
endcase
end
end
always @(*) begin
case(read_state)
IDLE: rd_nxt_state <= wr_buf1_end ? RD_BFR1 : IDLE;
RD_BFR1:rd_nxt_state <= rd_buf1_end ? RD_BUBBLE : RD_BFR1;
RD_BUBBLE:rd_nxt_state <= RD_BFR2;
RD_BFR2:rd_nxt_state <= rd_buf2_end ? END_RD : RD_BFR2;
END_RD: rd_nxt_state <= IDLE;
endcase
end
//调用时钟锁相环的ip
clk_pll_v1 u_clk_pll (
.clk_in1 (clk) ,
.reset (~rst_n) ,
.locked (locked) ,
.clk_out1 (clk_25mhz)
);
//调用ram的ip
blk_mem_gen_0 buffer1(
.clka (clk) ,
.clkb (clk_25mhz) ,
.wea (wea) ,
.addra (wr_buf1_addr) ,
.addrb (rd_buf1_addr) ,
.dina (data_in) ,
.doutb (data_o_buf1)
);
blk_mem_gen_0 buffer2(
.clka (clk) ,
.clkb (clk_25mhz) ,
.wea (~wea) ,
.addra (wr_buf2_addr) ,
.addrb (rd_buf2_addr) ,
.dina (data_in) ,
.doutb (data_o_buf2)
);
assign data_out = ((read_state == RD_BFR1 && rd_buf1_addr >= 1) || read_state == RD_BUBBLE ) ? data_o_buf1 : (read_state == RD_BFR2) ? data_o_buf2 : 16'd0;
endmodule
The testbench is as follows:
`timescale 1ns/1ns
`define CLK_CYCLE 20
module pingpong_tb();
reg clk ;
reg rst_n ;
reg [7:0] data_in ;
reg start ;
wire [15:0] data_out ;
wire locked ;
pingpong_buffer u_pingpong_buffer(
. clk (clk),
. rst_n (rst_n),
. data_in (data_in),
. start (start),
. data_out (data_out),
. locked (locked)
);
initial begin
clk = 0;
rst_n = 0;
data_in = 8'd0;
start = 1'b0;
#30
rst_n = 1;
#40
@(posedge locked);
start = 1'b1;
#20
start = 1'b0;
repeat(50) begin
data_in = $random() % 256;
@(posedge clk);
end
end
always # (`CLK_CYCLE / 2) clk = ~clk;
endmodule
Simulation waveform (it can be seen that the result is correct):
Summarize
Knowing the idea of ping-pong design, I am more proficient in the design of state machines. Keep up the good work!