[Xilinx Vivado Timing Analysis/Constraints Series 10] FPGA Development Timing Analysis/Constraints-FPGA DDR-Direct Interface Input Delay Constraint Optimization Method

content

 

DDR Sampling Brief

The first model (without PLL)

Actual operation

Summarize constraints

practical engineering

top level code

clock constraints

input delay constraint

View Timing Reports

Solution

Add primitives

Primitive interpretation

View Timing Reports

Timing Analysis

Summarize

Past series of blogs


 

DDR Sampling Brief

In the previous analysis of SDR sampling, that is, unilateral sampling, and then introduce DDR sampling, that is, double-sided sampling. In experimental applications, DDR sampling is also very extensive, such as CMOS, DRAM, ADC, Gigabit Ethernet, etc. It is a DDR interface, so it is also necessary to analyze whether the timing is correct and learn how to implement timing constraints.

In SDR, two timing models are introduced, one is a timing model with PLL, and the other is a timing model without PLL. The same is true in DDR, where both models exist.

The same is the upstream device and the downstream device, the downstream device is an FPGA, and the upstream device can be an Ethernet interface or an ADC. The delay that exists is also shown in the figure.

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATGluZXN0LTU=,size_20,color_FFFFFF,t_70,g_se,x_16

During analysis, the default PCB data path delay Td_bd and PCB clock path delay Tc_bd are consistent, so when analyzing the clock and data states arriving at the FPGA pins, you only need to know the clock and data states of the pins in the upstream device , the clock and data of the FPGA can be constrained, and the phase relationship between the clock and the data can be analyzed.

The first model (without PLL)

First analyze the first timing model, the timing model without the PLL.

In SDR, the rising edge is used as the transmission edge, the next rising edge is used as the sampling edge, and the transmission edge of the next data is used at the same time. There are maximum and minimum values ​​at the rising edge time.

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATGluZXN0LTU=,size_20,color_FFFFFF,t_70,g_se,x_16

In DDR, the rising edge is used as the transmission edge, the next falling edge in the same cycle is used as the sampling edge, and at the same time, it is used as the transmission edge of the next data. Another difference from SDR is that DDR has not only the maximum and minimum values ​​at the rising edge, but also the maximum and minimum values ​​at the falling edge. Because both rising and falling edges are used as transmit and sample edges. At the same time, it can also be found that the maximum and minimum ranges of the rising edge and the maximum and minimum ranges of the falling edges are not necessarily the same.

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATGluZXN0LTU=,size_20,color_FFFFFF,t_70,g_se,x_16

This timing model does not add PLL, and uses edge sampling, that is to say, the data sampled on the current falling edge is transmitted from the previous rising edge, and the data sampled on the next rising edge is transmitted from the previous falling edge. of. Therefore, the relationship between the emission edge and the sampling edge can be obtained. The two can be said to be complementary to each other. If the emission edge is a rising edge, the sampling edge is a falling edge; if the emission edge is a falling edge, then the sampling edge is a rising edge. along. You can look at it this way later when analyzing timing in Vivado.

Actual operation

Take an actual example for analysis, the following is a manual of a CMOS device from Sony, in which its output includes SDR and DDR, and its SDR sampling mode has been analyzed and practiced before, this time it will look at its DDR sampling mode.

The clock frequency is 54Mhz, that is, the period is 18.519ns, and the half period is 9.259ns. It can be seen in the parameter table that the Max skew is 2ns. Therefore, it can be seen from the figure that the data when the first falling edge is used as the sampling edge is emitted by the previous rising edge. At this time, the arrow 1 in the figure points to The moment is the minimum value of the input delay, and the moment of arrow 2 is the maximum value of the input delay; similarly, when the first falling edge is used as the transmit edge to transmit data, the next rising edge is used as the sampling edge, and the input delay at this time is The maximum and minimum values ​​are the times indicated by arrows 3 and 4, respectively.

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATGluZXN0LTU=,size_20,color_FFFFFF,t_70,g_se,x_16

Summarize constraints

Rise Max = T/2 + skew_afe = 9.259ns + 2ns = 11.259ns

Rise Min = T/2 - skew_afe = 9.259ns - 2ns = 7.259ns

Fall Max = T/2 + skew_afe = 9.259ns + 2ns = 11.259ns

Fall Min = T/2 - skew_afe = 9.259ns - 2ns = 7.259ns

practical engineering

Use the previous project to continue to do DDR constraints

top level code

module top_ioddr(
    input wire          rx_clk,
    input wire          rx_ctrl,
    input wire  [3:0]   rx_dat,
    //tx
    output  wire        tx_clk,
    output  wire [3:0]  tx_d,
    output  wire        tx_dv,
    input   wire 	sdrclk,
    input   wire [3:0]	sdrdata,
    input   wire 	sdrden,
    input   wire 	sysclk,
    output  reg 	tout 	
	);

wire         rst;
wire         rx_clk_90;
wire         rx_en;
wire  [7:0]  rx_data;

reg          tx_en1,tx_en2;
reg   [7:0]  tx_data1,tx_data2;
wire	     sdrclk1;
assign rst = 0;

assign sdrclk1 = sdrclk;

always @(posedge rx_clk_90 or posedge rst) begin
	if (rst == 1'b1) begin
		tx_data1 <= 'd0;
	end
	else if (rx_en == 1'b1) begin
		tx_data1 <= rx_data+ rx_data -1;
	end
end

always @(posedge rx_clk_90 or posedge rst) begin
	if (rst == 1'b1) begin
		tx_data2 <= 'd0;
	end
	else if (tx_en1 == 1'b1) begin
		tx_data2 <= tx_data1+ tx_data1 -5;
	end
end

always @(posedge rx_clk_90 ) begin
	tx_en1 <= rx_en;
end

always @(posedge rx_clk_90 ) begin
	tx_en2 <= tx_en1;
end

	iddr_ctrl inst_iddr_ctrl
		(
			.rx_clk_90 (rx_clk_90),
			.rst       (rst),
			.rx_dat    (rx_dat),
			.rx_ctrl   (rx_ctrl),
			.rx_en     (rx_en),
			.rx_data   (rx_data)
		);

	oddr_ctrl inst_oddr_ctrl
		(
			.sclk    (rx_clk_90),
			.tx_dat  (tx_data2),
			.tx_en   (tx_en2),
			.tx_c    (rx_clk_90),
			.tx_data (tx_d),
			.tx_dv   (tx_dv),
			.tx_clk  (tx_clk)
		);

//sdr clock domain

reg [3:0] sdrdata_r1,sdrdata_r2;
reg 	sdrden_r1,sdrden_r2;

always @(posedge sdrclk1 ) begin
	{sdrdata_r2,sdrdata_r1} <= {sdrdata_r1,sdrdata};
end

always @(posedge sdrclk1 ) begin
	{sdrden_r2,sdrden_r1} <= {sdrden_r1,sdrden};
end

always @(posedge sdrclk1) begin
	if(sdrden_r2 == 1'b1) begin
		tout <= (&sdrdata_r1)|(&sdrdata_r2);
	end
	else begin
		tout <= (^sdrdata_r2);
	end
end

endmodule

The code of other modules is the same as the previous project, so it will not be added here.

Layout and route the project, and open the layout design after completion

0b917f3d52074a0eb98f0bc9c69ded6b.png

 

After opening, click Edit Timing Constraints

5af55364a88647a5aaf2076bfb9bc18e.png

 

clock constraints

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATGluZXN0LTU=,size_20,color_FFFFFF,t_70,g_se,x_16

 

input delay constraint

Here, four input delays are constrained, namely max and min of the rising edge and max and min of the falling edge.

The constraints are as follows:

Rise Max = T/2 + skew_afe = 9.259ns + 2ns = 11.259ns
Rise Min = T/2 - skew_afe = 9.259ns - 2ns = 7.259ns
Fall Max = T/2 + skew_afe = 9.259ns + 2ns = 11.259ns
Fall Min = T/2 - skew_afe = 9.259ns - 2ns = 7.259ns

Maximum value of rising edge

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATGluZXN0LTU=,size_20,color_FFFFFF,t_70,g_se,x_16

 

Minimum value of rising edge

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATGluZXN0LTU=,size_20,color_FFFFFF,t_70,g_se,x_16

 

Maximum value of falling edge

Notice! When constraining the falling edge, the parameter indicated by the arrow in the figure needs to be checked, indicating whether this constraint will cover the previous constraint on the rising edge, because this experiment is DDR double-edge sampling, the rising and falling edges are both The transmission edge is also used as the sampling edge, so when defining the falling edge constraint, it is necessary to do the operation that does not cover the previous constraint, but the previous SDR does not need it.

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATGluZXN0LTU=,size_20,color_FFFFFF,t_70,g_se,x_16

 

The minimum value of the falling edge

The same need to do not cover processing

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATGluZXN0LTU=,size_20,color_FFFFFF,t_70,g_se,x_16

 

This edge-aligned timing model does not have a PLL, which prompts the routing tool to increase the clock routing delay as much as possible, that is, move the clock as far as possible to the right in the figure, so that the setup time meets the requirements.

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATGluZXN0LTU=,size_20,color_FFFFFF,t_70,g_se,x_16

 

The XDC constraint file at this time is

set_property IOSTANDARD LVCMOS33 [get_ports rx_clk]
set_property PACKAGE_PIN J19 [get_ports rx_clk]
set_property PACKAGE_PIN H22 [get_ports rx_ctrl]
set_property IOSTANDARD LVCMOS33 [get_ports rx_ctrl]
set_property PACKAGE_PIN K22 [get_ports {rx_dat[0]}]
set_property PACKAGE_PIN K21 [get_ports {rx_dat[1]}]
set_property PACKAGE_PIN J22 [get_ports {rx_dat[2]}]
set_property PACKAGE_PIN J20 [get_ports {rx_dat[3]}]
set_property IOSTANDARD LVCMOS33 [get_ports {rx_dat[3]}]
set_property IOSTANDARD LVCMOS33 [get_ports {rx_dat[2]}]
set_property IOSTANDARD LVCMOS33 [get_ports {rx_dat[1]}]
set_property IOSTANDARD LVCMOS33 [get_ports {rx_dat[0]}]
set_property PACKAGE_PIN M18 [get_ports tx_dv]
set_property IOSTANDARD LVCMOS33 [get_ports tx_dv]
set_property PACKAGE_PIN K18 [get_ports tx_clk]
set_property IOSTANDARD LVCMOS33 [get_ports tx_clk]
set_property PACKAGE_PIN M22 [get_ports {tx_d[0]}]
set_property PACKAGE_PIN L18 [get_ports {tx_d[1]}]
set_property PACKAGE_PIN L19 [get_ports {tx_d[2]}]
set_property PACKAGE_PIN L20 [get_ports {tx_d[3]}]
set_property IOSTANDARD LVCMOS33 [get_ports {tx_d[3]}]
set_property IOSTANDARD LVCMOS33 [get_ports {tx_d[2]}]
set_property IOSTANDARD LVCMOS33 [get_ports {tx_d[1]}]
set_property IOSTANDARD LVCMOS33 [get_ports {tx_d[0]}]
set_property PACKAGE_PIN W19 [get_ports sdrclk]
set_property PACKAGE_PIN Y22 [get_ports sdrden]
set_property PACKAGE_PIN V20 [get_ports {sdrdata[0]}]
set_property PACKAGE_PIN U20 [get_ports {sdrdata[1]}]
set_property PACKAGE_PIN AB22 [get_ports {sdrdata[2]}]
set_property PACKAGE_PIN AB21 [get_ports {sdrdata[3]}]
set_property IOSTANDARD LVCMOS33 [get_ports sdrclk]
set_property IOSTANDARD LVCMOS33 [get_ports {sdrdata[*]}]
set_property IOSTANDARD LVCMOS33 [get_ports sdrden]
set_property PACKAGE_PIN Y21 [get_ports tout]
set_property IOSTANDARD LVCMOS33 [get_ports tout]
set_property PACKAGE_PIN Y18 [get_ports sysclk]
set_property IOSTANDARD LVCMOS33 [get_ports sysclk]
create_clock -period 18.518 -name sdrclk -waveform {0.000 9.259} [get_ports sdrclk]
set_input_delay -clock [get_clocks *] -rise -max 2.000 [get_ports {
   
   {sdrdata[0]} {sdrdata[1]} {sdrdata[2]} {sdrdata[3]} sdrden}]
set_input_delay -clock [get_clocks *] -rise -min -2.000 [get_ports {
   
   {sdrdata[0]} {sdrdata[1]} {sdrdata[2]} {sdrdata[3]} sdrden}]
create_clock -period 18.518 -name rx_clk -waveform {0.000 9.259} [get_ports rx_clk]
set_input_delay -clock [get_clocks rx_clk] -rise -max 11.259 [get_ports {rx_ctrl {rx_dat[0]} {rx_dat[1]} {rx_dat[2]} {rx_dat[3]}}]
set_input_delay -clock [get_clocks rx_clk] -rise -min 7.259 [get_ports {rx_ctrl {rx_dat[0]} {rx_dat[1]} {rx_dat[2]} {rx_dat[3]}}]
set_input_delay -clock [get_clocks rx_clk] -clock_fall -fall -max -add_delay 11.259 [get_ports {rx_ctrl {rx_dat[0]} {rx_dat[1]} {rx_dat[2]} {rx_dat[3]}}]
set_input_delay -clock [get_clocks rx_clk] -clock_fall -fall -min -add_delay 7.259 [get_ports {rx_ctrl {rx_dat[0]} {rx_dat[1]} {rx_dat[2]} {rx_dat[3]}}]

Re-layout and route the project, then open the routing design, and you will see a timing violation.

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATGluZXN0LTU=,size_20,color_FFFFFF,t_70,g_se,x_16

 

View Timing Reports

report timingSee the specific timing report

Profiling against rx_clk

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATGluZXN0LTU=,size_20,color_FFFFFF,t_70,g_se,x_16

 

Choose 100 for number of paths per group and 10 for number of paths per endpoint, and choose as large as possible.

number of paths per group : Indicates how many groups of time series report analysis are displayed, starting from the worst

number of paths per endpoint : Indicates how many emission sampling relationships are selected, such as rising edge emission, falling edge sampling, which is a relationship, and falling edge emission, and rising edge sampling, which is also a relationship. It is set to 10 here, but there are actually no 10 types, so all types are displayed.

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATGluZXN0LTU=,size_20,color_FFFFFF,t_70,g_se,x_16

 

It can be seen that the establishment time is violated, and the hold time is normal

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATGluZXN0LTU=,size_20,color_FFFFFF,t_70,g_se,x_16

 

Click on path1 to view the timing report

The data time arrives at 11.714ns

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATGluZXN0LTU=,size_20,color_FFFFFF,t_70,g_se,x_16

 

The actual arrival time of the clock is 10.727ns

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATGluZXN0LTU=,size_20,color_FFFFFF,t_70,g_se,x_16

 

The arrival time of the data is longer than the arrival time of the clock, indicating that the wiring of the clock is too short, and the data arrives later than the clock, so the clock cannot collect the data, resulting in a violation of the setup time. You should try to extend the clock routing as long as possible.

Solution

The constraints in the previous input delay only express the relationship between clock and data, and tell the timing constraint tool what kind of data clock relationship needs to help us achieve, but the tool may not be able to do it. to extend the clock delay to meet timing requirements.

Add primitives

Click Primitives in the left menu bar

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATGluZXN0LTU=,size_12,color_FFFFFF,t_70,g_se,x_16

 

Choose from the following options

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATGluZXN0LTU=,size_20,color_FFFFFF,t_70,g_se,x_16

 

Add the primitive content in the top-level file, and comment the code content of assign rx_clk_90 = rx_clk.

Primitive interpretation

There is an explanation for each sentence in the primitive. The purpose of this primitive is very simple, which is to increase the delay in the clock. You can see that in the primitive, the input clock rx_clk passes through IDATAIN first and then a delay for a period of time. , and then send it from DATAOUT to rx_clk_90 as the output clock. The delay time can be set by the parameter IDELAY_VALUE, which is set to 18 here first.

IDELAYCTRL IDELAYCTRL_inst (
.RDY(RDY),       // 1-bit output: Ready output
.REFCLK(sysclk), // 1-bit input: Reference clock input
.RST(1'b0)        // 1-bit input: Active high reset input
);

IDELAYE2 #(
      .CINVCTRL_SEL("FALSE"),          // Enable dynamic clock inversion (FALSE, TRUE)
      .DELAY_SRC("IDATAIN"),           // Delay input (IDATAIN, DATAIN)
      .HIGH_PERFORMANCE_MODE("FALSE"), // Reduced jitter ("TRUE"), Reduced power ("FALSE")
      .IDELAY_TYPE("FIXED"),           // FIXED, VARIABLE, VAR_LOAD, VAR_LOAD_PIPE
      .IDELAY_VALUE(18),                // Input delay tap setting (0-31)
      .PIPE_SEL("FALSE"),              // Select pipelined mode, FALSE, TRUE
      .REFCLK_FREQUENCY(200.0),        // IDELAYCTRL clock input frequency in MHz (190.0-210.0, 290.0-310.0).
      .SIGNAL_PATTERN("CLOCK")          // DATA, CLOCK input signal
   )
   IDELAYE2_inst_dv (
      .CNTVALUEOUT(), // 5-bit output: Counter value output
      .DATAOUT(rx_clk_90),         // 1-bit output: Delayed data output
      .C(1'b0),                     // 1-bit input: Clock input
      .CE(1'b0),                   // 1-bit input: Active high enable increment/decrement input
      .CINVCTRL(1'b0),       // 1-bit input: Dynamic clock inversion input
      .CNTVALUEIN(5'd0),   // 5-bit input: Counter value input
      .DATAIN(1'b0),           // 1-bit input: Internal delay data input
      .IDATAIN(rx_clk),         // 1-bit input: Data input from the I/O
      .INC(1'b0),                 // 1-bit input: Increment / Decrement tap delay input
      .LD(1'b0),                   // 1-bit input: Load IDELAY_VALUE input
      .LDPIPEEN(1'b0),       // 1-bit input: Enable PIPELINE register to load data input
      .REGRST(1'b0)            // 1-bit input: Active-high reset tap-delay input
   );

View Timing Reports

Re-place and route, then reload and report timing to view the timing report, the operation is the same as before.

You can see that the timing has returned to normal. Click on the path101 to view the detailed timing report.

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATGluZXN0LTU=,size_20,color_FFFFFF,t_70,g_se,x_16

 

Timing Analysis

Taking the setup time analysis as an example, the hold time analysis method is the same.

The actual arrival time of the data is 11.714ns

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATGluZXN0LTU=,size_20,color_FFFFFF,t_70,g_se,x_16

 

The actual arrival time of the clock is 12.832ns. Compared with the previous one, it can be seen that the arrival time of the clock has increased significantly. The increased part is the part where the primitive was just added, IDELAY2 in the red box, the delay of this part. time has 1.556ns.

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATGluZXN0LTU=,size_20,color_FFFFFF,t_70,g_se,x_16

 

According to the calculation formula of the settling time, it can be concluded that the settling time margin is 1.119ns, which meets the requirements of the timing sequence.

Summarize

When analyzing the first model, that is, the model without PLL, if the timing constraint tool cannot meet our requirements according to the constraints we set, it is necessary to manually change the code, such as adding primitives to increase Large clock or data delays to meet timing requirements.

Past series of blogs

 [Xilinx Vivado timing analysis/constraint series 1] FPGA development timing analysis/constraint-inter-register timing analysis

 [Xilinx Vivado Timing Analysis/Constraints Series 2] FPGA Development Timing Analysis/Constraints - Setup Time

 ​​​​​​[Xilinx Vivado timing analysis/constraint series 3] FPGA development timing analysis/constraint-hold time

 [Xilinx Vivado Timing Analysis/Constraints Series 4] FPGA Development Timing Analysis/Constraints-Experimental Engineering Hands-on Operation

 [Xilinx Vivado timing analysis/constraint series 5] FPGA development timing analysis/constraint-IO timing analysis

 [Xilinx Vivado timing analysis/constraint series 6] FPGA development timing analysis/constraint-IO timing input delay

 [Xilinx Vivado Timing Analysis/Constraints Series 7] FPGA Development Timing Analysis/Constraints-FPGA Single-Edge Sampling Data Input Delay Timing Constraint Practice

 [Xilinx Vivado Timing Analysis/Constraints Series 8] FPGA Development Timing Analysis/Constraints-FPGA Data Intermediate Sampling, Edge Sampling PLL Timing Optimization Practice

 [Xilinx Vivado Timing Analysis/Constraints Series 9] FPGA Development Timing Analysis/Constraints-FPGA Single-Edge Data Input Delay Edge Alignment, Different Timing Models Practical Practice

 

Guess you like

Origin blog.csdn.net/m0_61298445/article/details/124049546