content
DDR Sampling Brief
In the previous analysis of SDR sampling, that is, unilateral sampling, and then introduce DDR sampling, that is, double-sided sampling. In experimental applications, DDR sampling is also very extensive, such as CMOS, DRAM, ADC, Gigabit Ethernet, etc. It is a DDR interface, so it is also necessary to analyze whether the timing is correct and learn how to implement timing constraints.
In SDR, two timing models are introduced, one is a timing model with PLL, and the other is a timing model without PLL. The same is true in DDR, where both models exist.
The same is the upstream device and the downstream device, the downstream device is an FPGA, and the upstream device can be an Ethernet interface or an ADC. The delay that exists is also shown in the figure.
During analysis, the default PCB data path delay Td_bd and PCB clock path delay Tc_bd are consistent, so when analyzing the clock and data states arriving at the FPGA pins, you only need to know the clock and data states of the pins in the upstream device , the clock and data of the FPGA can be constrained, and the phase relationship between the clock and the data can be analyzed.
The first model (without PLL)
First analyze the first timing model, the timing model without the PLL.
In SDR, the rising edge is used as the transmission edge, the next rising edge is used as the sampling edge, and the transmission edge of the next data is used at the same time. There are maximum and minimum values at the rising edge time.
In DDR, the rising edge is used as the transmission edge, the next falling edge in the same cycle is used as the sampling edge, and at the same time, it is used as the transmission edge of the next data. Another difference from SDR is that DDR has not only the maximum and minimum values at the rising edge, but also the maximum and minimum values at the falling edge. Because both rising and falling edges are used as transmit and sample edges. At the same time, it can also be found that the maximum and minimum ranges of the rising edge and the maximum and minimum ranges of the falling edges are not necessarily the same.
This timing model does not add PLL, and uses edge sampling, that is to say, the data sampled on the current falling edge is transmitted from the previous rising edge, and the data sampled on the next rising edge is transmitted from the previous falling edge. of. Therefore, the relationship between the emission edge and the sampling edge can be obtained. The two can be said to be complementary to each other. If the emission edge is a rising edge, the sampling edge is a falling edge; if the emission edge is a falling edge, then the sampling edge is a rising edge. along. You can look at it this way later when analyzing timing in Vivado.
Actual operation
Take an actual example for analysis, the following is a manual of a CMOS device from Sony, in which its output includes SDR and DDR, and its SDR sampling mode has been analyzed and practiced before, this time it will look at its DDR sampling mode.
The clock frequency is 54Mhz, that is, the period is 18.519ns, and the half period is 9.259ns. It can be seen in the parameter table that the Max skew is 2ns. Therefore, it can be seen from the figure that the data when the first falling edge is used as the sampling edge is emitted by the previous rising edge. At this time, the arrow 1 in the figure points to The moment is the minimum value of the input delay, and the moment of arrow 2 is the maximum value of the input delay; similarly, when the first falling edge is used as the transmit edge to transmit data, the next rising edge is used as the sampling edge, and the input delay at this time is The maximum and minimum values are the times indicated by arrows 3 and 4, respectively.
Summarize constraints
Rise Max = T/2 + skew_afe = 9.259ns + 2ns = 11.259ns
Rise Min = T/2 - skew_afe = 9.259ns - 2ns = 7.259ns
Fall Max = T/2 + skew_afe = 9.259ns + 2ns = 11.259ns
Fall Min = T/2 - skew_afe = 9.259ns - 2ns = 7.259ns
practical engineering
Use the previous project to continue to do DDR constraints
top level code
module top_ioddr(
input wire rx_clk,
input wire rx_ctrl,
input wire [3:0] rx_dat,
//tx
output wire tx_clk,
output wire [3:0] tx_d,
output wire tx_dv,
input wire sdrclk,
input wire [3:0] sdrdata,
input wire sdrden,
input wire sysclk,
output reg tout
);
wire rst;
wire rx_clk_90;
wire rx_en;
wire [7:0] rx_data;
reg tx_en1,tx_en2;
reg [7:0] tx_data1,tx_data2;
wire sdrclk1;
assign rst = 0;
assign sdrclk1 = sdrclk;
always @(posedge rx_clk_90 or posedge rst) begin
if (rst == 1'b1) begin
tx_data1 <= 'd0;
end
else if (rx_en == 1'b1) begin
tx_data1 <= rx_data+ rx_data -1;
end
end
always @(posedge rx_clk_90 or posedge rst) begin
if (rst == 1'b1) begin
tx_data2 <= 'd0;
end
else if (tx_en1 == 1'b1) begin
tx_data2 <= tx_data1+ tx_data1 -5;
end
end
always @(posedge rx_clk_90 ) begin
tx_en1 <= rx_en;
end
always @(posedge rx_clk_90 ) begin
tx_en2 <= tx_en1;
end
iddr_ctrl inst_iddr_ctrl
(
.rx_clk_90 (rx_clk_90),
.rst (rst),
.rx_dat (rx_dat),
.rx_ctrl (rx_ctrl),
.rx_en (rx_en),
.rx_data (rx_data)
);
oddr_ctrl inst_oddr_ctrl
(
.sclk (rx_clk_90),
.tx_dat (tx_data2),
.tx_en (tx_en2),
.tx_c (rx_clk_90),
.tx_data (tx_d),
.tx_dv (tx_dv),
.tx_clk (tx_clk)
);
//sdr clock domain
reg [3:0] sdrdata_r1,sdrdata_r2;
reg sdrden_r1,sdrden_r2;
always @(posedge sdrclk1 ) begin
{sdrdata_r2,sdrdata_r1} <= {sdrdata_r1,sdrdata};
end
always @(posedge sdrclk1 ) begin
{sdrden_r2,sdrden_r1} <= {sdrden_r1,sdrden};
end
always @(posedge sdrclk1) begin
if(sdrden_r2 == 1'b1) begin
tout <= (&sdrdata_r1)|(&sdrdata_r2);
end
else begin
tout <= (^sdrdata_r2);
end
end
endmodule
The code of other modules is the same as the previous project, so it will not be added here.
Layout and route the project, and open the layout design after completion
After opening, click Edit Timing Constraints
clock constraints
input delay constraint
Here, four input delays are constrained, namely max and min of the rising edge and max and min of the falling edge.
The constraints are as follows:
Rise Max = T/2 + skew_afe = 9.259ns + 2ns = 11.259ns
Rise Min = T/2 - skew_afe = 9.259ns - 2ns = 7.259ns
Fall Max = T/2 + skew_afe = 9.259ns + 2ns = 11.259ns
Fall Min = T/2 - skew_afe = 9.259ns - 2ns = 7.259ns
Maximum value of rising edge
Minimum value of rising edge
Maximum value of falling edge
Notice! When constraining the falling edge, the parameter indicated by the arrow in the figure needs to be checked, indicating whether this constraint will cover the previous constraint on the rising edge, because this experiment is DDR double-edge sampling, the rising and falling edges are both The transmission edge is also used as the sampling edge, so when defining the falling edge constraint, it is necessary to do the operation that does not cover the previous constraint, but the previous SDR does not need it.
The minimum value of the falling edge
The same need to do not cover processing
This edge-aligned timing model does not have a PLL, which prompts the routing tool to increase the clock routing delay as much as possible, that is, move the clock as far as possible to the right in the figure, so that the setup time meets the requirements.
The XDC constraint file at this time is
set_property IOSTANDARD LVCMOS33 [get_ports rx_clk]
set_property PACKAGE_PIN J19 [get_ports rx_clk]
set_property PACKAGE_PIN H22 [get_ports rx_ctrl]
set_property IOSTANDARD LVCMOS33 [get_ports rx_ctrl]
set_property PACKAGE_PIN K22 [get_ports {rx_dat[0]}]
set_property PACKAGE_PIN K21 [get_ports {rx_dat[1]}]
set_property PACKAGE_PIN J22 [get_ports {rx_dat[2]}]
set_property PACKAGE_PIN J20 [get_ports {rx_dat[3]}]
set_property IOSTANDARD LVCMOS33 [get_ports {rx_dat[3]}]
set_property IOSTANDARD LVCMOS33 [get_ports {rx_dat[2]}]
set_property IOSTANDARD LVCMOS33 [get_ports {rx_dat[1]}]
set_property IOSTANDARD LVCMOS33 [get_ports {rx_dat[0]}]
set_property PACKAGE_PIN M18 [get_ports tx_dv]
set_property IOSTANDARD LVCMOS33 [get_ports tx_dv]
set_property PACKAGE_PIN K18 [get_ports tx_clk]
set_property IOSTANDARD LVCMOS33 [get_ports tx_clk]
set_property PACKAGE_PIN M22 [get_ports {tx_d[0]}]
set_property PACKAGE_PIN L18 [get_ports {tx_d[1]}]
set_property PACKAGE_PIN L19 [get_ports {tx_d[2]}]
set_property PACKAGE_PIN L20 [get_ports {tx_d[3]}]
set_property IOSTANDARD LVCMOS33 [get_ports {tx_d[3]}]
set_property IOSTANDARD LVCMOS33 [get_ports {tx_d[2]}]
set_property IOSTANDARD LVCMOS33 [get_ports {tx_d[1]}]
set_property IOSTANDARD LVCMOS33 [get_ports {tx_d[0]}]
set_property PACKAGE_PIN W19 [get_ports sdrclk]
set_property PACKAGE_PIN Y22 [get_ports sdrden]
set_property PACKAGE_PIN V20 [get_ports {sdrdata[0]}]
set_property PACKAGE_PIN U20 [get_ports {sdrdata[1]}]
set_property PACKAGE_PIN AB22 [get_ports {sdrdata[2]}]
set_property PACKAGE_PIN AB21 [get_ports {sdrdata[3]}]
set_property IOSTANDARD LVCMOS33 [get_ports sdrclk]
set_property IOSTANDARD LVCMOS33 [get_ports {sdrdata[*]}]
set_property IOSTANDARD LVCMOS33 [get_ports sdrden]
set_property PACKAGE_PIN Y21 [get_ports tout]
set_property IOSTANDARD LVCMOS33 [get_ports tout]
set_property PACKAGE_PIN Y18 [get_ports sysclk]
set_property IOSTANDARD LVCMOS33 [get_ports sysclk]
create_clock -period 18.518 -name sdrclk -waveform {0.000 9.259} [get_ports sdrclk]
set_input_delay -clock [get_clocks *] -rise -max 2.000 [get_ports {
{sdrdata[0]} {sdrdata[1]} {sdrdata[2]} {sdrdata[3]} sdrden}]
set_input_delay -clock [get_clocks *] -rise -min -2.000 [get_ports {
{sdrdata[0]} {sdrdata[1]} {sdrdata[2]} {sdrdata[3]} sdrden}]
create_clock -period 18.518 -name rx_clk -waveform {0.000 9.259} [get_ports rx_clk]
set_input_delay -clock [get_clocks rx_clk] -rise -max 11.259 [get_ports {rx_ctrl {rx_dat[0]} {rx_dat[1]} {rx_dat[2]} {rx_dat[3]}}]
set_input_delay -clock [get_clocks rx_clk] -rise -min 7.259 [get_ports {rx_ctrl {rx_dat[0]} {rx_dat[1]} {rx_dat[2]} {rx_dat[3]}}]
set_input_delay -clock [get_clocks rx_clk] -clock_fall -fall -max -add_delay 11.259 [get_ports {rx_ctrl {rx_dat[0]} {rx_dat[1]} {rx_dat[2]} {rx_dat[3]}}]
set_input_delay -clock [get_clocks rx_clk] -clock_fall -fall -min -add_delay 7.259 [get_ports {rx_ctrl {rx_dat[0]} {rx_dat[1]} {rx_dat[2]} {rx_dat[3]}}]
Re-layout and route the project, then open the routing design, and you will see a timing violation.
View Timing Reports
report timingSee the specific timing report
Profiling against rx_clk
Choose 100 for number of paths per group and 10 for number of paths per endpoint, and choose as large as possible.
number of paths per group : Indicates how many groups of time series report analysis are displayed, starting from the worst
number of paths per endpoint : Indicates how many emission sampling relationships are selected, such as rising edge emission, falling edge sampling, which is a relationship, and falling edge emission, and rising edge sampling, which is also a relationship. It is set to 10 here, but there are actually no 10 types, so all types are displayed.
It can be seen that the establishment time is violated, and the hold time is normal
Click on path1 to view the timing report
The data time arrives at 11.714ns
The actual arrival time of the clock is 10.727ns
The arrival time of the data is longer than the arrival time of the clock, indicating that the wiring of the clock is too short, and the data arrives later than the clock, so the clock cannot collect the data, resulting in a violation of the setup time. You should try to extend the clock routing as long as possible.
Solution
The constraints in the previous input delay only express the relationship between clock and data, and tell the timing constraint tool what kind of data clock relationship needs to help us achieve, but the tool may not be able to do it. to extend the clock delay to meet timing requirements.
Add primitives
Click Primitives in the left menu bar
Choose from the following options
Add the primitive content in the top-level file, and comment the code content of assign rx_clk_90 = rx_clk.
Primitive interpretation
There is an explanation for each sentence in the primitive. The purpose of this primitive is very simple, which is to increase the delay in the clock. You can see that in the primitive, the input clock rx_clk passes through IDATAIN first and then a delay for a period of time. , and then send it from DATAOUT to rx_clk_90 as the output clock. The delay time can be set by the parameter IDELAY_VALUE, which is set to 18 here first.
IDELAYCTRL IDELAYCTRL_inst (
.RDY(RDY), // 1-bit output: Ready output
.REFCLK(sysclk), // 1-bit input: Reference clock input
.RST(1'b0) // 1-bit input: Active high reset input
);
IDELAYE2 #(
.CINVCTRL_SEL("FALSE"), // Enable dynamic clock inversion (FALSE, TRUE)
.DELAY_SRC("IDATAIN"), // Delay input (IDATAIN, DATAIN)
.HIGH_PERFORMANCE_MODE("FALSE"), // Reduced jitter ("TRUE"), Reduced power ("FALSE")
.IDELAY_TYPE("FIXED"), // FIXED, VARIABLE, VAR_LOAD, VAR_LOAD_PIPE
.IDELAY_VALUE(18), // Input delay tap setting (0-31)
.PIPE_SEL("FALSE"), // Select pipelined mode, FALSE, TRUE
.REFCLK_FREQUENCY(200.0), // IDELAYCTRL clock input frequency in MHz (190.0-210.0, 290.0-310.0).
.SIGNAL_PATTERN("CLOCK") // DATA, CLOCK input signal
)
IDELAYE2_inst_dv (
.CNTVALUEOUT(), // 5-bit output: Counter value output
.DATAOUT(rx_clk_90), // 1-bit output: Delayed data output
.C(1'b0), // 1-bit input: Clock input
.CE(1'b0), // 1-bit input: Active high enable increment/decrement input
.CINVCTRL(1'b0), // 1-bit input: Dynamic clock inversion input
.CNTVALUEIN(5'd0), // 5-bit input: Counter value input
.DATAIN(1'b0), // 1-bit input: Internal delay data input
.IDATAIN(rx_clk), // 1-bit input: Data input from the I/O
.INC(1'b0), // 1-bit input: Increment / Decrement tap delay input
.LD(1'b0), // 1-bit input: Load IDELAY_VALUE input
.LDPIPEEN(1'b0), // 1-bit input: Enable PIPELINE register to load data input
.REGRST(1'b0) // 1-bit input: Active-high reset tap-delay input
);
View Timing Reports
Re-place and route, then reload and report timing to view the timing report, the operation is the same as before.
You can see that the timing has returned to normal. Click on the path101 to view the detailed timing report.
Timing Analysis
Taking the setup time analysis as an example, the hold time analysis method is the same.
The actual arrival time of the data is 11.714ns
The actual arrival time of the clock is 12.832ns. Compared with the previous one, it can be seen that the arrival time of the clock has increased significantly. The increased part is the part where the primitive was just added, IDELAY2 in the red box, the delay of this part. time has 1.556ns.
According to the calculation formula of the settling time, it can be concluded that the settling time margin is 1.119ns, which meets the requirements of the timing sequence.
Summarize
When analyzing the first model, that is, the model without PLL, if the timing constraint tool cannot meet our requirements according to the constraints we set, it is necessary to manually change the code, such as adding primitives to increase Large clock or data delays to meet timing requirements.
Past series of blogs