[Ten Hard Guides] - 7.4 [Dynamic RAM] DDR4 design overview and analysis and simulation cases

introduction

  As the performance requirements of computers and servers are getting higher and higher, DDR4 has begun to be used in some high-end designs, targeting the SI (signal integrity) part and related Chinese materials. On the other hand, the high rate of DDR4 is very easy to cause SI problems. Once Problems such as DDR4 Margin test failure will cause headaches for many designers. The debugging process is very difficult. Signal testing becomes more and more difficult and inaccurate, and it is difficult to verify. PCB Layout optimization will be done later. The inefficiency of the board verification method also adds a lot of cost.

  In this case, it is much more convenient to use signal simulation methods to analyze and verify problems. Starting from the basic concepts of DDR4, this article introduces the key technologies and some new methods related to DDR4. In addition, it combines an actual DDR4 Margin Fail problem to briefly explain the problem analysis ideas and simulation methods.

1. Analysis of DDR4 key technologies and methods

1.1 Differences between DDR4 and DDR3

  Compared with DDR3, DDR4 first has some changes in appearance. For example, DDR4 designs the lower part of the memory to be slightly protruding in the middle and shortening the edges. The high point in the center and the low points at both ends have a smooth curve transition. This design can Ensure that there is sufficient contact surface between the gold finger and the memory slot to ensure the stability of the memory. In addition, the gold finger design of DDR4 memory has also changed significantly. The fool-proof notch in the middle of the gold finger is closer to the center than that of DDR3. Of course, the most important mission of DDR4 is to increase frequency and bandwidth. Generally speaking, DDR4 has higher performance, better stability and lower power consumption. From an SI perspective, the main points are as follows: Chapters explain the main differences

Insert image description here

1.2 Comparison between POD and SSTL

  POD is the new driving standard of DDR4. The biggest difference is that the terminal voltage of the receiving end is equal to VDDQ, while the terminal voltage of the SSTL receiving end used in DDR3 is VDDQ/2. Doing so reduces parasitic pin capacitance and I/O terminal power consumption, and enables stable operation even with reduced VDD voltages. The equivalent circuit is shown in Figure 1 (DDR4), Figure 2 (DDR3).

Insert image description here
It can be seen that when DRAM is in a low level state, both SSTL and POD have current flowing.

Insert image description here
When the DRAM is in a high-level state, SSTL continues to have current flowing, but POD has the same voltage at both ends, so no current flows. This is also the reason why DDR4 is more power efficient

Insert image description here

1.3 Data Bus Inversion (DBI)

  As described above, according to the characteristics of POD, when the data is high level, no current flows, so one way to reduce DDR4 power consumption is to make as many high levels as possible. This is the core of DBI technology. For example, if in a set of 8-bit signals, at least 5-bits are low level, then all signals are inverted, and at least 5-bit signals are high level. When the DBI signal becomes low, it means that all signals have been flipped (when the DBI signal is high, it means that the original data has not been flipped). In this case, among a set of 9 signals (8 DQ signals and 1 DBI signal), at least five states are high, thus effectively reducing power consumption.

Insert image description here

1.4 ODT control

  In order to improve the signal quality, starting from DDR2, the termination resistors of DQ, DM, DQS/DQS# are built into the Controller and DRAM, which is called ODT (On Die Termination). Clock and ADD/CMD/CTRL signals still require the use of external Termination resistors.

Insert image description here
  In DRAM, the equivalent resistance value of On-Die Termination is set through Mode Register (MR), and the accuracy of ODT is controlled by reference resistor RZQ. DDR4 ODT supports 240, 120, 80, 60, 48, 40, 34 ohms. .

  Different from DDR3, DDR4's ODT has four modes: Data termination disable, RTT_NOM, RTT_WR, and RTT_PARK.

  The Controller can control the RTT status through read and write commands and ODT Pin. RTT_PARK is a newly added option of DDR4. It is generally used in multi-Rank DDR configurations. For example, there are Rank0, Rank1 and Rank2 in a system. When the controller writes to Rank0 When transmitting data, Rank1 and Rank2 can be high impedance (Hi-Z) or relatively weak terminals (240, 120, 80, etc.) at the same time. RTT_Park provides a more flexible terminal method so that Rank1 and Rank2 do not need to It is always in high-impedance mode, allowing DRAM to operate at a higher frequency.

  Generally speaking, the ODT value can be adjusted through the BIOS adjustment register in the Controller, but some Controller manufacturers do not recommend this. Taking Intel as an example, the MRC Code given by Intel has already given the optimized ODT value. In theory, users can obtain other ODT values ​​through simulation and other methods and modify them in the BIOS, but all problems caused by this will be borne by the design manufacturer. The following table is the optimization solution provided by Intel.
Insert image description here

1.5 Reference voltage Vref

  As we all know, DDR signals generally determine whether the signal is high or low by comparing the input signal with another reference signal (Vref). However, in DDR4, a Vref is missing. Let’s take a look at the following two designs. It can be seen that in In the design of DDR4, VREFCA is the same as DDR3. It is generated by using an external voltage dividing resistor or power control chip. However, VREFDQ is not included in the design and is instead generated internally by the chip. This not only saves the design cost, but also increases the cost. Routing space.

      Insert image description here
  The internal VREFDQ of DRAM is adjusted through register (MR6). The main parameters are Voltage range, step size, VREF step time, VREF full step time, as shown in the table below.

Insert image description here
  Every time it is turned on, the DRAM Controller will adjust the VREFDQ of the input data signal at the DRMA end through a series of calibrations, optimizing the timing and voltage margin. In other words, VREFDQ not only depends on VDD, but also depends on the transmission line characteristics and the receiving end chip. The characteristics will be related, so the value of VREFDQ may be different every time Power Up is performed.

  Because of the difference in Vref, there will be differences in Vih/Vil. You can see the difference in Vref by adjusting ODT, and use a simulation example to illustrate. For DDR3, when adjusting the ODT waveform, the waveform will float up and down synchronously, while when adjusting DDR4 OOT, the waveform will only move on one side.

Insert image description here
Insert image description here

1.6 New method of DDR4 Layout Routing

  Among all Layout traces, DDR is undoubtedly the most complex. Not only impedance matching must be considered, but also length matching must be considered. Moreover, there are a large number of data and address lines, and the impact of crosstalk must be considered.

  After the DDR4 data rate increases, the impact of these aspects becomes more serious. Especially now, in order to save costs, many designs require the PCB size and number of layers to be as small as possible, so the requirements for impedance and crosstalk become more severe. Challenging, generally SI engineers and layout engineers will think of various ways to meet these needs, and often have to make compromises, such as trying to make the line width smaller when doing stack-up design, and using thinner lines in the BGA breakout area. Line, etc. However, these methods can only make minor adjustments to the design, but it is difficult to fundamentally solve the problem. A new method recently discovered by Intel research is very interesting, and can balance impedance (line width) and crosstalk (line spacing) well to a certain extent. They are compiled here for your reference.

  Let’s first look at an actual Layout example. The trace between the two red lines adopts a zigzag shape. Yes, this is a new method developed by Intel, and its official name is "Tabbed Routing".

Insert image description here
  The main method of Tabbed Routing is to reduce the line width and increase the raised small blocks (Tabs) in areas where space is tight (generally the BGA area and DIMM slot area), as shown in the figure below.

Insert image description here
  This method can increase the mutual capacitance characteristics between the two lines while keeping their inductance characteristics almost unchanged, and the increased capacitance can effectively control the impedance of each layer and reduce the far-end crosstalk of the outer layer. The simulation results are shown in the figure below

Insert image description here
  It can be seen from the simulation results that this method can indeed achieve a good balance between impedance and far-end crosstalk. Of course, for the size of the Tab, detailed simulation design needs to be done based on the actual PCB. Intel also provides some Tools for reference. Interested readers can refer to more information

2、 DDR4 Simulation

2.1 Pre-Simulation with HyperLynx

  If both the Controller and DRAM have IBIS models, you can use HyperLynx to simulate DDR4 very conveniently. The simulation method is the same as other DDRs. Through Pre-Simulation, you can determine the topology of the entire system and some details, such as Impedance (by Stackup and (determined by line width and line spacing), selection of ODT value, control of Stub length in T-shaped structure, value size of ADD/CMD/CTRL terminal resistor, etc.

2.1.1 ADD/CMD/CTRL terminal resistor value

Assume that the ADD circuit is as follows, working at 2400MTs (Add/CMD is 1.2Gbps), the transmitting end is U16, using Fly-By structure to five groups of DRAM chips, each group of DRAM adopts T structure (in the actual layout, there is one DRAM chip on the Top surface, A DRAM chip on the Bottom), the T-shaped Stub length is 77mil, the terminal resistance is 32 ohms, and the terminal voltage is 0.6V.

Insert image description here
  It can be seen from the simulation results that the waveforms at both ends of the T-shaped structure are almost the same because they are completely symmetrical. For the convenience of observation, only one of the waveforms is shown. From near to far from the Controller, the DRAMs are U5, U4, U3, U2, and U1 respectively. , their eye diagrams are as follows:

Insert image description here
  It can be seen that the chip closer to the Controller has a more "messy" waveform, but the rising edge is very fast, while the chip closer to the terminal resistor has a better waveform, but the rising edge becomes slower. So how can we get the optimized waveform? Let’s scan the value of the terminal resistor to see if it will improve the signal quality. Through the Sweep function of HyperLynx, set the terminal resistor resistance to four values: 27, 33, 39, and 45.

Insert image description here
The eye diagram of U5 (closest to the Controller) is as follows, corresponding to the terminal resistor resistance values ​​​​of 27, 33, 39, and 45 ohms:

Insert image description here
The eye diagram of U4 is as follows, which corresponds to the terminal resistor resistance values ​​of 27, 33, 39, and 45 ohms in sequence: The
Insert image description here
eye diagram of U3 is as follows, which corresponds to the terminal resistor resistance values ​​of 27, 33, 39, and 45 ohms in sequence:
Insert image description here
The eye diagram of U2 is as follows, The corresponding terminal resistor values ​​are 27, 33, 39, and 45 ohms in sequence:
Insert image description here
The eye diagram of U1 is as follows, and the corresponding terminal resistor values ​​are 27, 33, 39, and 45 ohms in sequence: As
Insert image description here
  can be seen from the waveform above, corresponding to each DRAM The third waveform is the best, which means that the optimized waveform can be obtained corresponding to the terminal resistor of 39 ohms.

2.1.2 Length of Data signal Stub

  In general DDR4 design, the Data signal adopts the Pin to Pin design method. However, in some designs, due to PCB space limitations or controller limitations, it is also necessary to adopt a one-to-two design (T-shaped structure). In the author's case, In a design I encountered, I encountered this situation and considered the following two options:

  If a T-shaped topology is used, as shown in Figure 20, the PCB space can be saved as much as possible. However, if only one DIMM0 or DIMM1 is inserted, a longer Stub will appear on the other side, which will affect the signal quality.
  If a daisy chain structure is used, as shown in Figure 21, when only DIMM0 is inserted, there will also be a Stub effect. Moreover, this topology requires length matching between the signal lines between DIMM0 and DIMM1. When DIMM0 and DIMM1 are relatively close, winding will be difficult. And if the distance between DIMM0 and DIMM1 is increased, the stub will become longer and the signal quality cannot be controlled.

  From the perspective of signal integrity, both solutions will have the impact of Stub, but from the perspective of Layout, Solution 1 has certain convenience, and its Stub can be controlled within 500 mil. Therefore, option one was finally selected as the final option. Of course, this design comes at the expense of the signal margin, and the signal rate will be affected to a certain extent. In the author's project, when only one memory is inserted, the signal rate can only reach a maximum of 1866Mb/s.

Insert image description here
  From the perspective of simulation, there are many factors that need to be considered in this kind of simulation, such as the controller model, PCB model, Connector model, and finally the memory module model. Under normal circumstances, the Connector model and memory module model are difficult to obtain, and there are Even if you get it, they are different types of models, and the overall Channel simulation requires more time and energy to complete.

  If time is limited and you need to quickly evaluate the design, you can also use HyperLynx for quick simulation. In the following example, assume that a Conntorller needs to drive two DIMMs or two memory chips, and the system works at 2400Mb/s, TL2 and The length of TL3 can be used to roughly evaluate the length of the PCB stub plus the length of the connector plus the length of the memory module. (This is only used for rough evaluation. If running under time conditions, it is strongly recommended to obtain accurate models of each part for more accurate simulation).

  It can be seen from this simple simulation that the impact of Stub on signal quality is still very obvious, especially when one memory slot is suspended. In the above example, when the Stub reaches 1000 mil, when only one memory slot is inserted In this case, the eye diagram is already very bad, so in actual design, a trade-off needs to be made between design cost and signal rate. In the design I made, due to PCB space limitations, I finally chose to run only 1866Mb/s on a single memory stick.

Insert image description here
When the Stub length is 500mil, the eye diagram of both the two memories and only one memory is as follows:
Insert image description here
When the Stub length is 1000mil, the eye diagram of both memories and only one memory is as follows:
Insert image description here
  When using Intel When the chip is designed as a DDR Controller, the SI Model provided by Intel can provide a relatively complete simulation. The Simulation Deck provided by Intel includes DDR connector and DIMM models. If you can find a model that matches the actual project, You can replace the model in the Deck. If the model cannot be found, it is also very useful to directly use the model provided in the Deck.

2.2 Intel SISTAI simulation

  The Memory Bit Error Rate Executable (MBERE) tool provided by Intel is integrated into its Intel SISTAI (Signal Integrity Support Tools for Advanced Interfaces) website system. SISTAI can simulate PCIE, SATA, USB, QPI and other high-speed signals, and DDR4 simulation The module is MBER. The basic idea is to first generate a Step Response based on Hspice, and then put the simulation result.TR0 file into the SISTAI system for calculation to generate the eye diagram of Worse Case. The general simulation process is as follows:

2.2.1 DDR channel modeling

  Intel's simulation is based on a 10-line model, eight DQ lines plus two DQS lines. You can use the Causal-W Element Tool provided by Intel to generate W Element models. You can also use tools such as ADS and Hspice to model transmission lines. For For Post-Layout, you can use PowerSI, Siwave and other software to extract the S parameters of the DDR channel. Note that the order of DQ and DQS here must be the same as the order provided by Intel, as shown in Figure 23.

Insert image description here

2.2.2 Hspice simulation

  The Intel simulation model is quite detailed, providing various models and Simulation Deck under various circumstances. During actual simulation, you need to replace the parameters in the Deck with the actual designed model. Taking S parameters as an example, assume that the entire For the S parameters of the DDR channel, you need to add the PCB channel model after the pcakage parameters, as shown in the second red box in the figure below. Some of the previous parameters can be deleted or added with * to block them.

Insert image description here
Hspice simulation obtains Step Response, the results are as follows:

Insert image description here

2.2.3 SISTAI simulation

After obtaining the Tr0 file, you need to put Tro into the SISTAI system for calculation. The operation process is as follows:
Insert image description here
Click Success to get the simulation results. Unfortunately, SISTAI can only see simulation data such as eye width and eye height, and does not provide eye diagrams. display.
Insert image description here
Intel's documents also provide Spec to compare and judge the simulation results.

Insert image description here

3. DDR4 RMT Margin test Fail problem example

3.1 Design situation

This design uses Intel Haswell-EP CPU as the DDR4 Controller and adopts 3DPC (DIMM Per Channel) design, as shown in Figure 29 below. DDR4 runs at 1600Mb/s.

Insert image description here

3.2 Problem description

  After the motherboard is completed, the DDR4 signal needs to be tested and verified. However, for DDR4 memory modules, the test points are very difficult to find and the test results are also very inaccurate. All options are to test only the Memory Margin. When testing with the margin test tool RMT provided by Intel, we tested memory modules from various manufacturers, including Hynix 8G, Hynix 16G, Samsung 8G, Samsung 16G, Samsung 32G, Micron 8G, Micron 16G, of which only Micron The 8G results show that the values ​​of RxVLow and RxVhigh are less than 14 (Spec is greater than or equal to 14), and the test results of other memory modules meet the Spec requirements.
Insert image description here

3.3 Memory Margin Test

  The above mentioned RMT test Fail, but what is RMT test? The following is a general introduction to the general testing of Memory. As we all know, after the actual PCB is completed, we need to test it to verify signal integrity. Usually, an oscilloscope is used to test the signal quality of the DDR signal line when reading and writing. However, this test has great limitations. For example, the measurement point where the DDR signal reaches each component cannot be measured, and the test point is often far away from the chip pad. There is still a distance that requires some additional test equipment, which will inevitably affect the accuracy. In addition, DDR signal read and write separation has always been difficult to handle. Even if you use professional test software provided by the instrument manufacturer, you often cannot see very accurate results. The waveform and test points are only located outside the chip, and the signal timing adjustment inside the Memory Controller cannot be measured. Therefore, in addition to using an oscilloscope to test the waveform, it is also necessary to perform a Memory Margin test.

Insert image description here
  The simple Memory Margin test method is to adjust the voltage amplitude of VREF under the condition that both the Controller and DRAM are powered by external VREF, and run the Memory Stress Test software (such as Golden Memory, MSTRESS, etc.) at the same time until a test Fail occurs. The difference between the VREF value and the default VREF value is recorded as VREF Margin. Adjusting VREF will not affect the waveform of signal transmission, because VREF is only the basis for the chip receiving end (Controller or DRAM) to judge whether the input is 0 or 1. However, in the DDR4 era, Vrefdq has been integrated into the chip and we cannot adjust it.

  At this time, some specialized testing software is more convenient. For example, Intel provides RMT and EVTS for DDR Margin testing.

Insert image description here
  The principle of RMT (DDR Rank Margin Tool) is to modify the settings so that the BIOS automatically runs the Training program when booting, and at the same time outputs the Training results through the Debug Port, and then analyzes the output print information to obtain the Memory Margin. The results obtained not only include VREF Margin, but also Write/Read Timing Margin, ADD/CMD Timing Margin... EVTS is a supplement to RMT and can perform per-bit margin testing. If the Margin is not good, the left and right or up and down When it is symmetrical, you can use EVTS 2D Margin to understand whether the cause is due to the shape of the eye diagram.

3.4 Problem analysis

3.4.1 Micron 8G body analysis

  Because the RMT test of other memory modules is PASS, only the test of Micron 8G is Fail. The first thing that comes to mind is the problem of the DIMM itself. After contacting Micron FAE, Micron suspected that the production date of the tested memory module was too old, and the version change would Affects the test results. However, after getting the latest sample, the test results still have not improved.

  At the same time, using these samples to test on Intel CRB (Custom reference board) can pass.

  It can be judged from this that Micron 8G itself is not the only factor causing Margin Fail. We can only try to increase the motherboard PCB Margin to improve the RMT results.

3.4.2 Analyze problems through Simulation

  Judging from the description of the problem, the motherboard + most of the memory modules are tested for PASS, and the problematic memory module + other motherboards are tested for PASS. It seems that the most troublesome situation is Worst Case + Worse Case. In this case, Simply looking at the design itself, all design indicators can satisfy the relevant documents or Design Guide. We can only start with the details and improve each other's margins through some subtle adjustments and optimizations. In this case, Micron 8G's Module It has been mass-produced. Before there is enough evidence, there is no way to ask the manufacturer to make any modifications. The motherboard is in the design stage. It seems that the only way to optimize and improve the layout of the motherboard is to increase the margin.

  However, for DDR, as described above, all design indicators meet the relevant design rules. It is undoubtedly inefficient and pertinent to do it only through empirical guessing, revision, and testing. However, through simulation, , do various simulations of different Cases, find obvious improvement points for improving Margin, and then modify the Layout, which is more targeted and avoids the waste of time and cost caused by multiple revisions.

  Back to the design itself, as described in Section 3.1 of this article, this design uses a channel with three memories (one Controller plus three DIMMs), as shown in Figure 33. After careful analysis of the test results, the worst Marign is DIMM2 (distance The closest one to the CPU), do a simple theoretical analysis, no matter writing data from the CPU to DIMM2 or reading data from DIMM2 to the CPU, no matter what state DIMM1 and DIMM0 are in, L2 and L3 always exist. For DIMM2, it is equivalent to There is a section of Stub, and the Stub will cause signal reflection, which will cause the Margin to decrease. Wow, I found the root cause. It turns out that the problem is so simple. Let’s revise it quickly and make the next batch of PCBs. However, if the next batch still doesn’t work, what to do? Calm down and do some simulation verification first.

Insert image description here

  Calm down, analyze it carefully, and compare the PCB design of the motherboard and Intel CRB. Sure enough, there is a difference here. The length of L2 and L3 of the CRB board is about 398 mil, while the length of L2 and L3 of our motherboard is about 462 mil. There is indeed a difference. , since there is a difference in length here, judging from our previous analysis, the simulation results will definitely be different. Let’s simulate it. As mentioned earlier, Intel SISTAI can only provide simulation data but cannot display waveforms. Simulation The results are organized as shown below.

Insert image description here

  Two points can be seen from the simulation results. First, the worst simulation data is also DIMM2, which is consistent with the actual test results. Second, our motherboard simulation results are worse than the Intel CRB results, which is consistent with our previous analysis and guessing. So, after reducing the length of L2 and L3, will the simulation results improve? Due to the differences between the PCB and the Connector itself, our motherboard L2 and L3 can only be reduced to about 410 mil. So, what is the result after the PCB is improved? The simulation data is as follows. It can be seen that the results of D2 have improved regardless of Write and Read, but why is it still very different from Intel CRB?

Insert image description here
  Comparing the layout again, trace routing can no longer find any difference. The stackup (Stackup), which has not been paid attention to before, has become the biggest difference. The CRB is an 8-layer board, while our motherboard is an 18-layer board, and our motherboard’s DDR routing The line is close to the TOP layer. Such a large lamination difference directly leads to the difference in stub length caused by the PTH Via hole. Similarly, the difference in pin length of the DIMM slot will also affect the stub. The pin length of the DIMM slot used by CRB is 2.4 mm. , the pin length of our motherboard DIMM slot is 3.2 mm, and we have not found a corresponding DIMM slot model. We can only simply simulate the DIMM slot pin length by deleting or increasing the PCB stack thickness, and reduce the motherboard DIMM slot pin length. (Using Stackup changes for simple simulation) to 2.4 mm, the simulation results are as follows, which are very close to the results of CRB. Although this simulation is not very accurate, we can still see the impact of Stub on signal quality.

Insert image description here

  According to the analysis results, shorten the length of L2 and L3 and change it to a DIMM slot with shorter pins (because the design has been basically finalized, only small changes can be made, and there is no way to move the DDR trace to the Layer close to the Bottom layer), and re- After the revision, the margin that failed in the previous test has been increased by 2~3 steps, and it is finally possible to pass.

  At this point, the analysis and simulation of this case is basically over. The length between DIMM to DIMM and the DIMM slot pin length (and PTH VIA Stub) cause the Stub to contribute to improving the signal margin. Therefore, when targeting 3DPC (DIMM) per Channel) design, in the early stage of design, the length of DIMM TO DIMM should be reduced as much as possible. For Cases with relatively large board thickness, the DDR traces should be placed as close to the bottom surface as possible to reduce the impact of stub on signal quality.

4. Summary

  The design, simulation, and testing of DDR have always been a concern for most designers and a headache for most engineers. First of all, from a theoretical understanding, DDR contains many technical difficulties, such as interface circuits, such as Timing, Driver Strength, ODT and other concepts need to be understood. Secondly, from a Layout perspective, DDR is not like a serial bus. It only has a few pairs of differential lines. The problem is easy to locate. Once a problem occurs in DDR, if the problem is located, it becomes a troublesome problem for many designers and requires a lot of testing. and experiments. Finally, from a simulation perspective, DDR simulation is much more complex than serial bus simulation. It requires consideration of PCB, connectors, memory modules, various parameter settings, etc.

  This article aims at some common confusions in DDR design. It first briefly describes the new and key technologies of DDR4, and then introduces the current simulation methods of DDR4 and Intel’s simulation solution for DDR4. Finally, through a practical case of Memory Margin, the ideas for analyzing and solving the problem are introduced.

  In the process of writing this article, the author consulted a variety of information on the Internet and some senior articles. At the same time, I also added a lot of my own opinions. Due to my limited level, mistakes are inevitable. I hope there is any misunderstanding. Please correct me promptly. At the same time, I also hope that everyone can improve and supplement, discuss with each other, and make progress together.




Return to the directory [Hard Ten Treasures] - 7. Memory category

Guess you like

Origin blog.csdn.net/qq_37952052/article/details/126236759