Summary of DDR-SDRAM technical principles

DDR SDRAM 全称: Double Date Rate Synchronous Dynamic Random Access Memory

Let’s talk about RAM (Random Access Memory) first, which literally means: random access memory. Its characteristic is that it can access a memory address at any time, and its access time is the same (compared to the cassette and hard disk before the emergence of RAM).

Further, DRAM (Dynamic Random Access Memory) corresponds to SRAM (Staic Random Access Memory). The difference between the two is that the physical structure is different. SRAM uses 4-6 transistors to store 1 bit of data, while DRAM only needs one capacitor and a Transistors, because the charge held in the capacitor will discharge over time, the information held in DRAM will be gradually lost unless the capacitor is periodically charged, that is, re-refreshed. This is the origin of the 'D' in DRAM, which stands for Dynamic.

SRAM circuit structure

SRAM uses bistable flip-flops to store information. As long as the power is not lost, the information will not be lost. SRAM storage elements use many MOS tubes and occupy a large silicon chip area, so they consume large amounts of power and have low integration. However, because a positive and negative feedback flip-flop circuit is used to store information, as long as the DC power supply is always added to the circuit, it can The memory status remains unchanged, so there is no need to refresh. The state will not change due to read operations, especially its reading and writing speed is fast. Its storage principle can be regarded as the reading and writing process of an RS flip-flop with a clock. Since SRAM is relatively expensive, it is suitable for high-speed and small-capacity semiconductor memory, such as Cache.

DRAM circuit structure

DRAM uses MOS (metal oxide semiconductor) capacitors to store charges to store information, so the capacitors must be continuously charged to maintain information. DRAM storage elements use fewer MOS tubes and occupy a small silicon chip area, so they consume less power and are highly integrated. However, because capacitors are used to store charges to store information, leakage will occur, so to keep the state unchanged, it must be refreshed regularly. , usually the refresh cycle is 4ms-64ms; the read operation will change the state; especially its read and write speed is much slower than that of SRAM components, and its storage principle can be regarded as the process of charging and discharging the capacitor. Compared with SRAM, DRAM is cheaper and therefore suitable for slow and large-capacity semiconductor memory, such as main memory.

The addressing methods of SRAM and DRAM are also different. Although we usually think of memory as a long array arranged in one dimension, in fact the memory is arranged in the form of a two-dimensional array. Each unit has its row address and column address, and of course the cache is the same. The difference between the two is that for SRAM with a smaller capacity, we can transfer the row address and column address into the SRAM at one time, but if we do the same for DRAM, we will need many, many address lines (the larger the capacity, the more address lines we have. Larger, the longer the address, the more digits the address has). So we choose to transfer the row address and column address to DRAM separately. First select a whole row, then store the data of the whole row into a latch, wait for the transmission of the column address and then select the required data. This is one of the reasons why SRAM is faster than DRAM.

There are many types of DRAM, the common ones include FPRAM/FastPage, EDORAM, SDRAM, DDR RAM, RDRAM, etc.

SDRAM (Synchronous Dynamic Random Access Memory) synchronizes dynamic random access memory. The CPU provides a clock signal and uses a single system clock to synchronize all address data and control signals. Using SDRAM can not only improve system performance, but also simplify design and provide high-speed data transmission. Often used in embedded systems.

DDR double rate synchronous dynamic random access memory. Strictly speaking, DDR should be called DDR SDRAM, which has twice the throughput of SDRAM.

DDR2/DDR II (Double Data Rate 2) SDRAM is a new-generation memory technology standard developed by JEDEC (Joint Electronic Device Engineering Committee). The basic method of data transmission at the same time as rising/falling delay, but DDR2 memory has twice the pre-read capability of the previous generation DDR memory (ie: 4bit data read pre-fetch). In other words, DDR2 memory can read/write data at 4 times the speed of the external bus per clock, and can run at 4 times the speed of the internal control bus.

DDR3 is a computer memory specification. It belongs to the SDRAM family of memory products, providing higher operating performance and lower voltage than DDR2 SDRAM, and is the successor (up to eight times) of DDR2 SDRAM (four times data rate synchronous dynamic random access memory) .

 In a Memory Array, a bit is selected by the intersection of the row address and the column address. If two Arrays are superimposed, two bits are selected at the same time, and the bit width is X2. The same applies to X4 and X8. Multiple Arrays are superimposed to form a Bank. A DRAM chip/device is composed of multiple banks, but only one bank can be selected for data transmission at the same time.

 Take 16Meg x16 memory as an example, its interior is like this, so if we want to address a unit inside SDRAM, we need to give 3 parameters, Row, Colunm, Bank.

 In a Memory Array, a bit is selected by the intersection of the row address and the column address. If two Arrays are superimposed, two Bits are selected at the same time, and the bit width is X2. If four Arrays are superimposed together, four Bits can be selected at the same time, and the bit width is X4. That is to say, for a DDR particle with a width of X4 bits, if the row address and column address are given, 4 Bits will be output to the DQ data line at the same time.

 The DDR core frequency is the working frequency of the memory; the core frequency of DDR1 memory is the same as the clock frequency. Only with DDR2 and DDR3 did the concept of clock frequency come into being, which is a frequency obtained by multiplying the core frequency through frequency multiplication technology. Data transmission frequency is the frequency with which data is transmitted. The clock frequency of DDR2 memory is twice the core frequency. The clock frequency of DDR3 memory is 4 times the core frequency, and the data transmission frequency is 8 times the core frequency (usually the data transmission frequency is 2 times the clock/bus frequency). DDR will be followed by updates to DDR2, DDR3, and DDR4, basically every The first generation uses more Prefetch and higher clock frequency to achieve 2 times the data transfer rate of the previous generation.

Transfer Rate (MT/s): It is the number of Transfers that occur per second, which is generally 2 times the Bus Clock (in one Clock cycle, there is one Transfer on the rising edge and the falling edge). Internal rate (MHz): It is the internal Memory The frequency of Array reading and writing. Since SDRAM uses capacitors as storage media, due to limitations in process and physical characteristics, it is difficult to further shorten the charging and discharging time of the capacitors. Therefore, the read and write frequency of the internal Memory Array is also limited. Currently, it can reach a maximum of 266.67 MHz, which is also the limit of SDR. The main reason why DDR adopts Prefetch architecture. Memory Array's read and write frequency is limited, so it can only optimize the read and write width. By increasing the data width of operations in a single read and write cycle, combined with the increase in bus and IO frequencies, the overall transfer rate can be improved.

The architecture of DDR Memory: The
layer-by-layer relationship of the memory subsystem from the CPU to the memory chip is CPU->channel>DIMM>rank>chip>bank>row/column. From the Memory controller to the Channel first, each Channel must have a set of Control registers are used to configure and operate the memory chip. Each channel can have multiple sets of DIMMs (Dual In-line Memory Module), and DIMMs are what we usually call memory sticks.

Rank and Chip, Rank refers to the chip connected to the same CS. The memory controller can read and write the chip of the same rank. Usually a group of channels can read and write 64bit data at the same time (ECC function is 72bit), so for 8-bit wide memory particles can form a RANK. Similarly, if it is 16bit wide, only 4 are needed.

The chip is further divided into banks, and the banks are divided into specific circuits for storing data. The rows in the bank are called rows and the columns are columns. There is also a row buffer (sense amplifier) ​​under each bank, which is responsible for buffering the read row data, waiting for the column address to be delivered, and outputting the correct bit. And determine whether the stored data is 0 or 1.

DDR multi-channel and interleave

From the memory access characteristics of DDR, for the same DDR, there is some time interval between two memory access operations, which includes CL (CAS delay), tRCD (RAS to CAS delay), tRP (precharge valid) cycle) etc.

In order to improve the memory access speed of DDR, multi-channel technology can be used. Typical desktop and notebook CPUs have long supported dual channels, and now triple channels are also added. If the data is distributed on memory sticks plugged into different channels, the memory controller can read them simultaneously regardless of the above delays and timings, and the speed can be doubled or even tripled (if more channels are supported, the speed improved more). Qualcomm's first-generation ARM server SoC chip uses four DDR controllers, which supports four channels.

However, due to the limitations of the program, a program will not put the data in various places and fall into another DIMM. Often the program and data are in one DIMM, and the CPU's Cache itself will help you prepare the data. Take it out, and the speed improvement of this multi-channel is not so obvious.

At this time, another method to improve speed is used, which is to distribute the same piece of memory to different channels. This technology is called interleaving. In this way, regardless of whether the Cache is hit or not, it can be accessed at the same time. Multi-channel Technology can be of greater use.

Precharge (Precharge):  The addressing of SRAM is exclusive, so every time you switch to a different row of the same bank, you must close the original working row and resend the row and column addresses. The process of L-BANK closing existing rows and opening new rows is called precharging. In fact, precharging is a kind of data rewriting to all memory banks in the working row, reset the row address, and release S-AMP at the same time to prepare for the work of the new row. Address line A10 controls whether to automatically precharge the current L-BANK after reading or writing. Usually, it will take some time before sending RAS# to open a new working line after the precharge is issued. This time is called tRP (precharge command period, precharge effective period) and the unit is the number of clock cycles.

Memory Refresh Circuit (Refresh):  The reason why it is called DRAM is because it needs to be constantly refreshed to keep the data. Refresh is actually to rewrite the data. The capacitor in the storage bank keeps the data for 64ms, so the refresh cycle of each row is 64ms. There are two types of refresh, Auto refresh and Self refresh. Self-refresh is used in the state of S3 to refresh according to the internal clock. The refresh operation is the same as the rewrite operation in pre-charging, and both use S-AMP to read first and then write. But why do you need to refresh when there is a pre-charge operation? Because precharging operates on one or all working rows in L-Bank, and is irregular, while refresh has a fixed cycle, and operates on all rows in turn to reserve those memory banks that have not experienced rewriting for a long time data in. But different from all L-Bank precharges, the row here refers to the row with the same address in all L-Banks, and the address of the working row in each L-Bank in the precharge is not necessarily the same. Signal amplifier: used to convert the change of the external circuit into 0/1 stored in the capacitor, or exported.

Delay-locked loop (DLL):  DDR SDRAM has high requirements on clock accuracy, and DDR SDRAM has two clocks, one is the external bus clock, and the other is the internal working clock. In theory, the two clocks of DDR SDRAM The clock should be synchronized, but due to various reasons, such as temperature and voltage fluctuations, delays make it difficult to synchronize the two, so it is necessary to dynamically correct the delay of the internal clock according to the external clock to achieve synchronization with the external clock. This is the task of the DLL .

CK/CK#:  A necessary design for differential clock DDR. The role of CK# is usually understood as the second trigger clock, but in fact it plays the role of trigger clock calibration. Due to various factors, the distance between the upper and lower edges of the audio CK may change, and CK# can play a correction role. CK rises quickly and falls slowly, while CK# rises slowly and falls quickly.

DQS: data selection pulse, bidirectional signal; when reading the memory, it is triggered by the memory, and the edge of DQS is aligned with the edge of the data. When writing, it is triggered by the CPU/MEM controller, and the middle of the DQS corresponds to the edge of the data.

DQ0-DQn : Data input and output signals.

RAS#, CAS#, WE#: row strobe, column strobe, write enable signal.

CS#: Chip select signal, enables the command decoder.

A0-An: The rows and columns share address lines, of which A10 is used for automatic precharge during read and write commands.

BA0-BA1: Bank strobe signal.

CKE: clock enable signal.

DM0-DMi:  data mask

DDR Command:  The interaction between Host and SDRAM is initiated by Host in the form of Command. A Command is composed of multiple signals. The main Commands are described in the table below.

  • Active Active Command will select a Row in the specified Bank through BA[1:0] and A[12:0] signals, and open the wordline of the Row. Before performing Read or Write, Active Command needs to be executed first.
  • Read Read Command will send the address of the Column to be read to SDRAM through the A[12:0] signal. Then SDRAM sends the corresponding Column data to the Host through DQ[15:0] in the Row selected by Active Command. The Host sends Read Command, and the number of clock cycles required to send data to the SDRAM on the bus is defined as CL.
  • Write Write Command will send the address of the Column to be written to SDRAM through the A[12:0] signal, and at the same time send the data to be written to the SDRAM through DQ[15:0]. Then SDRAM writes data to the specified Column of Active Row. The time from SDRAM receiving the last data to completing data writing to Memory is defined as tWR (Write Recovery).
  • Precharge The Precharge operation must be performed before the next Read or Write operation. The Precharge operation is performed on a Bank-by-Bank basis. It can be performed on a certain Bank individually or on all Banks at once. If A10 is high, then SDRAM performs All Bank Precharge operation. If A10 is low, then SDRAM performs Precharge operation on the specified Bank according to the value of BA[1:0]. The time required for SDRAM to complete the Precharge operation is defined as tPR.
  • The charge in the Storage Cell of Auto-Refresh DRAM will slowly decrease over time. In order to ensure that the information stored in it is not lost, it needs to be refreshed periodically. SDRAM is refreshed according to Row. The standard defines that all Row refresh operations need to be completed within a refresh cycle (64ms at normal temperature and 32ms at high temperature). In order to simplify the design of the SDRAM Controller, the SDRAM standard defines the Auto-Refresh mechanism, which requires the SDRAM Controller to send 8192 Auto-Refresh Commands, or AR, to the SDRAM within a refresh cycle. Every time SDRAM receives an AR, it performs n Row refresh operations, where n = total number of Rows / 8192. In addition, SDRAM maintains a refresh counter internally. Every time a refresh operation is completed, the counter is updated to the Row that needs to be refreshed next time. Under normal circumstances, the SDRAM Controller will send AR periodically, and the direct time interval between every two ARs is defined as tREFI = 64ms / 8192 = 7.8 us. The time required for SDRAM to complete a refresh operation is defined as tRFC, and this time will increase as the number of SDRAM Rows increases. Since AR will occupy the bus and block normal data requests, and SDRAM consumes a lot of power when performing refresh operations, some optimization measures are also provided in the SDRAM standard. For example, the DRAM Controller can delay up to 8 tREFI. Then send out 8 ARs at the same time.
  • Self-Refresh Host can also put SDRAM into Self-Refresh mode to reduce power consumption. In this mode, the Host cannot perform read or write operations on the SDRAM, and the SDRAM performs internal refresh operations to ensure data integrity. Usually when the device enters standby state, the Host will put the SDRAM into Self-Refresh mode to save power consumption.

How DDR4-DRAM works

The most important signals in DDR4 are address signals and data signals.

As shown above, the DDR4 chip has 20 address lines (17 Address, 2 BA, and 1 BG) and 16 data lines. Before figuring out the role of these signal lines and why the address signals have multiplexing functions, we first ask a question. If we use 20 address lines and 16 data lines to design a DDR, how much DDR addressing capacity can we design?

According to the simplest single-line 8421 encoding addressing method learned in the textbook, we know that the addressing space of 20 address lines (not even considering the read and write control signals) is 2^20, and 16 data lines can be transmitted at one time With 16-bit data, we can easily calculate that if the single-line 8421 encoding addressing method is followed, the maximum storage capacity of the DDR chip is:

Size(max)=(2^20)*16=1048576*16=16777216bit=2097152B=2048KB=2MB。

But in fact, the maximum capacity of this DDR can reach 1GB, which is a full 512 times larger than the traditional single-line encoding addressing capacity. How does it do it? The answer is simple, time-sharing multiplexing.

We can design the DDR storage space as follows:

First, divide the storage space into two large blocks, namely BANK GROUP0 and BANK GROUP1, and then use 1 address line (there are 19 left), named BG, for encoding. If BG is pulled high, BANK GROUP0 is selected, and if BG is pulled low, BANK GROUP1 is selected. (Of course you can also divide it into 4 large blocks and use 2 lines for encoding)

Then divide one BANK GROUP area into four small BANK areas, named BANK0, BANK1, BANK2, and BANK3 respectively. Then we pick out 2 address lines (there are 17 remaining) and name them BA0 and BA1, and encode the addresses for the 4 small banks.

At this point, we divide the DDR memory particles into 2 BANK GROUP, and each BANK GROUP is divided into 4 BANKs, a total of 8 BANK areas, assigned 3 address lines, named BG0, BA0, BA1 respectively. Then we still have 17 signal lines left. How should we design each BANK? At this time, the design concept of time-sharing reuse is used.

The remaining 17 lines are used to represent the row address for the first time and the column address for the second time.

Originally, once the address is transmitted, data is transmitted once, and the addressing range is up to 16KB (no reading or writing signals).

Now it is modified to transmit the address twice and data once, and the addressing range is extended to up to 2GB . Although the data transfer speed is reduced by half, the storage space is expanded many times. This is room for improvement.

Therefore, of the remaining 17 address lines, 1 is left to indicate whether the transmission address is a row address .

In the first transmission, the row address selection is enabled, leaving 16 address lines, which can represent the row address range, and the row address range can be easily calculated as 2^16=65536=64K .

In the second transmission, the row address selection is disabled, 16 address lines are left, and 10 column address lines are left to indicate the column address range. The column address range that can be easily expressed is 2^10=1024=1K , and the remaining 6 roots are used to indicate read and write status/refresh status/execution function, etc. multiplexing functions.

In this way, we can divide 1 BANK into 67108864 = 64M address numbers . As follows:

In each address space, all 16 of our data lines are used to store 16 bits of data at a time .

Therefore, 1 BANK can be divided into 65536 rows, each row has 1024 columns, and each storage unit has 16 bits .

Each row can store 1024*16bit=2048bit=2Kb. The storage capacity of each row is called Page Size.

A single BANK has a total of 65536 rows, so the storage capacity of each BANK is 65536*2KB=128MB.

A single BANK GROUP has a total of 4 BANKs, and the storage capacity of each BANK GROUP is 512MB.

A single DDR4 chip has 2 BANK GROUPs, so the storage capacity of a single DDR4 chip is 1024MB=1GB.

At this point, all 20 address lines and 16 data lines have been allocated. We use forward design thinking to explain the storage principle of DDR4 as well as the interface definition and addressing method.

Summary: There are 20 DDR4 address lines, including 1 BankGroup strobe, 2 Bank strobes, 1 row and column strobes, 16 row addresses (65536 rows), 10 column addresses and row addresses (1024 columns); data There are 16 lines, and 16 bits of data can be fetched at one time, that is, 16 bits are prefetched. Page size 1024*16bit=16Kb=2KB, Bank size 2KB*65536=128MB, BankGroup capacity 128MB*4=512MB, single chip capacity 512MB*2=1GB.

DDR read and write timing

 Before reading and writing memor, you must first select the corresponding PBANK (RANK) through CS#, then select the LBANK through BA0, BA1, and then select the specific bank through row valid (RAS#) and column valid (CAS#) rows and columns, and then you can read and write. But because the row address and column address share the address line, there must be an interval between RAS# switching to CAS# to ensure the response time of the electronic components of the chip storage array. This time is called tRCD, that is, RAS to CAS Delay ( RAS to CAS delay) units are clock cycles.

After the column address is determined, it is only necessary to send the data to the bus through DQ. But it will take some time from CAS# to the real data output to the bus, which is called CL/RL (CAS Latency, CAS latency). Why this time is needed, mainly because the storage unit needs a certain reaction time, so it is impossible to trigger on the same rising edge as CAS. In addition, the capacitor capacity is very small, and the signal needs to be amplified (S-AMP) to be recognized before it can be sent to the bus. This also takes a period of time called tAC (Access Time from CLK, access time after clock triggering). The unit of tAC is ns.

Data writing is also performed after tRCD, but CL is not required. Although data can be sent out at the same time as CAS#, because the charging of the strobe transistor and capacitor must take a period of time, actual data writing requires a certain amount of time. cycle, sufficient write/correction time (tWR, Write Recovery Time) will be set aside. This operation is also called write back (Write Back). tWR takes at least one clock cycle or a little more.

Burst Mode:  refers to the way in which adjacent memory cells in the same row are continuously transmitted. The number of consecutive transmissions is called the burst length BL (Burst Length). Without Burst, continuous reading and writing of multiple data will cause memory control resources to be occupied (cmd and column addresses must be read continuously), and new commands cannot be entered during data transfer.

After using the burst transfer mode, as long as the column starting address and BL are specified, the memory will automatically read the corresponding number of subsequent memory cells in sequence and there is no need to always provide the column address. BL has 1, 2, 4, 8.

Memory address mapping:  One of the main functions of the SDRAM Controller is to convert the CPU's memory access operation to the specified physical address into SDRAM read and write timing to complete data transmission. In actual products, it is usually necessary to consider the mapping of the physical addresses in the CPU to the Bank, Row and Column addresses of the SDRAM. The following figure is an example of a 32-bit physical address mapping:

DDR2 VS DDR3 VS DDR4:  DDR has not changed much from the beginning of its invention to DDR4. The transmission speed is continuously increased by adjusting the prefetch bits. The core frequency has always been maintained between 100-266MHZ, and the prefetch maximum is 8n. But after DDR4, the method of adding prefetch cannot be used anymore, because the maximum cache line is only 64Bytes. When using 8-bit prefetch, one cache line only needs BL8. If it is a 16-bit prefetch, taking 128bits BL8 at a time will be 128 Bytes. 64Bytes is wasted, so DDR4 starts to increase the core frequency to 200-400MHZ.

In addition, DDR4 also introduces the Bank Group mechanism to improve performance. Specifically, each Bank Group can read and write data independently, so that the internal data throughput is greatly improved, a large amount of data can be read at the same time, and the equivalent frequency of the memory is also greatly improved under this setting. The DDR4 architecture uses 8n prefetched Bank Group groups, including the use of two or four selectable Bank Group groups, which will allow each Bank Group group of DDR4 memory to have independent activation, reading, writing and Refresh operation. Similar to multiplexing, up to 4 sets of data can be processed simultaneously in one operating clock cycle, thereby improving the overall efficiency and bandwidth of the memory.


Memory Training:
From the CPU's perspective, the distance between each DRAM on the DIMM and the CPU is different. From the perspective of DIMM itself, the CLK and Data signals of DRAM Chip are not equal in length (deviation), so Training is required. Training is used to compensate the delay from the board and DRAM.
DQS Receiver Enable:  When we read data, we need to send Read cmd first, and then wait for a while for the data to appear on the bus, and then the Memory controller needs to enable the DQS receiver pad to receive the data. DRAM notifies valid data to be sent through DQS signal. Before that, DRAM will first pull down the DQS signal for half a cycle called "read preamble". This read preamble is used to let the CPU know that the next DQS signal will be valid when it arrives. data. But before the preamble arrives, DQS is prone to generate interference signals, which will cause the receiver pad to misjudge. DQS Receiver Enable training is to make the pad open just in the middle of the preamble signal by adjusting the delay time.

For the rank on each Channel, two addresses (64bytes cache line aligned, 2M apart) are selected, and the CPU writes 64bytes data of a specific type (55, AA) to them. The CPU will use the first QWORD data to train Receiver PAD. The program will continuously adjust the delay, and then read the data from these two addresses and compare it with the known data. Once the correct data can be obtained it means that the DQS Receiver is just to the left of the preamble. We can save this delay and write it to the relevant registers. Q: How to ensure that the test pattern can be successfully written into memory? SPEC says that writing a cache line specific pattern can ensure that the first QWord can be correctly written to DRAM.
Write Leveling:  Fly-by wiring has been introduced since DDR3, which means that the wiring of address, command and clock pass through each DDR memory chip in sequence, but DQ and DQS are still point-to-point connections. Helps reduce synchronous switching noise. However, this wiring causes offset of CLK and DQ/DQS signals.

The function of Write Leveling is to adjust the edge alignment of the DQS signal and the CLK signal on the DRAM chip; the specific process of Training is as follows: Enter the Write Leveling mode by setting the MR1 register A7 to 1, and then the DDR controller continuously adjusts the delay of DQS relative to CLK, and the DRAM chip The clock signal on the CLK pin will be sampled on the rising edge of DQS. If the sample value is low, the DDR controller will be notified that the tDQSS phase relationship has not been satisfied by keeping all DQ[n] low. If it is found that the sampled CLK level becomes high on a certain DQS rising edge, it is considered that the phase relationship of the tDQSS has met the requirements, and the DDR controller is notified of a Write Leveling success by pulling DQ[n] high. At the same time, the DDR controller will lock this phase difference. At this time, the CLK and DQS seen from the DRAM side are edge-aligned.

See the figure above, the write equalization adjustment process: t1: Pull up ODT and enable on die termination; t2: After waiting for the tWLDQSEN time (make sure the ODT on the DQS pin has been set), the DDR controller sets the DQS starts; DDR memory samples the CK signal on the rising edge of DQS and finds that CK=0, then DQ remains 0. t3: The DDR controller sets DQS; DDR memory samples the CK signal on the rising edge of DQS and finds that CK=0, then DQ remains at 0. t4: DDR controller sets DQS; DDR memory samples CK signal at the rising edge of DQS, and finds that CK=1, then waits for a period of time, DDR memory sets dq signal.
The reason for adopting the above strategy: For the DDR controller, it is unable to measure the absolute position of the clk edge and the dqs edge, so it continuously adjusts the dqs delay and judges a change of clk from 0 to 1 or 1 to 0 on the rising edge of dqs. Once a change is detected, write equalization stops. tDQSS (DQS, DQS# rising edge to CK, CK# rising edge, required in the standard to be +/-0.25 tCK. tCK is the CLK clock cycle)
wiring requirements: 1. It needs to be enabled only when fly-by is used write leveling 2. The memory controller inside the CPU can only delay the DQS signal and cannot do advance processing, so CK must be greater than the length of the DQS signal line, otherwise tDQSS will not be satisfied.
DQS Centering:  DQS Centering is divided into two types: Read DQS Timing and Write Data Timing. Their purpose is to ensure that the DQS signal is in the middle of the DQ data eye.

  • Read DQS Timing: When reading data, the CPU/Memory controller delays DQS by adjusting the internal DLL delay locked loop circuit so that it is in the middle of the DQ data eye. The specific training process is:
    • Write a cacheline's specific type of data for each rank of each channel of Memory.
    • Then read back the data and mark PASS or Fail based on the current ReadDQS timing.
    • Increase ReadDQSTiming delay and continue step a. Until the maximum passable ReadDQSTiming delay is found.
    • Increase the Write Data Timing delay and continue with the above steps. If and only if there are three consecutive sets of passes, take the middle set of data and record the average value of ReadDqsTiming
  • Write Data Timing: When writing data, ReadDqsTiming has been found. In addition, because Memory does not have a DLL circuit, you can only adjust the Write Data timing on the CPU/Memory controller side to match the DQS signal so that DQS is in the middle of the DQ data eye. The specific training process is as follows:
    • Write a cacheline's specific type of data for each rank of each channel of Memory.
    • Then read back the data and record PASS or Fail based on the current Write Data timing.
    • Increase the Write Data Timing Delay and continue step a. until the maximum passable Write DQ Delay timing is found.
    • Calculate their midpoint (average) and set the corresponding Write DQ Delay timing value.

References:

Introduction to SDRAM_Love Onion’s Blog-CSDN Blog

SDRAM Internals

How DDR Memory works - Zhihu

Guess you like

Origin blog.csdn.net/ctbinzi/article/details/131211549