Use MIG in VIVADO to control DDR3 (AXI interface) 3 - Introduction to DDR3

        Before reading and writing DDR3, we need to understand the relevant knowledge of DDR3, and if we look at DDR3 directly from the beginning, we are very likely to feel confused and do not know where to start. Next, we will start from SDRAM step by step. DDR3, step by step to learn related knowledge.

1 Introduction to SDRAM

        In a sense, SDRAM is the original product of the current memory, and the current DDR4, including DDR5, all originated from it. SDRAM (Synchronous Dynamic Access Memory) is a synchronous dynamic random access memory. Synchronization means that its clock frequency is the same as the system clock frequency of the CPU front-side bus, and internal command sending and data transmission are based on it; dynamic means that the storage array needs to be constantly refreshed to ensure that data is not lost: random is It means that the data is not stored linearly and sequentially, but the address is freely specified to read and write data.

1.1 Physical Bank

        In order to ensure the normal operation of the CPU, the traditional memory system must transmit the data required by the CPU in one transmission cycle at one time. The data capacity that the CPU can receive in one transmission cycle is the bit width of the CPU data bus . The unit is bit. At that time, the north bridge chip that controlled the data exchange between the memory and the CPU also equated the data bit width of the memory bus with the bit width of the CPU data bus, and this bit width was called the bit width of the physical bank (Physical Bank). A simple understanding is that the physical Bank is consistent with the CPU bit width.

1.2 Chip width

        Chip width refers to the bit width of each SDRAM cache chip itself.

        Here is an example to understand: Assuming that the bit width of the CPU is 64bits, then the physical Bank bit width of ADRAM is 64bits. Assuming that the bit width of the chip is 16bits, how should SDRAM interact with the CPU? We need 4 pieces of SDRAM connected to interact with the CPU.

1.3 SDRAM basic architecture

1.3.1 Logic Bank

        Logic Bank (Logic Bank, referred to as L-Bank) is an area divided by ADRAM memory storage space.

List land site
1 2 3 4 5 6 7 8 9
2
OK 3
land 4
site 5
6
7

        We can understand L-Bank as a grid matrix, and each small grid is a storage unit. Assuming that the bit width of the chip is 16 bits, each small unit can store 16 bits of data. When reading and writing SDRAM, we use RAS (row selection signal) and CAS (column selection signal) to select different ROW (row) and Column (column) addresses.

1.3.2 Schematic diagram of storage principle

        This picture is a simple schematic diagram and cannot be regarded as the actual internal circuit. When we read and write the memory unit, we must first select and activate the corresponding Bank and row, then give the column address while giving the read and write command, activate the column address, and then access the memory unit. We know that the data of the memory unit is stored by charging and discharging the capacitor, writing data to charge the capacitor, and reading data to discharge the capacitor. The role of the refresh amplifier is very important. When we write data, the refresh amplifier plays a role in pre-charging the memory unit, so that we can get a relatively standard high and low level. During the data storage time, because the capacitor will leak current, so it may be in a metastable state after a long time, then the refresh amplifier can refresh the cell, and when the stored data is 1, it charges the capacitor to maintain the standard 1; when the stored data is 0, it Make the charge of the memory cell more clean and maintain a stable 0. In addition, when reading data from the I/O port, because the amount of charge stored in the storage capacitor is very limited, we need to refresh the amplifier to amplify the capacitor to a value sufficient for our observation and then read it out.

1.3.3 Basic structure of SDRAM  

         As shown in the figure, CLK and CKE are a pair of differential clocks, CS#, WE#, CAS#, RAS# are control signals, which are used to configure the mode register, and the address line in the lower left corner is used to transmit the corresponding address, respectively Bank Address, row address and column address, the function of the big block in the middle is to select the corresponding row and column. The specification of the Bank storage array is 32M*4bit, which represents a bank with four chips with a bit width of 4bit, so its physical Bank bit width is 4*4=16bits. The data input/output register on the right is connected with I/O for data transmission. DQM is a mask signal, which is similar to a strobe signal, that is, it chooses to mask certain data bits. It should be noted that when writing data, after using the DQM signal to mask the corresponding data, the data will not be written into the memory; but in the data reading stage, using DQM to mask the data, the data will still be read from the memory, but in In the data output register processing stage, the masked data will not be sent to the I/O port.

1.4 Operation timing of SDRAM

1.4.1 Pin Diagram

        A0~A11 is a 12-bit address bus, and the row and column addresses are shared; BA0, BA1 are BANK addresses; DQ0~31 are 32-bit data pins; DQ0~3 are data shielding lines; clk is a clock signal, and cke is an enable Signal; NC means not connected. WE#, CAS#, RAS#, CS# are the command bus, WE# is the read and write control bit, 0 means write, 1 means read; RAS# is the row selection signal, CAS# is the column selection signal, CS# is the chip selection Signal.

1.4.2 Instruction truth table

        NOP means no operation, the first type is used during initialization, and the second type is used when idle. ACTIVE means to select a row of a certain bank and activate it. READ is to give the column address of the corresponding BANK, and start a burst read transfer. The WRITE command gives the column address of the corresponding BANK and starts a burst write transfer. BURST TERMINATE is an instruction prohibition command. For example, when a burst transmission is not over, if we want to perform other operations, we can use this command. Let it force stop. PRECHARGE means pre-charging. Before performing data read and write operations, the rows in BANK must be pre-charged. AUTO REFRESH and SELF REFRESH mean automatic refresh and manual refresh respectively. Automatic refresh means that the refresh amplifier will refresh the memory unit according to a cycle, while manual refresh is different. If it is not set to automatic refresh, then we read and write the memory every time Before the unit, the corresponding ROW (row) in the corresponding BANK must be manually refreshed first, and then the read and write commands can be executed.

 1.4.3 SDRAM operation timing diagram

        Activate command, activate a certain row of a certain bank.

         Write command, after refreshing the memory unit, give the BANK address and column address, and then read and write operations can be performed according to the command. The read operation is similar. It should be noted that the row and column addresses share a set of signal lines. There is no delay when the write command is executed, but the read command needs to be refreshed and enlarged, and the data may be delayed for a period of time before it can be read.

         Precharge timing, you can choose whether to automatically precharge according to A10. BA0,1 is used to select which BANK to precharge.

         SDRAM initialization sequence, give the corresponding command at the corresponding time in the figure to complete the initialization operation, mainly for precharging and configuration of the mode register. The times in the figure are some delays, and the specific values ​​can be viewed in the memory manual.

         The logic of the specific read and write timing diagrams is similar, there are burst read and write, and random read and write modes. Here, the burst read transmission is taken as an example. After precharging the corresponding row, the read command and the corresponding command are given at the same time. After the column address of CL, after the delay of CL, the data can be obtained on the DQ signal, and the cases of CL=2 and CL=3 are shown in the figure.

 2 From SDRAM to DDR SDRAM

        The full name of DDR SDRAM is Double Data Rate SDRAM, that is, double-rate SDRAM is improved on the basis of SDRAM.

2.1 Basic structure of DDR SDRAM

         It is not difficult to see that the left half of the figure is the same as the structure of SDRAM, and the difference is in the right half, that is, the four-bit data of DQ0~3 interacts with the 8-bit data of I/O. The logic is mainly Through the MUX multiplexer and the input register to realize the transformation of 8-bit data into two 4-bit data and the combination of two 4-bit data into 8-bit data, then the key question is how to perform an 8-bit data operation while DDR, The user interface operates on 4-bit data twice? We used to operate on the rising edge of the clock. In DDR, we sample on the rising and falling edges of the clock at the same time, thus achieving double rate, which is generally achieved by a pair of differential clocks .

2.2 DDR operation timing

         It can be seen from the figure that the operation timing of DDR SDRAM and the operation timing of SDRAM have not changed much, but the clock has become a pair of differential clocks, and the timing operation is performed at the intersection of the clock edge of CK and the clock edge of CK# , so that the double working rate can be achieved. The DQS signal can be understood as a data strobe signal, and a data update will occur at the same time as one of its changes. Because of the doubling of the differential clock rate, the CL here does not need to be an integer, as long as the double speed is 40.5.

3 DDR2 SDRAM

3.1 Basic structure of DDR2

        

         Different from the double transmission of DDR, DDR2 realizes 4 times transmission. Its implementation logic is similar to that of DDR. It also uses multiplexers and input registers to combine and split data, but the bit width of DDR2 becomes Four times of DQ0~DQ15, here DDR2 can be said to belong to the concept of stealing, like DDR, it is also double-edge transmission, but the remaining 2 times is achieved by doubling the clock frequency, that is, the clock of the port It is twice the internal clock of DDR2 .

3.2 Off-chip Driver Calibration (OCD, Off-chip Driver)

        DDR2 memory also has an initialization process when it is turned on. At the same time, a new setting option has been added to EMRS, and there is not much change. In the EMRS stage, DDR2 has added an optional OCD function. The main function of OCD is to adjust the voltage of the I/O interface to compensate the pull-up and pull-down resistance values. The purpose is to minimize the deviation between DQS and DQ data signals. During the adjustment period, test the synchronization of DQS high level/DQ high level and DQS low level/DQ low level respectively. If the requirements are not met, the pull-up/ Pull down the resistance level (increase one level or decrease one level), and do not exit OCD operation until the test is passed.

        The picture below is an OCD schematic diagram drawn by others, which can be compared and understood. The function of OCD is to adjust the synchronization between DS and DQ to ensure the integrity and reliability of the signal.

 3.3 On-chip termination (ODT, On-Die Termination)

        The so-called termination is to allow the signal to be absorbed by the terminal of the circuit without forming reflections on the circuit and causing an impact on the subsequent signal . In the era of DDR, the termination of control and data signals is done on the mainboard, and each DDR mainboard has a design of a termination voltage island next to the DIMM slot , which is mainly composed of a row of termination resistors. For a long time, this voltage island has been a difficult point in DDR motherboard design. The emergence of ODT has eliminated this difficulty. In other words, ODT is to transplant the termination resistor into the chip instead of designing a separate circuit on the motherboard.

3.4 Pre-CAS, additional latency and write latency

        Pre-CAS (Posted CAS) is a function designed to resolve instruction conflicts in DDR. It allows the CAS signal to be sent following the RAS signal , which is equal to the front of the CAS compared to the previous DDR. In this way, the address line can be vacated immediately, which is convenient for issuing effective commands in the following lines , and avoids the situation that the command conflict is forced to be delayed, but the read/write operation is not advanced because of this, and it is still necessary to ensure that there is enough delay/ Latency . For this reason, DDR2 introduces the concept of additional latency (AL , Additive Latency). Like CL, the unit is the number of clock cycles. AL+CL is defined as the read latency (RL, Read Latency). Correspondingly, DDR2 also supports Write latency (WL, Write Latency) sets the standard, and WL refers to the latency period from the Satoshi write command to the first data input . According to the regulations, WL=RL-1, that is, AL+CL-1.

3.5 DDR2 Timing Diagram

3.5.1 Read Timing

 3.5.2 Write Timing

         Note that WL refers to the write latency, which refers to the latency from the write command to the first data input, do not confuse it with tDQSS.

4 DDR3 SDRAM

4.1 New contents of DDR3

4.1.1 Burst Length (Burst Length, BL)

        Since the prefetching of DDR3 is 8bit, the burst transmission cycle BL is also fixed at 8 , and for DDR2 and early DDR architecture systems, BL=4 is also commonly used, and DDR3 adds a 4bitBurst Chop (burst mutation) mode for this , that is, a BL=4 read operation plus a BL=4 write operation synthesizes a BL=8 data burst transmission, and the burst mode can be controlled through the address line at that time. ( That is to say, DDR3 does not support burst transmission with BL=4 alone, and the burst length BL can only be equal to 8 ). And it needs to be pointed out that any burst interrupt operation will be prohibited in DDR3 . And not supported, replaced by more flexible burst transmission control (such as 4bit sequential burst).

4.1.2 Addressing Timing (Timing)

        Just as the number of delay cycles increases after DDR2 changes from DDR, the CL cycle of DDR3 will also increase compared with DDR2 . The CL range of DDR2 is generally between 2 and 5 , while that of DDR3 is between 5 and 11 , and the design of the additional delay AL has also changed . The AL range of DDR2 is 0~4, and the AL of DDR3 has three options, namely 0, CL-1 and CL-2. In addition, DDR3 also adds timing parameter - write delay (CWD) , this parameter will be determined according to the specific operating frequency.

4.1.3 Newly added Reset function

        Reset is an important new function of DDR3, and a pin is specially prepared for it. This feature has long been requested by the DRAM industry and is now finally implemented in DDR3. This pin will simplify the initialization process for DDR3. When the Reset command is valid, the DDR3 memory will stop all operations and switch to the least active state to save power.

        During Reset, DDR3 memory will turn off most of its internal functions, all data receivers and transmitters will be turned off, all internal devices will be reset, DLL (delay locked loop) and clock circuits will stop working, and ignore data Any movement on the bus. In this way, DDR3 will achieve the most power-saving purpose.

4.1.4 DDR3 adds ZQ calibration function

ZQ is also a newly added pin, and a 240 ohm low tolerance reference resistor         is connected to this pin . This pin uses a command set to automatically verify the on-resistance of the data output driver and the termination resistance of the ODT through the on-chip calibration engine (On-Die Calibration Engine, ODCE ) . When the system issues this command, it will use the corresponding clock cycles ( 512 clock cycles after power-on and initialization , 256 clock cycles after exiting the auto refresh operation , and 64 clock cycles in other cases) to On-resistance and ODT resistance are recalibrated .

4.1.5 The reference voltage is divided into two

        In the DDR3 system, the reference voltage signal VREF, which is very important for the operation of the memory system, will be divided into two signals, VREFCA for the command and address signals and VREFDQ for the data bus , which will effectively improve the signal of the system data bus. noise level.

4.1.6 Point-to-Point Connection (Point-to-Point, P2P)

        This is an important change to improve system performance and a key difference between DDR3 and DDR2. In a DDR3 system, a memory controller only deals with one memory channel, and this memory channel can only have one slot. Therefore, there is a point-to-point (P2P) relationship between the memory controller and the DDR3 memory module (single physical Bank module), or a point-to-double point (P22P) relationship (dual physical Bank module), which greatly reduces the load on the address/command/control and data bus. In terms of memory modules, similar to DDR2, there are also standard DIMMs (desktop PCs), SO-DIMM/Micro-DIMMs (laptops), and FB-DIMM2 (servers). The second generation of FB- DIMMs will use AMB2 (Advanced Memory Buffer) with higher specifications.

        DDR3 for 64-bit architecture obviously has more advantages in frequency and speed. In addition, because DDR3 adopts other functions such as automatic self-refresh and partial self-refresh according to temperature, DDR3 is also much better in terms of power consumption, so , he may be welcomed by mobile devices first, just as the latest DDR2 memory is not a desktop but a server. In the field of PC desktops, where the CPU FSB is increasing most rapidly, the future of DDR3 is also bright. Intel's new chip - Bear Lake (Bear Lake) will support DDR3 specifications, and AMD is also expected to support both DDR2 and DDR3 specifications on the K9 platform.

4.2 DDR3 hardware design

4.3 DDR3 Timing

 5 MIG introduction

        MIG is a memory controller IP core given by Xilinx platform. Although the timing relationship of DDR3 is very complicated, we can control DDR3 through the MIG IP core, and we only need to complete the timing control of MIG, which greatly simplifies the complexity of the design and shortens the development cycle.

         The figure shows the basic structure of MIG, which is divided into three parts, namely user interface, memory controller and physical interface. Among them, the memory controller and the physical interface are used to control the timing of DDR3, and we only need to control the user interface.

        The user interface of the MIG IP core in the figure is a Native interface, and the timing relationship is relatively simple. That is to say, when the app_en and app_rdy signals are high at the same time, app_cmd (command) and app_addr (address) are valid, so when app_cmd and app_addr are required to be valid, app_en must be kept until app_rdy is high.

         There are many ways to explain and use the MIG IP core of the Native interface, but we mainly use the MIG IP core of the AXI interface for this experiment. The timing of the user interface of the AXI interface is actually the timing of the AXI bus protocol. We talked about it before, and then we will directly how to configure it and use it to read and write DDR3.

Guess you like

Origin blog.csdn.net/qq_57541474/article/details/127699412