Training in DDR?

https://zhuanlan.zhihu.com/p/107898009

foreword

Haven't been watching startups lately? There is always a DDR Training in it, so I wonder what it is. But Baidu really didn't find it. I accidentally found an article by a senior today, so I have to go and study it right away. The link is at the very beginning, thank you for your excellent article.

Some people say that the BIOS program is to fill in the registers according to the hardware manual and according to the user's choice, which can be solved with a few tables. Why are there so many programs?

Although the combinatorial explosion of thousands of choices makes the exhaustive list impossible, it also reveals the essence of most programs in the BIOS: filling in the registers .

There is a voice that has different opinions. In the BIOS program, there has been an alternative for a long time. His name is MRC: Memory Reference Code. His task is to initialize the memory , but he calls himself the Memory Training code, which focuses on adjusting timing and improving signal integrity . Good advanced name: Training, training, seems to be related to artificial intelligence?

Of course not, but the reason why the training of the AI ​​​​model is so named is to find a solution to the problem through experiments. Deep Learning's Training gets the weight matrix of the neural network;

And Memory Training gets a set of alignment, compensation and reference voltage parameters to balance and hedge line differences and signal noise.

If we look at the code of Intel's BIOS, we will find that the MRC code is very large. And it is different because it is the only place that deals with analog signals and signal integrity . A large number of samples and eye diagram codes make it different. ARM and AMD do not have such a large amount of Memory Training code, why? Can memory be used without training?

Why does memory need Training?

The frequency of the memory I/O part is getting higher and higher, and such a high frequency will amplify even a small error. Students who are familiar with motherboard wiring should know that the clock constraints of high-speed signal wiring are very strict. When a group of high-speed signals turns a corner on the motherboard, there will be a gap in the length of the inner ring and outer ring. Although it is small, it does not matter for low-speed signals, but The high-speed signal clock constraint cannot be achieved, and it must be turned back in the opposite direction to compensate.

The frequency of G on the memory I/O frequency, so that any small errors must be compensated, so alignment and compensation must be performed throughout the data link.

A relatively complete memory access link includes many parts:

insert image description here
From the source, it includes MC (memory controller, memory Controller), PHY (MC and PHY are partly together, and many are designed separately to increase flexibility); from Pitch to pin, and then from pin to the wiring of the motherboard to the slot; slot The gold finger is connected to the memory particle through fly-by or directly; the memory particle is connected to the memory cell.

Each point of such a long chain may introduce problems of clock asynchrony and sampling delay, so each part must be aligned separately to make the hundreds of connections of the memory DIMM neat and uniform. (DIMM is a type of memory slot. The full name of DIMM is Dual-Inline-Memory-Modules, and the Chinese name is Dual-Inline-Memory-Modules.)

This is actually the end of more than a dozen major steps of memory initialization alignment .

  • Alignment and compensation start from the inside of the chip. When the chip is exported,
  • Then align DCA and DCS (because the following steps require command);
  • Then there is Read Leveling in jedec spec, read DQS/DQ;
  • Then come down to Write Leveling, and write DQS/DQ.
  • For a good signal, it is also necessary to match the RON and ODT resistors,
  • And by adjusting vRef to make eye diagram eyes open, and looking for safe and suitable sampling points.
  • Because the speed of DDR5 is too high, equalizers such as DFE are added to improve signal integrity:

These steps do not include the backside training required by RDIMM and the additional training steps of DB to granule for LRDIMM, so the server memory initialization is much more complicated.

Who will do the training?

Most of these steps are required for all memory solutions, including Solder Down solutions soldered on the board and different memory controllers. The key is who will perform these steps and who will train the entire command and data chain.

There are two schemes: In Band and OOB (Out Of Band). You often hear the concept of band in the communication field. There is no communication modulation here, so what does it mean to mention band? In fact, this statement is often mentioned in silicon technical documents. The so-called band refers to the computing resources of the CPU, that is, the Compute Bandwidth of the CPU .

  • In Band, that is, the CPU does it by itself and completes the task by itself;
  • OOB means that it does not occupy CPU resources and is completed by other companies. Typically, it is completed by an MCU. Of course, during and after the completion process, it needs to interact with the CPU through mechanisms such as mailboxes.

OOB training is very common, more common than everyone thinks. Almost all high-speed communication lines now require Training, including but not limited to PCIe, USB, SATA, and so on. And it is not the CPU that completes this Training, it can be MCU and DSP.

in conclusion

Well, let's go back to the original question: Why does Intel have a large amount of MRC code, while ARM and AMD don't have such a large amount of Memory Training code?

I believe that the students already have the answer. Yes, Intel uses In Band training, while ARM and AMD use OOB Training.

Finally, I would like to leave a question for everyone: What are the advantages and disadvantages of these two methods? Why does Intel use In Band, while ARM and AMD use OOB?

Great review:

The training sequencer of OOB is integrated in the PHY, and the access to DRAM is faster , so the training speed is also fast.

However, due to the closed nature of the third-party vendor, the open FW interface is limited, so it is not flexible enough, and the compatibility adjustment in the later stage is laborious.

arm is always more advanced than intel. Intel didn't have mcu when it was developing cpu. MRC is a program. The program needs ram to execute. It initializes the ram. Before the ram is initialized, it cannot be used. The chicken or the egg, this code must be very tangled.
oob uses mcu, and mcu comes with low-speed ram that does not need training.

oob just let another firmware do initialization. There is no saying that it is more convenient to repair the code. Regardless of whether the bios does it or other things, the memory training process must be updated by updating the firmware.

I don't think it may be a technical problem. It is estimated that Intel's MC and PHY are all made by itself, and it is logical to use the processor for training. It does not make a small MCU core by itself, and it is even less likely to integrate a third-party small MCU for training; ARM, AMD systems are more likely Will purchase third-party MC/PHY, the solution integrated by the IP designer is naturally more reliable and efficient.

Guess you like

Origin blog.csdn.net/weixin_45264425/article/details/130540196