Follow the old wolf to see the memory

I accidentally fell into Uncle Wolf's article, here are three articles about memory.

For more exciting articles, I suggest you take a look at Uncle Wolf’s homepage:
https://zhuanlan.zhihu.com/p/26255460
https://zhuanlan.zhihu.com/p/26387396

Memory Series 1: Quickly Read Memory Stick Labels

Memory is one of the most frequently encountered computer hardware. The size, amount and model of memory are closely related to the performance of our computers and mobile phones. The memory series plans to introduce the software and hardware characteristics of the memory and the relationship with the firmware through three articles from the shallower to the deeper. This is the first article, which introduces readers to the background knowledge of memory with a life scenario, laying the foundation for the future.

1- Scenario

Xiao Zhang has a certain background in computer knowledge. Recently, he bought two DDR3 memory on JD.com and plans to upgrade his notebook to 8G. But as soon as I unpacked it, I was dumbfounded:

insert image description here
4GB looks fine, two sticks are exactly 8GB. What is 2Rx8, what is PC3, 10600 seems to be far from the 1333 he wants to buy, what does the string of numbers behind it represent?

Xiao Zhang found me and asked me to explain to him what these letters and numbers represent. I am the best teacher, so we started our introduction today.

1- What are DIMMs?

In the 80286 era, the memory chip (Chip) was directly inserted on the motherboard, called DIP (Dual In-line Package).

In the 80386 era, it was replaced with a circuit board soldered with memory particles, called SIMM (Single-Inline Memory Module).

Changing from the shape of the pins to the circuit board has brought many benefits: modularization, convenient installation, etc., so that the DIY market is possible.

At that time, the bit width of SIMM was 32bit, that is, 4 bytes were read in one cycle. When it came to Pentium, the bit width became 64bit, that is, 8 bytes, so SIMM became DIMM (Double-Inline Memory Module) along the way.

This form has continued to this day, and it is also the basic form of memory sticks.

Speaking of this, Xiao Zhang became anxious: "What does this have to do with my memory?".

Of course, it has something to do with the S of the 10600S. Now there are many types of DIMMs:

  • RDIMM: full name (Registered DIMM), registered module, mainly used in servers, in order to increase the capacity and stability of memory, there are two types: ECC and non-ECC, but almost all of them are ECC on the market.

  • UDIMM: full name (Unbuffered DIMM), unbuffered module, this is the standard desktop computer DIMM we usually use, there are two types of ECC and no ECC, generally no ECC.

  • SO-DIMM: full name (Small Outline DIMM), small-sized DIMM, DIMM used in notebook computers, divided into two types: ECC and no ECC.

  • Mini-DIMM: A new type of module in the DDR2 era. It is a reduced version of Registered DIMM and is used in high-end fields such as blade servers that require high volume.

The general memory length is 133.35mm, and SO-DIMM is shortened to 67.6mm in order to adapt to the narrow space in the notebook and is generally inserted sideways.

There are also some variants in height, the general memory height is 30mm, the VLP (Very Low Profile) is reduced to 18.3mm , and the ULP (Ultra Low Profile) is dwarfed to 17.8mm, mainly for putting it into a 1U blade server.

insert image description here
Xiao Zhang now knows that the S in 10600S stands for SO-DIMM, and it seems that the size is correct. But what about speed?

2-DDR to DDR4

In order to take care of Xiao Zhang's quick temper, I skipped the story of DDR and Rambus/RDRAM competing for the world after SDRAM.

The full name of DDR SDRAM is Double Data Rate SDRAM, and the Chinese name is "Double Data Rate SDRAM".

DDR SDRAM is improved on the basis of the original SDRAM. It is also because of this that DDR can defeat its former rival RDRAM by virtue of the cost advantage of switching production, and become the mainstream today. As the name implies, compared with the original SDRAM, DDR SDRAM needs to transfer data twice in one clock cycle:

insert image description here
The main difference from DDR to DDR4 is the difference in the transmission rate. As the clock cycle continues to decrease, the transmission rate continues to increase.

And the voltage is getting lower and lower .

What's interesting is the naming convention. Most desktop DIMM manufacturers will mark DDRx-yyy, x represents the generation, and yyy represents the data transfer rate.

Most SO-DIMMs and RDIMMs are marked with PCx-zzzz, x also represents the generation, and zzzz represents the maximum bandwidth. Because the DDR bit width is 64 bits and 8 bytes, so zzzz=yyy * 8, and yyy is twice the clock. The following table shows the speed of the main generations of DDR memory:

insert image description here
So PC3-10600S on Zhang's memory stick represents DDR3, 1333MHz SO-DIMM. Xiao Zhang asked again, what does 2R*8 mean?

1325*8=10600

3-RANK and BANK

In fact, it can be seen from the appearance that Zhang's memory stick is composed of many Hynix memory particles.

From the memory controller to the internal logic of memory particles, generally speaking, from large to small: channel>DIMM>rank>chip>bank>row/column, as shown in the following figure:

insert image description here
A realistic example would be:

insert image description here
In this example,

  • An i7 CPU supports two Channels (dual channels),
  • Each Channel can insert two DIMMs,
  • And each DIMM consists of two ranks,
  • 8 chips form a rank. Since the bit width of most memory granules is 8bit, while the CPU bandwidth is 64bit, 8 granules can often form a rank.

So Xiao Zhang's memory stick 2R X 8 means that it is composed of 2 ranks, and each rank has eight memory particles (why we will talk about it later).

Since the entire memory is 4GB, we can calculate that a single memory particle is 256MB.

3- Postscript

Xiao Zhang is relieved now, but he mentioned that he saw many digital marks on many memory sticks, such as:

insert image description here
insert image description here
What are these? In fact, this is the latency (Latency) data of memory particles, such as 4-4-4-8, 5-5-5-15, 7-7-7-21, or 9-9-9-24, which represent For CL-tRCD-tRP-tRAS data, the smaller the better. Specifically, what these are, requires more in-depth knowledge. We will talk about the hardware principle in the next article, and we will also explain the specific use of these parameters in detail in the memory reference code (MRC) part of UEFI.

If you are as unsatisfied as Xiao Zhang, you can think about the following questions before the next article:

The width of memory sticks of each generation is the same, will it be wrongly inserted?
As can be seen from the memory speed tables of the previous generations, the bandwidth of each generation and the previous generation is partially repeated. Why is this?
If the bandwidth of the previous generation and the next generation are the same, which performance is better?

The article says that 2Rx8 means 2 Ranks, and each Rank has eight memory particles. Then why the DDR2 Nanya memory on my old notebook said PC2-5300S 512MB 2Rx16 667. But this memory stick only has 4 memory particles on each side?

"So Xiao Zhang's memory stick 2R X 8 means that it is composed of 2 ranks, and each rank has eight memory particles", here it is said that each RANK has eight memory particles should be established when each chip is 8bit . Because if each chip has a bit width of 8 bits (that is, ×8), then it can be concluded that each rank can support 8 chips (that is, each rank has eight memory particles), and 2R supports 16 chips. If it is 2R×16, each rank supports 4 chips. Therefore, it is easy to be misunderstood as ×N in the original words means that each RANK has N memory particles, which is not the case. ×8 refers to the bit width of CHIP, which is just numerically equal to the number of memory particles of each rank.

Memory Series 2: In-depth understanding of hardware principles

This article follows the above and continues to introduce the hardware principle of DDR memory, including how to address , timing and delay , and what methods can be used to improve memory performance .

Although Xiao Zhang's problem was solved last time, it aroused his interest in the principle of memory. Now he came to me again, saying that I still owe him an explanation. This time we made an appointment to meet in a coffee shop. This time the content was a little more in-depth. I brought some pictures, and Xiao Zhang also ordered a large glass of Americano, planning to have a big fight. Seeing his serious look, I also decided to ruin others and bring him into the no-return path of an IT engineer. . .

1-addressing

In order to understand the several delay parameters mentioned a few days ago, I have to introduce the addressing method of DIMM.

Maybe you discovered that when we introduced the relationship between Rank and chip last time, there was a Bank/Column/row that we didn't talk about, and they are closely related to how to address. Remember the picture from last time?

insert image description here
This time let’s take a look at what’s in the rank and Chip, as shown below:

insert image description here
This is a schematic diagram of a DDR3 Rank. Let's disassemble the 128MB Chip on the left. It is composed of 8 Banks. The core of each Bank is a storage matrix, like a large square lattice array. This grid array has many columns (Column) and many rows (Row), so that we want to access a certain grid, we only need to tell which row and which column it is . Reason for block access.

Speaking of this, Xiao Zhang became interested: "I know, I know, I learned it in university, and I talked about it in the principle of computer composition. These are the row address lines and column address lines of the storage unit . The middle grid is a Bit!" . Xiao Zhang has quite a lot of knowledge! But here are some similarities, you can imagine, but not necessarily, CAS# and RAS# have only one signal line.

In fact, the storage width of each grid is the bit width of the memory particle (Chip) . Here, a Rank is composed of 8 Chips , and the CPU addressing width is 64bit, so 64/8=8bit, that is, each grid is 1 byte .

Selecting each grid is not simply two sets of signals, it is composed of a series of signals, take this 2GB DDR3 as an example:

    1. Chip select (Chip Select) signal, S0# and S1#, each used to select which Rank.
    1. Bank address line, BA0-BA2, 2^3=8, you can choose 8 banks
    1. Column Select (Column Address Select), CAS#, is used to indicate that the column address is to be selected now.
    1. Row Address Select, RAS# is used to indicate that the row address is to be selected now.
    1. Address lines, A0-A13, are used for address selection of rows and columns (not all of them are used for addresses, which are ignored here).
    1. Data lines, DQ0-DQ63, are used to provide full 64bit data.
    1. Command, COMMAND, is used to transmit commands, such as read or write, and so on.

Note that there is no selection signal line for memory particles , only the selection signal for Rank . After the Rank is selected, 8 memory particles are selected together to provide 64bit data in total.

Reading and writing data is also a little more complicated. In simple terms, it is divided into the following three steps:

    1. line is valid. RAS# low level, CAS# high level. It means that the row address is valid now, and at the same time, the address signal is transmitted in A0-A13, that is, 2^13 Rows can be selected.
    1. column is valid. RAS# high level, CAS# low level. It means that the column address is valid. At this time, the column address is transmitted on A0-A13. That's right, A0-A13 are shared by rows and columns, so each grid selection needs two steps of 1 and 2 to be uniquely determined.
    1. Data is read or written. Read or write according to COMMAND. After the small square is selected, the specific storage unit has been determined, and the remaining thing is to output the data to the memory bus through the data I/O channel (DQ).

Only random access is introduced here, and the Burst mode is skipped here. The following figure is a simple illustration:

insert image description here

2-Timing

After talking so much in one go, I couldn't help my mouth getting dry, so I stopped and took a big sip of coffee. Xiao Zhang thought I had finished speaking, and asked me anxiously: "I seem to understand, but I haven't explained the few numbers yet.". Don't worry, just listen to me slowly. Just because accessing a piece of data requires approximately three steps, in order to ensure the integrity of the signal, the steps must be separated directly, sending them together will cause confusion, and too close intervals will also make sampling difficult and easily introduce noise. So timing is very important,

The following is a timing diagram for back-to-back read and write:

insert image description here

3- Latency

As soon as Xiao Zhang saw this picture, he couldn't help shouting: "It's too complicated, it makes me suffer from trysophobia, I can't understand it!". It doesn't matter, we took it apart and looked at it one by one.

    1. CL: CAS Latency. CL means that after the CAS is issued, it still takes a certain period of time before the data can be output. The period from the issuance of the CAS and read command to the first data output is defined as CL (CAS Latency, CAS delay). Since CL only appears when reading, CL is also called read latency (RL, Read Latency). That is the time we need to read in step 3 above. CL is the most important parameter in the delay, and sometimes it is marked separately on the memory label as CLx. It tells us how many clock cycles before we can get the data, the CL7 memory will delay 7 cycles to give us the data, and the CL9 memory will have to wait 9. So the smaller the data, the faster we can get the data. Note that the cycle here is the real cycle instead of the marked DDR3 1333MHz cycle, because a cycle is transmitted twice, the real cycle is only 1/2, here is 666MHz. The figure below is an example of CL7 and CL9:
      insert image description here
      If the same frequency memory, CL7 can have 22% performance improvement than CL9.
    1. tRCD: RAS to CAS delay. There must be an interval between sending a column read and write command and a row valid command , which is a delay formulated according to the response time of the electronic components of the chip storage array . That is, the time interval between steps 1 and 2 . Of course, the faster the interval, the better. The following is an example of tRCD=3:

insert image description here
You can also see that this time is also the interval between the activate command and the read command.

    1. tRP: Precharge effective period (Precharge command Period) . There is a pre-charging process after the last transmission is completed and before the next row is activated, and it will take a period of charging time to allow the RAS to be sent. That is, how long does it take to prepare for step 1. Here is an example:

insert image description here
There are also two similar time delays tRAS and CMD, I will not talk about it when I see that Xiao Zhang is almost asleep. In short, all these delays together constitute the overall delay, and the smaller the delay, the better.

4-SPD

Having said so much, Xiao Zhang finally understands what 4-4-4-8, 5-5-5-15, CL-tRCD-tRP-tRAS-CMD on the memory label strip mean. But Xiao Zhang is a little confused. Consumers can understand these data printed on paper (in fact, not many people seem to understand), but the computer has no eyes, how does it know?

In fact, each DIMM has a small memory chip (EEPROM) on the board, which records many parameters including these in detail, as well as the manufacturer's code, etc., which is why the BIOS can know what kind of memory we have inserted. the reason . On Xiao Zhang's memory stick, I pointed it out to him:

insert image description here
In fact, with the step-by-step evolution of DDR, the number of these delayed clock cycles is also increasing step by step, but due to the acceleration of the frequency, the time is actually slowly decreasing.

5- Other means of performance improvement

It was still early, so I chatted with Xiao Zhang about how to increase the memory access speed besides increasing the frequency.

1. Multi-channel (Channel)

Modern memory controllers are all moved from the North Bridge into the CPU , and all memory controllers can operate multiple channels at the same time. Typical desktop and notebook CPUs have long supported dual-channel, and now triple-channel. If the data is distributed on the memory sticks inserted in different channels, the memory controller can read them at the same time regardless of the above delays and timings, and the speed can be doubled or even tripled! Xiao Zhang jumped up when he heard this: "I want to double it too!". Don't worry, to enable multi-channel, you must first insert the right slot. Now motherboard manufacturers usually use colors to identify memory channels in order to allow beginners to insert memory modules correctly. Note that the same channel has different colors! Therefore, the memory must be inserted into memory slots of the same color so that the memory can occupy different channels. It is best to check with the motherboard manual, and then enter the BIOS to see if the current memory status is multi-channel mode.

一、位置不同

北桥芯片就是主板上离CPU最近的芯片,以CPU插座为北的话,靠近CPU插座的一个起连接作用的芯片称为“北桥芯片;

南桥芯片(South Bridge)是主板芯片组的重要组成部分,一般位于主板上离CPU插槽较远的下方,PCI的前面,即靠主机箱前的一面。


二、作用不同

北桥主要负责CPU与内存之间的数据交换,主要控制 CPU内存显卡等高速设备;

南桥主要是负责I/O接口等一些外设接口的控制、IDE设备的控制及附加功能等等。

三、发展方向不同

北桥芯片的数据处理量非常大,发热量也越来越大,所以现在的北桥芯片都覆盖着散热片用来加强北桥芯片的散热,有些主板的北桥芯片还会配合风扇进行散热;

南桥芯片的发展方向主要是集成更多的功能,例如网卡、RAID、IEEE1394、甚至WI-FI无线网络等等。

2. Interleave

Seeing Xiao Zhang eager to try, I couldn't help pouring cold water on him. Fantasy is beautiful, reality is cruel. The usefulness of multi-channel is not obvious in many cases! Because of the locality of the program, a program does not put the data in various places, so it falls into another DIMM. Often the program and data are in the same DIMM, and the CPU's Cache itself will prefetch the data for you. Come out, this improvement is not obvious. Unless you run a lot of huge tasks.

"Ah, I always open a game to play, it's useless to me, it's just tasteless!" Xiao Zhang said. Not necessarily, there is another way to distribute the same piece of memory to different channels. This technique is called Interleaving. In this way, regardless of whether the cache is hit or not, it can be accessed at the same time , and the multi-channel technology can be more useful. "Great, how can I enable this interleave?", I can't help but hehe, this function is generally only available in server CPUs , if you have an i5, who will buy thousands of server CPUs?

3。Overclock

"Aren't you talking nonsense, how can I build a high-speed memory that is only equipped with a fever machine?". In fact, Xiao Zhang can buy enthusiast-level memory sticks. The DDR3 mark of these memory sticks reaches more than 2133! But it should be noted that if we plug these memory into a general motherboard, it is likely to run on 1333 or 1600, because this is the highest frequency specified by DDR3. A good horse with a good saddle needs a motherboard that can support overclocking memory. Only by boosting the voltage and frequency in the motherboard BIOS can you really make good use of these feverish memory sticks.

6- Epilogue

The time is almost up, I promise Xiao Zhang that I will introduce the mysterious BIOS how to initialize the memory next time, and I am about to leave. Xiao Zhang stopped me and said, "The hole you dug last time hasn't been filled yet!" "What hole?" Maybe I dug too many holes and couldn't remember. "It's the three questions you asked me to go back to think about last time. I know the first one. DIMM has a fool-proof port. The position of the fool-proof port is different for several generations of DDR. I can't insert it. I googled it on the Internet. The last two I really can't figure it out."

每一代内存条宽度都一样,会不会插错呢?
从前面各代内存速度表可以看出,每一代和前一代带宽都有部分重复,这是为什么?
前一代和下一代如果带宽一样,那个performance更好呢?

Well, let’s make a long story short. In fact, the two questions can be answered together. Today we know that the various delay parameters of each generation of DDR are rising , so if the frequency of the two generations is the same, the performance may actually decline! For example, the delay of DDR2 800 is smaller than that of DDR3 800 in many cases. We can think that the starting point of each generation is lower than that of the previous generation, and there is a period of overlap. After the frequency increases, the difference in the number of clocks in the delay will be compensated. The comparison delay is the number of clocks, not the time. The clock is faster, and it is possible. Latency will be smaller. And this overlapping period also leaves room for different business strategies. (This part still needs to be experienced!!!)

Xiao Zhang still caught me. He did not know where to look up some terms, such as the number of prefetches increases every generation, and the core frequency is different from the external frequency, etc. I hope he can look for information by himself, and dig a new hole by the way:

  1. Why does each generation of DDR have to be upgraded, instead of directly increasing the frequency, why is there no DDR2 3200 memory?

  2. DDR memory is still parallel data, serial seems to be faster and higher than the grid, why not get a serial access memory?

Xiao Zhang fell into deep thought, and I was secretly happy to have another coffee and afternoon tea. But when I go back, I still need to prepare some materials to continue eating and drinking. Next time after introducing the BIOS part of the memory, what other topics can continue to attract Xiao Zhang?

Memory Series 3: Analysis of Memory Initialization

This article follows the previous two articles, and continues to introduce how to initialize DDR memory in firmware, and how to improve the efficiency of initialization, etc.

Xiao Zhang didn't come to see me for a long time after he went back last time. I thought he was no longer interested in memory knowledge, but I didn't expect him to meet me again today. When I came to the cafe last time, he had been waiting for me for a long time. It turned out that after he went back last time, he searched for information for a long time. Although he had an overall grasp of hardware knowledge, the more he read, the more questions he had. He knew that I had a deep understanding of firmware and wanted to ask me questions about software. I couldn't help but be moved by Xiao Zhang's thirst for knowledge, ordered a cup of Americano coffee, and started today's introduction.

1- Memory initialization

Xiao Zhang got straight to the point and went straight to the point: "The hardware structure of the memory is so complicated, why have I never used it when writing programs? Is it possible that the operating system handles addressing and delay by itself?".

This is wrong, the operating system's understanding of memory only goes to the level of segment page management , that is, to the physical address. You can check out my blog, which has an introductory section on page management. Addressing from physical address to Rank, Bank, etc. is done by the memory controller . "I took a sip of coffee and began to introduce it with relish.

Since Intel/AMD canceled the north bridge, the memory controller has been integrated into the CPU. After all, the FSB is eliminated, the delay is greatly reduced, the bandwidth can be greatly increased, and the cost of the motherboard is also reduced . (The Front Side Bus - Front Side Bus (FSB) is the bus that connects the CPU to the North Bridge chip.)

Initializing the memory controller and memory is an important task for firmware, arguably one of the main tasks . Maybe you have read the previous introduction about UEFI, you will understand that the memory initialization is completed in the PEI stage. Generally speaking, we divide memory initialization into three stages:

1. Preparation Phase

This stage is mainly to prepare for memory training. It has to be done:

  • A.Initialize the memory controller registers.

  • B.Read SPD content. Read the content of SPD through SMBUS, record each delay, and use it in the next stage. The SMBUS address of each DIMM is different, generally A0/A1/A2/A3, which is related to the wiring of the motherboard. This step can also check whether there is a DIMM inserted in the memory slot. On some embedded motherboards, the memory particles are directly soldered to the motherboard, and there is probably no SPD at this time. At this time, firmware engineers need to hard-code the delay information into the code according to the hardware manual of the actual memory particles.

2. Memory Training

You may be surprised when you see Training. Artificial intelligence needs to be trained, but does memory also need to be trained?

From DDR2 to DDR3, a big change is the connection method of signal lines. A typical DDR2 connection is shown in the figure:

insert image description here
And DDR3 becomes:

(May I ask why Vt is specially marked on the right side of the ddr3 diagram? It seems to be connected with clk addr.

DDR3 adopts the form of fly by layout, and the latter vtt is the voltage provided to the terminal resistor to ensure the integrity of the cmd clk signal)
insert image description here
This daisy-chain method, DDR standard maker JEDEC has a special name for it-" Fly-by".

This design greatly reduces the difficulty of hardware manufacturing for DIMM manufacturers. At the same time, because the CLK/CLK#, DQS, AD and CMD signals do not need to be transmitted at the same time, the signal integrity is improved and higher frequencies are possible.

There is no free lunch in the world, and it has brought a lot of troubles. A big problem is that the timing coordination of the memory controller has become much more difficult.

The daisy chain connection method also means that there is a time difference for signals to be transmitted between each memory particle. The delay between the first memory granule and the second granule may not be large, but it is considerable when it reaches the eighth.

Recall our last introduction, after the row is valid and the column is valid, the first chip puts the data on the corresponding bit of the data line DQ after the time of CL, and it takes a long time for the eighth chip to prepare the data , when will the memory controller be able to collect data?

Now each memory channel generally has two memory slots, the situation will be more complicated. Adding fuel to the flames is that the DDR standard is formulated by JEDEC, memory chips and DIMM manufacturers are an ecosystem, and good and bad people are mixed in it, and the delays of memory chips and DIMMs are all kinds of strange.

This is completely different from the embedded platform made by ARM, which only needs to support fixed memory chips. All motherboard manufacturers hope to support more memory modules on the market. Sometimes it's a tightrope, timings that work well for memory A may not work for memory B.

It’s okay if timing errors are found during the startup phase. If the wrong or excessive timing escapes the memory detection of the BIOS, it will cause more trouble when the operating system is running.

How can we set the timing accurately? Fortunately, JEDEC provides a standard practice called Write Leveling Coarse. To put it simply, the memory controller keeps sending DQS signals with different delays , and the memory particles sample the state of CK on the rising edge of DQS-DQS#, and feed back to the DDR3 controller through the DQ line (a set of 01010101 data) .

The controller side repeatedly adjusts the delay of DQS-DQS# until the controller side detects the jump from 0 to 1 on the DQ line, and the controller locks the delay value at this time , and a Write leveling process is completed at this time . As shown below:

insert image description here
It combines other things such as on-die termination (ODT) and adjusting Vref voltage to complete the training of the memory . This is a process of constantly finding a balance point, and it is also a process of training the memory controller to understand the timing and voltage of the DIMM.

3. mopping up stage

Assuming that the previous stage successfully discovered and set the parameters, the following is relatively simple. It is mainly to set Channel and Interleave, and report the results to other parts of the firmware.

There is a lot of interesting information here, such as memory manufacturers, specific models, quantity, cooperation between memory sticks and memory slots, relationship between multiple CPUs and memory, etc. These information will be put into the HOB for future use. to call later.

2-Other

After talking endlessly for a long time, I thought that this would be enough for Xiao Zhang to drink a pot. I took a sip of coffee in satisfaction, unexpectedly Xiao Zhang took out an A4 paper from his pocket, and there were many questions on it. Good boy, play dumb up front and wait for me here. It doesn't matter, soldiers come to block, water comes to cover, just let the horse come!

1. How to get the source code

"After listening for a long time, I seem to understand it. Is there any code to see? Talk is cheap, show me the code!". The memory initialization code is generally provided by the chip manufacturer, and Intel calls it MRC (memory reference code) . Because it involves a large number of register operations, it is generally provided to IBV and OEM in the form of authorized access instead of open source.

Fortunately, Intel's open source hardware platform Galileo (Galileo) has opened all the source code of SOC Quark, including the MRC code, which is in:

tianocore/edk2:https://github.com/tianocore/edk2-archive/tree/master/QuarkSocPkg/QuarkNorthCluster/MemoryInit/Pei

Interested students can go to learn about it.

2。CAR

"I see that before the memory is initialized, UEFI is already executing the c program. Where is the stack at this time?".

Good question, until the memory is ready. **UEFI firmware generally initializes cache as memory (CAR, Cache As Ram). ** Not only that, we can also use part of the cache to continue to use as a cache, cache data and code, and the cache used as memory can also execute code! Cache is really powerful!

3。Fast boot/S3

"The memory initialization needs to be trained, and it feels very slow. Is there a way to speed it up?". Yes, if we don't change the memory stick, we don't actually need to train from scratch every time. We can store the memory data of the first power-on training, and use the previous data in the future. This is the Fast Boot of many BIOS memory parts.

"Then if I change the memory, will it crash?". No, there is a GPIO on the motherboard connected to the chassis switch, we open the chassis, the BIOS will capture this information (called Intrude) . Using this signal, we can think that the user has done something to change the configuration, and we are training from scratch . Some MRCs will automatically retrain after the SPD information changes.

"Can you tell me about the memory settings in hibernation mode?". S3, which is Sleep to memory, is a sleep mode specified by ACPI. We will talk about it later (is there any more pits to dig?). Here I will simply talk about the state of the memory. **General SDRAM has to be refreshed, which is determined by its design. It has a special mode called self-refresh. **In this mode, the memory content will not disappear and the power consumption is extremely small. Our S3 is to set the memory to this state.

4. How to set various delays

"Can I change those delay parameters that I said last time?". Generally, the BIOS will automatically set various delays according to the SPD, and we can also set these parameters by ourselves in the BIOS settings , as shown in the figure below:

insert image description here

But be careful with the settings, wrong changes may cause failure to start, make sure you know how to clear CMOS or reset BIOS before changing these settings.

3- Epilogue

It's getting late, and it's time for me to say goodbye to Xiao Zhang. I hope these three introductions can give Xiao Zhang something to gain.

summary

In the past, the understanding of memory was only at the level of mapping and page table. Today, I have further learned about the hardware level. In the future, I hope that one day I can map this virtual address to a physical address, and then access the physical address. The whole end-to-end string is really cool.

Thanks to Uncle Wolf for the article, too much work! ! !

Guess you like

Origin blog.csdn.net/weixin_45264425/article/details/130540596
Old