Computer memory mechanism in detail

Full text catalog


Programs run in memory. A qualified programmer must understand memory. A programmer who does not understand memory is destined to be unable to make a qualitative leap in his programming level. He can only see things in the fog and know what is going on. Don't know why.

1. How does a program run on a computer?

The program is stored in the hard disk and needs to be loaded into the memory to run. The CPU is also designed to only read data and instructions from the memory.

For the CPU, the memory is just a place to store instructions and data, and calculation functions cannot be completed in the memory. For example, to calculate a = b + c, a, b, and c must be read into the CPU before addition can be performed. Operation. In order to understand the specific operation process, we might as well take a look at the structure of the CPU.

The CPU is a complex computer component that contains many small parts, as shown in the figure below:

Insert image description here

The computing unit is the brain of the CPU, responsible for addition, subtraction, multiplication, division, comparison, displacement and other operations. Each operation is supported by a corresponding circuit and is very fast.

寄存器(Register)It is a very small and very fast storage component inside the CPU. Its capacity is very limited. For a 32-bit CPU, each register can generally store 32 bits (4 bytes) of data. For a 64-bit CPU, each register Generally, it can store 64 bits (8 bytes) of data. In order to complete various complex functions, modern CPUs have dozens or even hundreds of built-in registers. Embedded systems have a single function and a small number of registers.

We often hear about the number of bits in a CPU, which refers to the number of bits in the register. The CPUs used in personal computers have now entered the 64-bit era, such as Intel's Core i3, i5, i7, etc.

Registers are crucial and indispensable in the execution process of the program. They can be used to complete mathematical operations, control the number of loops, control the execution flow of the program, mark the CPU running status, etc.

For example: EIP(Extern Instruction Pointer )寄存器the value is the address of the next instruction. After the CPU executes the current instruction, it will find the next instruction according to the value of EIP. Changing the value of EIP will change the execution flow of the program; it saves the physical address of the current process page directory CR3 寄存器. , switching processes will change the value of CR3, which will be explained in " MMU Components and Control of Memory Permissions "; the EBP and ESP registers are used to point to the bottom and top of the stack, and function calls will change the values ​​of EBP and ESP. This It will be explained in " The Concept of Stack and Stack Overflow ".

So, why do we need to set up a cache inside the CPU?

Although the reading speed of memory is already very fast, there is still a big gap compared with the CPU. It is not an order of magnitude. If data is read from the memory every time, it will seriously slow down the running speed of the CPU. The CPU Often in a waiting state with nothing to do. By setting up a cache inside the CPU, frequently used data can be temporarily read into the cache. When data at the same address is needed, there is no need to go all the way to access the memory and it can be read directly from the cache.

Insert image description here

When people buy a CPU, they often care about the cache capacity. For example, the Intel Core i7 3770K has a third-level cache of 8MB, a second-level cache of 256KB, and a first-level cache of 32KB. The larger the capacity, the more powerful the CPU.

The capacity of the cache is limited, and the CPU can only read part of the data from the cache. For data that is not used very frequently, it will bypass the cache and read it directly from the memory. Therefore, data cannot be obtained from the cache every time. This is the hit rate of the cache. If it can be read from the cache, it will hit, otherwise it will miss. The hit rate of the cache is another science. There are complex algorithms for which data is retained in the cache and which data is not retained.

CPU Instructions
To make the CPU work, you must use specific instructions, such as add for addition, sub for division, and cmp for comparing the size of two numbers, which is called CPU的指令集(Instruction Set). Our C language code will eventually be compiled into CPU instructions one by one. The instruction sets supported by different models of CPUs vary, but most of them are the same.

In this section, we explain the simple structure of the CPU and CPU instructions. The focus is to let everyone know the register, a small and fast storage component. It plays a vital role in the running process of the program. The CPU uses it to record the running of the program. status, and then decide the next operation based on its value.

2. What exactly is virtual memory?

In C language, the value of a pointer variable is a memory address, and &the function of the operator is also to get the memory address of the variable. Please see the following code:

Insert image description here

A and b in the code are global variables. Their memory addresses have been determined at link time and cannot be changed later. No matter when the program is run, the result will be the same.

So the question is, what if these two addresses in the physical memory are occupied by other programs? Wouldn't our program be unable to run?

Fortunately, these memory addresses are fake, not real physical memory addresses, but virtual addresses. The virtual address can be mapped to the physical address through conversion by the CPU, and every time the program is run, the operating system will rearrange the corresponding relationship between the virtual address and the physical address, and use whichever segment of physical memory is free. As shown below:

Insert image description here

virtual address

The whole idea of ​​virtual addresses is this: treat the address given by the program as an address 虚拟地址(Virtual Address), and then convert this virtual address into an actual physical address through some mapping method. In this way, as long as we can properly control the mapping process from virtual address to physical address, we can ensure that the program can use the same address every time it runs.

For example, the address of variable a in the above code is 0X404038. When it is run for the first time, its corresponding physical memory address may be 0X12ED90AA, and when it is run for the second time, it may correspond to 0XED90. Our program does not need to care about these, these complicated memories Just leave the management work to the operating system.

Let's get back to the essence of how the program runs. The user program does not want to be involved in these complex memory management processes when it is running. As an ordinary program, it needs a simple execution environment with its own memory and its own CPU. It seems that the entire program occupies the entire computer without Care about other programs.

In addition to being able to use fixed memory addresses during programming, which brings convenience to programmers, using virtual addresses can also isolate the address spaces of different programs from each other and improve memory usage efficiency.

Isolate the address spaces of different programs from each other

If all programs use physical memory directly, the address spaces used by the programs are not isolated from each other. Malicious programs can easily rewrite the memory data of other programs to achieve destruction; some non-malicious but buggy programs may also accidentally modify the data of other programs, causing other programs to crash.

This is intolerable for users who need a safe and stable computer environment. The user hopes that if one of the tasks fails while using the computer, at least it will not affect other tasks.

After using the virtual address, although both program A and program B can access the same address, their corresponding physical addresses are different. No matter how they are operated, the other party's memory will not be modified.

Improve memory usage efficiency

After using virtual addresses, the operating system will be more involved in memory management, which makes it possible to control memory permissions. For example, we hope that the memory that saves data does not have execution permission, the memory that saves code does not have modification permission, and the memory occupied by the operating system does not have read permission for ordinary programs, etc.

In addition, when physical memory is insufficient, the operating system can more flexibly control the granularity of swap-in and swap-out. Disk I/O is a very time-consuming task, which can greatly improve program performance.

We will explain the above two points in detail in " Memory Paging Mechanism " and " Implementation of Memory Paging Mechanism ".

middle level thinking

In computers, in order to make operations more intuitive, easy to understand, and enhance user experience, developers often use a magic weapon - , which is to use an 增加中间层indirect method to shield complex underlying details and only provide users with a simple interface. Virtual addresses are a classic example of using a middle tier.

In fact, the entire development process of computers is the continuous introduction of new middle layers:

  • In the early days of computers, programs ran directly on the hardware and were responsible for the management of the hardware. Programmers also used binary programming and needed to deal with various boundary conditions and security issues.
  • Later, people couldn't stand it anymore, so they developed an operating system to manage various hardware. At the same time, they invented assembly language to reduce the burden on programmers.
  • As the scale of software continues to increase, programming in assembly language begins to become difficult. Not only is the learning cost high, but the development efficiency is also low, so the C language was born. The C language compiler first translates the C code into assembly code, and then the assembler translates the assembly code into machine instructions.
  • With the development of computers, hardware has become more and more powerful, and software has become more and more complex. People are no longer satisfied with using C language, so modern programming languages ​​​​such as C++, Java, C#, and PHP were born.

3. Virtual address space and compilation mode

The so-called virtual address space is the effective range of virtual addresses that a program can use. The mapping relationship between virtual addresses and physical addresses is determined by the operating system. Correspondingly, the size of the virtual address space is also determined by the operating system, but it is also affected by the compilation mode.

In this section, we first explain the CPU, and then the compilation mode, so that everyone can understand how the compiler cooperates with the CPU to improve the running speed of the program.

CPU data processing capabilities

The CPU is the core of the computer, which determines the computer's data processing capabilities and addressing capabilities, which also determines the computer's performance. The size of the data that the CPU can process at one time (within one clock) is determined by the number of bits in the register and the width of the data bus (that is, how many data buses there are). What we usually call the number of bits in a CPU, in addition to what can be understood as a register The number of bits can also be understood as the width of the data bus, usually they are equal.

The data bus is located on the motherboard, not in the CPU, and is not determined by the CPU. Strictly speaking, it should be said here that the maximum number of data buses that the CPU can support, that is, the maximum data processing capability that can be supported. For the convenience of expression, this article only Use the term "CPU's data bus".

The data bus and the main frequency are both important indicators of the CPU: the data bus determines the single data processing capability of the CPU, and the main frequency determines the number of data processing times per unit time of the CPU. Their product is the data processing volume of the CPU per unit time.

We often hear that the main frequency of the CPU has increased rapidly during the development of computers, from the initial tens of KHz to the later hundreds of MHz to the current 4GHz. Finally, it is difficult to increase the frequency due to the physical properties of the silicon crystal. Can develop in the direction of multi-core. In this process, the data bus width of the CPU has also doubled, from 8-bit to 16-bit in the early days to 32-bit later. Now most of our computers are using 64-bit CPUs.

It should be noted that the data bus and the address bus are not the same thing. The data bus is used to transmit data between the CPU and the memory, and the address bus is used to locate data on the memory. There is no necessary connection between them, and the width is not necessarily equal. What actually happens is that the width of the address bus tends to grow with the width of the data bus to access larger memory.

16-bit CPU
Early CPUs were 16-bit and could process 16Bit (2 bytes) of data at a time. At this time, the computer industry was still in its early stages. Personal computers had not yet entered thousands of households, and the concept of virtual addresses had not been proposed. Programs still ran directly on physical memory. The operating system's memory management was very simple, and programmers could easily write A malicious program modifies the memory of another program.

Students who have studied assembly should know that a typical 16-bit processor is Intel 8086, which has 16 data buses, 20 address buses, and an addressing capability of 2^20 = 1MB.

32-bit CPU
With the advancement of the computer industry, 32-bit CPUs have emerged, which can process 32Bit (4 bytes) of data at a time. At this time, the concept of virtual address was proposed and applied to the CPU and operating system. They jointly completed the mapping of virtual addresses and physical addresses, which made program writing easier and operation safer.

Typical 32-bit processors are Intel's 80386 and Intel Pentium 4 (Pentium 4): the 80386's data bus and address bus width are both 32 bits, with an addressing capacity of 4GB; the Pentium 4's address bus width is 36 bits, and the theoretical addressing capacity is 36 bits. The address capacity reaches 64GB.

64-bit CPU
Modern computers all use 64-bit CPUs, which can process 64Bit (8 bytes) of data at a time. Typical 64-bit processors are Intel's Core i3, i5, i7, etc., and their address bus width is about 40~50 bits. The emergence of 64-bit CPU has once again made a qualitative leap in personal computers.

Actual supported physical memory
The physical memory supported by the CPU is only theoretical data. In actual application, it will also be limited by the operating system. For example, Win7 64-bit Home Edition only supports a maximum of 8GB or 16GB of physical memory, and Win7 64-bit Professional Edition or The Enterprise Edition can support up to 192GB of physical memory.

Windows Server 2003 Data Center Edition is designed for large enterprises or national institutions and can handle massive amounts of data. It is divided into 32-bit version and 64-bit version. The 32-bit version supports up to 512GB of physical memory, which is obviously beyond the requirements of a 32-bit CPU. Addressing capability can be achieved by addressing twice.

Compile mode

In order to be compatible with different platforms, most modern compilers provide two compilation modes: 32-bit mode and 64-bit mode.

32-bit compilation mode

In 32-bit mode, a pointer or address occupies 4 bytes of memory, with a total of 32 bits. Theoretically, the size of the virtual memory space that can be accessed is 2^32 = 0X100000000Bytes, that is, 4GB, and the effective virtual address range is 0 ~ 0XFFFFFFFF.

In other words, for 32-bit compilation mode, no matter how large the actual physical memory is, the range of effective virtual address space that the program can access is, that is, 0 ~ 0XFFFFFFFFthe size of the virtual address space is 4GB. In other words, the maximum memory that the program can use is 4GB, which has nothing to do with physical memory.

If the memory required by the program is greater than the physical memory, or the remaining space in the memory is not enough to accommodate the current program, the operating system will write a part of the temporarily unused data in the memory to the disk, and then read it back when needed. Our program only uses 4GB of memory and does not care about whether the hardware resources are sufficient.

If the physical memory is greater than 4GB, for example, many PCs are currently equipped with 8GB of memory, then the program can't do anything, it can only use 4GB of it.

64-bit compilation mode

In 64-bit compilation mode, a pointer or address occupies 8 bytes of memory, with a total of 64 bits. Theoretically, the size of the virtual memory space that can be accessed is 2^64. This is a very large value, almost infinite. As far as current technology is concerned, not only is the physical memory impossible to reach this large, but the addressing capability of the CPU is also not so large. Implementing a 64-bit long virtual address will only increase the system cost. The complexity and cost of address translation do not bring any benefits, so both Windows and Linux limit virtual addresses, using only the lower 48 bits (6 bytes) of the virtual address, and the total virtual address space size is 2^48= 256TB.

have to be aware of is:

  • A 32-bit operating system can only run 32-bit programs (that is, programs compiled in 32-bit mode), while a 64-bit operating system can run 32-bit programs at the same time (for forward compatibility, a large number of existing 32-bit applications are retained programs) and 64-bit programs (that is, programs compiled in 64-bit mode).
  • A 64-bit CPU can achieve its maximum performance by running 64-bit programs. Running 32-bit programs will waste some resources.
  • At present, computers can be said to have entered the 64-bit era. The reason why 32-bit compilation mode is provided is to be compatible with some old hardware platforms and operating systems. Or in some cases, the 32-bit environment is enough and a 64-bit environment is used. Will increase costs, such as embedded systems, microcontrollers, industrial controls, etc.

The 32-bit environment mentioned here refers to: 32-bit CPU + 32-bit operating system + 32-bit program.

In addition, it should be noted that the 32-bit environment has a very classic design, is easy to understand, and is suitable for teaching. Many existing materials are explained based on the 32-bit environment. The same goes for this tutorial, which is for a 32-bit environment unless otherwise specified. Compared with the 32-bit environment, the design ideas of the 64-bit environment have not changed qualitatively. It is understood that the 32-bit environment can easily migrate to the 64-bit environment.

4. Memory alignment to improve addressing efficiency

Computer memory is divided into bytes. In theory, the CPU can access any number of bytes, but in practice this is not the case.

The CPU accesses memory through the address bus. If it can process several bytes of data at a time, it commands the address bus to read several bytes of data. A 32-bit CPU can process 4 bytes of data at a time, so 4 bytes of data are read from the memory each time; less is a waste of main frequency, and more is useless. The same is true for 64-bit processors, which read 8 bytes each time.

Taking a 32-bit CPU as an example, the actual addressing step size is 4 bytes, that is, only memories that are multiples of 4 are addressed, such as 0, 4, 8, 12, 1000, etc., and no memory is addressed. Memory addresses numbered 1, 3, 11, 1001. As shown below:

Insert image description here

This allows for the fastest addressing: no byte is missed, and no byte is addressed twice.

For programs, it is best for a variable to be within the range of an addressing step, so that the value of the variable can be read once; if it is stored across the step, it needs to be read twice and then spliced ​​together, which is obviously efficient. Reduced.

For example, if the address of an int type data is 8, it is easy to handle. Just address the memory numbered 8 once. If the number is 10, it is more troublesome. The CPU needs to first address the memory number 8, read 4 bytes, and get the first half of the data, and then address the memory number 12 and read 4 bytes. Bytes, get the second half of the data, and then splice the two parts together to get the value of the data.

Try to place a piece of data within a step size to avoid storage across strides. This is called memory alignment. In 32-bit compilation mode, the default is 4-byte alignment; in 64-bit compilation mode, the default is 8-byte alignment.

In order to improve access efficiency, the compiler will automatically perform memory alignment.

5. Memory paging mechanism to complete virtual address mapping

There are many ideas about the mapping of virtual addresses and physical addresses. We can assume that on a program basis, a section of virtual space of the same size required to run the program is mapped to a section of physical space.

For example, program A requires 10MB of memory. The virtual address range is from 0X00000000 到 0X00A00000. Assume that it is mapped to a physical memory of the same size. The address range is from 0X00100000 到 0X00B00000, that is, each byte in the virtual space corresponds to each byte in the physical space.

When the program is running, their correspondence is as shown in the figure below:

Insert image description here
When program A needs to access 0X00001000 , the system will convert this virtual address into an actual physical address 0X00101000, 0X002E0000and 0X003E0000so on.

This method of taking the entire program as a unit solves the problem of non-isolation of addresses in different programs, and can also use fixed addresses in the program.

address isolation

As shown in the figure above, Program A and Program B are mapped to two different physical memories respectively. There is no overlap between them. If the virtual address accessed by Program A exceeds 0X00A00000this range, the system will judge that this is an illegal access. , reject the request and report the error to the user. The usual approach is to forcefully close the program.

Programs can use fixed memory addresses

No matter which area of ​​the physical memory the virtual memory is mapped to, it is transparent to the programmer. We do not need to care about the changes in the physical address. We only need to 0X00000000 到 0X00A00000write the program and place the variables according to the slave address. The program no longer needs reset.

Memory usage efficiency issues

以程序为单位对虚拟内存进行映射At this time, if the physical memory is insufficient, the entire program will be swapped in and out to the disk, which will inevitably lead to a large number of disk read and write operations and seriously affect the running speed, so this method is still rough and has a relatively large granularity.

Memory paging mechanism

We know that when a program is running, it only frequently uses a small part of the data in a certain period of time. In other words, a lot of the program's data will not actually be used in a certain period of time.

以整个程序为单位进行映射,不仅会将暂时用不到的数据从磁盘中读取到内存,也会将过多的数据一次性写入磁盘,这会严重降低程序的运行效率。

Modern computers use 分页(Paging)methods to divide and map virtual address space and physical address space to reduce the granularity of swap-in and swap-out and improve program running efficiency.

The idea of ​​paging is to artificially divide the address space into several parts of equal (and fixed) size. Such a part is called a page, just like a book is composed of many pages, and each page is equal in size. In this way, the memory can be swapped in and out in units of pages:

  • When the program is running, only the necessary data needs to be read from the disk to the memory. Data that is not used temporarily will be left on the disk and read when it is needed.
  • When the physical memory is insufficient, you only need to write part of the data of the original program to the disk to free up enough space, without writing the entire program to the disk.

About page size

The page size is fixed and determined by the hardware, or the hardware supports pages of multiple sizes and the operating system selects the page size. For example, the Intel Pentium series processor supports a page size of 4KB or 4MB, then the operating system can choose a page size of 4KB or a page size of 4MB, but only one size can be selected at the same time, so the entire system Say, that is, fixed size.

Almost all operating systems on PCs currently use 4KB pages. Assume that the PC we are using is 32-bit, then the virtual address space has a total of 4GB. If divided into 4KB pages, there will be a total of 2^32 / 2^12= 2^20= 1M = 1048576 pages; the physical memory is also divided in the same way.

Map based on page

Below we use a simple example to illustrate how virtual addresses are mapped to physical addresses based on pages. Please look at the following figure first:

Insert image description here
After we separate the virtual space of the program by pages, we load the commonly used data and code pages into the memory, temporarily leave the less commonly used ones on the disk, and then read them from the disk when they are needed. In the figure above, we assume that there are two programs, Program 1 and Program 2, and some of their virtual pages are mapped to physical pages. For example, VP0, VP1, and VP7 of Program 1 are mapped to PP0, PP2, and PP3 respectively; while some are not. remain in the disk, such as VP2 and VP3 are located in DP0 and DP1 of the disk respectively; there are also some pages such as VP4, VP5, and VP6 that may not have been used or accessed yet, and they are temporarily unused.

Here, we call the page in the virtual space a virtual page (VP, Virtual Page), the page in the physical memory is called a physical page (PP, Physical Page), and the page in the disk is called a disk page (DP, Disk Page).

The lines in the figure represent the mapping relationship. It can be seen that some virtual pages in Program 1 and Program 2 are mapped to the same physical page, so that memory sharing can be achieved.

VP2 and VP3 of Program 1 are not in the memory, but when the process needs to use these two pages, the hardware will capture this message, so-called, and then the operating system will take over the process and be responsible for reading 页错误(Page Fault)VP2 and PV3 from the disk. out and loaded into the memory, and then establish a mapping relationship between these two pages in the memory and VP2 and VP3.

6. Implementation of memory paging mechanism (mapping of virtual address and physical address)

Modern operating systems use a paging mechanism to manage memory, which allows each program to have its own address space. Whenever a program uses a virtual address to read or write, it must be converted to an actual physical address in order to actually locate the data on the memory stick. As shown below:

Insert image description here

页表(Page Table)The conversion of memory addresses is completed through a mechanism called , which is the focus of this section, namely:

  • What is a page table? Why use the page table mechanism instead of other mechanisms?
  • How are virtual addresses translated to physical addresses via page tables?

Directly use array conversion

The easiest mapping solution to think of is to use an array: each array element stores a physical address, and the virtual address is used as the array subscript, so that the mapping can be easily completed and the efficiency is not low. As shown below:

Insert image description here

However, such an array has 2^32 elements, each element is 4 bytes in size, and takes up a total of 16GB of memory, which is unrealistic!

Use one-level page table

Since the memory is paged, as long as we can locate the page where the data is located and its offset within the page (that is, the number of bytes from the beginning of the page), we can convert it into a physical address.

For example, an int type value is stored on page 12, and the page offset is 240, then the corresponding physical address is 2^12 * 12 + 240= 49392.

2^12 is the size of a page, which is 4K.

The virtual address space size is 4GB, containing a total of 2^32 / 2^12 == 2^20 1K * 1K = 1M = 1048576 pages. We can define an array like this: it contains 2^20 = 1Melements, and the value of each element is the page number (that is, which page is located ), the length is 4 bytes, and the entire array occupies a total of 4MB of memory space. Such an array is called a page table (Page Table), which records the numbers of all pages in the address space.

The length of the virtual address is 32 bits. We might as well cut it and use the upper 20 bits as the subscript of the page table array and the lower 12 bits as the intra-page offset. As shown below:

Insert image description here

Why is it cut like this? Because the page table array has 2^20 = 1M a total of elements, using the high 20 bits of the virtual address as a subscript can exactly access all elements in the array; and, the size of a page is , 2^12 = 4KBusing the low 12 bits of the virtual address can exactly represent all offsets.

Note that only 20 bits are needed to represent the page number, but the length of each element of the page table array is 4 bytes, that is, 32 bits, which is 32 - 20 = 12 bits more. These 12 bits are also of great use and can be used to represent the relevant attributes of the current page, such as whether it has read and write permissions, whether physical memory has been allocated, whether it has been swapped out to the hard disk, etc.

For example, a virtual address 0XA010BA01, its high 20 bits are 0XA010B, so you need to access the 0XA010B element of the page table array to find the physical page where the data is located. Assume that the value of the 0XA010B element of the page table array is 0X0F70AAA0, and its high 20 bits are 0X0F70A, then it can be determined that the data is located on the 0X0F70A physical page. Looking at the virtual address again, its lower 12 bits are 0XA01, so the page offset is also 0XA01. With the page index and intra-page offset, the physical address can be calculated. After calculation, the final physical address is 0X0F70A * 2^12 + 0XA01 = 0X0F70A000 + 0XA01 = 0X0F70AA01.

The mapping relationship formed by this idea is shown in the figure below:

Insert image description here
It can be found that some pages are mapped to physical memory and some are mapped to hard disk. Different mapping methods can be controlled by the lower 12 bits of the page table array element.

Using this solution, no matter how much memory the program occupies, 4M of memory space must be allocated for the page table array (the page table array must also be placed in physical memory), because the upper 1G or 2G of the virtual address space is occupied by the system. Yes, it is necessary to ensure that the larger array subscript is valid.

Nowadays, hardware is very cheap and memory capacity is large. Many computers are equipped with 4G or 8G of memory. The page table array taking up 4M of memory may not seem like a lot, but when the 32-bit system was just released, memory was still a very scarce resource. There was a lot of memory. Computers are only equipped with 100M or even dozens of megabytes of memory, and 4M of memory seems a bit large, so the above solution must be improved to compress the memory occupied by the page table array.

Use two-level page tables

slightly

Use multi-level page tables

slightly

7. MMU components and control of memory permissions

When the mapping of virtual addresses and physical addresses is completed through the page table, multiple conversions and calculations are required. If the operating system completes this work, it will exponentially reduce the performance of the program, and the gain outweighs the loss, so this method is unrealistic.

MMU

Inside the CPU, there is a component called MMU(Memory Management Unit,内存管理单元), which is responsible for mapping virtual addresses to physical addresses, as shown in the following figure:

Insert image description here

In page mapping mode, the CPU sends a virtual address, which is the address we see in the program. This address will first be given to the MMU, and then converted into a physical address by the MMU.

Even so, the MMU has to access the memory several times, and the performance is still worrying, so a cache is added inside the MMU specifically to store the page directory and page table. The cache inside the MMU is limited. When the page table is too large, only some commonly used page tables can be loaded into the cache. However, this is enough because through clever design of the algorithm, the cache hit rate can be increased to 90%. In the next 10% of cases, it cannot be hit, and then the page table is loaded into the physical memory.

With direct hardware support, the performance loss when using virtual addresses is very small and within an acceptable range compared to using physical addresses.

The MMU only completes the mapping of virtual addresses to physical addresses through page tables, but does not build page tables. Building page tables is the task of the operating system. During the process of loading a program into memory and running the program, the operating system will continuously update the page table corresponding to the program and save the physical address of the page directory to the CR3 register. When the MMU loads the page table into the cache, it will find the page directory based on the CR3 register, then find the page table, and finally complete the memory mapping through a combination of software and hardware.

CR3 is a register inside the CPU specifically used to save the physical address of the page directory.

Each program has its own set of page tables when running. When switching programs, just change the value of the CR3 register to switch to the corresponding page table.

Controlling memory permissions

In addition to completing the mapping of virtual addresses to physical addresses, the MMU can also control memory permissions. The previous section " Implementation of Memory Paging Mechanism (Mapping of Virtual Address and Physical Address) " mentioned that in the page table array, each element occupies 4 bytes, which is 32 bits. We use the high 20 bits to represent the physical page. number, the lower 12 bits are left. These 12 bits are used to control the memory, for example, whether it is mapped to physical memory or to disk, whether the program has access permissions, whether the current page has execution permissions, etc.

The operating system defines the memory permissions when building the page table. When the MMU maps the virtual address, it first checks the lower 12 bits to see if the current program has permission to use it. If so, the mapping is completed. If not, an exception is generated. , and handed over to the operating system for processing. The operating system is generally rough in handling this kind of memory error and will directly terminate the execution of the program.

8. Memory layout (memory model) of C language programs under Linux

As mentioned in the section "Virtual Address Space and Compilation Mode", the size of the virtual address space in a 32-bit environment is 4GB, and the size in a 64-bit environment is 256TB. Then, the memory of a C language program is in the entire address space . How are they distributed? Where is the data? Where is the code? Why is it distributed like this? That's what this section is about.

The distribution of program memory in the address space is called the memory model. The memory model is built by the operating system, differs between Linux and Windows, and will be affected by the compilation mode. In this section we explain the memory model of the 32-bit environment and the 64-bit environment under Linux.

Kernel space and user space

For a 32-bit environment, theoretically the program can have a 4GB virtual address space. The variables, functions, strings, etc. we use in the C language will all correspond to an area in the memory.

However, in this 4GB address space, part of it must be used by the operating system kernel. Applications cannot directly access this memory. This part of the memory address is called 内核空间(Kernel Space).

Windows allocates 2GB of space at high addresses to the kernel by default (can also be configured to 1GB), while Linux allocates 1GB of space at high addresses to the kernel by default. That is, the application can only use the remaining 2GB or 3GB of address space, called 用户空间(User Space).

User space memory distribution in 32-bit environment under Linux

We don’t care about the memory distribution of the kernel space for the time being. The following figure is a classic memory model in a 32-bit environment under Linux:

Insert image description here
Description of each memory partition:

memory partition illustrate
Program code area (code) Stores the binary code of the function body. A C language program consists of multiple functions, and the execution of a C language program is the mutual calls between functions.
constant area (constant) Store general constants, string constants, etc. This memory has only read permission, not write permission, so their values ​​cannot be changed while the program is running.
global data Store global variables, static variables, etc. This memory has read and write permissions, so their values ​​can be changed arbitrarily while the program is running.
Heap Generally, it is allocated and released by the programmer. If the programmer does not release it, it will be recycled by the operating system when the program ends. Functions such as malloc(), calloc(), and free() operate on this memory, which is also the focus of this chapter. Note: The heap area mentioned here is not the same concept as the heap in the data structure. The allocation method of the heap area is similar to a linked list.
dynamic link library Used to load and unload dynamic link libraries during program running.
stack Stores function parameter values, local variable values, etc., and its operation method is similar to the stack in the data structure.

In these memory partitions (the dynamic link library will not be discussed for the time being), the program code area is used to save instructions, and the constant area, global data area, heap, and stack are used to save data. The study of memory focuses on the study of data partitioning.

The program code area, constant area, and global data area are allocated after the program is loaded into the memory, and they will always exist during the running of the program. They cannot be destroyed or increased (the size has been fixed) and can only be operated after the program is finished. The system reclaims it, so global variables, string constants, etc. can be accessed anywhere in the program because their memory is always there.

The constant area and the global data area are sometimes collectively called the static data area, which means that this memory is specially used to save data and will always exist during the running of the program.

When a function is called, function-related information such as parameters, local variables, return address, etc. will be pushed onto the stack. After the function is executed, this information will be destroyed. Therefore, local variables and parameters are only valid in the current function and cannot be passed outside the function because their memory is no longer there.

The memory in the constant area, global data area, and stack is automatically allocated and released by the system and cannot be controlled by the programmer.

The only memory area that programmers can control is the Heap: it is a huge memory space that often occupies most of the entire virtual space. In this space, the program can apply for a piece of memory and use it freely (put enter any data). The heap memory will always exist until the program actively releases it, and will not expire when the function ends. The data generated inside the function can be used outside the function as long as it is placed on the heap.

In order to deepen your understanding of memory layout, please look at the following piece of code:

#include <stdio.h>

char *str1 = "www.c.xyz";  		//字符串在常量区,str1在全局数据区
int n;  						//全局数据区
char* func(){
    
    
    char *str = "内存布局"; 		//字符串在常量区,str在栈区
    return str;
}
int main(){
    
    
    int a;  					//栈区
    char *str2 = "01234";  		//字符串在常量区,str2在栈区
    char  arr[20] = "56789";  	//字符串和arr都在栈区
    char *pstr = func();  		//栈区
    int b;  					//栈区
    return 0;
}

User space memory distribution in 64-bit environment under Linux

In a 64-bit environment, the virtual address space size is 256TB. Linux allocates the upper 128TB space to the kernel and the lower 128TB space to user programs. As shown below:

Insert image description here

As mentioned in the section " Virtual Address Space and Compilation Mode ", in a 64-bit environment, although the virtual address occupies 64 bits, only the lowest 48 bits are valid. One thing to add here is that bits 48 to 63 of any virtual address must be consistent with bit 47.

In the above figure, the 47th bit of the user space address is 0, so the high 16 bits are also 0. Converted to hexadecimal form, the four highest numbers are all 0; the 47th bit of the kernel space address is 1, so the high 16 bits It is also 1. Converted to hexadecimal form, the four highest numbers are all 1. In this way, a part of the address in the middle is just vacant, which is the "undefined area" in the picture. This part of the memory cannot be accessed anyway.

9. Memory layout of C language program under Windows

In a 32-bit environment, Windows allocates 2GB of high-address space to the kernel by default (it can also be configured to 1GB), and allocates the remaining 2GB of space to user programs.

Unlike Linux, Windows is closed source, protected by copyright, and has less information. It is difficult to study every detail in depth. There are still some internal principles that are not known to everyone. Regarding the memory distribution of Windows address space, the official website only gives simple instructions:

  • For 32-bit programs, the kernel occupies the higher 2GB, and the remaining 2GB is allocated to user programs;
  • For 64-bit programs, the kernel occupies the highest 248TB, and the user program occupies the lowest 8TB.

The following figure shows the memory distribution of a typical Windows 32-bit program:

Insert image description here
As you can see, the Windows address space is allocated to various exe, dll files, heaps, and stacks. The exe file is generally located at the address starting at 0x00400000; some DLLs are located at the address starting at 0x10000000, such as the runtime library dll; and some DLLs are located close to 0x80000000, such as system dlls, Ntdll.dll, and Kernel32.dll.

The stack location is distributed at 0x00030000 and behind the exe file. Some readers may wonder why Windows needs so many stacks? We know that the stack of each thread is independent, so there are as many corresponding stacks as there are threads in a process. For Windows, the default stack size of each thread is 1MB.

After allocating the above addresses, Windows' address space is already fragmented. When a program applies for heap space from the system, it has to allocate it from these remaining unoccupied addresses.

10. User mode and kernel mode

First we have to explain a concept - 进程(Process). Simply put, an executable program is a process. The program we compiled and generated using C language earlier becomes a process after running. The most notable feature of a process is that it has an independent address space .

Strictly speaking, a program is a file stored on the disk, a collection of instructions and data, and is a static concept; a process is a series of activities after the program is loaded into memory and run, and is a dynamic concept.

When we explained the address space earlier, we always said "the address space of the program". This is actually not rigorous. We should say "the address space of the process". One process corresponds to one address space, and a program may create multiple processes.

The kernel space stores the operating system kernel code and data, which is shared by all programs . Modifying the data in the kernel space in a program will not only affect the stability of the operating system itself, but also affect other programs. This is a very dangerous behavior. , so the operating system prohibits user programs from directly accessing the kernel space.

If you want to access the kernel space, you must use the API functions provided by the operating system to execute the code provided by the kernel and let the kernel access it by itself. This can ensure that the data in the kernel space will not be modified at will and ensure the stability of the operating system itself and other programs. sex.

The user program's call to the system API function is called a system call ; when a system call occurs, the user program will be suspended, and the kernel code will be executed instead (the kernel is also a program), and the kernel space will be accessed. This is called a system call 内核模式(Kernel Mode).

The user space saves the code and data of the application program, which is private to the program and generally cannot be accessed by other programs . When the application's own code is executed, it is called 用户模式(User Mode).

Computers frequently switch between kernel mode and user mode:

  • When an application running in user mode needs lower-level operations such as input and output, memory application, etc., it must call the API function provided by the operating system to enter kernel mode;
  • After the operation is completed, the application code continues to execute and returns to user mode.

Summary: User mode is to execute application level code and access user space; kernel mode is to execute kernel code and access kernel space (of course, it also has permission to access user space).

Why should we distinguish between two modes?

We know that the main task of the kernel is to manage hardware, including monitors, keyboards, mice, memory, hard disks, etc., and the kernel also provides interfaces (that is, functions) for use by upper-layer programs. When a program wants to perform input and output, allocate memory, respond to the mouse and other hardware-related operations, it must use the interface provided by the kernel. However, user programs are very unsafe, and the kernel fully distrusts user programs. When a program calls the kernel interface, the kernel must perform various verifications to prevent errors.

Starting from Intel 80386, for security and stability considerations, the CPU can run at four different permission levels, ring0 ~ ring3, and also provides four corresponding levels of protection for data. However, Linux and Windows only take advantage of two of these runlevels:

  • One is kernel mode, corresponding to ring0 level. The core part of the operating system and device drivers all run in this mode.
  • The other is user mode, which corresponds to ring3 level. The user interface part of the operating system (such as Windows API) and all user programs run at this level.

Why do kernel and user programs share address space?

Since the kernel is also an application, why not let it have an independent 4GB address space, but share it with user programs and occupy limited memory?

Allowing the kernel to have a completely independent address space means that the kernel is in an independent process, so that every time a system call is made, the process needs to be switched. The consumption of switching processes is huge. It not only requires registers to be pushed and popped from the stack, but also invalidates the data cache in the CPU and the page table cache in the MMU. This will cause memory access to be quite inefficient for a period of time.

Let the kernel and user programs share the address space. When a system call occurs, mode switching is performed. Mode switching only requires registers to be pushed and popped out of the stack, which will not cause cache failure. Modern CPUs also provide instructions for quickly entering and exiting kernel mode, which is consistent with Compared with process switching, the efficiency is greatly improved.


11. The concept of stack and stack overflow

In " Memory Layout of C Language Programs (Memory Model) " we mentioned that the virtual address space of the program is divided into multiple areas, and the stack (Stack) is the area with a higher address.

The Stack can store function parameters, local variables, local arrays and other data that are scoped inside the function. Its purpose is to complete the function call.

Stack memory is automatically allocated and released by the system: when a function call occurs, memory is allocated for the data used when the function is running. After the function call is completed, all previously allocated memory is destroyed. Therefore, local variables and parameters are only valid in the current function and cannot be passed outside the function.

Stack concept

In computers, a stack can be understood as a special container. Users can put data into the stack in sequence, and then take data out of the stack in reverse order. In other words, the data put in first can be taken out last, and the data put in last must be taken out first. This is called the First In Last Out principle.

Putting data into the stack is often called pushing or pushing, and taking data out is often called popping up or popping out. As shown below:

Insert image description here
It can be found that the bottom of the stack never moves, and popping and pushing only moves the top of the stack. When there is no data in the stack, the top of the stack and the bottom of the stack coincide.

In essence, the stack is a continuous piece of memory. It is necessary to record the bottom and top of the stack at the same time in order to locate the current stack. In modern computers, the ebp register is usually used to point to the bottom of the stack, and the esp register is used to point to the top of the stack. As data is pushed into and out of the stack, the value of esp will continue to change. When pushing into the stack, the value of esp decreases, and when popping out of the stack, the value of esp increases.

Both ebp and esp are registers in the CPU: ebp is the abbreviation of Extend Base Pointer, usually used to point to the bottom of the stack; esp is the abbreviation of Extend Stack Pointer, usually used to point to the top of the stack.

Stack size and stack overflow

For each program, the memory that can be used by the stack is limited, generally 1M~8M. This is determined at compile time and cannot be changed while the program is running. If the stack memory used by the program exceeds the maximum value, a stack overflow (Stack Overflow) error will occur.

A program can contain multiple threads, and each thread has its own stack. Strictly speaking, the maximum value of the stack is for threads, not for programs.

The size of the stack memory is related to the compiler. The compiler will specify a maximum value for the stack memory. Under VC/VS, the default is 1M, under C-Free, the default is 2M, and under Linux GCC, the default is 8M.

Of course, we can also modify the size of the stack memory through parameters.

Tip: The stack is often called a stack, and the heap is still called a heap, so the concept of stack does not include heap, so everyone should pay attention to the distinction.

12. What does a function look like on the stack?

Function calls and stacks are inseparable. Without a stack, there would be no function calls. This section will explain how functions are called on the stack.

stack frame/activity record

When a function call occurs, all the information needed to run the function will be pushed onto the stack, which is often called a stack frame or activate record.

Logically speaking, the stack frame is the environment in which a function executes: function parameters, local variables of the function, where to return to after the function is executed, etc.

Activity records generally include the following aspects:

  1. The return address of the function, that is, where to continue executing the following code after the function is executed. For example:
int a, b, c;
func(1, 2);
c = a + b;

From the perspective of C language, after the func() function is executed, it will continue to execute the c=a+b; statement, then the return address is the location of the statement in memory.

Note: C language code will eventually be compiled into machine instructions. To be precise, the return address should be the address of the next instruction. The reason why it is said to be the address of the next C language statement is just to illustrate the problem more intuitively.

  1. Parameters and local variables. Some compilers, or compilers with optimization options turned on, will pass parameters through registers instead of pushing parameters onto the stack. We will not consider this situation for the time being.

  2. Temporary data automatically generated by the compiler. For example, when the length of the function return value is large (for example, it takes up 40 bytes), the return value will be pushed onto the stack first and then handed to the function caller.

When the length of the return value is small (char, int, long, etc.), it will not be pushed onto the stack. Instead, the return value will be placed in a register and then passed to the function caller.

  1. Some registers that need to be saved, such as ebp, ebx, esi, edi, etc. The reason why the register value needs to be saved is so that when the function exits, it can be restored to the scene before the function call and continue to execute the upper-layer function.

The following figure is an example of a function call:

Insert image description here

Add "old" in front of the register name to indicate the value of the register before the function call.

When a function call occurs:

  • The actual parameters, return address, and ebp register are pushed onto the stack first;
  • Then allocate a piece of memory for use by local variables, return values, etc. This memory is generally large enough to accommodate all data and will be redundant;
  • Finally, the values ​​of other registers are pushed onto the stack.

13. Detailed analysis of an example of pushing a function into and out of the stack

Previously, we just explained what the activity record of a function looks like. I believe that everyone’s understanding of the detailed calling process of the function is not very clear. In this section, we will take the VS2010 Debug mode as an example to conduct an in-depth analysis.

Please look at the code below:

void func(int a, int b){
    
    
    int p =12, q = 345;
}
int main(){
    
    
    func(90, 26);
    return 0;
}

The function uses the default calling convention, which is that parameters are pushed onto the stack from right to left, and the caller is responsible for popping the parameters off the stack. The push and pop process of the function is as shown in the figure below:

Insert image description here

function push onto stack

Steps ① to ⑥ are the process of function pushing into the stack:

  1. main() is the main function and also needs to be pushed onto the stack, as shown in step ①.

  2. In step ②, execute the statement func(90, 26);, first push the actual parameters 90 and 26 onto the stack, and then push the return address onto the stack. These tasks are completed by the main() function (caller). At this time, the value of ebp has not changed, only the direction of esp has been changed.

  3. At step ③, the function body of func() begins to be executed. First, push the value of the original ebp register onto the stack (that is, the old ebp in the picture), and assign the value of esp to ebp, so that ebp points from the bottom of the stack of the main() function to the bottom of the stack of the func() function. , completing the function stack switching. Since the values ​​of esp and ebp are equal at this time, they point to the same location.

  4. Reserve enough memory for local variables, return values, etc., as shown in step ④. Since the stack memory has been allocated before the function call, the memory is not actually allocated here, but the value of esp is subtracted by an integer, such as esp - 0XC0, which means 0XC0 bytes of memory are reserved.

  5. Push the values ​​of the ebp, esi, and edi registers onto the stack in sequence.

  6. Put the value of the local variable into the reserved memory. Note that there are 4 bytes of space between the first variable and old ebp, and there are also several bytes of space between variables.

Why leave so much blank space? Isn't it a waste of memory? This is because we use Debug mode to generate the program, leaving extra memory to facilitate adding debugging information; when generating the program in Release mode, the memory will become more compact and blanks will be eliminated.

At this point, the activity record of the func() function is constructed. It can be found that during the actual calling process of the function, the formal parameters do not exist and do not occupy memory space. There are only actual parameters in the memory, and they are pushed onto the stack by the caller before the function body code is executed.

Why are the values ​​of uninitialized local variables garbage?

When allocating memory for local variables, just subtract an integer from the value of esp to reserve enough blank memory. Different compilers will process this blank memory differently in different modes, and may initialize it as A fixed value that may not be initialized.

Although the compiler initializes blank memory, this value generally has no meaning to us, so we can think of it as a garbage value and random.

function pop

Steps ⑦ to ⑨ are the function func() popping process:

  1. After the function func() is executed, it starts to pop from the stack. First, the values ​​of the edi, esi, and ebx registers are popped from the stack.

  2. When popping data such as local variables and return values ​​from the stack, directly assign the value of ebp to esp, so that ebp and esp point to the same location.

  3. Next, pop the old ebp from the stack and assign it to the current ebp. At this time, the ebp points to the position before the func() call, that is, the old ebp position of the main() activity record, as shown in step ⑨.

This step is critical to ensure that the situation before the function call is restored. This is why old ebp must be pushed onto the stack every time the function is called.

Finally, the location of the next instruction is found based on the return address, and both the return address and the actual parameters are popped off the stack. At this time, esp points to the top of the stack of the main() activity record, which means that func() is completely popped off the stack, and the stack is Restored to the situation before func() was called.

legacy misperceptions

After the above analysis, it can be found that the function popping from the stack only increases the value of the esp register, making it point to the previous data, and does not destroy the previous data. Earlier we said that it is wrong for local variables to be destroyed immediately after the function is finished. This is just to make it easier for everyone to understand and have a clear understanding of the scope of local variables.

The data on the stack can only be overwritten when subsequent functions continue to push onto the stack. This means that as long as the time is right, the value of local variables can still be obtained outside the function. Please look at the code below:

#include <stdio.h>
int *p;
void func(int m, int n){
    
    
    int a = 18, b = 100;
    p = &a;
}
int main(){
    
    
    int n;
    func(10, 20);
    n = *p;
    printf("n = %d\n", n);
    return 0;
}

Running results: n = 18

In func(), assign the address of local variable a to p, and call func() in the main() function. The function has just finished calling, and no other functions have been pushed onto the stack. The memory where local variable a is located has not been overwritten. So n = *pits value can be obtained through the statement;.

14. Principle of stack overflow attack

Insert image description here

Local arrays also allocate memory on the stack. When "12345678901234567890" is entered, an array overflow will occur, occupying the memory where "4 bytes of blank memory", "old ebp" and "return address" are located, and overwriting the original data. Then, when the main() function is executed, an incorrect return address will be obtained. The instruction at this address is uncertain, or there is no instruction at all, so the program makes an error when returning.

C language does not detect array overflow. This is a typical example of overwriting the function return address due to array overflow. We call such an error a "stack overflow error".

Note: The "stack overflow" mentioned here means that a certain data on the stack is too large and covers other data. It is not the same thing as the stack overflow mentioned in the " Concept of Stack and Stack Overflow " section.

Local arrays allocate memory on the stack and do not detect array overflow. This is the root cause of stack overflow. In addition to the gets() function mentioned above, strcpy(), scanf() and other functions that can write data to the array have the risk of causing stack overflow.

15. Dynamic memory allocation in C language

In the address space of the process, the memory in the code area, constant area, and global data area has been allocated when the program starts. They have a fixed size and cannot be allocated and released by the programmer. They can only be recycled by the operating system until the program is finished. . This is called static memory allocation.

The memory in the stack area and heap area can be allocated and released according to actual needs during the running of the program. It is not necessary to prepare all the memory when the program is just started. This is called dynamic memory allocation.

The advantage of using static memory is that it is fast and saves the time of applying for memory from the operating system. The disadvantage is that it is inflexible and lacks expressiveness. For example, it cannot control the scope of data and cannot use larger memory. The use of dynamic memory can make the program's memory management more flexible and efficient. It allocates memory immediately when it is needed, and allocates as much memory as it needs, ranging from a few bytes to several GB; it is immediately recycled and reallocated when it is not needed. for use by other programs.

The difference between stack and heap

The management models of the stack area and the heap area are different: the memory in the stack area is allocated and released by the system and is not controlled by the programmer; the memory in the heap area is completely controlled by the programmer, and he can allocate as much as he wants and release it whenever he wants. Freeing and very flexible.

When the program starts, a piece of memory of appropriate size will be allocated for the stack area. This is enough for general function calls. The function pushing and popping into the stack is just the transformation of the ebp and esp register points, or writing data to the existing memory. Does not involve memory allocation and release. When there is a large local array in a function, such as 1024*10 elements, the compiler will insert a dynamic memory allocation function for the stack into the function code, so that memory is allocated only when the function is called, and not allocated if it is not called.

We often hear that "the allocation efficiency of stack memory is higher than that of heap". This is the truth, because in most cases stack memory is not actually allocated, but only the operation of existing memory.

Dynamic memory allocation function

The Heap is the only memory area controlled by programmers, and the dynamic memory allocation we often refer to is also in this area. Allocating and releasing memory on the heap requires the use of several functions in the C language standard library: malloc(), calloc(), realloc() and free().

16. The implementation principle behind malloc() (memory pool)

Compared with the stack, the heap memory faces a slightly complicated behavior pattern: at any time, the program may issue a request to either apply for a piece of memory or to release a piece of memory that has been applied for, and the size of the request can range from a few words to It is possible to save up to several GB. We cannot assume how much heap space the program will apply for at one time. Therefore, heap management is more complicated.

So, how is using malloc() to allocate memory on the heap implemented?

One approach is to leave the memory management of malloc() to the system kernel. Since the kernel manages the address space of the process, if it provides a system call, malloc() can use this system call to apply for memory. Is it ok? Of course, this is a theoretical approach, but in fact the performance of this is relatively poor, because every time the program applies for or releases heap space, a system call must be made. We know that the performance overhead of system calls is relatively large. When the program operates on the heap frequently, the result of this will seriously affect the performance of the program.

A better approach is to apply malloc() to the operating system for a heap space of appropriate size, and then let malloc() manage this space by itself.

malloc() is equivalent to "wholesale" a larger memory space to the operating system, and then "retails" it to the program. When all are "sold out" or the program has a large memory requirement, the operating system will be "purchased" according to actual needs. Of course, when malloc() sells heap space to the program, it must manage the heap space it wholesales. It cannot sell the same address twice, causing address conflicts. So malloc() needs an algorithm to manage the heap space, and this algorithm is the heap allocation algorithm.

Allocation algorithm of malloc() and free()

During the running of the program, heap memory is continuously allocated from low addresses to high addresses. As the memory is released, discontinuous free areas will appear, as shown in the following figure:

image description

The shaded boxes are allocated memory, and the white boxes are free memory or memory that has been released. When the program needs memory, malloc() first traverses the free area to see if there is a memory block of appropriate size. If there is, it allocates it. If not, it applies to the operating system (a system call occurs). In order to ensure the continuity of the memory allocated to the program, malloc() will only allocate in one free area and cannot combine multiple free areas.

The structure of memory blocks (both allocated and free) is similar to a linked list, and they are connected together through pointers. In practical applications, the structure of a memory block is as shown below:

Insert image description here

next is a pointer, pointing to the next memory block, and used is used to indicate whether the current memory block has been used. In this way, the entire heap area will form a linked list as shown below:

Insert image description here

Now assume that you need to allocate 100 bytes of memory for the program. When you search for the first free area in the graph (size is 200 bytes) and find that the conditions are met, then allocate it here. At this time, malloc() will split the first free area into two parts, one part will be used by the program, and the remaining part will remain free, as shown in the following figure:

Insert image description here

Still taking Figure 3 as an example, when the program releases the third memory block, a new free area will be formed, and free() will merge the second, third, and four consecutive free areas into one, as shown in the figure below :

Insert image description here

It can be seen that the work done by malloc() and free() is mainly to split and merge existing memory blocks, and does not frequently apply for memory from the operating system, which greatly improves the efficiency of memory allocation.

In addition, since the one-way linked list can only search in one direction, it is inconvenient when merging or splitting memory blocks. Therefore, most malloc() implementations will add a pre pointer to the memory block to point to the previous memory block to form a doubly linked list. As shown below:
Insert image description here

Linked list is a classic heap memory management method that is often used in teaching. Many C language tutorials will mention that "the allocation of stack memory is similar to the stack in the data structure, while the allocation of heap memory is similar to the stack in the data structure." This is where Linked List comes from.

Although linked list memory management is simple and easy to understand, there are many problems, such as:

  • Once the pre or next pointer in the linked list is destroyed, the entire heap cannot work, and these data are easily accessed by out-of-bounds reading and writing.
  • Small free areas are often difficult to allocate again, resulting in a lot of memory fragmentation.
  • Frequent allocation and release of memory will cause the linked list to be too long and increase the traversal time.

In view of the shortcomings of linked lists, later people proposed the management method of bitmap and object pool, but now malloc() is often compounded in a variety of ways. Memory blocks of different sizes often adopt different measures to ensure the safety of memory allocation. and efficiency.

memory pool

No matter what the specific allocation algorithm is, in order to reduce system calls and reduce physical memory fragmentation, the overall idea of ​​malloc() is to first apply for a memory of appropriate size from the operating system and then manage it by itself. This is the Memory Pool.

The focus of memory pool research is not to apply for memory from the operating system, but to manage the memory that has been applied for. This involves very complex algorithms and is a subject that will never be finished. In addition to the malloc() that comes with the C standard library ), and there are some third-party implementations, such as Google's tcmalloc and jemalloc.

We know that C/C++ is a compiled language and does not have a memory recycling mechanism. Programmers need to release unnecessary memory themselves. This not only brings great flexibility to the program, but also brings many risks. For example, C Memory leaks often occur in C++ programs. When the program first runs, it takes up very little memory. As time goes by, the memory usage continues to increase, causing the entire computer to run slowly.

Memory leak problems are often difficult to debug and discover, or they may only reappear under certain conditions, which brings many obstacles to code modification. In order to improve the stability and robustness of the program, later Java, Python, C#, JavaScript, PHP and other non-compiled languages ​​​​that use virtual machine mechanisms have added automatic garbage memory recycling mechanisms, so that programmers do not need to manage memory. , the system will automatically identify the memory that is no longer used and release them to avoid memory leaks. It can be said that these high-level languages ​​have implemented their own memory pools at the bottom level, that is, they have their own memory management mechanisms.

Pooling technology

In computers, there are many places where "pool" technology is used. In addition to memory pools, there are also connection pools, thread pools, object pools, etc. Taking the thread pool on the server as an example, its main idea is: first start a number of threads and put them in a sleeping state. When a client request is received, wake up a sleeping thread in the pool and let it process the client. When the request is processed, the thread goes to sleep again.

The so-called "pooling technology" means that the program first applies for excessive resources from the system and then manages them by itself in case of emergency. The reason why you need to apply for excessive resources is because there is a large overhead every time you apply for the resource. It is better to apply in advance, so that it will be very fast to use and greatly improve the efficiency of the program.

17. C language memory leak (memory loss)

If you use malloc(), calloc(), realloc() to dynamically allocate memory, if there is no pointer pointing to it, no operation can be performed. This memory will be occupied by the program until the program ends and is reclaimed by the operating system.

Please look at the code below:

#include <stdio.h>
#include <stdlib.h>
int main(){
    
    
    char *p = (char*)malloc(100 * sizeof(char));
    p = (char*)malloc(50 * sizeof(char));
    free(p);
    p = NULL;
    return 0;
}

In this program, 100 bytes of memory are allocated for the first time and p points to it; 50 bytes of memory are allocated for the second time and p points to it again.

This leads to a problem. The 100-byte memory allocated for the first time has no pointer pointing to it, and we don’t know the address of this memory, so it can no longer be retrieved or released. This The block memory becomes garbage memory. Although it is useless, it still takes up resources. The only way is to wait for the operating system to recycle it after the program is finished.

This is a memory leak (Memory Leak), which can be understood as the program losing contact with the memory and no longer able to perform any operations on it.

The metaphor of a memory leak is that "the memory space that the operating system can provide to all programs is being drained by a certain program." The end result is that the longer the program runs, it takes up more and more memory space, and finally uses up all the memory space. , the entire system crashes.

Write a memory leak example in C language

The operating system allows programs to allocate memory by themselves and use it freely. After use, it can release it and return the memory to the computer.

The so-called allocating memory means that the program applies for a piece of memory space from the computer and then uses it; the so-called releasing memory means that the program tells the computer that it no longer uses the previous memory space and needs to return it to the computer for other programs to use.

If a program keeps allocating memory without releasing it, it will have more and more memory, and the computer's memory will be exhausted. Other programs will have less and less memory available, and the entire computer will All become slow or even stuck.

Below we use a while loop to write a memory leak example:

#include <stdlib.h>
#include <stdio.h>
int main(){
    
    
    while(1){
    
      //死循环
        malloc(1024);  //分配1024个字节的内存
    }
    return 0;
}

The condition of the while loop is 1, which is always true. The loop will continue forever and will never end, so it is an "infinite loop".

Each time it loops, the program will apply for 1024 bytes (1KB) of memory from the computer and will not release it; when the loop reaches the 1024th time, the program occupies 1024 1024 bytes (1MB) of memory; when the loop reaches 1024 1024 times, the program occupies 1024 1024 1024 bytes (1GB) of memory.

Don't be afraid, try it yourself, open the Task Manager under Windows, you can see that the memory usage will soar, and the program will be terminated after a while. The memory management mechanism of Windows finds that our program takes up too much memory and will cause it to crash and prevent the system from freezing (other operating systems also have corresponding measures).

Guess you like

Origin blog.csdn.net/qq_44721831/article/details/128419125