C ++ application performance optimization (five) - operating system memory management

C ++ application performance optimization (five) - operating system memory management

First, the operating system memory management Introduction

For a long time, in a computer system, memory is a kind of scarce and valuable resource, the application can only be executed when loaded into memory. Early on, when the memory space is large enough, the number of applications running simultaneously will be severely limited, even more than the physical memory, the application will not run memory when an application needs to run at a certain time. Modern operating systems (Windows, Linux) memory management through the introduction of virtual memory to solve the problem in the application can not run out of memory.
In essence, the virtual memory is to make a program's code and data to run when not fully loaded into memory. During operation, when executing the code has not been loaded into memory, or to access data has not been loaded into memory, the virtual memory manager dynamically load the appropriate code or data from the hard disk into memory. And under normal circumstances, the virtual memory manager will be a corresponding first replacement of certain code or data memory to the hard drive to make room for the upcoming loaded code or data.
Because the data transfer between the memory and hard drive with respect to the code execution is very slow, so the virtual memory manager in ensuring the correct operation of the premise must also consider efficiency factors, such as the need to optimize the replacement algorithm, to avoid data code or access to be executed just the replacement of memory, and a long time without access code or data has been resident in memory. Virtual memory manager also needs to code data of each process resides in memory maintained at a reasonable number, and process performance dynamically adjusted based on performance, so that the number of disk IO will be involved in the program is run as low as possible to improve the program's operating performance.

Two, Windows memory management

1, Windows virtual memory management system Introduction

Win32 virtual memory manager provides each Win32 process based on linear process private and 4GB virtual address space page (32bit) size is.
Private process that is, each process can only access its own memory space and can not access the address space belonging to other processes, do not worry about their address space is seen by another process (Sons process exceptions, such as the debugger using the parent and child relationship access the debugging process address space). Runtime dll used in the process and did not belong to their own address space, but virtual address space to which it belongs processes, dll global data, and through memory dll function call from the application of all its process virtual address space to open up.
Based on the page is a virtual address space is divided into a plurality of units called pages, the page size is determined by the underlying processors, x86 processor architecture page size is 4KB. Win32 page is the minimum unit of the process virtual memory manager, the corresponding physical memory is also divided into a plurality of pages. Application and release of virtual memory address space, and data transfer or replacement of memory and disk are the smallest units of pages of.
4GB address the range of sizes means that the process can be from 0x00000000 to 0xFFFFFFFF, Win32 process will be left to the lower zone 2GB of use, 2GB reserved for system use the high zone.
Win32 is used to assist in the realization of virtual memory hard disk file called the paging file, there may be 16, the paging file used to store memory data is replaced with a virtual memory manager. When the data paging file again access to the process, the virtual memory manager will be replaced into its memory, the process from the paging file can access it properly. Users can configure their own paging file, for reasons of efficiency and performance space, the program code will not be modified (including exe and dll), so when they are replaced by a page in memory and will not be written into the paging file, but directly abandoned. When required again, the virtual memory manager to find the exe or dll files directly from the program code stored in memory and transferred. Further, the processing of the read-only data and program codes exe and dll files contained in the same process, is not stored in the open space in the paging file.
When the process executes a piece of code or access to certain data, but not when code or data in memory, known as a page fault error. Page faults for many reasons, the most common code and data virtual memory manager replacement of memory, virtual memory manager will be executed in the code or data being accessed before it is transferred to memory. Memory replacement is transparent to developers, greatly simplifies the work developers. But paging error involves disk IO, a large number of paging error would greatly reduce the overall performance of the program, it is necessary to know the main reason for the page fault errors and workarounds.

2, the use of virtual memory

Win32 allocated memory is divided into two steps, reserved and committed. Therefore, the virtual page in the process address space of three states: free free, reserved and set aside submit committed.
Free and this page has not yet been allocated, it can be used to meet the new memory allocation request.
Refers to the reservation from the virtual address space is set aside a region (Region, an integer multiple of the page), the memory space can not draw to meet new memory allocation request, but for this section of memory reserved for the required code later use. When the reservation is not assigned physical memory, but adds a virtual address space using the process described in the state of a data structure (the VAD, the virtual address of the descriptor), this section of the memory space for recording has been reserved. Reserve action relatively quickly, because there is no real physical memory allocation, and therefore reserved space can not directly access, access to the reservation page cause a memory access violation.
Submit, if you want to get real physical memory, the memory must be submitted to the reservation. Submission will open up space from the paging file and modify the appropriate entry in the VAD. When submitting also did not immediately allocate space from physical memory, but open space from the disk paging file space as a backup replacement. When the code first visit to submit data in memory, the system finds no real physical memory, a page fault is thrown operation. Virtual Memory Manager handles page faults until this time will really allocate physical memory, it can also be submitted at the same time reserved. Commit operation will open up space from the disk paging file, so set aside than the time-consuming operation.
Not real physical memory allocation for a virtual address Win32 virtual memory management demand-paging strategy requires less real access. First, demand-paging strategies in performance reasons, the segment will work to complete and improve overall performance; second is space for efficiency reasons, when not actually access, Win32 always assumed think the process does not access most of the data, and thus unnecessary its open space or its replacement into physical memory, in order to improve the utilization of storage space.
If some programs there is great demand for memory, but not immediately need all the memory, the one-time open space from physical storage to meet the potential demand from the execution performance and storage efficiency is a waste. As the demand is only potential, most likely a large part of the allocated memory eventually not really use. If the one-time assign all the physical memory in the application, it will greatly reduce the utilization of space.
But if not entirely reservation and commit mechanism, but with the need to allocate memory to satisfy every request, the request for a frequent memory for code at different points in time, because the gap in their requests at different time points memory of the pole when there may be a request by the other code memory, RAM memory can cause frequent request code obtained at different points in time because the virtual address is not continuous, not very good locality characteristic use of space, access to (such as traversal) will be its overall increasing the number of page faults, thereby reducing program performance.
Reservation and submit Win32 programs use the VirtualAlloc function is completed, the reserved incoming MEM_RESERVE parameters, submit incoming MEM_COMMIT parameters. VirtualFree release using virtual memory function, depending on the parameters passed, and the corresponding function VirtualAlloc, can release the virtual address corresponding to the physical memory area, but may also be in the virtual address area reservation status and to be together in conjunction with a virtual address area release, the virtual address area reverts to the free state.
Thread stack and heap process implemented using the reserved and committed two-step mechanism, Win32 systems, thread stack using the reserved and committed a two-step mechanism is as follows:
When you create a thread stack, just set aside a virtual address area, the default is 1M (which can be modified by CreateThread or link link option), only the first two are initially submitted. Because the nested function calls when the need for more submission page thread stack, the virtual memory manager dynamically submit a virtual address areas of a thread in the subsequent pages to meet their needs, until reaching the upper limit of 1M. When reaching the upper limit of the reserved area size (default 1M), the virtual memory manager will not increase the size of the reserved area, but throw a stack overflow exception when submitting the last page, stack overflow exception is thrown when the thread stack there is a space available, the program can run normally. When the program continue to use the stack space runs out when the last page, but also the continued need for storage space, this time exceeds the upper limit, it will directly cause the process to exit.
In order to prevent the thread stack overflows cause the entire program to exit, should try to control the use of the size of the stack. Such as reducing nesting function, reduce the use of recursive functions, try not to use a larger local variable in the function (can store large objects from the heap open space, because the heap is dynamically expanded, and thread stack memory available area has been fixed at the time of thread creation, we can not extend throughout the life of the thread).
In order to prevent a thread stack overflow causes the entire program exits, it can add exception handling thread body function may produce thread stack overflow, capture overflow exception thrown when submitting the last page, and act accordingly.

3, process flow when accessing virtual memory

To a virtual memory area has been set aside and submit, you're ready access to the data in the virtual memory area. When the program when certain memory access process is as follows:
C ++ application performance optimization (five) - operating system memory management
If the data is already in physical memory, virtual address manager only needs to be mapped to a physical address points to the virtual memory address of the data, you can access the data in physical memory. In this case it does not involve disk IO, faster.
When the data memory is accessed for the first time in a period of just submitted, because there is no real physical memory allocation, or the data being accessed has been previously visited, but has since been replaced with a virtual memory manager physical memory, this time will trigger page faults. Virtual memory manager will handle page faults, the virtual memory manager will first detect whether the data is already in the paging file backup space (backup space exe, dll code page and read-only data is not in the paging file, but exe , dll file), if there is access to the data in the backup disk space, virtual memory manager to find the right page in physical memory, and store the backup data on disk replacement into physical memory.
Virtual Memory Manager first check whether there is a free page of the current physical memory, virtual memory manager maintains a name for the page frame database (page-frame database) data structure, this data structure is operating system-wide, when the Windows system starts It is initialized, used to track and record the state of each physical memory page, and with a free page list to all connected together, a free page when needed, this direct lookup free page list, if any, used as a free page; otherwise, according to paging algorithm first select a page. When the page is not only transferred to a paged virtual memory manager, in order to utilize the local characteristics, while containing the required data in the transferred page, will be transferred to the adjacent memory with several pages, to improve program efficiency. On the election of a memory page, then check the status of this page, if this page since the last transferred to the memory has not been modified, the direct use of this page (code page and read-only page also can be used directly). If this page has been modified, you need to the contents of this page are written to disk paging file in the appropriate backup page, and then mark this page to a free page. At this point there has been a free page used to store data is about to be accessed. Virtual Memory Manager detects again, whether this data memory and just application is first accessed, if this is a direct free page cleared to use, without having to read the corresponding page from the backup disk paging file; If not, you will need disk paging file in the appropriate backup page read this free space, and then this page from idle state to active pages page.
In this case, the data is already in the physical memory page, you can access the data mapped to the physical address of the virtual address.
The actual data access, the situation would be more complicated, such as when a user-defined array, and this array just below the border in its pages, and the next page of this page is reserved for free or state (non-submission, not really physical memory). When the program down accidentally cross-border access this array, the first caused by page faults. Then the Virtual Memory Manager detects an error in dealing with missing pages to the paging file, it is not, the so-called access violation (access violation). Virtual memory address access violation means that to access the specified address has not been submitted, that no actual physical memory and virtual memory address corresponding to the address, an access violation will directly cause the entire process to exit (crash).
Consequences pointer cross-border access varies according to the actual situation of different run-time, when the array is not in their border in page after page of the same cross-border is still, at this time will only cause incorrect access (write misinterpretation or misrepresentation, misunderstanding only affect the code being executed mistakenly written elsewhere will affect the implementation of the code), other data on this page, and will not cause the entire process crash. Even though the array really exist at the boundary where the page, and the pointer value falls out of bounds after its adjacent page, but if this is also adjacent page for the submission of the state at this time is still wrong access, it will not lead to crash the process. Therefore, the code in the same program there is an array of pointers access violation error, and sometimes crash when running, sometimes not.
MicroSoft provides a tool pageheap a pointer to detect cross-border access, the principle is the mandatory allocation of memory each are located on the border of the page, while the adjacent page forcibly page for free pages, then each cross-border access causes an access violation, resulting in program crash, so that cross-border access pointer errors during the development phase will be exposed, without the occurrence of an error has been hidden pointer cross-border access to the published version, will not be discovered until the end-user access.

4, the mapping of virtual addresses to physical addresses

Ensuring access to data in physical memory after already, you need to first convert virtual addresses into physical addresses, that address mapping to access the data.
Win32 is achieved through a two-address mapping table structure, because the 4GB virtual address space for each process private, each process maintains its own set of hierarchical structure to achieve their address mapping. The first layer is a page directory table (page directory), is actually a memory page (4KB = 4096Byte), in 4-byte units into 1024, each called a page directory entry (PDE, page directory entry) ; The second layer is a table page table (page table), a total of 1024 page tables. Each page directory page directory entry corresponding to a page table, each page table also accounted for a memory page. 4KB page table 1024 is also divided, each 4 bytes, referred to as page table entries (PTE, page table entry). Each page table entry points to a page frame in physical memory.
C ++ application performance optimization (five) - operating system memory management
Win32 provides virtual address space 4GB (32bit) size, each virtual address is a 32-bit integer value, consists of three parts, the first subscript 10bit page directory, the page directory for positioning in the 1024 a particular, according to the value of a particular location can be found a second layer of a page table page table; 10bit subsequent to the page table index, an item for positioning in the page table 1024, which values can be found whether a page in physical memory, the page that contains the data represented by the virtual address; after 12bit byte index for locating a particular physical page byte position 12 may be positioned just a page byte arbitrary position.
C ++ application performance optimization (five) - operating system memory management
Suppose access a pointer (virtual address) in the program, the pointer value 0X2A8E317F, virtual address to physical address mapping process as follows:
C ++ application performance optimization (five) - operating system memory management
0X2A8E317F binary is 0,010,101,010,001,110 0,011,000,101,111,111, which is divided into three parts, the first of 10bit 0010101010, page directory entry for the page directory in the positioning, because the page directory entry is 4 bytes, the first positioning the left 2bit 0010101010, to give 10 1010 1000 (0X2A8), using 0X2A8 find the corresponding page directory entry as a subscript, the page directory entry points to a page table. Use 10bit i.e. subsequent positioning 0011100011 page table entry in the page table, the left 2bit is 0011100011 11 1000 1100 (0X38B), using 0X38B find a page table entry corresponding to the page table as a subscript. Page table entry points to find the real memory. Finally, with the last data 12bit i.e. 0001 0111 1111 (0X17F), positioned within the data page, i.e. the pointer for this point.
Win32 has always assumed that the data in the physical memory address mapping, and. A page table entry with a marker for the data contained in this page is in physical memory pages, the page table entry when obtaining, detecting this bit, if the address mapping performed; if not, an error is thrown missing pages, then this page table entry contains the data is in the paging file, and if not, then the access violation; if this page table entry can be found in this data sheet is in the paging file, and this data page in the paging file starting position, then transferred this data page from disk physical memory address mapping process to continue. In order to implement virtual private address space for each process, and each process has its own page directory entries and page table structures for different processes, page directory page directory entries and page table entry in the page table is different Therefore the same pointer (virtual address) are different processes mapped to physical addresses are different, that is, between different processes passing a pointer is meaningless.

5, virtual memory space using state record

Win32虚拟内存管理器使用另一个数据结构来记录和维护每个进程的4GB虚拟地址空间的使用及状态信息,即虚拟地址描述符树(VAD,Virtua Address Discriptor)。每一个进程都有自己的VAD集合,VAD集合被组织成一个自平衡二叉树,以提高查找的效率。另外由于只有预留或提交的内存块才会有VAD,自由的内存块没有VAD(即不在VAD树结构中的虚拟地址块就是自由的)。
(1)当程序申请一块新内存时,虚拟内存管理器执行访问VAD,找到两个相邻VAD,只要小的VAD的上限与大的VAD的下限之间的差值满足所申请的内存块的大小需求,即可使用二者之间的虚拟内存。
(2)当第一访问提交的内存时,虚拟内存管理器总是假定要访问的数据所在数据页已经在物理内存中,并进行虚拟地址到物理地址映射。当找到相应的页目录项后发现页目录项并没有指向一个合法的页表,虚拟内存管理器就会查找进程的VAD树,找到包含该地址的VAD,并根据VAD中的信息,比如内存块大小、范围,以及在调页文件中的起始位置,随需生成相应的页表项。然后从刚才发生缺页错误的位置继续进行地址映射。因此,一个虚拟内存页被提交时,除了在调页文件中开辟一个备份页外,不会生成指向它的页表项的页表,也不会填充指向它的页表项,更不会开辟真正的物理内存页,而是直到第一次访问提交页时才会随需地从VAD中取得包含该页的整个区域的信息,生成相应页表,并填充相应页的页表项。
(3)当能够访问预留的内存时,虚拟地址管理器进行虚拟地址到物理地址的映射,找到相应的页目录项后发现页目录项并没有指向一个合法的页表,虚拟地址管理器就会查找进程的VAD树,找到包含该地址的VAD,此时发现此段内存块只是预留的,而没有提交,即没有对应物理内存,直接抛出访问违例,进程退出。
(4)当访问自由的内存时,虚拟地址内存管理器进行虚拟地址到物理地址的映射,找到相应的页目录项后发现页目录项并没有指向一个合法的页表,虚拟地址管理器就会查找进程的VAD树,发现并没有VAD包含此虚拟地址,发现此虚拟地址所在的虚拟内存页是自由状态,直接抛出访问违例,进程退出。

6、进程工作集

因为频繁的调页操作引起的磁盘IO会大大降低程序的运行效率,因此对每一个进程,虚拟内存管理器都会将一定量的内存页驻留在物理内存中,并跟踪其执行的性能指标,并动态调整驻留的内存页数量。Win32中驻留在物理内存中的内存页称为进程的工作集(working set),进程的工作集可以通过任务管理器查看,内存使用列即为工作集大小。
工作集是会动态变化的,进程初始时只有很少的代码页和数据页被调入物理内存。当执行到未被调入内存的代码或访问到尚未调入内存的数据时,相应代码页或数据页会被调入物理内存,工作集也会随之增加。但工作集不能无限增加,系统为每个进程设定了一个最小工作集和最大工作集,当工作集达到最大工作集大小,进程需要再次调入新页到物理内存时,虚拟内存管理器会架构原来工作集中某些内存页先置换出物理内存,然后再将需要调入的新页调入内存。
因为工作集的页驻留在物理内存中,对工作集页的访问不会涉及磁盘IO,因此速度非常快。如果访问的代码或数据不在工作集中,会引发额外的磁盘IO,从而降低程序的执行效率。极端情况下会出现所谓的颠簸或抖动(thrashing),即程序的大部分执行时间都花在调页操作上,而不是执行代码上。
虚拟内存管理器在调页时,不仅仅只是调入需要的页,同时还将其附近的页一起调入内存中,对于开发人员,如果要提高程序的运行效率需要考虑如下:
(1)对代码李硕,尽量编写紧凑代码,最理想情形是工作集不会达到最大阈值,在每次调入新页时,就不需要置换已经载入的内存页,因为根据locality特性,以前执行的代码和访问的数据在后面有很大可能会被再次执行好访问,因此程序执行时,缺页错误会大大降低,即减少磁盘IO。从进程任务管理器也可以查看一个进程从开始时到当前时刻共发生的缺页错误次数。即使不能达到理想情形,紧凑的代码往往意味着接下来执行的代码更大可能就在当前页或相邻页。根据时间locality特性,程序80%的时间花费在20%代码上,如果能将耗时的20%代码尽量紧凑且排在一起,会大大提高程序的整体性能。
(2)对数据来说,尽量将那些会一起访问的数据(如链表)放在一起,当访问数据时,数据在同一页或相邻页,只需要一次调页操作就可以完成。如果数据分散在分散在多个页(多个页不相邻),每次对数据的整体访问都会引发大量的缺页错误,从而降低性能。利用Win32提供的预留和提交两步机制,可以为一同访问的数据预留一大块空间,此时并没有分配实际存储空间,而是在后续执行过程中生成数据时格局需要提交内存,既不浪费存储空间(物理内存和磁盘的调页文件存储空间),又能利用locality特性。

三、Linux内存管理

1、Linux内存管理机制简介

Linux的内存管理主要分为两部分,一部分负责物理内存的申请与释放,物理内存的申请与释放的最小单位为页,在IA32中,页的大小为4KB;另一部分负责处理虚拟内存,虚拟内存的主要操作包括虚拟地址空间与物理地址空间的映射,物理内存页与磁盘页之间的置换等。

2、Linux进程的内存布局

一个32位Linux进程的地址空间为4GB,其中高位1GB,即0XC0000000--0XFFFFFFFF,为内核空间,低位3GB,即0X00000000--0XBFFFFFFF为用户地址空间。用户地址空间进一步被分为程序代码区、数据区(包括初始化数据区DATA和未初始化数据区BSS)、堆和栈。程序代码区占据最低端,往上是初始化数据区DATA和未初始化数据区BSS。代码区存放应用程序的机器代码,运行过程中代码不能修改,因此代码区内存为只读,且大小固定。数据区中存放应用程序的全局数据,静态数据和常量字符串,数据区大小也是固定的。
堆从未初始化数据区开始,向上端动态增长,增长过程中虚拟地址值变大;栈从高位地址开始,向下动态增长,虚拟地址值变小。
堆是应用程序在运行过程中动态申请的内存空间,如通过malloc/new动态生成对象或开辟内存空间时,最终会调用系统调用brk来动态调整数据区的大小。当申请的动态内存区域使用完毕,需要开发者明确使用相应的free/delete对申请的动态内存空间进行释放,free/delete最终也会使用brk系统调用调整数据区的大小。
栈是用来存放函数的传入参数、临时变量以及返回地址等数据,不需要通过malloc/new开辟空间,栈的增长与缩减是因为函数的调用与返回,不需要开发人员操作,没有内存泄漏的危险。
初始化数据区存放的是编译期就能够知道由程序设定初始值的全局变量及静态变量等,其初始值必须保存在最终生成的二进制文件中,并且在程序运行时会原封不动地将此区域映射到进程的初始化数据区。如果一个全局变量或静态变量在源代码中没有被赋初始值,在程序启动后,在第一次被赋值前,其初始值为0,本质上是有初始值的,其初始值为0。但当最终生成二进制文件时,未初始化数据区不会占据对应变量总大小的区域,而是只用一个值进行标识其未初始化数据区的总大小。如一个程序的代码指令有100KB,所有初始化数据总大小为100KB,所有未初始化数据总大小为150KB,则在最终生成的二进制文件中代码区有100KB,接着是100KB的初始化数据区,然后是4字节的大小空间,用于标记未初始化数据区大小,其值为150X1024,用于节省磁盘空间。但在进程虚拟地址空间中,对应未初始化数据区的大小必须是150KB,因为在程序运行时,程序必须真正能够访问到变量中的每一个,即当程序启动时,当检测到二进制文件中未初始化数据区的值为150X1024,则系统会开辟出150KB大小的区域作为进程的未初始化数据区并同时使用0对其进行初始化。

3、Linux物理内存管理

物理内存是用来存放代码指令与供代码指令操作的数据的最终场所,因此物理内存的管理是内存管理系统极其重要的任务。Linux使用页分配器(page allocator)来管理物理内存,页分配器负责分配和回收所有的物理内存页(物理内存的分配与回收的最小单位为4KB大小的页)。
页分配器的核心算法称为兄弟堆算法(buddy-heap algorithm),算法思想是每个物理内存区域都会有一个与之相邻的所谓兄弟区域,当两个区域被回收后,会被合并成为一个区域。如果被合并区域的相邻区域也被回收后,会被进一步合并为更大的区域。当有物理内存请求到来时,页分配器会首先检测是否有大小与之一致的区域。如果有,直接使用找到的匹配区域满足请求;如果没有,则找到更大的一个区域,并继续划分,直到分出的区域能够满足请求。为了配合兄弟堆算法,必须有链表来记录自由的物理内存区域,对于每个相同大小的自由区域,会有一个链表将其连接,每种大小的区域都会有一个链表对其进行管理。自由区域的大小都是2的幂。
当有一个8KB大小的内存请求到来,当前最小可供分配的区域为64KB,此时64KB会被划分为两个32KB,继而将低位的32KB继续划分为两个16KB大小的区域,再将最低位的16KB大小区域划分为两个8KB大小的区域,然后分配高位的8KB区域满足请求。

4、Linux虚拟内存管理

The main task of the virtual memory manager is to maintain the virtual address space of the application usage information, such as which areas has been used (map), if there is a disk file as a backup storage. If so, in what area corresponding to each area of the disk, another important feature is the paging, such as programs to access certain data has not been transferred to physical memory, virtual memory manager is responsible for positioning data, and replaced it into physical memory . If the physical memory page at this time there is no freedom, but also to some pages in physical memory before replacement out.
Used to maintain the application's virtual address space usage information data structure is vm_area_struct. Each structure vm_area_struct describe a process virtual address space is allocated region, when the number 32 is not more than when vm_area_struct, being connected as a list; when more than 32, all will be organized as a vm_area_struct self-balancing binary tree, which will help speed up the search. When a program accesses data through a pointer, the system queries vm_area_struct tree, if the pointer is not found to fall within a vm_area_struct any area represented by the address pointer is determined that the representative is not assigned, i.e., illegal access to the pointer.

5, virtual addresses are mapped to a physical address

When accessing the data through a pointer to the program, because the pointer is essentially a virtual address value, and therefore the value of the virtual address must be converted to a physical address value, in order to truly access data to which it refers.
Linux uses a three-tier strategy mapped virtual address mapped to a physical address. Compared with Windows, and more Middle layer, when for the IA32 architecture, Middle layer is not used, therefore the same as Linux and Windows.

Guess you like

Origin blog.51cto.com/9291927/2406548