[Translation] Linux Memory Architecture of Linux Performance and Tuning Guide (Translation)

This article is a translation of Section 1.2 of IBM RedBook's Linux Performance and Tuning Guidelines.
Original address: http://www.redbooks.ibm.com/redpapers/pdfs/redp4285.pdf
Original author: Eduardo Ciliendo, Takechika Kunimasa, Byron Braswell


The translation is as follows:

1.2 Linux Memory Architecture

To execute a process, the Linux kernel allocates a portion of the memory area to the requested process. The process uses this memory area as its work area and performs the requested work. It's similar to applying for a desk and then using the desk to hold papers, documents, and memos to perform your work. The difference is that the kernel must use a more dynamic way to allocate memory space. Sometimes the number of running processes will reach tens of thousands, but the amount of memory is limited. Therefore, the Linux kernel must handle memory efficiently. In this section, we will describe the Linux memory structure, address distribution and how Linux manages memory space efficiently.


1.2.1 Physical and virtual memory

Today we have faced the problem of choosing between 32-bit and 64-bit systems. One of the most important differences for enterprise customers is whether virtual memory addresses can exceed 4GB. From a performance perspective, it is important to understand how the Linux kernel maps physical memory to the virtual kernel on 32-bit and 64-bit systems.


From Figure 1-10, you can see the obvious difference in how the Linux kernel handles 32-bit and 64-bit system memory. It is beyond the scope of this article to describe the details of memory-to-virtual memory mapping, so this article focuses on some of the details of the Linux memory structure.


On 32-bit architectures, such as IA-32, the Linux kernel can only directly access the first 1GB of physical memory (896MB when considering partial reservations). The memory above the so-called ZONE_NORMAL must be mapped to more than 1GB of memory. The mapping is completely transparent to the application, but allocating memory pages in ZONE_HIGHMEM will cause a slight performance drop.


On the other hand, on 64-bit architectures such as x86-64 (also called x64), ZONE_HIGHMEM can extend all the way up to 64GB, or up to 128GB on IA-64 systems. As you can see, with 64-bit architecture, the memory page mapping overhead from ZONE_HIGHMEM to ZONE_NORMAL can be eliminated.


Figure 1-10 Linux kernel memory layout for 32-bit and 64-bit systems


Virtual Memory Address Layout
Figure 1-11 shows the Linux virtual address layout for 32-bit and 64-bit architectures.


On a 32-bit architecture, the maximum address space a process can access is 4GB. This is a limitation of 32-bit virtual addresses. In the standard implementation, the virtual address space is divided into 3GB of user space and 1GB of kernel space. This is somewhat similar to a variant of the 4G/4G addressing layout implementation.


On the other hand, in 64-bit architectures like x86-64 and IA64, there is no such restriction. Each individual process can benefit from a vast and huge address space.


Figure 1-11 Virtual memory address layout for 32-bit and 64-bit architectures


1.2.2 Virtual Memory Management
The physical memory architecture of an operating system is usually invisible to applications and users because the operating system maps any physical memory into virtual memory. If we want to understand the tuning possibilities in the Linux operating system, we must understand how Linux handles virtual memory. As described in "Physical Memory and Virtual Memory" in 1.2.1, the application cannot apply for physical memory, but when requesting a memory map of a certain size from the Linux kernel, a virtual memory map is obtained. As shown in Figure 1-12, virtual memory does not have to be mapped into physical memory. If your application allocates a lot of memory, some of this memory may be mapped to the swap file on disk.


Figure 1-12 shows that applications usually write directly not to disk, but directly to the cache or buffer. When the pdflush kernel thread is idle or the file size exceeds the cache buffer size, the pdflush kernel thread will flush the cached/buffered data and write it to disk. See "Empty Dirty Buffers".


Figure 1-12 Linux virtual memory management


Linux内核处理物理磁盘的写操作与Linux管理磁盘缓存紧密相连。其他的操作系统只分配部分内存作为磁盘缓存,而Linux处理内存资源则更加有效。默认的虚拟内存管理配置分配所有可用的空闲内存作为磁盘的缓存。因此在拥有大量内存的Linux系统中,经常看到只有20MB的空闲内存。


在相同的情况下,Linux管理swap空间也非常有效率。swap空间被使用时并不意味着出现内存的瓶颈,它恰恰证明了Linux管理系统资源如何的有效。详见“页帧回收”。


页帧的分配
一页是一组连续线性的物理内存(页帧)或虚拟内存。Linux内核以页为单位管理内存。一页的大小通常为4K字节。当一个进程申请一定数量的页时,如果可用的页足够,Linux内核马上分配给进程。否则,内存页必须从其他一些进程或内存页缓存中获取。Linux内存知道可用的内存页的数量及位置。


伙伴系统
Linux内核通过一种被称作伙伴系统的机制管理空闲页。伙伴系统管理空闲页并尽力为分配请求分配页。它尽最大努力保持内存区域的连续。如果不考虑分散的小页,将会导致内存碎片,并导致在连续区域内申请一大段的页变得困难。它将导致效率低下的内存使用和性能下降。


图1-13说明了伙伴系统如何分配页。


图1-13 伙伴系统


当尝试分配页失败,页回收会被激活。参阅“页帧回收”。


你可以通过/proc/buddyinfo查找伙伴系统的信息。详见“Memory used in a zone”。


页帧回收
当一个进程请求一定数量的页的映射时,如果页不可用,Linux内核新的请求尝试通过释放某些页(先前使用过但现在不再使用,但基于某些原则仍然被标记为活动状态的页)并分配内存给该进程。这个过程被称为面帧回收。kswapd内核线程和try_to_free_page()内核函数被用来负责页的回收。


kswapd线程通常处于可中断的睡眠状态,当某一区域中的自由页低于一个阈值时,kswapd线程会被伙伴系统调用。它尝试基于最近最少使用算法从活动页中找出候选页。最近最少使用的页将会被首先释放。活动列表和非活动列表被用于维护候选页。kswapd扫描部分活动列表并检查页的使用情况,把最近没有使用的页放到非活动列表中。你可以使用vmstat -a命令查看哪些内存是活动的和哪些内存是非活动的。


kswapd也遵循其他原则。页的使用主要是为了两个用途:页缓存和进程地址空间。页缓存是页映射到一个磁盘文件。属于一个进程地址空间的页(被称为匿名内存,因为它没有映射到任何文件,也没有名字)被用于堆和栈。参阅1.1.8,“进程内存段”。当kswapd回收页时,它将会尽量压缩页缓存而不是把进程的页page out(或者swap out)。


Page outswap out:“page out”和“swap out”很多时候都会被混淆。“page out”是指把页(整个地址空间的一部分)放到swap区,而“swap out”是指把整个地址空间放到swap区。但是它们有时候可以交换使用。


大部分被回收和进程地址空间的页缓存的回收取决于其使用场景,并将对性能产生影响。你可以通过使用/proc/sys/vm/swappiness对该行为进行一些控制。


swap(交换区)
如前所述,当页回收发生时,在非活动列表中属于该进程地址空间的候选页将会被page out。发生交换本身并不意味着发生了什么状况。虽然在其他系统中,swap只不过是万一发生了主要内存的过度分配的一种保障,但是Linux更有效地使用swap空间。如图1-12所示,虚拟内存由物理内存和磁盘或者swap分区共同组成。在Linux的虚拟内存管理的实现中,如果一个内存页已经被分配,但是在一段时间内都没有被使用,Linux会把该内存页移动至swap空间中。


你经常可以看到如getty的守护进程,它们通常当系统启动时被启动,但几乎不被使用。释放页所占的珍贵的主内存并把它移至交换区似乎是更加高效的。这正是Linux管理swap的方式,因此当你发现交换区已经使用了

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324734557&siteId=291194637