Linux磁盘高速缓存

磁盘高速缓存是一种软件机制，允许系统把通常存放的磁盘上的一些数据保留在RAM中。

例如，目录项高速缓存（dentry cache）,加速从文件路径名到最后一个路径分量的索引节点转换过程。Linux还有其他磁盘高速缓存，如页高速缓存、缓冲区高速缓存。

这里需要注意缓存和缓冲的差异，缓冲是buffer，缓存是cache。

2.4之前的内核中有两种缓存，一种是vfs的页高速缓存，另外一种是缓冲区高速缓存，缓冲区对应于磁盘块，是磁盘块在内存中的表示罢，其中的数据也还是文件中的数据。页高速缓存是按照页面管理的，而缓冲区高速缓存是按照块来管理的，两个缓存在数据本身上有一定的重合，这就造成了冗余.

到了2.6内核，有了bio，那么buffer_head（高速缓存使用的数据结构）再也不用作为IO容器了，一般来说，能用bio的话就用bio。现在缓冲区高速缓存是页高速缓存的一部分了。

1.1.1 页高速缓存

页高速缓存是由内存中的物理页面组成的，内容对应磁盘上的物理块。

使用的回收策略主要由LRU、双链策略（LRU/n）、

页高速缓存中的页可能包含了多个不连续的物理磁盘块。因为不连续，所以导致定位特定数据是困难的。

为了管理缓存项和页I/O操作，引入了address_space结构体。位于include/linux/fs.h文件中。

struct address_space {

struct inode *host; /* owner: inode, block_device */

struct radix_tree_root i_pages; /* cached pages */

atomic_t i_mmap_writable;/* count VM_SHARED mappings */

struct rb_root_cached i_mmap; /* tree of private and shared mappings */

struct rw_semaphore i_mmap_rwsem; /* protect tree, count, list */

/* Protected by the i_pages lock */

unsigned long nrpages; /* number of total pages */

/* number of shadow or DAX exceptional entries */

unsigned long nrexceptional;

pgoff_t writeback_index;/* writeback starts here */

const struct address_space_operations *a_ops; /* methods */

unsigned long flags; /* error bits */

spinlock_t private_lock; /* for use by the address_space */

gfp_t gfp_mask; /* implicit gfp mask for allocations */

struct list_head private_list; /* for use by the address_space */

void *private_data; /* ditto */

errseq_t wb_err;

}

和虚拟地址vm_area_struct的物理地址对等体。一个文件可以被多个进程映射到多个虚拟地址，但是在物理内存中只有一份。

操作函数表的结构体是address_space_operations。指定缓存对象实现的页I/O操作。

1.1.2 缓冲区高速缓存

独立的磁盘块通过块I/O缓冲要被存入页高速缓存。一个缓冲是一个物理磁盘块在内存里的表示。缓冲的作用是映射内存中的页面到磁盘块，使得页高速缓存在块I/O操作时减少了磁盘访问。

高速缓冲区中分两类结构，第一类是高速缓冲区头部结构主要用来索引高速缓冲块地址，第二类是高速缓存块。

缓冲头的结构体定义在文件include/linux/buffer_head.h中

在源码中可以看到其头部对齐注释如下：

* Historically, a buffer_head was used to map a single block

* within a page, and of course as the unit of I/O through the

* filesystem and block layers. Nowadays the basic I/O unit

* is the bio, and buffer_heads are used for extracting block

* mappings (via a get_block_t call), for tracking state within

* a page (via a page_mapping) and for wrapping bio submission

* for backward compatibility reasons (e.g. submit_bh).

2.4以前，页高速缓存和缓冲区是独立的两个磁盘缓存，前者缓存页面，后者缓存缓冲区。导致一个磁盘块会同时在两个缓存区中，浪费内存，这当然是不能接受的。现在已完成统一，只有一个页高速缓存，而缓冲则用来表示磁盘块用页映射块，包含在了页高速缓存中了。

1.1.3 刷新机制

内存中的数据早晚是要刷到磁盘的，一般由多种激发方式，具体可以查看/proc/sys/vm文件夹中相关文件，例如：

lÂ Â 当空闲内存低于一个特定的阈值/proc/sys/vm/dirty_background_ratio

lÂ Â 当脏页在内存中驻留时间超过一个特定的阈值, /proc/sys/vm/dirty_expire_centisecs

lÂ Â 用户进程调用sync和fsync

如果系统崩溃，内存中没有刷会的脏页就会丢失，所以周期性同步页是非常重要的。

这个刷盘动作，最早是由bdflush完成，后来由pdflush。可以在系统中执行ps -ef | grep -i flush来查看。

现在也是支持多线程刷盘的。