[Reprint] DOCKER basic technologies: AUFS

DOCKER basic technologies: AUFS

docker-filesystems-busyboxrwAUFS is a Union File System, the so-called UnionFS directory is to merge different physical locations to mount the same directory. One of the main applications UnionFS is put a CD / DVD and a hard disk directory to mount joint together, then you can make changes to read-only files on the CD / DVD (of course, save the modified file to the directory on the hard drive).

AUFS called Another UnionFS, later called Alternative UnionFS, then perhaps that is not enough domineering, called into Advance UnionFS. Is called Junjiro Okajima (Okajima Junji Lang) developed in the year 2006, AUFS completely rewritten the early UnionFS 1.x, its main purpose is to reliability and performance, and introduces some new features, such as writable branch load balancing. AUFS fully compatible in the use UnionFS, but also than the previous UnionFS on the stability and performance to be much better, later UnionFS 2.x start copy function AUFS in. But he did not enter into Linux in the trunk, because let Linus, basically because the code more than her, and very bad (with respect to only UnionFS union mount 3000 lines and 10,000 lines, as well as other on average only 6,000 lines of code around the VFS, AUFS actually have 30,000 lines of code), so Okajima continue to improve code quality, continue to submit, continue to be Linus refused to fall, so today AUFS still can not get into Linux trunk (you can see today Code to AUFS actually better, compared to OpenSSL good N times, or is Linus on the quality of the code requirements are very high, or is Linus just do not like AUFS).

However, the good news is there are many distributions used AUFS, for example: Ubuntu 10.04, Debian6.0, Gentoo Live CD support AUFS, therefore, is also OK.

Well, finished pull these gossip, we see an example of it (the environment: Ubuntu 14.04)

 

First, we built two directories (fruits and vegetables), and put some files in both directories, apples and tomato fruits, vegetables, carrots and tomatoes.

1
2
3
4
5
6
7
8
$ tree
.
├── fruits
│   ├── apple
│   └── tomato
└── vegetables
     ├── carrots
     └── tomato

We then enter the following command:

1
2
3
4
5
6
7
8
9
10
11
12
# 创建一个mount目录
$ mkdir mnt
 
# 把水果目录和蔬菜目录union mount到 ./mnt目录中
$ sudo mount -t aufs -o dirs =. /fruits :. /vegetables none . /mnt
 
#  查看./mnt目录
$ tree . /mnt
. /mnt
├── apple
├── carrots
└── tomato

We can see that there are three files, apple apple, carrots, carrots and tomatoes in tomato ./mnt directory. Contents of fruits and vegetables are union to the next ./mnt directory.

Let's modify the contents of the file:

1
2
3
4
5
$ echo mnt > . /mnt/apple
$ cat . /mnt/apple
mnt
$ cat . /fruits/apple
mnt

The above example, we can see ./mnt/apple changed content, content / fruits / apple has also changed.

1
2
3
4
5
$ echo mnt_carrots > . /mnt/carrots
$ cat . /vegetables/carrots
 
$ cat . /fruits/carrots
mnt_carrots

The above example, we can see that we modify the content ./mnt/carrots file,. / Vegetables / carrots did not change, but is appearing in the directory ./fruits/carrots carrots file whose contents are we ./mnt/carrots the contents.

In other words, we mount aufs command, we do not refer to the directory permissions of its vegetables and fruits, the default, the first command line (left-most) directory is readable and writable, behind only all read. (In general, the front of the directory should be writable, while the latter should be read-only)

So, if we specify rights such as the following to mount aufs, you will not find the same effect (remember the first ./fruits/carrots above files deleted):

1
2
3
4
5
6
7
8
9
$ sudo mount -t aufs -o dirs =. /fruits =rw:. /vegetables =rw none . /mnt
 
$ echo "mnt_carrots" > . /mnt/carrots
 
$ cat . /vegetables/carrots
mnt_carrots
 
$ cat . /fruits/carrots
cat : . /fruits/carrots : No such file or directory

现在,在这情况下,如果我们要修改./mnt/tomato这个文件,那么究竟是哪个文件会被改写?

1
2
3
4
5
6
7
$ echo "mnt_tomato" > . /mnt/tomato
 
$ cat . /fruits/tomato
mnt_tomato
 
$ cat . /vegetables/tomato
I am a vegetable

可见,如果有重复的文件名,在mount命令行上,越往前的就优先级越高。

你可以用这个例子做一些各种各样的试验,我这里主要是给大家一个感性认识,就不展开试验下去了。

那么,这种UnionFS有什么用?

历史上,有一个叫Knoppix的Linux发行版,其主要用于Linux演示、光盘教学、系统急救,以及商业产品的演示,不需要硬盘安装,直接把CD/DVD上的image运行在一个可写的存储设备上(比如一个U盘上),其实,也就是把CD/DVD这个文件系统和USB这个可写的系统给联合mount起来,这样你对CD/DVD上的image做的任何改动都会在被应用在U盘上,于是乎,你可以对CD/DVD上的内容进行任意的修改,因为改动都在U盘上,所以你改不坏原来的东西。

我们可以再发挥一下想像力,你也可以把一个目录,比如你的源代码,作为一个只读的template,和另一个你的working directory给union在一起,然后你就可以做各种修改而不用害怕会把源代码改坏了。有点像一个ad hoc snapshot。

Docker把UnionFS的想像力发挥到了容器的镜像。你是否还记得我在介绍Linux Namespace上篇中用mount namespace和chroot山寨了一镜像。现在当你看过了这个UnionFS的技术后,你是不是就明白了,你完全可以用UnionFS这样的技术做出分层的镜像来。

下图来自Docker的官方文档Layer,其很好的展示了Docker用UnionFS搭建的分层镜像。

docker-filesystems-multilayer

关于docker的分层镜像,除了aufs,docker还支持btrfs, devicemapper和vfs,你可以使用 -s 或 –storage-driver= 选项来指定相关的镜像存储。在Ubuntu 14.04下,docker默认Ubuntu的 aufs(在CentOS7下,用的是devicemapper,关于devicemapper,我会以以后的文章中讲解)你可以在下面的目录中查看相关的每个层的镜像:

1
/var/lib/docker/aufs/diff/ < id >

在docker执行起来后(比如:docker run -it ubuntu /bin/bash ),你可以从/sys/fs/aufs/si_[id]目录下查看aufs的mount的情况,下面是个示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#ls /sys/fs/aufs/si_b71b209f85ff8e75/
br0      br2      br4      br6      brid1    brid3    brid5    xi_path
br1      br3      br5      brid0    brid2    brid4    brid6
 
# cat /sys/fs/aufs/si_b71b209f85ff8e75/*
/var/lib/docker/aufs/diff/87315f1367e5703f599168d1e17528a0500bd2e2df7d2fe2aaf9595f3697dbd7 =rw
/var/lib/docker/aufs/diff/87315f1367e5703f599168d1e17528a0500bd2e2df7d2fe2aaf9595f3697dbd7-init =ro+wh
/var/lib/docker/aufs/diff/d0955f21bf24f5bfffd32d2d0bb669d0564701c271bc3dfc64cfc5adfdec2d07 =ro+wh
/var/lib/docker/aufs/diff/9fec74352904baf5ab5237caa39a84b0af5c593dc7cc08839e2ba65193024507 =ro+wh
/var/lib/docker/aufs/diff/a1a958a248181c9aa6413848cd67646e5afb9797f1a3da5995c7a636f050f537 =ro+wh
/var/lib/docker/aufs/diff/f3c84ac3a0533f691c9fea4cc2ceaaf43baec22bf8d6a479e069f6d814be9b86 =ro+wh
/var/lib/docker/aufs/diff/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158 =ro+wh
64
65
66
67
68
69
70
/run/shm/aufs .xino

你会看到只有最顶上的层(branch)是rw权限,其它的都是ro+wh权限只读的。

关于docker的aufs的配置,你可以在/var/lib/docker/repositories-aufs这个文件中看到。

AUFS的一些特性

AUFS有所有Union FS的特性,把多个目录,合并成同一个目录,并可以为每个需要合并的目录指定相应的权限,实时的添加、删除、修改已经被mount好的目录。而且,他还能在多个可写的branch/dir间进行负载均衡。

上面的例子,我们已经看到AUFS的mount的示例了。下面我们来看一看被union的目录(分支)的相关权限:

  • rw表示可写可读read-write。
  • ro表示read-only,如果你不指权限,那么除了第一个外ro是默认值,对于ro分支,其永远不会收到写操作,也不会收到查找whiteout的操作。
  • rr表示real-read-only,与read-only不同的是,rr标记的是天生就是只读的分支,这样,AUFS可以提高性能,比如不再设置inotify来检查文件变动通知。

权限中,我们看到了一个术语:whiteout,下面我来解释一下这个术语。

一般来说ro的分支都会有wh的属性,比如 “[dir]=ro+wh”。所谓whiteout的意思,如果在union中删除的某个文件,实际上是位于一个readonly的分支(目录)上,那么,在mount的union这个目录中你将看不到这个文件,但是read-only这个层上我们无法做任何的修改,所以,我们就需要对这个readonly目录里的文件作whiteout。AUFS的whiteout的实现是通过在上层的可写的目录下建立对应的whiteout隐藏文件来实现的。

看个例子:

假设我们有三个目录和文件如下所示(test是个空目录):

1
2
3
4
5
6
7
8
9
# tree
.
├── fruits
│   ├── apple
│   └── tomato
├── test
└── vegetables
     ├── carrots
     └── tomato

我们如下mount:

1
2
3
4
5
6
# mkdir mnt
 
# mount -t aufs -o dirs=./test=rw:./fruits=ro:./vegetables=ro none ./mnt
 
# # ls ./mnt/
apple  carrots  tomato

现在我们在权限为rw的test目录下建个whiteout的隐藏文件.wh.apple,你就会发现./mnt/apple这个文件就消失了:

1
2
3
4
# touch ./test/.wh.apple
 
# ls ./mnt
carrots  tomato

上面这个操作和 rm ./mnt/apple是一样的。

相关术语

šBranch – 就是各个要被union起来的目录(就是我在上面使用的dirs的命令行参数)

  • šBranch根据被union的顺序形成一个stack,一般来说最上面的是可写的,下面的都是只读的。
  • šBranch的stack可以在被mount后进行修改,比如:修改顺序,加入新的branch,或是删除其中的branch,或是直接修改branch的权限

šWhiteout 和 Opaque

  • š如果UnionFS中的某个目录被删除了,那么就应该不可见了,就算是在底层的branch中还有这个目录,那也应该不可见了。
  • šWhiteout就是某个上层目录覆盖了下层的相同名字的目录。用于隐藏低层分支的文件,也用于阻止readdir进入低层分支。
  • šOpaque的意思就是不允许任何下层的某个目录显示出来。
  • š在隐藏低层档的情况下,whiteout的名字是’.wh.<filename>’。
  • š在阻止readdir的情况下,名字是’.wh..wh..opq’或者 ’.wh.__dir_opaque’。
相关问题

看到上面这些,你一定会有几个问题:

其一、你可能会问,要有文件在原来的地方被修改了会怎么样?mount的目录会一起改变吗?答案是会的,也可以是不会的。因为你可以指定一个叫udba的参数(全称:User’s Direct Branch Access),这个参数有三个取值:

  • udba=none – 设置上这个参数后,AUFS会运转的更快,因为那些不在mount目录里发生的修改,aufs不会同步过来了,所以会有数据出错的问题。
  • udba=reval – 设置上这个参数后,AUFS会去查文件有没有被更新,如果有的话,就会把修改拉到mount目录内。
  • udba=notify – 这个参数会让AUFS为所有的branch注册inotify,这样可以让AUFS在更新文件修改的性能更高一些。

其二、如果有多个rw的branch(目录)被union起来了,那么,当我创建文件的时候,aufs会创建在哪里呢? aufs提供了一个叫create的参数可以供你来配置相当的创建策略,下面有几个例子。

create=rr | round−robin 轮询。下面的示例可以看到,新创建的文件轮流写到三个目录中

1
2
3
4
5
6
7
8
9
10
hchen$ sudo mount -t aufs  -o dirs =. /1 =rw:. /2 =rw:. /3 =rw -o create=rr none . /mnt
hchen$ touch . /mnt/a . /mnt/b . /mnt/c
hchen$ tree
.
├── 1
│   └── a
├── 2
│   └── c
└── 3
    └── b

create = mfs [: second] | most-free-space [: second]  is selected from the best branch of the available space. You can specify a time to check the available disk space.

create = mfsrr: low [: second ]  is selected from a space larger than the low Branch, if the space is less than the low, then aufs uses round-robin manner.

More details on the use of AUFS parameters, we can directly in the Ubuntu 14.04 by  man aufs  look at where the various parameters and commands.

AUFS performance

AUFS performance slowly? Slow nor too slow. Because all branches AUFS will mount up, so, in the Find File is relatively slow. Because it has to traverse all of the branch. It is O (n) algorithm (Obviously, this algorithm has a lot of room for improvement) So, branch more performance to find files is also slower. However, once AUFS found the inode of the file, and read and write after the operation that the original file is basically the same.

So, if your program is running under AUFS, open and stat operation there will be significant performance degradation, branch more, the worse performance, but on the write / read operations, the performance does not change.

IBM research center on the performance of Docker gave a very good performance report (PDF) " An Updated Performance Comparison of Virtual Machinesand Linux Containers "

I cut out the two figures, the first sequential read and write, the second random read and write. There is little loss of performance problems. The KVM also in the case of random read and write a little slow (but if SSD hard drive is it?)

 

Sequential read and write

 

Random read and write

Further reading

Guess you like

Origin www.cnblogs.com/jinanxiaolaohu/p/12398262.html