Docker basic technology: DEVICEMAPPER

Introduction to Device Mapper

DeviceMapper has become the most important technology of Linux since it was introduced in Linux 2.6. It supports a common device mapping mechanism for logical volume management in the kernel. It provides a highly modular kernel architecture for implementing block device drivers for storage resource management. It contains three important object concepts, Mapped Device, Mapping Table , Target device.

Mapped Device is a logical abstraction, which can be understood as a logical device provided by the kernel. It establishes a mapping with the Target Device through the mapping relationship described in the Mapping Table. Target device represents the physical space segment mapped by Mapped Device. For the logical device represented by Mapped Device, it is a physical device to which the logical device is mapped.

The Mapping Table contains information such as the starting address and range of the Mapped Device logic, and the address offset of the physical device where the Target Device is located, and the Target type (Note: These addresses and offsets are in units of disk sectors Yes, that is 512 bytes in size, so when you see 128, it actually means 128*512=64K).

The logical device Mapped Device in DeviceMapper can not only map one or more physical devices Target Device, but also map another Mapped Device, so it constitutes an iterative or recursive situation, just like the directory in the file system. There can be directories, which can theoretically be nested infinitely.

DeviceMapper implements filtering or redirection of IO requests through modular Target Driver plug-ins in the kernel. The currently implemented plug-ins include soft Raid, encryption, multi-path, mirroring, and snapshots. Principle of separation of strategy and mechanism in design. As shown below. From the figure, we can see DeviceMapper is just a framework, on which we can insert various strategies (reminds me of the strategy pattern in object-oriented unnaturally), among these many "plug-ins" , there is a thing called Thin Provisioning Snapshot, which is the most important module in Docker using DeviceMapper .

Image source: http://people.redhat.com/agk/talks/FOSDEM_2005/
Image source: http://people.redhat.com/agk/talks/FOSDEM_2005/

Introduction to Thin Provisioning

How to translate Thin Provisioning into Chinese is really a headache, so I won't translate it. This technology is a kind of virtualization technology. What does it mean? You can think of the "virtual memory technology" used in the memory management of our computers - the operating system gives each process N more than N more inexhaustible internal address addresses (under 32 bits, each process can have up to 2GB memory space), but we know that there is not so much physical memory. If we play according to the one-to-one mapping of process memory and physical memory, then how much physical memory do we need? Therefore, the operating system introduces the design of virtual memory, which means that I logically give you an infinite amount of memory, but in fact it is actually reimbursed , because I know that you must not use that much, so the effect of increasing memory usage is achieved. . (Many so-called virtualization in cloud computing today are actually using Thin Provisioning technology similar to "virtual memory", so-called over-provisioning, or over-selling)

Okay, let's get back to the topic, we are talking about storage here. Look at the following two pictures ( picture source ), the first is Fat Provisioning, the second is Thin Provisioning, which is a good illustration of what is going on (it is a concept with virtual memory)

thin-provisioning-1thin-provisioning-2

So, how does Docker use Thin Provisioning technology to achieve layered mirroring like UnionFS? The answer is that Docker uses Thin Provisioning's Snapshot technology. Let's introduce the Snapshot of Thin Provisioning.

Thin Provisioning Snapshot Demo

Below, we use a series of commands to demonstrate how Device Mapper's Thin Provisioning Snapshot works.

First, we need to create two files, one is data.img and the other is meta.data.img:

~hchen$ sudo dd if=/dev/zero of=/tmp/data.img bs=1K count=1 seek=10M
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 0.000621428 s, 1.6 MB/s

~hchen$ sudo dd if=/dev/zero of=/tmp/meta.data.img bs=1K count=1 seek=1G
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 0.000140858 s, 7.3 MB/s

Pay attention to the option in the command seek, which means to skip ofthe first 10G output bloksize space of the output file specified by the option before writing the content. Because bs is 1 byte, it is 10G in size, but in fact, it does not occupy space on the hard disk, and the occupied space is only 1k of content. Space is allocated for it on the hard disk when something is written to it. We can use the ls command to see that 12K and 4K are actually allocated.

~hchen$ sudo ls -lsh /tmp/data.img
12K -rw-r--r--. 1 root root 11G Aug 25 23:01 /tmp/data.img

~hchen$ sudo ls -slh /tmp/meta.data.img
4.0K -rw-r--r--. 1 root root 101M Aug 25 23:17 /tmp/meta.data.img

Then, we create a loopback device for this file. (loop2015 and loop2016 are two names I randomly picked)

~hchen$ sudo losetup /dev/loop2015 /tmp/data.img
~hchen$ sudo losetup /dev/loop2016 /tmp/meta.data.img

~hchen$ sudo losetup -a
/dev/loop2015: [64768]:103991768 (/tmp/data.img)
/dev/loop2016: [64768]:103991765 (/tmp/meta.data.img)

Now, we build a Thin Provisioning Pool for this device, using the dmsetup command:

~hchen$ sudo dmsetup create hchen-thin-pool \
                  --table "0 20971522 thin-pool /dev/loop2016 /dev/loop2015 \
                           128 65536 1 skip_block_zeroing"

The parameters are explained as follows (for more information, please refer to the man page of Thin Provisioning ):

  • dmsetup create is the command used to create thin pool
  • hchen-thin-pool is a custom pool name, as long as there is no conflict.
  • –table is the parameter setting of this pool
    • 0 represents the starting sector position
    • The sector number of the 20971522 code conclusion sentence, as mentioned earlier, a sector is 512 bytes, so 20971522 is exactly 10GB
    • /dev/loop2016 is the device of the meta file (we built it earlier)
    • /dev/loop2015 is the device for the data file (we built it earlier)
    • 128 is the minimum number of sectors that can be allocated
    • 65536 is the water mark of the least available sector, which is a threshold
    • 1 means there is an additional parameter
    • skip_block_zeroing is an additional parameter that skips blocks filled with 0

Then, we can see a Device Mapper device:

~hchen$ sudo ll /dev/mapper/hchen-thin-pool
lrwxrwxrwx. 1 root root 7 Aug 25 23:24 /dev/mapper/hchen-thin-pool -> ../dm-4

Next, our initial has not been completed, but also create a Thin Provisioning Volume:

~hchen$ sudo dmsetup message /dev/mapper/hchen-thin-pool 0 "create_thin 0"
~hchen$ sudo dmsetup create hchen-thin-volumn-001 \
            --table "0 2097152 thin /dev/mapper/hchen-thin-pool 0"

in:

  • The create_thin in the first command is a keyword, and the following 0 indicates the id of the device of this Volume
  • The second command is to actually create a mountable device for this Volumn, named hchen-thin-volumn-001. 2097152 is only 1GB

Well, before mount, we have to format:

~hchen$ sudo mkfs.ext4 /dev/mapper/hchen-thin-volumn-001
mke2fs 1.42.9 (28-Dec-2013)
Discarding device blocks: done
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=16 blocks, Stripe width=16 blocks
65536 inodes, 262144 blocks
13107 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=268435456
8 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376

Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

Ok, we can mount (in the following command, I also created a file)

~hchen$ sudo mkdir -p /mnt/base
~hchen$ sudo mount /dev/mapper/hchen-thin-volumn-001 /mnt/base
~hchen$ sudo echo "hello world, I am a base" > /mnt/base/id.txt
~hchen$ sudo cat /mnt/base/id.txt
hello world, I am a base

Ok, next, let's take a look at how the snapshot works:

~hchen$ sudo dmsetup message /dev/mapper/hchen-thin-pool 0 "create_snap 1 0"
~hchen$ sudo dmsetup create mysnap1 \
                   --table "0 2097152 thin /dev/mapper/hchen-thin-pool 1"

~hchen$ sudo ll /dev/mapper/mysnap1
lrwxrwxrwx. 1 root root 7 Aug 25 23:49 /dev/mapper/mysnap1 -> ../dm-5

In the above command:

  • The first command is to send a create_snap message to hchen-thin-pool, followed by two ids, the first is the new dev id, and the second is which existing dev id to take snapshot from (0 this The dev id is what we created earlier)
  • The second command is to create a mysnap1 device, which can be mounted.

Let's take a look:

~hchen$ sudo mkdir -p /mnt/mysnap1
~hchen$ sudo mount /dev/mapper/mysnap1 /mnt/mysnap1

~hchen$ sudo ll /mnt/mysnap1/
total 20
-rw-r--r--. 1 root root 25 Aug 25 23:46 id.txt
drwx------. 2 root root 16384 Aug 25 23:43 lost+found

~hchen$ sudo cat /mnt/mysnap1/id.txt
hello world, I am a base

Let's modify /mnt/mysnap1/id.txt and add a snap1.txt file:

~hchen$ sudo echo "I am snap1" >> /mnt/mysnap1/id.txt
~hchen$ sudo echo "I am snap1" > /mnt/mysnap1/snap1.txt

~hchen$ sudo cat /mnt/mysnap1/id.txt
hello world, I am a base
I am snap1

~hchen$ sudo cat /mnt/mysnap1/snap1.txt
I am snap1

Let's look at /mnt/base again, and you will find that nothing has changed:

~hchen$ sudo ls /mnt/base
id.txt      lost+found
~hchen$ sudo cat /mnt/base/id.txt
hello world, I am a base

Have you seen what layered mirroring looks like?

You still have to continue to build another snapshot on the snapshot just now

~hchen$ sudo dmsetup message /dev/mapper/hchen-thin-pool 0 "create_snap 2 1"
~hchen$ sudo dmsetup create mysnap2 \
                   --table "0 2097152 thin /dev/mapper/hchen-thin-pool 2"

~hchen$ sudo ll /dev/mapper/mysnap2
lrwxrwxrwx. 1 root root 7 Aug 25 23:52 /dev/mapper/mysnap1 -> ../dm-7

~hchen$ sudo mkdir -p /mnt/mysnap2
~hchen$ sudo mount /dev/mapper/mysnap2 /mnt/mysnap2
~hchen$ sudo  ls /mnt/mysnap2
id.txt  lost+found  snap1.txt 

Well, I believe you can see what layered mirroring looks like.

After watching the demo, let's add some theoretical knowledge:

  • Snapshot comes from LVM (Logic Volume Manager), which can take a snapshot for a device without interrupting the service.
  • Snapshot is Copy-On-Write, that is to say, the corresponding memory will be copied only if it is modified.

In addition, here is an article Storage thin provisioning benefits and challenges that you can read first.

Docker的DeviceMapper

The above is basically the way of playing Docker. We can take a look at the loopback device of docker:

~hchen $ sudo losetup -a
/dev/loop0: [64768]:38050288 (/var/lib/docker/devicemapper/devicemapper/data)
/dev/loop1: [64768]:38050289 (/var/lib/docker/devicemapper/devicemapper/metadata)

Among them, data 100GB, metadata 2.0GB

~hchen $ sudo ls -alhs /var/lib/docker/devicemapper/devicemapper
506M -rw-------. 1 root root 100G Sep 10 20:15 data
1.1M -rw-------. 1 root root 2.0G Sep 10 20:15 metadata 

Below is the relevant thin-pool. Among them, there is a device with a large string of hash strings that is the container being started:

~hchen $ sudo ll /dev/mapper/dock*
lrwxrwxrwx. 1 root root 7 Aug 25 07:57 /dev/mapper/docker-253:0-104108535-pool -> ../dm-2
lrwxrwxrwx. 1 root root 7 Aug 25 11:13 /dev/mapper/docker-253:0-104108535-deefcd630a60aa5ad3e69249f58a68e717324be4258296653406ff062f605edf -> ../dm-3

We can take a look at its device id (Docker has written them down):

~hchen $ sudo cat /var/lib/docker/devicemapper/metadata/deefcd630a60aa5ad3e69249f58a68e717324be4258296653406ff062f605edf
{
    
    "device_id":24,"size":10737418240,"transaction_id":26,"initialized":false}

The device_id is 24, the size is 10737418240, divided by 512, it is 20971520 sectors, we use this information to take a snapshot (Note: I used a relatively large dev id - 1024):

~hchen$ sudo dmsetup message "/dev/mapper/docker-253:0-104108535-pool" 0 \
                                    "create_snap 1024 24"
~hchen$ sudo dmsetup create dockersnap --table \
                    "0 20971520 thin /dev/mapper/docker-253:0-104108535-pool 1024"
~hchen$ sudo mkdir /mnt/docker
~hchen$ sudo mount /dev/mapper/dockersnap /mnt/docker/
~hchen$ sudo ls /mnt/docker/
id lost+found rootfs
~hchen$ sudo ls /mnt/docker/rootfs/
bin dev etc home lib lib64 lost+found media mnt opt proc root run sbin srv sys tmp usr var

We can also use the findmnt command in the docker container to see the related mount situation (because it is too long, the following is just a summary):

# findmnt
TARGET                SOURCE               
/                 /dev/mapper/docker-253:0-104108535-deefcd630a60[/rootfs]
/etc/resolv.conf  /dev/mapper/centos-root[/var/lib/docker/containers/deefcd630a60/resolv.conf]
/etc/hostname     /dev/mapper/centos-root[/var/lib/docker/containers/deefcd630a60/hostname]
/etc/hosts        /dev/mapper/centos-root[/var/lib/docker/containers/deefcd630a60/hosts]

Will Device Mapper work?

According to the Thin Provisioning document, this is still in the experimental stage, do not go to Production.

These targets are very much still in the EXPERIMENTAL state. Please do not yet rely on them in production.

Also, Jeff Atwood tweeted something like this

Jeff.Atwood.DeviceMapper

In the discussion pointed to by this push , it points to this code diff , which basically means that there are too many problems with DeviceMapper, and we should add it to the blacklist. Doker's Founder also replied:

So, if you are using loopback's devicemapper, when there is a problem with your storage, the correct solution is:

rm -rf /var/lib/docker

Guess you like

Origin blog.csdn.net/Wis57/article/details/130064075