Linux snapshot (snapshot) principle and practice (2) Snapshot function practice

Linux snapshot principle and practice (2)

0. Overview

The previous article "Linux snapshot (snapshot) principle and practice (1)" briefly introduced the basic principle of snapshots, and focused on the implementation of snapshot snapshots under Linux. This should be the only article on the entire network that introduces snapshots under Linux. model article.

For front-line engineers, only theory is far from enough, so I designed another set of experimental operations in this article to verify various functions of the snapshot snapshot model under Linux, so as to deepen the understanding of snapshot snapshots.

Specific experiments include:

  • Creation of the snapshot-origin target (Section 2)

  • Creation of snapshot targets (Section 3)

  • COW mode (section 4)

    • Verification of data written for the first time (Section 4.1)
    • Verification of the second written data (Section 4.2)
  • ROW mode (section 5)

    • Verification of data written for the first time (Section 5.1)
    • Verification of the second written data (Section 5.2)
  • Creation of the snapshot-merge target (Section 6)

  • COW mode and ROW mode data changes in merge operations (Section 7)

Since these experiments are one after another, it is recommended to start from the first section, follow the various operation commands I provided to repeat these experiments in your local area, and check the changes in the data in person, so as to achieve better results .

This article uses the following Linux command line tools:

echo, tr, dd, md5, hexdump, xxd, losetup, dmsetup

The last 3 commands are less common, but important:

  • xxd, a super easy-to-use binary tool, can also be used to modify files very conveniently
  • losetup, for management operations of loop devices
  • dmsetup, for the management of device mapper virtual devices

1. Prepare demo data

For the convenience of demonstration, two data files data-base.img and data-cow.img are created here to represent the source volume and snapshot volume in the scene respectively.

  • The source volume data-base.img has a size of 100M and all 0xFF, but special strings are written at 0x0000 and 0x1000 (4K) to facilitate subsequent operations.

  • Snapshot volume data-cow.img, size is 50M, all 0x00

Here, the data of the source volume and the snapshot volume are deliberately set to 0xFF and 0x00 respectively, so if the data in the source volume enters the snapshot volume, it can be found by viewing the snapshot volume.

For ease of description and reference, I have numbered all the steps like step 1a, step 1b…

#
# step 1. 准备源卷数据 data-base.img
#
# step 1a. data-base.img, 100M 全 0xff, 100M = 1024 x 102400
$ tr '\000' '\377' < /dev/zero | dd of=data-base.img bs=1024 count=102400
$ hexdump -C data-base.img 
00000000  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
06400000

# step 1b. 把 "Great China!" 写入到 0x0000 开始的地方
$ echo -n "Great China!" | xxd
00000000: 4772 6561 7420 4368 696e 6121            Great China!
$ echo -n "00000000: 4772 6561 7420 4368 696e 6121" | xxd -r - data-base.img 
$ hexdump -C data-base.img 
00000000  47 72 65 61 74 20 43 68  69 6e 61 21 ff ff ff ff  |Great China!....|
00000010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
06400000

# step 1c. 把 "Rocky Can Do It!" 写入到 0x1000(4096) 开始的地方
$ echo -n 'Rocky Can Do It!' | xxd
00000000: 526f 636b 7920 4361 6e20 446f 2049 7421  Rocky Can Do It!
# 这里记得将偏移地址调整为 00001000 (4096), 如下
$ echo -n "00001000: 526f 636b 792c 2053 7570 6572 6d61 6e21" | xxd -r - data-base.img  
$ hexdump -C data-base.img 
00000000  47 72 65 61 74 20 43 68  69 6e 61 21 ff ff ff ff  |Great China!....|
00000010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00001000  52 6f 63 6b 79 20 43 61  6e 20 44 6f 20 49 74 21  |Rocky Can Do It!|
00001010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
06400000

#
# step 2. 准备快照卷数据 data-cow.img
#
# step 2a. data-cow.img, 50M 全 0, 50M = 1024 x 51200
$ dd if=/dev/zero of=data-cow.img bs=1024 count=51200
$ hexdump -C data-cow.img 
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
03200000

#
# step 3. 将源卷和快照卷数据文件挂载为 loop 设备
#
# step 3a. 计算 data-base.img 和 data-cow.img 的 md5 值
$ md5sum data-base.img data-cow.img
54bc2af511613f6405b27144c410fb94  data-base.img
25e317773f308e446cc84c503a6d1f85  data-cow.img

# step 3b. 将 data-base.img 和 data-cow 挂载为 loop 设备
$ sudo losetup -f data-base.img --show
/dev/loop3
$ sudo losetup -f data-cow.img --show
/dev/loop6

# step 3c. 再次查看数据文件和 loop 设备的 md5
$ sudo md5sum /dev/loop3 data-base.img /dev/loop6 data-cow.img 
54bc2af511613f6405b27144c410fb94  /dev/loop3
54bc2af511613f6405b27144c410fb94  data-base.img
25e317773f308e446cc84c503a6d1f85  /dev/loop6
25e317773f308e446cc84c503a6d1f85  data-cow.img

To sum up all the operations above:

  1. A 100M source volume file data-base.img is prepared, all 0xFF, but special data is written in two places for backup;
  2. Prepared a 50M snapshot volume file data-cow.img, all 0x00
  3. Mount data-base.img and data-cow.img as loop devices /dev/loop3 and /dev/loop6 respectively;

Special data in the source volume file data-base.img includes:

  • At the beginning of 0x0000, write the string "Great China!"

  • At the beginning of 0x1000 (offset address 4096), write the string "Rocky Can Do It!"

In step 3c, we use md5 to calculate the md5 hash values ​​of the two data files and their corresponding loop devices, to prove that the content of the data files is exactly the same as that of the corresponding loop devices. Due to the problem that the data and the loop file are not synchronized, the md5 value of the loop device will not be calculated separately, but only the md5 value of the data file will be calculated.

Synchronization issue between data files and loop devices mounted using data files

In my experiments, I found that after the data file is changed in some cases, if it is not remounted, the loop device will not reflect this modification.

After searching on google, I found that many people have raised this synchronization problem, but there is no good solution. I haven't studied the loop device driver yet, if anyone knows how to solve the synchronization problem, please specify one or two, thank you very much~

This article uses the xxd command to convert and modify hexadecimal content, which is a very, very useful skill.

For how to use xxd efficiently, please refer to my article: "Don't look for it, this command allows you to freely convert between strings and hexadecimals"

2. Create a snapshot-origin target

#
# step 4. 基于源卷映射的 loop 设备创建 snapshot-origin 目标设备
#
# step 4a. 获取 /dev/loop3 和 /dev/loop6 的 sector 数量(每个 sector 为 512 字节)
$ sudo blockdev --getsz /dev/loop3 /dev/loop6
204800
102400

# step 4b. 基于 /dev/loop3 创建 snapshot-origin 设备 /dev/mapper/origin
$ sudo dmsetup create origin --table "0 204800 snapshot-origin /dev/loop3"
$ sudo dmsetup table origin
0 204800 snapshot-origin 7:4

# step 4c. 检查数据文件和虚拟设备的 md5
$ sudo md5sum data-base.img /dev/mapper/origin data-cow.img
54bc2af511613f6405b27144c410fb94  data-base.img
54bc2af511613f6405b27144c410fb94  /dev/mapper/origin
25e317773f308e446cc84c503a6d1f85  data-cow.img

It can be seen from the above that the virtual device /dev/mapper/origin has the same content as data-base.img.

3. Create a snapshot target

#
# step 5. 基于源卷和快照卷的 loop 设备创建 snapshot 目标
#         必须用 /dev/loop3 (对应 data-base.img)创建快照,不能使用 /dev/mapper/origin
# step 5a. 基于源卷和快照卷的 loop 设备创建 snapshot 目标设备
$ sudo dmsetup create snapshot --table "0 204800 snapshot /dev/loop3 /dev/loop6 P 8"            
$ sudo dmsetup table snapshot
0 204800 snapshot 7:3 7:6 P 8

# step 5b. 检查数据文件和虚拟设备的 md5
$ sudo md5sum data-base.img /dev/mapper/origin /dev/mapper/snapshot data-cow.img
54bc2af511613f6405b27144c410fb94  data-base.img
54bc2af511613f6405b27144c410fb94  /dev/mapper/origin
54bc2af511613f6405b27144c410fb94  /dev/mapper/snapshot
f0cb475bc4c1a84c31ba9c9053445daf  data-cow.img

In particular, explain --tablethe parameters :

--table "0 204800 snapshot /dev/loop3 /dev/loop6 P 8"
  • "0 204800 snapshot", respectively specify the starting position (0 sector) and length (204800 sector) of the mapped virtual device, and the type of created virtual device (snapshot)
  • "/dev/loop3", create a snapshot based on the device /dev/loop3
  • "/dev/loop6", the snapshot cow device is /dev/loop6
  • "P", specifies to use the persistent way to create a snapshot, the so-called persistence means that the data is stored in the external memory (that is, the cow device)
  • "8", specifies that the chunk size of the created snapshot is 8, that is, the actual size of each chunk is 512 x 8 = 4096, and the commonly used parameter is 8 or 16, which means that the size of each chunk is 4K or 8K

The basic unit of device mapper device is sector, each sector is 512 bytes.

Carefully observe the md5 value of data-cow.img calculated in the last step step 5b above, which has changed compared with the md5 value of the previous step 4c.

md5-diff-for-snapshot-creation.png

Figure 1. Comparison of the md5 value of each device before and after creating the snapshot target

The main reason is that 16 bytes of disk header data will be written to the head of the cow device when creating a snapshot:

# step 5c. 创建 snapshot 目标后查看快照卷 data-cow.img 的内容
$ hexdump -C data-cow.img 
00000000  53 6e 41 70 01 00 00 00  01 00 00 00 08 00 00 00  |SnAp............|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
03200000

The snapshot has just been created without any COW or ROW operations, so initially only the disk header data is written to the initial position of the cow device, and the rest of the data is still all 0.

So far, the relationship model between the snapshot devices we created is as follows:

snapshot-device-relations.png

Figure 2. Device-to-device relationships for snapshot experiments

4. Verify COW operation

Verifying COW operations needs to be done on the origin device.

According to the conclusion in the article "Linux snapshot (snapshot) principle and practice (1)" , for COW:

When new data is first written to a storage location on the origin device:

  1. First read the original content in the source volume data-base.img and write it to the snapshot volume data-cow.img;
  2. Then write the new data to the source volume data-base.img.

Operation 1 occurs only when data is written for the first time, and the next write operation to this location directly writes new data to the source volume, and no copy-on-write (COW) operation is performed.

4.1 Write data for the first time

Write the string "Wonderful World!" to the origin device starting at 0x0000, and a Copy-On-Write operation is expected:

  • Write the old data (string "Great China!") of 1 chunk starting from 0x0000 to the snapshot volume data-cow.img (cow device),
  • New data will be written directly to the location starting at 0x0000 in the source volume data-base.img.
#
# step 6. 第一次往 orgin 设备的 0x0000 写数据触发 Copy-On-Write 操作
#
# step 6a. 把 "Wonderful World!" 写入到 origin 设备 0x0000 开始的地方
$ echo -n "Wonderful World!" | xxd
00000000: 576f 6e64 6572 6675 6c20 576f 726c 6421  Wonderful World!
$ echo -n "00000000: 576f 6e64 6572 6675 6c20 576f 726c 6421" | sudo  xxd -r - /dev/mapper/origin 

# step 6b. 查看 origin 设备的内容
$ sudo hexdump -C /dev/mapper/origin
00000000  57 6f 6e 64 65 72 66 75  6c 20 57 6f 72 6c 64 21  |Wonderful World!|
00000010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00001000  52 6f 63 6b 79 20 43 61  6e 20 44 6f 20 49 74 21  |Rocky Can Do It!|
00001010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
06400000

# step 6c. 查看源卷 data-base.img 的内容
$ hexdump -C data-base.img
00000000  57 6f 6e 64 65 72 66 75  6c 20 57 6f 72 6c 64 21  |Wonderful World!|
00000010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00001000  52 6f 63 6b 79 20 43 61  6e 20 44 6f 20 49 74 21  |Rocky Can Do It!|
00001010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
06400000

# step 6d. 第一次往 orgin 写入数据后查看 md5 值
$ sudo md5sum data-base.img /dev/mapper/origin /dev/mapper/snapshot data-cow.img 
feaabda81f5f031cba6ed20c1914dff1  data-base.img
feaabda81f5f031cba6ed20c1914dff1  /dev/mapper/origin
54bc2af511613f6405b27144c410fb94  /dev/mapper/snapshot
b7ce3102b418858009e5a3662d8a6a5a  data-cow.img

When checking the md5 of the data file and virtual device above, compared with the value before writing the data, data-base.img, /dev/mapper/origin and data-cow.img have changed. /dev/mapper/snapshot Because there is no operation, the md5 has not changed.

md5-diff-for-cow-1st-modification.png

Figure 3. Changes in the md5 value of each device before and after writing data to origin for the first time

The changes in the data file data-base.img and /dev/mapper/origin are easy to understand, because we changed the content of the source volume at 0x0000, and changed the original "Great China!" to "Wonderful World!"

For COW devices, how does the data change? Let's take a look at the data of the device:

# step 6e. 查看 data-cow.img 文件的数据
$ hexdump -C data-cow.img 
00000000  53 6e 41 70 01 00 00 00  01 00 00 00 08 00 00 00  |SnAp............|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000  00 00 00 00 00 00 00 00  02 00 00 00 00 00 00 00  |................|
00001010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002000  47 72 65 61 74 20 43 68  69 6e 61 21 ff ff ff ff  |Great China!....|
00002010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00003000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
03200000

Before starting to explain this data, recall that the chunksize specified when creating the snapshot device is 8, so the size of a chunk is 4K (0x1000).

Now let me explain the results here:

  1. Area: 0x00000000-0x0000fff, 1 chunk, which stores the disk header containing the following information:

    • magic: 53 6e 41 70, corresponding to "SnAp";
    • valid: 01 00 00 00, the value is 1;
    • version: 01 00 00 00, the value is 1;
    • chunk_size: 08 00 00 00, the value is 8, corresponding to 512 x 8 = 4KB chunk data

    The parts other than the disk header are all filled with 0;

  2. Area: 0x00001000-0x00001fff, 1 chunk, now only the COW mapping table is stored in it, and one piece has been mapped, so the content is relatively simple, and the rest of the data are all 0.

  3. Area: 0x00002000-0x00002fff, 1 chunk, which stores the content of the first chunk (0x0000~0x0fff) from the origin device.

In summary, we have seen that when writing data to the 0x0000 position of the source volume origin for the first time, the old data in the source volume origin is saved to the second chunk (numbering starts from 0) in the cow device, and the new data is directly written into the source volume origin.

4.2 Write data for the second time

Write the string "Go away, COVID-19!" to the origin device starting at 0x0000, and no Copy-On-Write operation is expected:

  • The second time the new data will be written directly to the source volume data-base.img.
#
# step 7. 第二次往 orgin 设备的 0x0000 写数据不会触发 Copy-On-Write 操作
#
# step 7a. 把 "Go away, COVID-19!" 写入到 origin 设备 0x0000 开始的地方
$ echo -n "Go away, COVID-19!" | xxd -c 18
00000000: 476f 2061 7761 792c 2043 4f56 4944 2d31 3921  Go away, COVID-19!
$ echo -n "00000000: 476f 2061 7761 792c 2043 4f56 4944 2d31 3921" | sudo xxd -r - /dev/mapper/origin

# step 7b. 查看设备 origin 的内容
$ sudo hexdump -C /dev/mapper/origin
00000000  47 6f 20 61 77 61 79 2c  20 43 4f 56 49 44 2d 31  |Go away, COVID-1|
00000010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00001000  52 6f 63 6b 79 20 43 61  6e 20 44 6f 20 49 74 21  |Rocky Can Do It!|
00001010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
06400000

# step 7c. 查看源卷 data-base.img 的内容
$ hexdump -C data-base.img
00000000  47 6f 20 61 77 61 79 2c  20 43 4f 56 49 44 2d 31  |Go away, COVID-1|
00000010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00001000  52 6f 63 6b 79 20 43 61  6e 20 44 6f 20 49 74 21  |Rocky Can Do It!|
00001010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
06400000

# step 7d. 第二次往 orgin 写入数据后查看 md5 值
$ sudo md5sum data-base.img /dev/mapper/origin /dev/mapper/snapshot data-cow.img
b566e218b059f01f4aac7ad185845fe8  data-base.img
b566e218b059f01f4aac7ad185845fe8  /dev/mapper/origin
54bc2af511613f6405b27144c410fb94  /dev/mapper/snapshot
b7ce3102b418858009e5a3662d8a6a5a  data-cow.img

Compared with the md5 value when writing data for the first time, the md5 value of data-base.img and /dev/mapper/origin has changed, but the snapshot volume data-cow.img has not changed, so make sure to modify the same block for the second time Data does not generate COW operations.

md5-diff-for-cow-2nd-modification.png

Figure 4. Changes in the md5 value of each device before and after writing data to origin for the second time

This can also be confirmed by viewing the contents of the snapshot volume data-cow.img:

# step 7e. 查看 data-cow.img 文件的数据
$ hexdump -C data-cow.img 
00000000  53 6e 41 70 01 00 00 00  01 00 00 00 08 00 00 00  |SnAp............|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000  00 00 00 00 00 00 00 00  02 00 00 00 00 00 00 00  |................|
00001010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002000  47 72 65 61 74 20 43 68  69 6e 61 21 ff ff ff ff  |Great China!....|
00002010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00003000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
03200000

Therefore, when writing data to the location 0x0000 of the origin device for the second time, the new data is directly written into the source volume data-base.img, without any write operation on the snapshot volume data-cow.img.

5. Verify ROW operation

Verify ROW operations need to be performed on the snapshot device.

According to the conclusion in the article "Linux snapshot (snapshot) principle and practice (1)" , for ROW:

  • When new data is written to the snapshot device for the first time, it will be redirected and written to the snapshot volume data-cow.img, and the source volume data-base.img will not change.

  • When the data at the same location is rewritten again, the system will keep the data in the source volume data-base.img unchanged and continue to redirect to the snapshot volume data-cow.img.

5.1 Write data for the first time

Write the string "You Can Do It!!!" to the snapshot device starting at 0x1000, which is expected to generate a Redirect-On-Write operation:

  • Directly redirect and write the new data "You Can Do It!!!" to the snapshot volume data-cow.img;
  • The data in the source volume data-base.img remains unchanged
#
# step 8. 第一次往 snapshot 设备的 0x1000 (4KB) 写数据触发 Redirect-On-Write 操作
#
# step 8a. 把 "You Can Do It!!!" 写入到 snapshot 设备的 0x1000 开始的地方
$ echo -n 'You Can Do It!!!' | xxd
00000000: 596f 7520 4361 6e20 446f 2049 7421 2121  You Can Do It!!!

# 务必记得调整地址为 00001000
$ echo -n "00001000: 596f 7520 4361 6e20 446f 2049 7421 2121" | sudo xxd -r - /dev/mapper/snapshot

# step 8b. 查看设备 snapshot 的内容
$ sudo hexdump -C /dev/mapper/snapshot 
00000000  47 72 65 61 74 20 43 68  69 6e 61 21 ff ff ff ff  |Great China!....|
00000010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00001000  59 6f 75 20 43 61 6e 20  44 6f 20 49 74 21 21 21  |You Can Do It!!!|
00001010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
06400000

# step 8c. 查看源卷 data-base.img 的内容
$ hexdump -C data-base.img 
00000000  47 6f 20 61 77 61 79 2c  20 43 4f 56 49 44 2d 31  |Go away, COVID-1|
00000010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00001000  52 6f 63 6b 79 20 43 61  6e 20 44 6f 20 49 74 21  |Rocky Can Do It!|
00001010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
06400000

# step 8d. 第一次往 snapshot 写入数据后查看 md5 值
$ sudo md5sum data-base.img /dev/mapper/origin /dev/mapper/snapshot data-cow.img 
b566e218b059f01f4aac7ad185845fe8  data-base.img
b566e218b059f01f4aac7ad185845fe8  /dev/mapper/origin
27ceb83bde809ec2288a2cb3493cf033  /dev/mapper/snapshot
effae9af0f5a4aa453af2185b741e300  data-cow.img

This time, the data in the snapshot device has changed, and the snapshot volume data-cow.img has also changed, but the source volume data-base.img has not changed, as shown in the following figure:

md5-diff-for-row-1st-modification.png

Figure 5. md5 changes before and after writing data to the snapshot device for the first time

Let’s take a look at the contents of the snapshot volume data-cow.img:

# step 8e. 查看 data-cow.img 文件的数据
$ hexdump -C data-cow.img 
00000000  53 6e 41 70 01 00 00 00  01 00 00 00 08 00 00 00  |SnAp............|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000  00 00 00 00 00 00 00 00  02 00 00 00 00 00 00 00  |................|
00001010  01 00 00 00 00 00 00 00  03 00 00 00 00 00 00 00  |................|
00001020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002000  47 72 65 61 74 20 43 68  69 6e 61 21 ff ff ff ff  |Great China!....|
00002010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00003000  59 6f 75 20 43 61 6e 20  44 6f 20 49 74 21 21 21  |You Can Do It!!!|
00003010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00004000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
03200000

It can be seen that the data "You Can Do It!!!" written to the snapshot device appears in the snapshot volume data-cow.img.

Let me explain the contents of the snapshot volume data-cow.img:

  1. Area: 0x00000000-0x0000fff, 1 chunk, which stores a 16-byte disk header, and fills the rest with 0;

    # 16 字节 disk header
    00000000  53 6e 41 70 01 00 00 00  01 00 00 00 08 00 00 00  |SnAp............|
    
    # 剩余部分全部填充为 0
    00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    *
    
  2. Area: 0x00001000-0x00001fff, 1 chunk, now it not only stores the COW mapping table generated when verifying the COW operation, but also stores the ROW mapping table generated by ROW verification here, and the rest of the data are all 0.

    # COW 映射表
    00001000  00 00 00 00 00 00 00 00  02 00 00 00 00 00 00 00  |................|
    
    # ROW 映射表
    00001010  01 00 00 00 00 00 00 00  03 00 00 00 00 00 00 00  |................|
    
    # 剩余部分全部填充为 0
    00001020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    *
    
  3. Area: 0x00002000-0x00002fff, 1 chunk, which stores the content of verifying the first write data of COW operation.

    00002000  47 72 65 61 74 20 43 68  69 6e 61 21 ff ff ff ff  |Great China!....|
    00002010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
    *
    
  4. Area: 0x00003000-0x00003fff, 1 chunk, storing the redirected data written here.

    00003000  59 6f 75 20 43 61 6e 20  44 6f 20 49 74 21 21 21  |You Can Do It!!!|
    00003010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
    *
    

5.2 Write data for the second time

Rewrite the data starting from 0x1000 of the snapshot device again, and write the string "Everyone Can Do!":

  • The new data "Everyone Can Do!" will also be redirected to the snapshot volume data-cow.img.
  • The data in the source volume data-base.img remains unchanged
  • Pay attention to whether the data "You Can Do It!!!" from the last redirection will be retained?
#
# step 9. 第二次往 snapshot 设备的 0x1000 (4KB) 写数据触发 Redirect-On-Write 操作
#
# step 9a. 把 "Everyone Can Do!" 写入到 snapshot 设备 0x1000 开始的地方
$ echo -n 'Everyone Can Do!' | xxd
00000000: 4576 6572 796f 6e65 2043 616e 2044 6f21  Everyone Can Do!

# 这里记得调整地址为 00001000
$ echo -n '00001000: 4576 6572 796f 6e65 2043 616e 2044 6f21' | sudo xxd -r - /dev/mapper/snapshot 

# step 9b. 查看设备 snapshot 的内容
$ sudo hexdump -C /dev/mapper/snapshot 
00000000  47 72 65 61 74 20 43 68  69 6e 61 21 ff ff ff ff  |Great China!....|
00000010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00001000  45 76 65 72 79 6f 6e 65  20 43 61 6e 20 44 6f 21  |Everyone Can Do!|
00001010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
06400000

# step 9c. 查看源卷 data-base.img 的内容
$ hexdump -C data-base.img 
00000000  47 6f 20 61 77 61 79 2c  20 43 4f 56 49 44 2d 31  |Go away, COVID-1|
00000010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00001000  52 6f 63 6b 79 20 43 61  6e 20 44 6f 20 49 74 21  |Rocky Can Do It!|
00001010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
06400000

# step 9d. 第二次往 snapshot 写入数据后查看 md5 值
$ sudo md5sum data-base.img /dev/mapper/origin /dev/mapper/snapshot data-cow.img
b566e218b059f01f4aac7ad185845fe8  data-base.img
b566e218b059f01f4aac7ad185845fe8  /dev/mapper/origin
5e974c6c78213a70a77d4757cb0b8205  /dev/mapper/snapshot
7766259753fa0a3a75798e4a6416e2bc  data-cow.img

The following is the comparison result of md5 value:

md5-diff-for-row-2nd-modification.png

Figure 6. md5 changes before and after writing data to the snapshot device for the second time

As can be seen from the above content, the content of the snapshot has changed. Because of write redirection, new data is also written to the snapshot volume data-cow.img. The content of the source volume data-base.img has not changed.

Let's take a look at the contents of the snapshot volume data-cow.img:

# step 9e. 查看 data-cow.img 文件的数据
$ hexdump -C data-cow.img 
00000000  53 6e 41 70 01 00 00 00  01 00 00 00 08 00 00 00  |SnAp............|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000  00 00 00 00 00 00 00 00  02 00 00 00 00 00 00 00  |................|
00001010  01 00 00 00 00 00 00 00  03 00 00 00 00 00 00 00  |................|
00001020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002000  47 72 65 61 74 20 43 68  69 6e 61 21 ff ff ff ff  |Great China!....|
00002010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00003000  45 76 65 72 79 6f 6e 65  20 43 61 6e 20 44 6f 21  |Everyone Can Do!|
00003010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00004000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
03200000

It can be seen from here that the last (step 8e) write redirection "You Can Do It!!!" in the snapshot volume data-cow.img has been replaced by the new data "Everyone Can Do!".

6. Create a snapshot-merge target

According to the conclusion in the article "Linux snapshot (snapshot) principle and practice (1)", the function of snapshot-merge is to merge the data in the snapshot volume data-cow.img back to the source volume data-base.img :

  • For COW operation, the source volume data-base.img saves the latest data, and the merge operation will roll back the source volume data-base.img to the data at the snapshot time point.

  • For the ROW operation, the source volume data-base.img saves the data at the snapshot time point, and the merge operation will update the source volume data-base.img to the latest data.

The function of snapshot-merge requires it to be associated with two other targets, snapshot-origin and snapshot:

  • snapshot-merge and snapshot use the same parameters, only valid under persistent snapshot (persistent snapshot)
  • snapshot-merge assumes the role of snapshot-origin, if the source volume still has a snapshot-origin device, it must not be loaded

Normally, you can create a snapshot-merge target device merge directly by the following operations:

# step 10. 直接基于源卷和快照卷的 loop 设备创建一个新的 snapshot-merge 目标设备
$ sudo dmsetup create merge --table '0 204800 snapshot-merge /dev/loop3 /dev/loop6 P 8'

But here, as shown in Figure 2, the source volume data-base.img is currently still associated with two devices:

  • A snapshot-origin device origin,
  • There is also a snapshot device snapshot.

Therefore, you cannot directly use the above step 10 command to create a snapshot-merge target device.

Before creating the snapshot-merge destination device, first stop the snapshot-origin device mapped to the source volume data-base.img.

Since snapshot-merge assumes the role of snapshot-origin, you can directly consider creating a snapshot-merge device as follows:

#
# step 11. 基于源卷和快照卷的 loop 设备创建 snapshot-merge 目标设备
#
# step 11a. 暂停源卷 data-base.img 绑定的 snapshot-origin 目标设备 origin
$ sudo dmsetup suspend origin
# step 11b. 移除源卷 data-base.img 映射的 snapshot 目标设备 origin
$ sudo dmsetup remove snapshot
# step 11c. 使用 reload 操作将原来的 snapshot-origin 目标改变成 snapshot-merge 目标
$ sudo dmsetup reload origin --table '0 204800 snapshot-merge /dev/loop3 /dev/loop6 P 8'
# step 11d. 恢复 origin 设备的运行,但此时 origin 已经是 snapshot-merge 目标设备了
$ sudo dmsetup resume origin

Creating a snapshot-merge device goes through the following steps:

  1. Suspend the origin device;
  2. Cancel snapshot mapping;
  3. Reuse the snapshot parameters to change the original origin device from the snapshot-origin target to the snapshot-merge target;
  4. Resume the operation of the origin device, since it is already the target device of snapshot-merge, it starts to execute the merge operation internally;

7. Verify the merge operation

After the device origin of the snapshot-merge target starts working, how to confirm that the merge operation is completed?

The answer is to check the status of the snapshot-merge device.

For our demonstration here, it is to check the current snapshot-merge device, that is, the status of origin:

# step 12a. 查看 origin 设备的状态
$ sudo dmsetup status origin
0 204800 snapshot-merge 16/102400 16

dmsetup statusThe last three items of the status shown here are as follows:

# <sectors_allocated>/<total_sectors> <metadata_sectors>
  16                 /102400          16

<sectors_allocated>and <total_sectors>both contain data and metadata.

During the merge process, the number of allocated sectors will decrease.

Merging is done when the number of sectors holding data is zero, in other words <sectors_allocated> == <metadata_sectors>.

Here step 12 returns the status in the snapshot volume device, for the 50M snapshot volume data-cow.img:

  • <total_sectors>Indicates that the entire device has a total of 102400 sectors, because 50M = 512 x 102400
  • <sectors_allocated>Indicates that 16 sectors are currently allocated
  • As previously analyzed, the disk header and the mapping table each occupy 1 chunk, a total of 2 x chunk = 16 sectors, which is why there are 16 sectors for metadata

Therefore, the status information of the current snapshot shows:

The device has allocated a total of 16 sectors, and the metadata has just occupied these 16 sectors. The space for storing the modified data before has been released, which means that the merge is completed.

Because we only modified two chunks during the demonstration here, that is, two 4K spaces. The completion of two 4K combined writes is basically an instant thing. So when I restore the origin here, I immediately check the snapshot status and it shows that the merge has been completed.

Let's look at the data in the source volume data-base.img and the snapshot volume data-cow.img after the merge is completed.

First look at the source volume data-base.img:

# step 12b. 查看设备 origin 的内容
$ sudo hexdump -C /dev/mapper/origin 
00000000  47 72 65 61 74 20 43 68  69 6e 61 21 ff ff ff ff  |Great China!....|
00000010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00001000  45 76 65 72 79 6f 6e 65  20 43 61 6e 20 44 6f 21  |Everyone Can Do!|
00001010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
06400000

# step 12c. 查看源卷 data-base.img 的内容
$ hexdump -C data-base.img 
00000000  47 72 65 61 74 20 43 68  69 6e 61 21 ff ff ff ff  |Great China!....|
00000010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00001000  45 76 65 72 79 6f 6e 65  20 43 61 6e 20 44 6f 21  |Everyone Can Do!|
00001010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
06400000

It can be seen from here that the data stored in the snapshot volume by COW operation and ROW operation has been merged back to the source volume.

  • For data operated by COW, the area 0x0000 - 0x0fff (1 chunk) in the source volume data-base.img has been restored with the old data in the snapshot volume data-cow.img.

  • For the data of ROW operation, the area 0x1000 - 0x1fff (1 chunk) in the source volume data-base.img has been updated with the new data in the snapshot volume data-cow.img.

Look at the snapshot volume data-cow.img:

# step 12d. 查看 data-cow.img 文件的数据
$ hexdump -C data-cow.img 
00000000  53 6e 41 70 01 00 00 00  01 00 00 00 08 00 00 00  |SnAp............|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002000  47 72 65 61 74 20 43 68  69 6e 61 21 ff ff ff ff  |Great China!....|
00002010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00003000  45 76 65 72 79 6f 6e 65  20 43 61 6e 20 44 6f 21  |Everyone Can Do!|
00003010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00004000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
03200000

Suddenly, it was almost the same as before, but I always felt that something was missing.

The following is a comparison chart of the data before and after the merger:

md5-diff-for-merge.png

In contrast, it is obvious that the data in the mapping table area (0x1000-0x1fff) after merging is cleared to 0, in other words, all mappings are invalid.

think:

Why is the data (0x2000-0x2fff and 0x3000-0x3fff) in the snapshot volume data-cow.img not cleared after the merge?

After the merge is complete, the target orgin mapped by snapshot-merge can be removed:

# step 13. 移除 origin 设备
$ sudo dmsetup remove origin

Alternatively, restore the origin device from snapshot-merge back to the snapshot-origin target if desired:

#
# step 14. 将 origin 设备从 snapshot-merge 目标更改为 snapshot-origin 目标
#
# step 14a. 暂停 origin 设备
$ sudo dmsetup suspend origin
# step 14b. 偷天换日,使用 reload 方式将 origin 更改为 snapshot-origin 目标设备
$ sudo dmsetup reload origin --table "0 204800 snapshot-origin /dev/loop3"
# step 14c. 恢复 origin 设备的运行
$ sudo dmsetup resume origin

So far, all our experiments on snapshot principle verification have been completed.

So far, both the principle introduction and practical operation of "Linux snapshot (snapshot) principle and practice" have been completed.

8. Epilogue

The two articles of "Linux snapshot (snapshot) principle and practice" took nearly a month from planning to final drafting. In the middle, many drafts were revised and the content of experiments was adjusted many times.

Even though the principles and experiments of Linux snapshots have been explained, the expression ability of words is limited after all. I found that there are still many problems that have not been explained clearly, such as:

  • Various behavioral relationships of snapshot devices under Linux,
  • Details of the COW equipment,
  • Analyze the driver code,
  • How to expand the snapshot device,
  • How to debug the snapshot device, etc.

It's just that the original intention of writing this article has been achieved, so the introduction to the snapshot device under Linux has come to an end. As for whether to continue to improve this series, the follow-up is to be determined.

However, recent analysis found that the device mapper under linux is really a treasure, so we can really consider continuing to dig in the future.

If you have any questions, or find something wrong with the description, please add me to discuss on WeChat, please reply "wx" in the background of the official account ("Rocky Watching the World") to get the QR code.

Guess you like

Origin blog.csdn.net/guyongqiangx/article/details/128496471