Snapshot technology

table of Contents

Introduction: The principle of snapshots in Linux

1. Snapshot overview

2. Related nouns

2.1 Mapping table

2.2 COW technology

3. The principle of snapshot function 

4. Configuration process

5. Application scenarios


Introduction: The principle of snapshots in Linux

Several concepts:

Metadata: The basic information stored in a file is called Meta data. Metadata usually contains the basic information of the data, such as the location on the hard disk, permissions, creator, time, etc., and the metadata is very important Small; the correspondence between the file and the Inode number is stored in the current folder. Use ls -i in Linux to view the Inode number

Inode number: In Linux, each file has a corresponding Inode number. Through the Inode number, you can find the location of the metadata on the hard disk, and then get the specific data. The basic unit of data is a data block.

principle:

When the snapshot is taken, the metadata is actually copied. Because it is very small, the required time is very short; the snapshot is to associate the copied metadata with the data block, so that when the snapshot is enabled, it can be read through Take the copied metadata to find the location of the data block. When the file data has not changed, the data read is the data at the time of the snapshot. But when the file information changes, the system first copies the original data block to the data block of the snapshot, then cancels the correspondence between the snapshot metadata and the initial data block, and finally points the snapshot metadata to the snapshot data block. In the data block space (this process is only for the changed data).

Note: A snapshot is a process that starts when the snapshot is taken and continues until the snapshot is restored; during this period, if the data changes, the snapshot metadata and snapshot data block will change accordingly. When restoring the snapshot, there is no changed data block, no operation is needed, and only the changed data block is overwritten.

1. Snapshot overview

Definition: Snapshot refers to a consistent data copy of source data at a certain point in time. After the snapshot is generated, it can be read by the host or used as a data backup at a certain point in time.

The main features of snapshots include:

  • Instant generation: The storage system can generate a snapshot in a few seconds to obtain a consistent copy of the source data.
  • Occupies less storage space: The generated snapshot data is not a complete physical data copy and does not occupy a large amount of storage space. So even if the amount of source data is large, it will only take up very little storage space.

Snapshots can not only quickly generate a consistent copy of the source volume at a certain point in time, but also provide a mechanism for restoring the source volume data. When the data of the source volume is accidentally deleted, destroyed, or invaded by a virus, the snapshot rollback can quickly restore the data of the source LUN to the data at the point in time when the snapshot is activated, reducing the loss of source volume data.

2. Related nouns

Chinese

English

Chinese definition/description

Source volume

source volume

The volume where the source data that needs the snapshot operation is located is in the form of LUN to the user. Source volumes include Meta Volume and Data Volume.

– Meta Volume: Record the location of the source data in the source volume.

– Data Volume: Record the data stored in the source volume.

COW data space

COW data space

After the snapshot is generated and activated, when a pre-write copy occurs, the storage space allocated from the storage pool is used to save the data of the source volume at the activation time. All snapshot volumes corresponding to the same source volume share the same COW data space.

Snapshot volume

snapshot volume

Logically generated data copy after creating a snapshot of the source volume. For the user, the manifestation is a snapshot LUN.

Snapshot rollback

snapshot rollback

Copy the data of the snapshot LUN to the source LUN, so that the data of the source LUN is restored to the data at the time the snapshot LUN is generated.

Mapping table

mapping table

The mapping table is used to record the change of the source volume data and the snapshot volume data at a certain point in time and the storage location after the change. It is divided into a shared part and an exclusive part.

Inactive

inactive

A state of a snapshot. In this state, the snapshot is unavailable and can be used after an activation operation is required.

  • COW (Copy over Write): Copy before write, that is, before the host writes to the source LUN of the snapshot, the protected data of the source LUN is copied to other places. After the copy is completed, the host IO continues to write to the source LUN.
  • COW data space: The space used to store the protected data of the source LUN when the source LUN is doing a pre-write copy. This space is useless for user configuration. The space is applied to the storage pool when the source LUN is copied before writing, and the data space of the source LUN is not occupied. After the snapshot is activated, if no IO is issued to the source LUN, the source LUN will not have a copy before write action, and the actual used size of the Cow data space is 0.
  • Shared mapping table: After the snapshot is activated, the changes on the source LUN of the snapshot are stored in the shared mapping table. All snapshots on a source LUN share this mapping table.
  • Exclusive mapping table: After the snapshot is activated, the data written on the snapshot volume is recorded in the exclusive mapping table. Each snapshot has its own exclusive mapping table.

2.1 Mapping table

It is used to express the mapping relationship of snapshot data, which is called "pointer".

  • The left item of the mapping item is the source address, which is used as the search key value;
  • The right item records the address of the resource block;
  • You can add and delete items in the table;
  • Use B+ tree method to save. 

The mapping table is used to indicate where the actual data of the snapshot is located. The mapping table is divided into two categories: exclusive mapping table and shared mapping table. The principle is the same, the difference is that the exclusive mapping table records the data changes that occur in the write snapshot, and the shared mapping table records the changes in the write source LUN.

2.2 COW technology

When the snapshot is activated (this moment is called the snapshot time point), the specific operations of the host read and write are as follows:

  1. After the snapshot is activated, data is written to the source LUN.
  2. First, query the snapshot mapping table. If the mapping item corresponding to the address in the mapping table does not exist, copy-on-write (copy before write) is required. After the copy before write is completed, the backup source LUN data information is recorded in the mapping table. . If the mapping item exists, it will directly overwrite the corresponding location of the source LUN.
  3. Copy-before-write, that is, read data from the corresponding location of the source LUN and write the space of the COW volume.
  4. The COW volume space and the source LUN space are distributed in the same POOL, and writing to the COW volume means writing to the space area of ​​the POOL.
  5. After the pre-write copy is completed, the host data is written to the pool space where the source LUN is located.

3. The principle of snapshot function 

  • After the snapshot is created and activated, a data copy consistent with the source volume is generated. The storage system divides the COW data space from the source volume and automatically generates a snapshot volume.
  • Since there is no write operation to the source volume, there is no record in the COW Meta area and COW Data area.
  • After the snapshot is activated, when the application server makes a data write request to the source volume , the storage system will not write new data immediately. The storage system uses the copy before write mechanism to copy the copy before write data to the COW data space, modify the mapping relationship in the mapping table, and then write the new data to the source volume.
  • Note: In a snapshot period, the data at the same location will only be copied before writing once, which is determined according to the corresponding value in the mapping table. When data is written again, it will be overwritten directly. For example: For data "DataX", the corresponding value in the query mapping table is "1", which means that a copy before writing has been performed; if there is another data write request, it will be written directly, and "DataX" will not be copied to COW. In the data space.
  • After the snapshot is activated, the application server can write to the snapshot volume . After the write request issued by the application server, the data will be directly written into the snapshot volume, and the storage location of the data in the snapshot volume will be recorded in the mapping table (exclusive part).
  • After the snapshot is activated, the application server can read the snapshot volume (data written to the snapshot volume). After the application server sends a read snapshot request, it uses the mapping table (exclusive part) to determine the storage location of the snapshot data and reads the data; when the snapshot volume has not written data, the application server sends the read snapshot request through the mapping table (Shared part) Determine the storage location of the snapshot data and read the data.

Snapshot data write

  1. The application server sends a request to write the source LUN after Time 1: "Data1" is changed to "DataX".
  2. Use the copy-before mechanism to copy "Data1" to the COW data space.
  3. Update the mapping relationship in the mapping table, and change the storage location of "Data1" to "go" in the COW data space.

Snapshot data read

When the application server has a request to read snapshot data, the storage system will process it according to the following process:
1. Query the exclusive mapping table to determine the storage location of the data in the snapshot LUN.

  • If there is corresponding data in the snapshot LUN, the data in the snapshot LUN is directly read out and returned to the application server.
  • If there is no corresponding data in the snapshot LUN, query the shared mapping table.

2. Query the shared mapping table to determine the location of the data in the COW data space and source LUN.

  • If the value in the mapping table is 0, the corresponding data is read from the source LUN.
  • If the value in the mapping table is 1, read the corresponding data from the COW data space.

 

4. Configuration process

5. Application scenarios

Use snapshots directly for data backup. Using snapshot backup can quickly restore data in the following scenarios:

  • Viral infection.
  • Human error operation.
  • Malicious tampering.
  • Data corruption caused by system downtime.
  • Data corruption caused by application bugs.
  • Data damage caused by a BUG in the storage system.
  • Storage media is damaged (only snapshots based on split mirror technology can restore data).

 

Guess you like

Origin blog.csdn.net/weixin_43997530/article/details/108214587