7.2 Cassandra snapshot backup

 

7.2.1. About snapshots

Cassandra backs up data by taking a snapshot of all disk data files (SSTable files) stored in the data directory. You can take snapshots of all keyspaces, a single keyapace or a single table while the system is online.

Using parallel ssh tools (such as pssh), you can snapshot the entire cluster. This provides a eventually consistent backup. Although no node is consistent with its replica node when creating a snapshot, the restored snapshot uses Cassandra's built-in consistency mechanism to restore consistency.

After performing a system-wide snapshot, you can enable incremental backup on each node to back up data that has changed since the last snapshot: every time you refresh the memtable to disk and create an SSTable, the snapshot will be copied to the / data directory Backup subdirectories (JNA is enabled). Compacted SSTables will not create snapshots in / backups because these SSTables do not contain any data that has not been backed up.

7.2.2, Take a snapshot

Use the nodetool snapshot command to create a snapshot for each node. To obtain a global snapshot, use the parallel ssh utility (such as pssh) to run the nodetool snapshot command.

The snapshot first refreshes all memory write operations (memtables) to disk, and then creates an SSTable file mapping for each keyspace. There must be enough free disk space on the node to hold the snapshot of the data file. A snapshot requires very little disk space. However, because snapshots cannot delete old, outdated data files, snapshots may cause disk usage to grow faster over time. After the snapshot is complete, you can move the snapshot file to another location as needed, or you can leave it in place.

Note:  Cassandra can only recover data from the snapshot when the table system_schema exists. It is recommended that you also back up the table system_schema.

Run the nodetool snapshot command, specifying the host name, JMX port, and key space. E.g:

$ nodetool -h localhost -p 7199 snapshot mykeyspace
  • 1

Snapshots are created in the data / keyspace / table_name-UUID / snapshots / snapshot_name directory. Each snapshot directory contains many .db files of data from the snapshot.

E.g:

  • Cassandra package installation: /var/lib/cassandra/data/mykeyspace/users-081a1500136111e482d09318a3b15cc2/snapshots/1406227071618/mykeyspace-users-ka-1-Data.db

  • Cassandra tarball installation: install_location /data/data/mykeyspace/users-081a1500136111e482d09318a3b15cc2/snapshots/1406227071618/mykeyspace-users-ka-1-Data.db

7.2.3. Delete snapshot files

When taking a snapshot, the previous snapshot file is not automatically deleted. You should delete old snapshots that are no longer needed.

The nodetool clearsnapshot command will delete all existing snapshot files from the snapshot directory of each keyspace. You should execute this command before the snapshot backup to clear the old snapshot before updating the snapshot.

  • To delete all snapshots of a node, run the nodetool clearsnapshot command. E.g:
    $ nodetool -h localhost -p 7199 clearsnapshot 
  • 1
  • To delete snapshots on all nodes at once, use the parallel ssh utility to run the nodetool clearsnapshot command. 
    To delete a single snapshot, run the clearsnapshot command with the snapshot name:
    $ nodetool clearsnapshot -t <snapshot_name> 
  • 1

The file name and path vary depending on the type of snapshot. For more information about snapshot names and paths, see nodetools snapshots.

7.2.4. Enable incremental backup

When incremental backup is enabled (disabled by default), Cassandra maps the SSTable of each refreshable table to the backup directory under the keyspace data directory. This allows backups to be stored offsite without transferring the entire snapshot. Moreover, the combination of incremental backup and snapshot can provide a reliable latest backup mechanism. Compacted SSTables will not create a mapping file in / backups because these SSTables do not contain any data that has not been backed up. A point-in-time snapshot, plus all incremental backups and commit logs, form a complete backup.

Like snapshots, Cassandra does not automatically clear incremental backup files. It is recommended to set up a process to clear incremental backup files each time a new snapshot is created.

Edit the cassandra.yaml configuration file on each node in the cluster, and change the value of incremental_backups to true.

7.2.5. Snapshot recovery data

Restoring keyapce from snapshots requires all the snapshot files of the table. If you use an incremental backup, it includes any incremental backup files and SSTables (from repair, decommission, etc.) created after the snapshot was created.

Note: Temporary restoration from snapshots and incremental backups can cause intensive CPU and I / O activity on the node being restored.

7.2.6. Recovery from local node

This method copies the SSTables in the snapshot directory to the correct data directory.

1. Make sure the table system_schema exists. When the table system_schema exists, Cassandra can only recover data from the snapshot. If the schema does not exist and has not been backed up, you must re-create the schema.

2. If necessary, truncate the form.

3. Find the most recent snapshot folder. E.g:

data_directory/keyspace_name/table_name-UUID/snapshots/snapshot_name
  • 1

4. Copy the latest snapshot SSTable file directory to the data_directory / keyspace / table_name-UUID directory.

5. Run nodetool refresh ..

Note: You may not need to truncate under certain conditions. For example, if a node loses a disk, it may need to be restarted before recovery, so that the node continues to receive new writes before starting the restore process. 
Truncation is usually necessary. For example, if data is accidentally deleted, the write timestamp after the deleted logical deletion will be later than the data in the snapshot. If recovery is not truncated (tombstones are removed), Cassandra will continue to cover the recovered data. This behavior also occurs with other types of rewrites and causes the same problem.

If the node is on a DataStax Enterprise version before 5.0.10, please restart the node. This restart is necessary because nodetool refresh does not respect the existing LCS level on the disk, which can cause a compression backlog.

7.2.7. Backup and restore from the cluster

This method uses sstableloader to restore the snapshot.

1. Make sure the table system_schema exists. When the table system_schema exists, Cassandra can only recover data from the snapshot. If the schema does not exist and has not been backed up, you must re-create the schema.

2. If necessary, truncate the form.

3. Use the sstableloader tool on the backup SSTables to restore the most recent snapshot. 
The sstableloader streams SSTables to the correct node. There is no need to delete the commit log or delete the restart node.

Note: You may not need to truncate under certain conditions. For example, if a node loses a disk, it may need to be restarted before recovery, so that the node continues to receive new writes before starting the restore process. 
Truncation is usually necessary. For example, if data is accidentally deleted, the write timestamp after the deleted logical deletion will be later than the data in the snapshot. If recovery is not truncated (tombstones are removed), Cassandra will continue to cover the recovered data. This behavior also occurs with other types of rewrites and causes the same problem.

Published 19 original articles · praised 4 · 170,000 views +

Guess you like

Origin blog.csdn.net/u011250186/article/details/105682008