HBase学习---HBase snapshot

Disclaimer: This article is a blogger original article, shall not be reproduced without the bloggers allowed. https://blog.csdn.net/wjandy0211/article/details/90167539

hbase the snapshot function is quite useless article translated from cloudera a blog, I want to want to know snapshot? friends a little role, if poorly translated, and ask to see the original Introduction to Apache HBase Snapshots? control. Before, or a backup copy of a table can only copy / export table, or disable

 

hbase the snapshot function is quite useless article translated from cloudera a blog, I want to want to know snapshot? friends a little role, if poorly translated, and ask to see the original Introduction to Apache HBase Snapshots? control.

Before, or a backup copy of a table can only copy / export table, the disable table or copy all hfile from the hdfs. copy / export table using a MapReduce to scan and copy the table, this will have a direct impact on Region Server performance, but with the disable copy files are not directly accessible.

Under this contrast, HBase of snapshots feature allows administrators to ease the situation without copying data copy table, and will only cause minimal impact on the RS. Export snapshots to another cluster does not act directly on the RS, just add some extra logic.

Here are some practical snapshots of the scene:

  • Restore from user / app errors
    • Recovering from a known safe state / restore.
    • View previous merge snapshots and selectively from the production line.
    • Save snapshots before significant upgrade or modification.
  • Review and / or view reports specified time
    • Purpose of data acquisition on a monthly basis.
    • Run / monthly / daily statements moment of time.
  • Application Testing
    • In the production line to test the effect of changing the program or schema similarity data with snapshots, and then discard it. For example, acquiring a snapshot, and then create a snapshot of the contents of the table used, then the operating table.
  • Offline operation
    • Acquiring a snapshot, leads to another cluster and analyze it with MapReduce jobs. Export snapshot because the action takes place in HDFS level, you will not be as slow as HBase table copies.

What is Snapshot?

A snapshot is actually a collection of metadata information of a group, which allows administrators to restore the table to a previous state. snapshot is not a copy, it's just a list of file names, not to copy the data. A full recovery of the snapshot that you can roll back to the original data before the table schema and create a snapshot.

operating

  • Get: This action tries to take a snapshot from the specified table. When operating in the regions for balancing, split or merge, etc. migration may fail.
  • Copy: This operation with the specified snapshot and a data schema to create a new table. This action will not have any impact on the original table or snapshot.
  • Recovery:? This will a table schema and data rollback to the state at the time of the snapshot. ?
  • Delete: This will remove a snapshot from the system, free up disk space, not for other copy or snapshot any impact.
  • Export: this operation snapshot copy of the data and metadata to another cluster. This operation affects only HDFS, and will not hbase of Master or Region Server communication (these operations may lead to clusters hang).

Zero-copy snapshot, recovery, cloning

The biggest difference snapshot and CopyTable / ExportTable snapshot is only the metadata relates, does not involve data copying.

Hbase an important design is once written to a file will not be modified. There are non-modifiable document means only a snapshot file maintains the current use of relevant information on it, and when compaction occurs, snapshot notification system hbase only to those filing rather than delete it.

Similarly, when cloning or restore operation takes place, because these same files when creating a new table with a snapshot only link to those same documents on the line.

Export snapshot is the only action you need to copy the data, because there is no other clusters and data files.

Export Snapshot vs Copy / Export Table

Remove outer guarantee more good consistency, and Copy / Export job, the biggest difference is that the export snapshot operation is carried out in HDFS level. This means that the master and hbase Region Server is not involved in this operation, and therefore does not create unnecessary snapshot export data cache, and not because of GC as many scan operation caused. snapshot your export network and disk overhead of HDFS have been assessed datanode absorbed.

HBase Shell: Snapshot 操作

To use the snapshot feature, make sure your hbase-site.xml in the hbase.snapshot.enabled configuration item is true, as follows:

1

2

3

4

5

?

???

hbase.snapshot.enabled

??? true

?

? Creating a snapshot with the following command, the operation did not file copy operation:

1

hbase> snapshot ‘tableName’, ‘snapshotName’

To know what the system created snapshot, you can use list_snapshot命令,它会显示snapshot名,源表和创建时间日期。?

1

2

3

hbase> list_snapshots

SNAPSHOT               TABLE + CREATION TIME

 TestSnapshot          TestTable (Mon Feb 25 21:13:49 +0000 2013)

To remove the snapshot, withdelete_snapshot命令,移除snapshot不会对已经克隆好的表胡总和随后发生的snapshot造成任何影响。

1

hbase> delete_snapshot ‘snapshotName’

? To use the snapshot to create a new table, withclone_snapshot命令。该操作也无任何数据拷贝操作发生。

1

hbase> clone_snapshot ‘snapshotName’, ‘newTableName’

If you want to restore or replace the current schema and data tables, withrestore_snapshot命令。

1

hbase> restore_snapshot ‘snapshotName’

To export a snapshot to another cluster, withExportSnapshot工具。导出操作不会对Region server造成额外的负担。因为它工作在HDFS层级,你仅需指定HDFS的位置(其它集群的hbase.rootdir)即可,如下。

1

2

hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot --overwrite --snapshot 

SnapshotName -mappers 16 -copy-to hdfs:///srv2:8082/hbase -bandwidth 40

The current limit exists

Snapshots depends on the number of places taken for granted, and there are still many new features are not fully integrated into the tool in:

  • Data may be lost if the operation occurs doing Merging region or cloning snapshot table.
  • Restore the table, because of a replication is carried out, which may lead to two cluster data is not synchronized.

to sum up

The current snapshot features and includes all the basic functions, but there are still a lot of work to do, such as quality (metrics), Web UI integration, disk usage optimization.

Guess you like

Origin blog.csdn.net/wjandy0211/article/details/90167539