Hbase disaster recovery and backup
I. Introduction
This article mainly introduces three simple disaster recovery backup solutions commonly used by Hbase , namely CopyTable , Export / Import , Snapshot . They are introduced as follows:
Two, CopyTable
2.1 Introduction
CopyTable can copy data from an existing table to a new table, and has the following characteristics:
- Support functions such as time interval, row interval, changing table name, changing column family name, and whether to copy deleted data;
- Before executing the command, you need to create a new table with the same structure as the original table;
CopyTable
The operation is carried out based on HBase Client API, i.e. usingscan
query usedput
for writing.
2.2 Command format
Usage: CopyTable [general options] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>
2.3 Common commands
- CopyTable in the same cluster
hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=tableCopy tableOrig
- CopyTable in different clusters
# 两表名称相同的情况
hbase org.apache.hadoop.hbase.mapreduce.CopyTable \
--peer.adr=dstClusterZK:2181:/hbase tableOrig
# 也可以指新的表名
hbase org.apache.hadoop.hbase.mapreduce.CopyTable \
--peer.adr=dstClusterZK:2181:/hbase \
--new.name=tableCopy tableOrig
- The following is a relatively complete example given by the official, specifying the start and end time, the cluster address, and only copy the specified column family:
hbase org.apache.hadoop.hbase.mapreduce.CopyTable \
--starttime=1265875194289 \
--endtime=1265878794289 \
--peer.adr=server1,server2,server3:2181:/hbase \
--families=myOldCf:myNewCf,cf2,cf3 TestTable
2.4 More parameters
You can --help
see more support parameters
# hbase org.apache.hadoop.hbase.mapreduce.CopyTable --help
Three, Export/Import
3.1 Introduction
Export
Support exporting data to HDFS andImport
importing data from HDFS.Export
It also supports specifying the start time and end time of the exported data, so it can be used for incremental backup.Export
Export andCopyTable
as dependent on the HBasescan
operations
3.2 Command format
# Export
hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]
# Inport
hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>
- Exported
outputdir
directory can not pre-created, the program will automatically create. After the export is complete, the ownership of the exported file will be owned by the user who executed the export command. - By default, export only given
Cell
the latest version, regardless of the version history. To export multiple versions, you need to<versions>
replace the version number of the desired parameter.
3.3 Common commands
- Export command
hbase org.apache.hadoop.hbase.mapreduce.Export tableName hdfs 路径/tableName.db
- Import command
hbase org.apache.hadoop.hbase.mapreduce.Import tableName hdfs 路径/tableName.db
四、Snapshot
4.1 Introduction
HBase's Snapshot function allows you to obtain a copy of a table (including content and metadata) with very little performance overhead. Because snapshots store only table metadata and HFiles information. Snapshots of clone
action creates a new table from the snapshot, the snapshot of the restore
contents of the operation will revert to the snapshot table node. clone
And restore
the operation does not need to copy any data, because the underlying HFiles (HBase table containing data files) are not modified, only the modified metadata information table.
4.2 Configuration
HBase snapshot feature is not turned on by default, if you want to open a snapshot, you need to hbase-site.xml
add the following configuration file entries:
<property>
<name>hbase.snapshot.enabled</name>
<value>true</value>
</property>
4.3 Common commands
All snapshot commands need to be executed in the Hbase Shell interactive command line.
1. Take a Snapshot
# 拍摄快照
hbase> snapshot '表名', '快照名'
By default, data refresh is performed in the memory before taking a snapshot. To ensure that the data in memory is included in the snapshot. But if you do not want to include the data in memory, you can use the SKIP_FLUSH
option to disable the refresh.
# 禁止内存刷新
hbase> snapshot '表名', '快照名', {
SKIP_FLUSH => true}
2. Listing Snapshots
# 获取快照列表
hbase> list_snapshots
3. Deleting Snapshots
# 删除快照
hbase> delete_snapshot '快照名'
4. Clone a table from snapshot
# 从现有的快照创建一张新表
hbase> clone_snapshot '快照名', '新表名'
5. Restore a snapshot
Restore the table to the snapshot node, the restore operation needs to disable the table first
hbase> disable '表名'
hbase> restore_snapshot '快照名'
It should be noted here that if HBase is configured with Replication-based master-slave replication, since Replication works at the log level and snapshots work at the file system level, after the restoration, the replica and the master server will be in a different state. . At this time, you can stop the synchronization first, and then re-establish synchronization after all servers are restored to a consistent data point.