I. Introduction
This paper describes three commonly used simple Hbase disaster recovery solutions, i.e. CopyTable , the Export / Import , the Snapshot . They were introduced as follows:
Two, CopyTable
2.1 Introduction
CopyTable can copy data from an existing table to the new table, it has the following characteristics:
- Support time interval, row section, change the table name, column changed the family name, and whether the data has been deleted Copy functions;
- Before executing the command, you must first create the same structure as the original table new table;
CopyTable
The operation is carried out based on HBase Client API, i.e. usingscan
query usedput
for writing.
2.2 Command Format
Usage: CopyTable [general options] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>
2.3 Common Commands
- With Cluster CopyTable
hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=tableCopy tableOrig
- Under CopyTable different clusters
# 两表名称相同的情况
hbase org.apache.hadoop.hbase.mapreduce.CopyTable \
--peer.adr=dstClusterZK:2181:/hbase tableOrig
# 也可以指新的表名
hbase org.apache.hadoop.hbase.mapreduce.CopyTable \
--peer.adr=dstClusterZK:2181:/hbase \
--new.name=tableCopy tableOrig
- Here is a full example to the official, specify the start and end time, the cluster address, and copy only the specified column family:
hbase org.apache.hadoop.hbase.mapreduce.CopyTable \
--starttime=1265875194289 \
--endtime=1265878794289 \
--peer.adr=server1,server2,server3:2181:/hbase \
--families=myOldCf:myNewCf,cf2,cf3 TestTable
2.4 More parameters
You can --help
see more support parameters
# hbase org.apache.hadoop.hbase.mapreduce.CopyTable --help
三、Export/Import
3.1 Introduction
Export
Support export data to HDFS,Import
support for importing data from HDFS.Export
Also supports export specified start time and end time data, it can be used for incremental backups.Export
Export andCopyTable
as dependent on the HBasescan
operations
3.2 Command Format
# Export
hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]
# Inport
hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>
- Exported
outputdir
directory can not pre-created, the program will automatically create. After the export is complete, the ownership of the exported file will be owned by the user to perform the export command. - By default, export only given
Cell
the latest version, regardless of the version history. To export multiple versions, you need to<versions>
replace the version number of the desired parameter.
3.3 Common Commands
- Export command
hbase org.apache.hadoop.hbase.mapreduce.Export tableName hdfs 路径/tableName.db
- Import command
hbase org.apache.hadoop.hbase.mapreduce.Import tableName hdfs 路径/tableName.db
Four, Snapshot
4.1 Introduction
HBase snapshot (Snapshot) function allows you to get a copy of the table (including content and metadata), and the performance overhead is small. Because only a snapshot storage information metadata tables and HFiles of. Snapshots of clone
action creates a new table from the snapshot, the snapshot of the restore
contents of the operation will revert to the snapshot table node. clone
And restore
the operation does not need to copy any data, because the underlying HFiles (HBase table containing data files) are not modified, only the modified metadata information table.
4.2 Configuration
HBase snapshot feature is not turned on by default, if you want to open a snapshot, you need to hbase-site.xml
add the following configuration file entries:
<property>
<name>hbase.snapshot.enabled</name>
<value>true</value>
</property>
4.3 Common Commands
All commands snapshots need to be performed in Hbase Shell interactive command line.
1. Take a Snapshot
# 拍摄快照
hbase> snapshot '表名', '快照名'
Performs in-memory data refresh before taking a snapshot by default. To ensure in-memory data contained in the snapshot. But if you do not want to include the data in memory, you can use the SKIP_FLUSH
option to disable the refresh.
# 禁止内存刷新
hbase> snapshot '表名', '快照名', {SKIP_FLUSH => true}
2. Listing Snapshots
# 获取快照列表
hbase> list_snapshots
3. Deleting Snapshots
# 删除快照
hbase> delete_snapshot '快照名'
4. Clone a table from snapshot
# 从现有的快照创建一张新表
hbase> clone_snapshot '快照名', '新表名'
5. Restore a snapshot
Table node to restore the snapshot, the recovery operation will need to disable table
hbase> disable '表名'
hbase> restore_snapshot '快照名'
It should be noted that: is the case if HBase configuration based on primary Replication from the copy, because the Replication log level work, but the snapshot file system level work, so after reduction, there will be a copy of the master server in a different state of . This time you can stop the sync server restore all data point to a consistent and re-establish synchronization.
Reference material
More big data series can be found GitHub open source project : Big Data Getting Started