Hbase disaster recovery and backup

Hbase disaster recovery and backup

I. Introduction

This article mainly introduces three simple disaster recovery backup solutions commonly used by Hbase , namely CopyTable , Export / Import , Snapshot . They are introduced as follows:

Two, CopyTable

2.1 Introduction

CopyTable can copy data from an existing table to a new table, and has the following characteristics:

  • Support functions such as time interval, row interval, changing table name, changing column family name, and whether to copy deleted data;
  • Before executing the command, you need to create a new table with the same structure as the original table;
  • CopyTableThe operation is carried out based on HBase Client API, i.e. using scanquery used putfor writing.

2.2 Command format

Usage: CopyTable [general options] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>

2.3 Common commands

  1. CopyTable in the same cluster
hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=tableCopy  tableOrig
  1. CopyTable in different clusters
# 两表名称相同的情况
hbase org.apache.hadoop.hbase.mapreduce.CopyTable \
--peer.adr=dstClusterZK:2181:/hbase tableOrig

# 也可以指新的表名
hbase org.apache.hadoop.hbase.mapreduce.CopyTable \
--peer.adr=dstClusterZK:2181:/hbase \
--new.name=tableCopy tableOrig
  1. The following is a relatively complete example given by the official, specifying the start and end time, the cluster address, and only copy the specified column family:
hbase org.apache.hadoop.hbase.mapreduce.CopyTable \
--starttime=1265875194289 \
--endtime=1265878794289 \
--peer.adr=server1,server2,server3:2181:/hbase \
--families=myOldCf:myNewCf,cf2,cf3 TestTable

2.4 More parameters

You can --helpsee more support parameters

# hbase org.apache.hadoop.hbase.mapreduce.CopyTable --help

Three, Export/Import

3.1 Introduction

  • ExportSupport exporting data to HDFS and Importimporting data from HDFS. ExportIt also supports specifying the start time and end time of the exported data, so it can be used for incremental backup.
  • ExportExport and CopyTableas dependent on the HBase scanoperations

3.2 Command format

# Export
hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]

# Inport
hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>
  • Exported outputdirdirectory can not pre-created, the program will automatically create. After the export is complete, the ownership of the exported file will be owned by the user who executed the export command.
  • By default, export only given Cellthe latest version, regardless of the version history. To export multiple versions, you need to <versions>replace the version number of the desired parameter.

3.3 Common commands

  1. Export command
hbase org.apache.hadoop.hbase.mapreduce.Export tableName  hdfs 路径/tableName.db
  1. Import command
hbase org.apache.hadoop.hbase.mapreduce.Import tableName  hdfs 路径/tableName.db

四、Snapshot

4.1 Introduction

HBase's Snapshot function allows you to obtain a copy of a table (including content and metadata) with very little performance overhead. Because snapshots store only table metadata and HFiles information. Snapshots of cloneaction creates a new table from the snapshot, the snapshot of the restorecontents of the operation will revert to the snapshot table node. cloneAnd restorethe operation does not need to copy any data, because the underlying HFiles (HBase table containing data files) are not modified, only the modified metadata information table.

4.2 Configuration

HBase snapshot feature is not turned on by default, if you want to open a snapshot, you need to hbase-site.xmladd the following configuration file entries:

<property>
    <name>hbase.snapshot.enabled</name>
    <value>true</value>
</property>

4.3 Common commands

All snapshot commands need to be executed in the Hbase Shell interactive command line.

1. Take a Snapshot

# 拍摄快照
hbase> snapshot '表名', '快照名'

By default, data refresh is performed in the memory before taking a snapshot. To ensure that the data in memory is included in the snapshot. But if you do not want to include the data in memory, you can use the SKIP_FLUSHoption to disable the refresh.

# 禁止内存刷新
hbase> snapshot  '表名', '快照名', {
    
    SKIP_FLUSH => true}

2. Listing Snapshots

# 获取快照列表
hbase> list_snapshots

3. Deleting Snapshots

# 删除快照
hbase> delete_snapshot '快照名'

4. Clone a table from snapshot

# 从现有的快照创建一张新表
hbase>  clone_snapshot '快照名', '新表名'

5. Restore a snapshot

Restore the table to the snapshot node, the restore operation needs to disable the table first

hbase> disable '表名'
hbase> restore_snapshot '快照名'

It should be noted here that if HBase is configured with Replication-based master-slave replication, since Replication works at the log level and snapshots work at the file system level, after the restoration, the replica and the master server will be in a different state. . At this time, you can stop the synchronization first, and then re-establish synchronization after all servers are restored to a consistent data point.

Reference

  1. Online Apache HBase Backups with CopyTable
  2. Apache HBase ™ Reference Guide

Guess you like

Origin blog.csdn.net/weixin_44302240/article/details/112345949