HBase series (nine) - HBase disaster recovery and backup

I. Introduction

This paper describes three commonly used simple Hbase disaster recovery solutions, i.e. CopyTable , the Export / Import , the Snapshot . They were introduced as follows:

Two, CopyTable

2.1 Introduction

CopyTable can copy data from an existing table to the new table, it has the following characteristics:

  • Support time interval, row section, change the table name, column changed the family name, and whether the data has been deleted Copy functions;
  • Before executing the command, you must first create the same structure as the original table new table;
  • CopyTableThe operation is carried out based on HBase Client API, i.e. using scanquery used putfor writing.

2.2 Command Format

Usage: CopyTable [general options] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>

2.3 Common Commands

  1. With Cluster CopyTable
hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=tableCopy  tableOrig
  1. Under CopyTable different clusters
# 两表名称相同的情况
hbase org.apache.hadoop.hbase.mapreduce.CopyTable \
--peer.adr=dstClusterZK:2181:/hbase tableOrig

# 也可以指新的表名
hbase org.apache.hadoop.hbase.mapreduce.CopyTable \
--peer.adr=dstClusterZK:2181:/hbase \
--new.name=tableCopy tableOrig
  1. Here is a full example to the official, specify the start and end time, the cluster address, and copy only the specified column family:
hbase org.apache.hadoop.hbase.mapreduce.CopyTable \
--starttime=1265875194289 \
--endtime=1265878794289 \
--peer.adr=server1,server2,server3:2181:/hbase \
--families=myOldCf:myNewCf,cf2,cf3 TestTable

2.4 More parameters

You can --helpsee more support parameters

# hbase org.apache.hadoop.hbase.mapreduce.CopyTable --help

三、Export/Import

3.1 Introduction

  • ExportSupport export data to HDFS, Importsupport for importing data from HDFS. ExportAlso supports export specified start time and end time data, it can be used for incremental backups.
  • ExportExport and CopyTableas dependent on the HBase scanoperations

3.2 Command Format

# Export
hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]

# Inport
hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>
  • Exported outputdirdirectory can not pre-created, the program will automatically create. After the export is complete, the ownership of the exported file will be owned by the user to perform the export command.
  • By default, export only given Cellthe latest version, regardless of the version history. To export multiple versions, you need to <versions>replace the version number of the desired parameter.

3.3 Common Commands

  1. Export command
hbase org.apache.hadoop.hbase.mapreduce.Export tableName  hdfs 路径/tableName.db
  1. Import command
hbase org.apache.hadoop.hbase.mapreduce.Import tableName  hdfs 路径/tableName.db

Four, Snapshot

4.1 Introduction

HBase snapshot (Snapshot) function allows you to get a copy of the table (including content and metadata), and the performance overhead is small. Because only a snapshot storage information metadata tables and HFiles of. Snapshots of cloneaction creates a new table from the snapshot, the snapshot of the restorecontents of the operation will revert to the snapshot table node. cloneAnd restorethe operation does not need to copy any data, because the underlying HFiles (HBase table containing data files) are not modified, only the modified metadata information table.

4.2 Configuration

HBase snapshot feature is not turned on by default, if you want to open a snapshot, you need to hbase-site.xmladd the following configuration file entries:

<property>
    <name>hbase.snapshot.enabled</name>
    <value>true</value>
</property>

4.3 Common Commands

All commands snapshots need to be performed in Hbase Shell interactive command line.

1. Take a Snapshot

# 拍摄快照
hbase> snapshot '表名', '快照名'

Performs in-memory data refresh before taking a snapshot by default. To ensure in-memory data contained in the snapshot. But if you do not want to include the data in memory, you can use the SKIP_FLUSHoption to disable the refresh.

# 禁止内存刷新
hbase> snapshot  '表名', '快照名', {SKIP_FLUSH => true}

2. Listing Snapshots

# 获取快照列表
hbase> list_snapshots

3. Deleting Snapshots

# 删除快照
hbase> delete_snapshot '快照名'

4. Clone a table from snapshot

# 从现有的快照创建一张新表
hbase>  clone_snapshot '快照名', '新表名'

5. Restore a snapshot

Table node to restore the snapshot, the recovery operation will need to disable table

hbase> disable '表名'
hbase> restore_snapshot '快照名'

It should be noted that: is the case if HBase configuration based on primary Replication from the copy, because the Replication log level work, but the snapshot file system level work, so after reduction, there will be a copy of the master server in a different state of . This time you can stop the sync server restore all data point to a consistent and re-establish synchronization.

Reference material

  1. Online Apache HBase Backups with CopyTable
  2. Apache HBase ™ Reference Guide

More big data series can be found GitHub open source project : Big Data Getting Started

Guess you like

Origin www.cnblogs.com/heibaiying/p/11416170.html