hbase snapshot

Snapshot:
(1) take a snapshot
table是在enable状态,那么使用EnabledTableSnapshotHandler
table是在disable状态,使用DisabledTableSnapshotHandler,在HMaster端完成所有操作
(2) getCompletedSnapshots
获得已完成的snapshot,正在进行中的被忽略
(3) deleteSnapshot
直接删除snapshot对应的目录
(4) isSnapshotDone
查看snapshot是否完成,从snapshotHandlers队列中获得信息
(5) restoreSnapshot
table是否存在
存在,必须是disable状态,RestoreSnapshotHandler,restore到相应的状态
不存在,CloneSnapshotHandler,clone一个table。大部分利用CreateTableHandler的代码,把handleCreateHdfsRegions这块替换掉
(6) isRestoreSnapshotDone
查看restore是否完成,从restoreHandlers队列中获得信息

MasterSnapshotVerifier是验证用的
(1) SnapshotDescription is readable
(2) Table info is readable
(3) Regions

  • <li>Matching regions in the snapshot as currently in the table</li>
  • <li>{@link HRegionInfo} matches the current and stored regions</li>
  • <li>All referenced hfiles have valid names</li>
  • <li>All the hfiles are present (either in .archive directory in the region)</li>
  • <li>All recovered.edits files are present (by name) and have the correct file size</li>
 

在take a snapshot时,table在enable状态下使用 EnabledTableSnapshotHandler

HMaster做为master,执行Procedure
RegionServer做为slave执行Subprocedure,负责snapshot region
通过zk通信,完成任务的协商,slave什么时候开始执行任务(master发起一个任务,slave接收到通知后告诉master可以开始,等所有 slave都ok后就可以开始这次snapshot了),执行什么任务(snapshot那个表的region),slave任务结束后告知master 已完成,等所有slave都完成后就表明本次snapshot完成。

Master端zk处理是在ZKProcedureCoordinatorRpcs.java
Slave端的zk处理是ZKProcedureMemberRpcs.java
通过zk事件来触发相关任务和进度往下执行

EnabledTableSnapshotHandler

ZKProcedureCoordinatorRpcs
ZKProcedureMemberRpcs

Master 端启动一个 Procedure
(1) Master Procedure.sendGlobalBarrierAcquire
Notify the members to acquire barrier for the procedure
创建zk节点,类型是acquire znode,比如snapshot名字是t1-s2,那么节点是/hbase/online-snapshot/acquired/t1-s2
等待各个Subprocedure(HRegionServer上面)收到zk事件,并创建相应的zk节点

(2) HRegionServer监听到zk事件,发现是acquiredZnode节点
启动一个Subprocedure,Subprocedure.acquireBarrier
在acquire znode节点下创建一个子节点
等待本次Procedure到达reached状态,Subprocedure.waitForReachedGlobalBarrier

(3) Master收到acquire znode节点下面有子节点创建的消息,进行计数
当收到所有节点创建了子节点后就可以往后执行了,也就是达成一致

(4) Master Procedure.sendGlobalBarrierReached

  • Notify members that all members have acquired their parts of the barrier and that they can now execute under the global barrier.
    创建zk节点,类型是reached znode,/hbase/online-snapshot/reached/t1-s2
    等待所有成员完成任务,wait for all members to report barrier release

(5) HRegionServer监听到zk事件,发现是reachedZnode节点,触发Subprocedure的receiveReachedGlobalBarrier
Subprocedure往下执行,Subprocedure.insideBarrier,这里的Subprocedure是FlushSnapshotSubprocedure
在Subprocedure.insideBarrier里面执行HRegion的flush和snapshot
完成任务后,Subprocedure在reachedZnode节点下创建子节点说明他已经完成了任务,并且结束Subprocedure

(6) Master端收到reachedZnode节点下面有子节点创建的消息,进行计数
当收到所有Subprocedure创建子节点的消息后,就可以说明所有任务都执行完成了,往下继续执行

(7) Procedure.sendGlobalBarrierComplete

本次Procedure结束

Snapshot的目录结构见SnapshotDescriptionUtils

* Snapshots are laid out on disk like this:
 *
 * <pre>
 * /hbase/.snapshots
 *          /.tmp                <---- working directory
 *          /[snapshot name]     <----- completed snapshot
 * </pre>
 *
 * A completed snapshot named 'completed' then looks like (multiple regions, servers, files, etc.
 * signified by '...' on the same directory depth).
 *
 * <pre>
 * /hbase/.snapshots/completed
 *                   .snapshotinfo          <--- Description of the snapshot
 *                   .tableinfo             <--- Copy of the tableinfo
 *                    /.logs
 *                        /[server_name]
 *                            /... [log files]
 *                         ...
 *                   /[region name]           <---- All the region's information
 *                   .regioninfo              <---- Copy of the HRegionInfo
 *                      /[column family name]
 *                          /[hfile name]     <--- name of the hfile in the real region
 *                          ...
 *                      ...
 *                    ...
 

猜你喜欢

转载自bupt04406.iteye.com/blog/1883304