hadoop 2.x-HDFS snapshot

I dont want to restruct wheels of open sources,in contrast, just wonder the implied features and use cases as possible.so i will write somethings to summary or memo.

Agenda

1.what is

2.how to

3.hadoop snapshot vs hbase snapshot

4.demos to use snapshot

1.what is

a long time ago,the term 'snapshot' was introduced to describe 'the aspect of something in a point in-time',e.g memory snapshot,db's snapshot,or even google's page snapshot etc.but they have the similar or close means:a certain view/image of one thing in history.

akin to hadoop's snapshot,we want to use this 'view' to cut the files at a point in-time.so its usages will like this:

a. a periodic backup

b.restore some key data from mistaken deletions

c.isolutes some important data from product for testing ,comparing etc

and there are some features among this snapshot:

-no any data to be moved or copied,so the network bandwidth is not affected

-not causing too many tasks for namenode or datanode to deal with ,so reliability is also kept staying

2.how to

benefits from hdfs file support of write-once and read-many characteristic,hadoop snapshot uses it to function properly.when create a new snapshot on a dir,the namenode will register this dir as a snapshotable dir to provide protection:all operations include deletion ,move,or creation of files and dirs will only affect the 'metadata' in namenode,so the actual files and dirs will not applied instantly .so after a while,if u want to restore some files/dirs,u can move or copy the snapshoted files or dirs from '.snapshot' dir to anywhere u wnat.when u delete the snapshot created before,then the prior operations will apply right now.

for deep study of 'linked data structure' u can check out 'making data structures persistent'

3.hadoop snapshot vs hbase snapshot

according to the version releases between hadoop and hbase,i think hadoop's snapshot is introduced from hbase's one:) ,so the underlying implementions of them are similar.here are some differences in snapshot below:

	hadoop	hbase	supplement
copy/move data	n	n
gen new files refered to original files	n	y	hbase will gen many temp files to point to the real hdfs files

so for a hhbase cluster,i think it's unnecessary to backup(snapshot) hadoop hdfs againt if use hbase snapshot already;else it should be.in the sense that there are most overlapings between both snapshots.

4.demos to use snapshot

there are some usage demos in apache official site [2],but i want to declare that this snapshot is 'read-only' (RO) instead of RW,hence then ,if u make some changes in the '.snapshot' dir will cause something errors,in addition ,if u want to check out the real principles of the commands,see details in 'NameNodeRpcServer.java'

ref:

jira:Support for RW/RO snapshots in HDFS

[2]HDFS Snapshots

hbase -tables replication/snapshot/backup within/cross clusters

hadoop-2.x --new features

hadoop 2.x-HDFS snapshot

猜你喜欢