HBase-存储-HFile格式

HBase-存储-HFile格式

实际的存储文件功能是由HFile类实现的,它被专门创建以达到一个目的:有效地存储HBase的数据。它们基于Hadoop的TFile类,并模仿Google的BigTable架构使用的SSTable格式。
文件格式的详细信息如下图

这些文件是可变长度的,唯一固定的块是File Info块和Trailer块。Trailer有指向其它块的指针。它是在持久化数据到文件结束时写入的,写入后即确定其成为不可变的数据存储文件。Index块记录Data和Meta块的偏移量。Data和Meta块实际上都是可选的,但是考虑到HBase如何使用数据文件,在存储文件中用户几乎总能找到Data块。
块大小是由HColumnDescriptor配置的,而该配置可以在创建表时由用户指定或者使用比较合理的默认值。

hbase(main):002:0> desc 'test_table_mr'
Table test_table_mr is ENABLED 
test_table_mr 
COLUMN FAMILIES DESCRIPTION 
{NAME => 'data', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIO
NS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}

这里的默认值是64KB(65536字节)。
HFile在JavaDoc中的解释
块大小的最小值。对于一般的应用,建议将最小的块大小设置为8KB-1MB。如果应用主要涉及顺序访问,较大的块大小将更加合适。不过这会降低随机读性能(因为需要解压缩更多的数据)。较小的块更有利于随机数据访问,不过同时也需要更多的内存来存储块索引,并且可能创建过程也会变得更慢(因为我们必须在每个数据块结束的时候刷写压缩流,这会导致一个FS I/O刷写)。此外,由于压缩解码器在内部缓存,导致可能的最小的块大小是20KB-30KB。
每个块都包含一个magic头部和一定数量的序列化的KeyValue实例。如果用户没有使用压缩算法,每个块大小和配置的块大小差不多。写入程序必须适应用户写入的数据:如果用户存储了一个比块大小更大的KeyValue实例,则HBase也必须接受它。不过即使是较小的值,对于块大小的检查也是在最后一个值写入后才进行的,所以在实际情况中,大部分块会稍大。
当使用压缩算法时,用户对于块大小的控制力将更弱。压缩解码器在能够自己控制获取的数据量时才能达到最有效的压缩比率。例如,把块大小设置为256KB,并使用LZO压缩算法,系统将写更小的块来适应LZO的内部缓冲区大小。
HBase不知道用户是否选择了一个压缩算法:它将按照块大小的限制来写原始数据,并尽量让原始数据的大小与这个限制接近。如果用户启用了压缩,则保存到磁盘上的数据将更少。这意味着最终的存储文件由相同数量的块组成,但是由于每一个块都更小,所以总大小也更小。
在HDFS中,文件的默认块大小是128MB,这个是HFile默认块大小的2048倍。因此HBase存储文件的块与hadoop的块之间没有匹配关系。事实上,这两种块类型之间根本没有相关性。HBase把它的文件透明的存储到文件系统中,而HDFS也使用块来切分文件仅仅是一个巧合,并且HDFS不知道HBase存储的是什么,它只能看到二进制文件。

有时候,用户有必要绕过HBase并直接访问一个HFile,例如,检查它的健康程度,或者转存它的内容。HFile.main()方法提供了这样的工具

[root@node-231 ~]# hbase org.apache.hadoop.hbase.io.hfile.HFile
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/hadoop-yarn/ProcessLog-0.0.1-SNAPSHOT-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
usage: HFile [-a] [-b] [-e] [-f <arg> | -r <arg>] [-h] [-i] [-k] [-m] [-p]
[-s] [-v] [-w <arg>]
-a,--checkfamily Enable family check
-b,--printblocks Print block index meta data
-e,--printkey Print keys
-f,--file <arg> File to scan. Pass full-path; e.g.
hdfs://a:9000/hbase/hbase:meta/12/34
-h,--printblockheaders Print block headers for each block.
-i,--checkMobIntegrity Print all cells whose mob files are missing
-k,--checkrow Enable row order check; looks for out-of-order
keys
-m,--printmeta Print meta data of file
-p,--printkv Print key/value pairs
-r,--region <arg> Region to scan. Pass region name; e.g.
'hbase:meta,,1'
-s,--stats Print statistics
-v,--verbose Verbose output; emits file and meta data
delimiters
-w,--seekToRow <arg> Seek to this row and print all the kvs for this
row only

  

查看目录

[root@node-231 ~]# hadoop fs -lsr /apps/hbase
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/hadoop-yarn/ProcessLog-0.0.1-SNAPSHOT-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxr-xr-x - hbase hdfs 0 2018-09-19 18:26 /apps/hbase/data
drwxr-xr-x - hbase hdfs 0 2018-09-27 17:49 /apps/hbase/data/.tmp
drwxr-xr-x - hbase hdfs 0 2018-09-27 17:49 /apps/hbase/data/.tmp/data
drwxr-xr-x - hbase hdfs 0 2018-09-28 17:54 /apps/hbase/data/.tmp/data/default
drwxr-xr-x - hbase hdfs 0 2018-10-09 17:59 /apps/hbase/data/MasterProcWALs
-rw-r--r-- 3 hbase hdfs 0 2018-10-09 17:59 /apps/hbase/data/MasterProcWALs/state-00000000000000001877.log
drwxr-xr-x - hbase hdfs 0 2018-09-19 18:27 /apps/hbase/data/WALs
drwxr-xr-x - hbase hdfs 0 2018-07-10 10:31 /apps/hbase/data/WALs/node231,16020,1531189330072
drwxr-xr-x - hbase hdfs 0 2018-07-10 10:54 /apps/hbase/data/WALs/node231,16020,1531189883651
drwxr-xr-x - hbase hdfs 0 2018-07-10 11:02 /apps/hbase/data/WALs/node231,16020,1531191257857
drwxr-xr-x - hbase hdfs 0 2018-07-10 11:14 /apps/hbase/data/WALs/node231,16020,1531191741322
drwxr-xr-x - hbase hdfs 0 2018-07-10 18:14 /apps/hbase/data/WALs/node231,16020,1531192949461
drwxr-xr-x - hbase hdfs 0 2018-07-11 17:06 /apps/hbase/data/WALs/node231,16020,1531219308266-splitting
-rw-r--r-- 3 hbase hdfs 91 2018-07-11 17:06 /apps/hbase/data/WALs/node231,16020,1531219308266-splitting/node231%2C16020%2C1531219308266..meta.1531298682095.meta
drwxr-xr-x - hbase hdfs 0 2018-10-09 17:28 /apps/hbase/data/WALs/node231,16020,1537352815235
-rw-r--r-- 3 hbase hdfs 83 2018-10-09 17:28 /apps/hbase/data/WALs/node231,16020,1537352815235/node231%2C16020%2C1537352815235..meta.1539077336609.meta
-rw-r--r-- 3 hbase hdfs 83 2018-10-09 17:28 /apps/hbase/data/WALs/node231,16020,1537352815235/node231%2C16020%2C1537352815235.default.1539077320915
drwxr-xr-x - hbase hdfs 0 2018-07-09 17:31 /apps/hbase/data/WALs/node232,16020,1531128455707-splitting
-rw-r--r-- 3 hbase hdfs 814 2018-07-09 17:29 /apps/hbase/data/WALs/node232,16020,1531128455707-splitting/node232%2C16020%2C1531128455707..meta.1531128485582.meta
drwxr-xr-x - hbase hdfs 0 2018-07-17 11:59 /apps/hbase/data/WALs/node233,16020,1531794848784-splitting
-rw-r--r-- 3 hbase hdfs 1867 2018-07-17 12:59 /apps/hbase/data/WALs/node233,16020,1531794848784-splitting/node233%2C16020%2C1531794848784..meta.1531794867296.meta
-rw-r--r-- 3 hbase hdfs 3490 2018-07-17 12:59 /apps/hbase/data/WALs/node233,16020,1531794848784-splitting/node233%2C16020%2C1531794848784..meta.1531796373363.meta
-rw-r--r-- 3 hbase hdfs 83 2018-07-17 11:59 /apps/hbase/data/WALs/node233,16020,1531794848784-splitting/node233%2C16020%2C1531794848784..meta.1531799973563.meta
drwxr-xr-x - hbase hdfs 0 2018-07-25 15:11 /apps/hbase/data/WALs/node233,16020,1531878086352
drwxr-xr-x - hbase hdfs 0 2018-10-09 17:26 /apps/hbase/data/WALs/node233,16020,1537352818756
-rw-r--r-- 3 hbase hdfs 83 2018-10-09 17:26 /apps/hbase/data/WALs/node233,16020,1537352818756/node233%2C16020%2C1537352818756.default.1539077334363
drwxr-xr-x - hbase hdfs 0 2018-07-10 10:22 /apps/hbase/data/WALs/node234,16020,1531188485576
drwxr-xr-x - hbase hdfs 0 2018-07-10 11:02 /apps/hbase/data/WALs/node234,16020,1531191251113
drwxr-xr-x - hbase hdfs 0 2018-07-10 11:14 /apps/hbase/data/WALs/node234,16020,1531191744628
drwxr-xr-x - hbase hdfs 0 2018-07-10 11:22 /apps/hbase/data/WALs/node234,16020,1531192469368
drwxr-xr-x - hbase hdfs 0 2018-07-10 18:14 /apps/hbase/data/WALs/node234,16020,1531192953492
drwxr-xr-x - hbase hdfs 0 2018-07-10 18:41 /apps/hbase/data/WALs/node234,16020,1531218644614
drwxr-xr-x - hbase hdfs 0 2018-07-17 09:56 /apps/hbase/data/WALs/node234,16020,1531736897611
drwxr-xr-x - hbase hdfs 0 2018-10-09 17:30 /apps/hbase/data/WALs/node234,16020,1537352822378
-rw-r--r-- 3 hbase hdfs 83 2018-10-09 17:30 /apps/hbase/data/WALs/node234,16020,1537352822378/node234%2C16020%2C1537352822378..meta.1539077644098.meta
-rw-r--r-- 3 hbase hdfs 83 2018-10-09 17:25 /apps/hbase/data/WALs/node234,16020,1537352822378/node234%2C16020%2C1537352822378.default.1539077313538
drwxr-xr-x - hbase hdfs 0 2018-07-10 18:41 /apps/hbase/data/WALs/node235,16020,1531218644231
drwxr-xr-x - hbase hdfs 0 2018-07-17 10:33 /apps/hbase/data/WALs/node235,16020,1531792606380
drwxr-xr-x - hbase hdfs 0 2018-07-25 15:11 /apps/hbase/data/WALs/node235,16020,1531878078376
drwxr-xr-x - hbase hdfs 0 2018-07-09 17:27 /apps/hbase/data/WALs/hregion-32519348
drwxr-xr-x - hbase hdfs 0 2018-10-09 02:25 /apps/hbase/data/archive
drwxr-xr-x - hbase hdfs 0 2018-07-09 17:31 /apps/hbase/data/corrupt
drwxr-xr-x - hbase hdfs 0 2018-07-09 17:28 /apps/hbase/data/data
drwxr-xr-x - hbase hdfs 0 2018-09-28 17:54 /apps/hbase/data/data/default
drwxr-xr-x - hbase hdfs 0 2018-07-09 17:32 /apps/hbase/data/data/default/socialSecurity
drwxr-xr-x - hbase hdfs 0 2018-07-17 18:09 /apps/hbase/data/data/default/socialSecurity/.tabledesc
-rw-r--r-- 3 hbase hdfs 673 2018-07-17 18:09 /apps/hbase/data/data/default/socialSecurity/.tabledesc/.tableinfo.0000000006
drwxr-xr-x - hbase hdfs 0 2018-07-17 18:09 /apps/hbase/data/data/default/socialSecurity/.tmp
drwxr-xr-x - hbase hdfs 0 2018-09-24 15:06 /apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8
-rw-r--r-- 3 hbase hdfs 49 2018-07-09 17:32 /apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8/.regioninfo
drwxr-xr-x - hbase hdfs 0 2018-10-09 02:17 /apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8/.tmp
drwxr-xr-x - hbase hdfs 0 2018-09-19 18:32 /apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8/recovered.edits
-rw-r--r-- 3 hbase hdfs 0 2018-09-19 18:32 /apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8/recovered.edits/228.seqid
drwxr-xr-x - hbase hdfs 0 2018-10-07 00:17 /apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8/tag
-rw-r--r-- 3 hbase hdfs 102271 2018-10-07 00:17 /apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8/tag/306820d94f9d463bba02b6949032ff3d
drwxr-xr-x - hbase hdfs 0 2018-10-09 02:17 /apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8/userInfo
-rw-r--r-- 3 hbase hdfs 101512 2018-10-09 02:17 /apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8/userInfo/549a3822eb484e32a12842287293435a

  

查看HFile状况

[root@node-231 ~]# hbase org.apache.hadoop.hbase.io.hfile.HFile -f /apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8/tag/306820d94f9d463bba02b6949032ff3d -v -m -p
省略部分。。。。
K: 653128197810268592/tag:basicTag_1400/1531132350695/Put/vlen=4/seqid=0 V: 1400
K: 653224199208208529/tag:basicTag_1364/1531129064927/Put/vlen=4/seqid=0 V: 1364
K: 653224199208208529/tag:basicTag_1367/1531130762508/Put/vlen=4/seqid=0 V: 1367
K: 653224199208208529/tag:basicTag_1374/1531130561522/Put/vlen=4/seqid=0 V: 1374
K: 653224199208208529/tag:basicTag_1399/1531132350695/Put/vlen=4/seqid=0 V: 1399
K: 654324197605195204/tag:basicTag_1364/1531129064915/Put/vlen=4/seqid=0 V: 1364
K: 654324197605195204/tag:basicTag_1368/1531130762525/Put/vlen=4/seqid=0 V: 1368
K: 654324197605195204/tag:basicTag_1373/1531130561519/Put/vlen=4/seqid=0 V: 1373
K: 654324197605195204/tag:basicTag_1400/1531132350695/Put/vlen=4/seqid=0 V: 1400
K: 659000198306113231/tag:basicTag_1363/1531129064927/Put/vlen=4/seqid=0 V: 1363
K: 659000198306113231/tag:basicTag_1367/1531130762508/Put/vlen=4/seqid=0 V: 1367
K: 659000198306113231/tag:basicTag_1371/1531130561522/Put/vlen=4/seqid=0 V: 1371
K: 659000198306113231/tag:basicTag_1399/1531132350695/Put/vlen=4/seqid=0 V: 1399
Block index size as per heapsize: 480
reader=/apps/hbase/data/data/default/socialSecurity/d31a708f6a7b307c9bb2aa6818b790f8/tag/306820d94f9d463bba02b6949032ff3d,
compression=none,
cacheConf=CacheConfig:disabled,
firstKey=110115199402265244/tag:basicTag_1364/1531129064927/Put,
lastKey=659000198306113231/tag:basicTag_1399/1531132350695/Put,
avgKeyLen=46,
avgValueLen=4,
entries=1682,
length=102271
Trailer:
fileinfoOffset=97813,
loadOnOpenDataOffset=97650,
dataIndexCount=2,
metaIndexCount=0,
totalUncomressedBytes=102109,
entryCount=1682,
compressionCodec=NONE,
uncompressedDataIndexSize=89,
numDataIndexLevels=1,
firstDataBlockOffset=0,
lastDataBlockOffset=65593,
comparatorClassName=org.apache.hadoop.hbase.KeyValue$KeyComparator,
encryptionKey=NONE,
majorVersion=3,
minorVersion=0
Fileinfo:
DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00
EARLIEST_PUT_TS = \x00\x00\x01d~gm\xD3
MAJOR_COMPACTION_KEY = \xFF
MAX_SEQ_ID_KEY = 29
TIMERANGE = 1531129064915....1531132350695
hfile.AVG_KEY_LEN = 46
hfile.AVG_VALUE_LEN = 4
hfile.CREATE_TIME_TS = \x00\x00\x01fJ.8!
hfile.LASTKEY = \x00\x12659000198306113231\x03tagbasicTag_1399\x00\x00\x01d~\x99\x90\xE7\x04
Mid-key: \x00\x0544023\x00\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF
Bloom filter:
Not present
Delete Family Bloom filter:
Not present
Scanned kv count -> 1682

输出的第一部分是序列化的KeyValue实例所存储的真实数据。第二部分转存内部的HFile.Reader属性和trailer块的详细信息。最后一个部分以Fileinfo开头,是file info块的值。

猜你喜欢

转载自www.cnblogs.com/EnzoDin/p/9766290.html