3.7-3.9 HBase table properties

First, the table compression

1、HBase Sanppy

Sanppy HBase
 . 1 ) arranged Haodop compression 
    [Beifeng @ Hadoop -senior-Hadoop 2.5.0] $ bin / Hadoop checknative
 2 ) arranged HBase
     >> Hadoop JAR-Snappy -> put into HBASE lib directory 
    [@ Hadoop the root -senior lib ] # cp hadoop-Snappy-0.0.1-SNAPSHOT.jar /opt/modules/hbase-0.98.6-hadoop2/lib/ 

    >> need local libraries native 
    [root @ hadoop -senior lib] # mkdir / opt / modules /hbase-0.98.6-hadoop2/lib/ Native 
    [root @ hadoop -senior Native ] # cd /opt/modules/hbase-0.98.6-hadoop2/lib/ Native / 
    [root @ hadoop -senior Native ] # LN - s /opt/modules/hadoop-2.5.0/lib/native ./Linux-amd64-64
    [root@hadoop-senior native]# ll
    总用量 0
    lrwxrwxrwx. 1 root root 36 5月  27 11:41 Linux-amd64-64 -> /opt/modules/hadoop-2.5.0/lib/native
3) 重启HBASE
    [root@hadoop-senior hbase-0.98.6-hadoop2]# bin/hbase-daemon.sh stop master
    [root@hadoop-senior hbase-0.98.6-hadoop2]# bin/hbase-daemon.sh stop regionserver

    [root@hadoop-senior hbase-0.98.6-hadoop2]# bin/hbase-daemon.sh start master
    [root@hadoop-senior hbase-0.98.6-hadoop2]# bin/hbase-daemon.sh start regionserver


2, the test

##regionserver启用snappy压缩,hbase-site.xml
<property>
  <name>io.compression.codecs</name>
  <value>snappy</value>
  <description>A list of the compression codec classes that can be used 
               for compression/decompression.</description>
</property>




##
hbase(main):002:0> create 't_snappy', {NAME => 'f1', COMPRESSION => 'SNAPPY'}
0 row(s) in 0.4710 seconds

=> Hbase::Table - t_snappy

hbase(main):003:0> describe 't_snappy'
DESCRIPTION                                                                                                     ENABLED                                                     
 't_snappy', {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERS true                                                        
 IONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'false', BL                                                             
 OCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                                                                                            
1 row(s) in 0.0310 seconds


2, version and BlockCache

1、Memstore & BlockCache

HBase Regionserver the memory into two parts, as memstore, primarily used to write; another part as BlockCache, mainly for reading. 

Write request will first write Memstore, Regionserver each region will provide a Memstore, When Memstore full 64MB, will start flush flushed to disk. 
When the total size of Memstore exceeds the limit (heapsize * * hbase.regionserver.global.memstore.upperLimit 0.9 ), will be forced to start the flush process, from the largest Memstore start until flush below the limit. 

First Memstore read request to search the data, finding out the check on the BlockCache, then it will not find the disk to read and the read result into BlockCache. 
Because BlockCache uses LRU strategy, therefore BlockCache reaches the upper limit (heapsize * * hfile.block.cache.size 0.85 after), will start the elimination mechanism, eliminate the oldest batch of data. 

In the scenarios focus on reading response time, BlockCache can set bigger, Memstore set smaller in order to increase the cache hit rate.


Reference Hirofumi: Https://Blog.51Cto.Com/12445535/2363376

Background:
 1 , the cache is extremely important for the database
 2 , the ideal situation is that all the data can be cached in memory, so there is not any file IO requests, read and write performance is bound to the extreme.
3, we do not need all the data are cached, according to Pareto rule, 80% of service requests are concentrated in 20% of the hot data,
 4, 20 per cent of cached data, since this part of the data cache on can greatly improve system performance. 

HBase provides two cache structure in implementation: MemStore and BlockCache. 
Memstore
 . 1 , wherein the write buffer is referred memstore
 2 , HBase write operation memstore first writes the data, and writes HLog order
 // code such, we understand for the first write HLog writes the data sequentially memstore 
3 , after certain conditions are met such as unified refresh MemStore the data to disk, this design can greatly enhance the write performance of HBase.
4 , memstore also crucial for read performance, if not MemStore, just read the written data will need to find the IO from the file, this is clearly the price is expensive! 

BlockCache
 . 1 , referred to as read cache BlockCache
 2, HBase will find a file to the Cache Block cache blocks, so that subsequent or adjacent to the same request data search request may be obtained directly from memory, to avoid expensive IO operations. 


##
 1 , BlockCache is Region Server level,
 2 , a Region Server has only one Block Cache, complete Block Cache initialization when the Region Server starts.
3, so far, HBase has achieved three kinds Block Cache program, LRUBlockCache was the original implementation plan, is also the default implementations; HBase 0.92 version implements the second option SlabCache, see HBASE-4027; official after HBase 0.96 another alternative BucketCache, 7404-see HBASE . 
4 , except that the three programs to memory management,
 5 , where all the data is LRUBlockCache into the JVM Heap, to manage the JVM.
6 , SlabCache BucketCache the two use different mechanisms to part of the data stored in a heap outside, to HBase own management.
7, this evolution is because LRUBlockCache programs JVM garbage collection procedures often lead to long pauses, while the use of external heap memory data management can prevent this from happening.


LRUBlockCache

The default implementations HBase BlockCache
 1, the memory is divided into the three logically: single-access areas, mutil-access zone, in-memory area, accounted for 25% of the entire size BlockCache, 50%, 25% 
2 , a random read, followed by a block loading block from the HDFS into signle first region
 3, the follow-up if there are multiple request access to this data, then this data will be moved mutil- access area.
3, while IN- Memory field indicates the data can be resident memory, typically used to store frequently accessed, a small amount of data in the data, meta data Meta example, the user may be provided through the construction of the table when the 
column family property the IN - MEMORY to true = this column family into the in-memory area. // this part of the reference HBase - construction of the table statement parsing http://hbasefly.com/2016/03/23/hbase_create_table/ in 
IN_MEMORY mentioned parameters;
 4 , it is clear that this design strategy is similar to the JVM young zone, old area and perm area. 

Phase Summary: 
LRUBlockCache mechanisms: jvm similar to the young district, old area and perm area, he divided (SINGLE -access area, mutil-access area, IN- Memory area, respectively, accounted for the entire BlockCache size 
25%, 50%, 25%) loaded at a random access data from hdfs out into single-access zone, if there is a subsequent request for this data a plurality of times, it will be placed mutil- Access area, 
and in - memory area data that can be permanent memory, usually used to store frequently accessed, a small amount of data in the data, such as metadata.
// when it will start after the total elimination mechanism BlockCache reaches a certain threshold, Block least used will be displaced, to reserve space for the newly loaded Block. 
Disadvantages: LRUBlockCache caching mechanism because CMS GC policy will lead to excessive memory fragmentation, which could lead to the infamous Full GC, triggering terrible 'stop-the-world' pause, seriously affecting the upper layer service;


Three, Compaction table

1、Compaction

As data continues to brush in memstore written to disk, it will produce more and more HFile documents, internal HBase has a butler mechanism to solve this problem, namely by merger will combine multiple files into one large file. 
There are two types of merge: minor combined (minor compaction) and major combined compression (majar compaction). minor merge multiple small files rewritten to a smaller number of large files, reducing the number of files are stored, 
the process is actually a multi-way merge process. Because each file HFile are classified through, so merging fast disk only by the I / affect O performance. 

a major region in the combined group of a plurality of column HFile HFile rewritten as a new, compared with the minor merge, there are more unique features: major combined scans all the key / value pairs, rewriting the entire data sequence , 
the process of rewriting data will skip made marked for deletion of data. Delete entry into force of this time assert, for example, for those data as well as data over time to live expired version number limit, when rewriting data is not written to the disk. 


######## 
StoreFile file HRegoin Server is a background thread is monitored to ensure that the file remains in a controlled state. Storefile on the number of disks as more and more memstore be refreshed and become more and more,  
each refresh will generate a storefile file. When the number storefile meet certain conditions (type of configuration parameters can be adjusted), the merge operation will trigger file minor compaction, a plurality of relatively small storefile
into one large file of storefile triggered until the merged file to more than a single large file to configure the maximum allowed automatic segmentation of a region, i.e., region split operation, a flat region is divided into two. 


################ 
Minor compaction 
lightweight 
will meet the conditions of the first generation of merging several storefile generate a large storefile file, it does not delete data is marked as "deleted" and to expire data, and perform once more there will be minor storefile file merge operation. 

major compaction 
heavyweight 
all the storefile merge into a single storefile file, the file will be deleted during the merge marked as "deleted" data and data marked expired, and it will block (blocking) the operation belongs to all clients requested region until the merger is completed, the final merged storefile delete files.

Guess you like

Origin www.cnblogs.com/weiyiming007/p/10931183.html