Hbase performance optimization

 

1. Garbage collection optimization

Users can set garbage collection related options by adding HBASE_OPTS or HBASE_REGIONSERVER_OPT to the hbase-env.sh file. The latter only affects the region server process and is also the recommended modification method.

Increase the size of the new generation and reduce the number of garbage collections in the new generation

-XX:MaxNewSize=8g -XX:NewSize=8g

 

Modify garbage collection policy

-XX:+UseParNewGC 

Set the young generation to use the Parallel New Collector strategy, which will stop the running Java process and empty the young generation heap. The young generation is small compared to the old generation, so this process takes a very short time, usually a few hundred milliseconds.

 

-XX:+UseConcMarkSweepGC

If the above strategy is used in the old generation, it will cause the region server to pause for a few seconds or even minutes. If the pause time exceeds the zookeeper session timeout limit, the server will be considered by the master to have crashed, and will then be abandoned.

This situation can be mitigated using the Concurrent Mark-Sweep Collector (CMS) strategy, which differs in that its work attempts to complete the work asynchronously and in parallel without stopping the running Java process. This strategy will increase the burden on the CPU, but avoid the pause when rewriting the old generation fragments - unless a hint failure occurs, which will force the garbage collection to suspend the running Java process and perform memory defragmentation.

 

 

 2. Local memstore allocation buffer

As memstore keeps creating and releasing memory space, holes will be created in the old generation Heap. When allocating new space, there is not enough contiguous space allocation due to excessive fragmentation, and the JRE will fall back to using the (stop the world) garbage collector, which will cause it to rewrite the entire heap space and compact the remaining available objects. 

 

MSLAB (memstore-local allocation buffers) are many fixed-size buffers used to store keyvalue instances of different sizes. When a buffer cannot hold a newly added keyvalue, the system considers the buffer to be full and creates a new fixed-size buffer. Once these buffer objects are reclaimed, they will leave fixed-size holes in the heap, which will be reused by subsequent calls to new fixed-size objects, thus eliminating the need for the JRE to stop compacting to reclaim heap memory. 

 

But mslab also has some side effects, such as more waste of heap space; using buffers requires additional memory copying work, which is slightly slower than using keyvalue instances directly

 

Configure hbase.hregion.memstore.mslab.enabled in hbase-site.xml The default value is true

 

 

3 Compression

Unless you are storing already compressed content such as JPEG images, for other scenarios, compression usually leads to better performance because the CPU compressing and decompressing takes less time than reading and writing more data from disk .

algorithm Compression ratio % Compress MB/S Decompress MB/S
GZIP 13.4 21 118
VOC 20.5 135 410
Zippy/Snappy 22.2 172 409

 By default Hbase does not compress files, see describe 'tablename'

 

 

4. Optimize split and merge

Usually Hbase handles region splitting automatically. Once they reach a predetermined threshold, the region will be split into two, after which they can accept new data and continue to grow. When the user's region size has grown at a constant speed, the region split will occur at the same time, because the storage files in the region need to be compressed at the same time, this process will rewrite the split data, which will cause IO to increase, which is called "Split and Merge Storm".

Instead of relying on automatic splitting, turn off this behavior and call the split and major_compace commands to split manually. 

 

To prevent automatic splitting, set the value of hbase.hregion.max.filesize to a relatively large value, such as 100GB. Then use the client to implement a client that calls split() and majorCompact(). You can also use the shell to interactively call related commands, or use cron to call them regularly.

 

Another way is to do a pre-split when creating the table

create 't1', 'f1', SPLITS => ['10', '20', '30', '40']

 

5. Load Balancing

The master has a built-in feature called the balancer, which by default runs every 5 minutes (set via hbase.balancer.period). Once the balancer starts, it tries to distribute regions evenly across all region servers. The user can change the on or off state of the balancer through the balance_switch command of the shell.

In addition to relying on the balancer to do the work automatically, users can also use the move command to explicitly move a region to another region server.

 

 

6. Merge regions

When a user deletes a large amount of data and wants to reduce the number of regions managed by each server, the merge_region command can be used to merge adjacent regions.

 

 

7 Client API Best Practices

Disable auto refresh

put.setAutoFlush(false)

 

Use scan cache

scan.setCaching(1000);

 

limited scan range

Try to scan only in one Family

 

Close ResultScanner

Be sure to close the ResultScanner in the try catch finally

 

Cache usage

scan.setCacheBlocks() For those frequently accessed rows, block caching is recommended

 

Optimize the way to obtain row keys When the user only needs to obtain the required row keys, use the setFilter() method in Scan to add a FilterList with MUST_PASS_ALL. FilterList contains two filters, FirstKeyFilter and KeyOnlyFilter. Using the above combined filter will return the first keyvalue row found to the client.

 

Turn off WAL on put

When the data to be stored does not require high accuracy, use Put's writeToWAL(false) to close the WAL.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326414178&siteId=291194637
Recommended