hbase optimization Journey (a) to explore the parameter optimization regionserver

 

Optimization purposes

We use the cluster group online hbase grouping function, but do not specifically optimized for the characteristics of different groups of business, service capabilities hbase not completely aroused.

This article documents the exploration of a business grouping parameter optimization, take the opportunity to understand the effects of different configuration regionserver monitoring indicators and machine load. After optimization, a single query regionserver lower latency, reduce disk IO, the system is more stable. Thereby improving the throughput, thereby reducing the machine, the ability to enhance the resource utilization, cost savings.

To solve those problems

The problem currently found mainly in the service packet write. Although resource utilization is not high, but from time to time rpcp99 slow. You can see from the monitor, intermittent queue waiting time will be longer. Look regionserver log, hlogs limit triggered at the same time too much brush region, flush and compact pressure concentrated. Brush a lot of small files, write amplification effect is obvious.

problem analysis

There are 17 sets of packet regionserver, has about 520 single region, the request amount is relatively stable during the day. Analyze the use of a station.

Physical resource usage

Machine type

2 * E5-2650v4 (2 * 12 nucleus), 8 * 32G, 2 * 150G (120G) SATA SSD + 12 * 800G (960G) SATA SSD

Machine load

cpu and load is not high.

 

 

Memory, due to application memory allocation more conservative, heap within regionserver heap outside a total of 150g, datanode 4g, a large piece of the operating system to read cache. But regionserver itself is also equipped with 100g of read cache, the operating system read cache useless.

 

 

High flow disk read and write traffic 50MB / s. Network In 20MB, Out 40 to 80MB fluctuations.

 

Disk read flow mainly local business regionserver read and compact reading and writing flow mainly local regionserver write wal, flush, compact, well written copy of other datanode.

Network in the main business is a write request and other datanode written copy of the request, the network out of business is mainly in response to queries and regionserver write wal, flush, compact write request other datanode copies.

As the main business inquiries go-memory cache (95%), hifile have a very high compression ratio (1: 5), if you do not consider the case of server-side data filtering, traffic caused by disk read IO IO should only network query response hundred one points.

Of course, it affects the service side data filtering is to be considered, but do not read too much traffic than normal network out to disk. compact small file read write large files compression rate increased, which is greater than the traffic network disk read OUT possible. Disk read a lot of traffic should be compact reading caused this time and no major compacted, instructions may cause a lot of small files write amplification.

 

regionserver usage

regionserver arrangement

Version 0.98.21, configuration Appendix I.

Monitoring indicators

P99 has a 150ms latency queue up, execution time p99 (ProcessCallTime_99th_percentile) within 45ms.

Read cache hit rate of 95%, also ok

There are some slow write

 

About 20 minutes memstore overall brush, brush appear in slowput when memstore

 

compaction queued refresh frequency and memstore basically the same, no major, explained that memstore regularly cause large flush.

 

flush line also occurs in periodic memstore flush

 

Look regionserver log at regular intervals will hlogs limit triggered at the same time a lot of brush region, 22MB of memstore brush 5MB hifile, while the brush a lot of small files, and immediately compact, write amplification effect is obvious.

 

09:34:04,075 INFO  [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=53, maxlogs=52; forcing flush of 69 regions(s) ...

09:34:27,339 INFO  [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=53, maxlogs=52; forcing flush of 222 regions(s) ...

...

09:34:27,601 INFO  [MemStoreFlusher.1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=311232252, memsize=22.6 M, hasBloomFilter=true, into tmp file ...

09:34:27,608 INFO  [MemStoreFlusher.1] regionserver.HStore: Added ...., entries=27282, sequenceid=311232252, filesize=5.4 M

09:34:27,608 INFO  [MemStoreFlusher.1] regionserver.HRegion: Finished memstore flush of ~23.6 M/24788698, currentsize=0/0 for region .... in 267ms, sequenceid=311232252, compaction requested=true

 

 

And gc heap memory, the cycle fluctuation mainly occur when the brush memstore.

summary of a problem

1.hbase.regionserver.maxlogs too little,

According to the relational formula hlog and memstore

 

 hbase.regionserver.hlog.blocksize * hbase.reionserver.logroll.mutiplier*hbase.regionserver.maxlogs 
 >= hbase.regionserver.global.memstore.lowerLimit * HBASE_HEAPSIZE

 

It should be at least 95. Configuring too frequently due hlog limit trigger hundreds region flush, flush are small files, flush and compact pressure is greater.

2. Memory idle too much, set aside 20% idle on it. But pay attention to g1 memory is too large may lead gc time becomes longer.

3.region a few too many, most of the region's memstore no mandatory flush on the 128MB. Normal should try to ensure that each region can be assigned to Flushsize size of memory, flush large files as much as possible, thereby reducing subsequent Compaction overhead. region by region can be merged to reduce the number, but this is not the first optimization, ease the way by increasing the memory.

 

Optimum Parameter

First Revision

Looking to optimize hlogs trigger strong brush memstore.

hbase.regionserver.maxlogs 52 was changed to 95.

Restart regionserver.

 

Hlog trigger limit becomes 47 minutes memstore the brush.

Log Look, about 47 minutes hlogs limit and refresh trigger 257 region. Before and strong brush, like many memstore20MB, hfile5MB compressed. The total peak memstore most 18.35GB, part of the region 128MB advance flush, so the theoretical upper limit not to 22.5GB.

Reducing the frequency of compaction queue

Peak queue wait time interval becomes long, has improved.

 

gc and the number of times the usual value decreased slightly, the peaks increase.

 

slowput decreased slightly.

Optimization have some effect.

The Second Amendment

Because too many region number, a single region may be less than 128MB triggered hlog strong brush. Thereby increasing the memory transfer large memstore and hlogs limit. Computing is expected before hlog upper limit certainly trigger a brush per hour. Turn up a number of handler threads.

hbase-env.sh

 

1.jvm XX:MaxDirectMemorySize 调为110GB

2.堆内Xmx调为80g

 

hbase-site.xml

 

3.hbase.bucketcache.size 112640  110g,hfile.block.cache.size   0.14 

4.hbase.regionserver.global.memstore.upperLimit   0.56   44.8g,

 hbase.regionserver.global.memstore.lowerLimit  0.5  40g,

5.hbase.regionserver.hlog.blocksize  268435456   256MB,显示配置,默认和hdfs块大小一致,也是配置的这个值。

hbase.regionserver.maxlogs 177

6.hbase.regionserver.handler.count 384

 

Restart regionserver, observe the monitor.

 

memstore1 hours a drop ceiling is not hlog triggered, it is regularly updated hourly triggered.

 

Look logs, 16:31 after another region128MB flush. 16:59 memstore 17.5GB, hlogs 125 when most of the region triggered one hour refresh, delay a few seconds to tens of seconds. Refresh time is more concentrated, there are flush and compact line. Six of points compactsize large and sustained, it should be a lot of need for compact region of hfile larger.

 

Slowly put increases, rpc execution time of the peak becomes higher, p99 longest queue waiting time to actually 2s!

 

The reason is that gc peak becomes higher, the influence regionserver business.

 

Heap memory gets bigger, gc recycling increases, gc time becomes longer.

Some side effects of optimization to eliminate.

 

 

Third revision

Optimization expected to increase gc heap memory problems caused. memstore refresh interval to 2 hours refresh size changed to 256MB

 

1.XX:InitiatingHeapOccupancyPercent=65 调整为XX:InitiatingHeapOccupancyPercent=75  之前hmaxlogs52限制实际memstore最大12g左右,堆内读写缓存加起来0.45左右。现在读写缓存加起来最多0.7了,所以调大触发gc的上限(bad modify)

2. -XX:ConcGCThreads=8,-XX:ParallelGCThreads=30,-XX:G1MixedGCCountTarget=32  增大gc并发,文档说建议逻辑核的5/8,48核30。并发标记跟着调大。拆解gc次数降低单次时间。

3.hbase.regionserver.optionalcacheflushinterval 改  7200000

4.hbase.hregion.memstore.flush.size  268435456  128MB的压缩会也就22MB hfile没多大,所以单个memstore内存改为256MB。  

 

Restart observation.

 

It found that the simultaneous transfer large intervals and flush size, and becomes 1 hour and 40 minutes hlog limit triggers a strong brush

 

Look at the log, while a small number of brush region, 170, flush and compact line has improved.

 

rpc and queue time peak p99 is still too high, put a lot of slow

 

Overall gc count reduction, the peak time is changeable high gc

Look gc log, concurrent adjustment gc, gc system is still very long time, need to continue to adjust.

The fourth of five modifications

gc trigger threshold lower, ahead of gc. gc concurrent piecemeal, a maximum of 10% of the region. Interval changed back one hour. Changed twice, the final modified as follows

1.XX:InitiatingHeapOccupancyPercent=65, 又改到60  提前 

2. -XX:ConcGCThreads=4,-XX:ParallelGCThreads=16,-XX:G1OldCSetRegionThresholdPercent=10  提高一次gc最多回收的region数。

3. hbase.regionserver.optionalcacheflushinterval 去掉,改回一小时。

 

 

Restart observation.

memstore brush is a lot of Region 1 hour, since only a short time to start performing a few cycles are not dispersed.

 

flush and compact line or by the hour appears

 

Slow put less, RPC and p99 reduced the waiting queue, but want to be lower.

 

gc changed much, but a little something but the peak time is still pricey, look young gc gc log time there very long, mixed gc is still quite a long time.

The need to continue tuning.

Sixth modification

Continue to tune gc parameter, young gc control memory size. Can be allowed to continue to increase memstore arranged a brush 2 hours. Because too many region number, remove flush 256MB settings.

-XX:G1MixedGCCountTarget=16    增大一次全局标记后mixed gc最多执行的次数
-XX:G1HeapWastePercent=10   允许保留10%垃圾
-XX:G1MaxNewSizePercent=20   young 上限20%,避免young区太大gc慢
hbase.regionserver.global.memstore.upperLimit 0.6    调大上限
hbase.regionserver.global.memstore.lowerLimit  0.54   
hbase.regionserver.maxlogs 200   调大上限
hbase.regionserver.optionalcacheflushinterval  7200000   2小时一刷
hbase.hregion.memstore.flush.size 去掉用默认128MB

 

regionserver index

2 hours a brush memstore

 

flush and compact line are less

 

Reducing slowly put, rpc queue wait time, and overall reduction p99, also reduced a lot peak

gc more balanced, shorter time to peak

 

Machine load

cpu load remains low

 

Disk read IO down to 1/4, the network out declined.

 

Memory and 20% + free, also could use more points.

Optimization effective.

to sum up

Optimize outcome

The optimization parameters and adjusting regionserver gc parameter files to reduce the flush small, compact reduced write amplification. The average response time is reduced, reducing the number of peak response time. Reduce the number of disk read and write IO, which increases the usage time of the disk.

Subsequent optimization direction

Reduce the number of region

regionserver single region on the high side.

region are pre-partition, press estimated the total number of data partition size calculation of the pre-partition, but did not consider hfile compression. Several large storage region form one region is the upper limit of the capacity fraction hfile, may be combined region, reduce the number region.

Increased memory configuration

Reservations can be idle 10%

gc-depth tuning

The gc parameter tuning is no clear objective, no optimization methodology, double-blind optimization, duplication of efforts, to in-depth details gc parameters, with a clear methodology, towards a clear objective optimization. Stay tuned.

Reduce server

Possible bottlenecks in storage capacity. hdfs logical storage were 36.0 T, 17 regionserver station may support 50TB logical storage capacity can be reduced several iterations, to achieve cost savings.

appendix

hbase-site.xml original configuration

slightly

hbase-env.sh original configuration

export HBASE_REGIONSERVER_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
 -Xloggc:${HBASE_LOG_DIR}/gc-`date +'%Y%m%d%H%M'` -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
 -XX:GCLogFileSize=512M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${HBASE_LOG_DIR}/hbase.heapdump -XX:ErrorFile=${HBASE_LOG_DIR}/hs_err_pid%p.log
 -XX:+PrintAdaptiveSizePolicy -XX:+PrintFlagsFinal -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+UnlockExperimentalVMOptions 
 -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 -XX:G1NewSizePercent=5 
-XX:G1MaxNewSizePercent=60 -XX:MaxTenuringThreshold=1 -XX:G1HeapRegionSize=32m  -XX:G1MixedGCCountTarget=8 -XX:InitiatingHeapOccupancyPercent=65
 -XX:MaxDirectMemorySize=100g -XX:G1OldCSetRegionThresholdPercent=5 -Xmx50g -Xms50g"

 

g1 References

https://www.oracle.com/technetwork/tutorials/tutorials-1876574.html

Published 86 original articles · won praise 267 · Views 1.77 million +

Guess you like

Origin blog.csdn.net/javastart/article/details/104945883