HBase turn tuning best practices -CMS GC

 

HBase development to the present, it carried out a variety of optimization has never stopped, and the optimized GC is one of the most important. From the 0.94 version proposed MemStoreLAB strategy, Memstore Chuck Pool strategies to optimize the write buffer Memstore beginning to version 0.96 and made BucketCache heap outside of program memory read cache BlockCache optimization, to follow-up claims that version 2.0 will introduce more foreign heap memory, HBase heap will be visible using an external memory as a strategic direction to optimize the GC. However, no matter how much heap memory outside the introduction, the reader can not avoid using the full path to the JVM memory, took the offheap mode BucketCache is concerned, even if HBase data block is cached in memory outside of the heap, but when the first read or will an outer block of heap memory is loaded into the JVM's memory, and then returned to the user. Visible, no matter how much external heap memory usage, memory usage of JVM after all, is not around the past, since around the past, you still need to be settled itself in GC, GC itself to be optimized. This article will introduce CMS tuning tips GC strategy, follow-up will be tuned introduced in the scenarios under HBase HBase scenarios for GC policy -G1GC industry began to use another strategy.

 

CMS GC works

Tell me if already familiar with CMS GC works, you can skip this section, go directly to the next section. Tell me if CMS GC also not very understanding, can refer to the author before another article "HBase GC past lives this life - life experience articles", the text of the JVM memory structure and CMS GC has been considerable detail. In order to facilitate the introduction below, in this or some important points which will be refined:

1. Overall Young JVM memory region consists of three parts, a position of tenured region and Perm region, where a Young Eden area is divided into two areas and regions Survivor

 

2. A brief description of the entire object lifecycle ( must be learned by heart, would have been used later ):

(1) Young Zone: An object After initialization, the first will enter the Eden area, when the Eden area is full will trigger a Minor GC, Minor GC will check whether the Eden area of ​​all objects still alive (if there are other object reference), if alive, Eden will copy it from zone to zone Survivor and age plus a survival of these objects, and the object of death will be garbage collected. Eden area and free up this time, and other new object fill, and then after filling up triggers Minor GC, and so forth. It should be noted that each perform a Minor GC, survival will be the object of age plus one.

(2) Tenured area: Once the live objects over more certain age threshold will be promoted to Tenured area, it can be understood as a general storage area Tenured longevity object. Obviously, as time goes by, Tenured area will be filled up, this time triggered CMS GC (old gc), this GC is relatively complex, consists of five steps, see reference article.

 

3. Whether Minor GC or CMS GC, will be 'Stop-The-World', that is to stop all threads of users, leaving only gc thread garbage objects. Minor GC where the main STW time spent during the replication phase, CMS GC's main STW time spent in garbage objects marked stage.

 

 

GC tuning goals

The memory section briefly describes the structure of the Java virtual machine and the basics of Java GC, the next will introduce HBase cluster of several parameters of GC tuning tips on this basis. Before introducing specific tuning tips, it is necessary to look at the ultimate goal of GC tuning and basic principles:

1. Average Minor GC time is as short as possible. Because the entire Minor GC are in STW, and therefore short-term Minor GC will make the user to read and write more stable, controllable delay.

2. CMS GC times better. The shorter the better. One reason is that a CMS GC will generally lead to at least suspend the application of the second level, a greater impact on the user to read and write; on the other hand frequently CMS GC will produce large amounts of memory fragmentation, severe cases can cause Full GC, leading to RegionServer dang machine.

 

The following tips on tuning parameters are endeavor to comply with these principles, especially for HBase this type of delay-sensitive projects, making more stable GC pause time is shorter in the case to avoid a serious impact on the user to read and write!

 

CMS GC Optimization Tips

This section will analyze the various GC parameters for the JVM HBase this scenario, divided into three stages. The first phase will introduce applies to all scenarios of GC parameters, these parameters do not need much explanation reader can easily understand; the second and third phases, respectively, to tune to explain two sets of parameters, which sets general parameters will be set according to different scenarios in order to make the best GC, given the complexity of these two sets of parameters, we will be 11 explained by way of theory + experiment;

 

Phase One: default recommended configuration

Before introducing specific tuning tips, take a look at all the relevant parameters and their corresponding significance CMS GC involved, the following are the most common parameters:

 

-Xmx -Xms -Xmn -Xss -XX:MaxPermSize= M -XX:SurvivorRatio=S  -XX:+UseConcMarkSweepGC -XX:+UseParNewGC  -XX:+CMSParallelRemarkEnabled -XX:MaxTenuringThreshold=N -XX:+UseCMSCompactAtFullCollection  -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=C -XX:-DisableExplicitGC

 

From the foregoing description of the various GC parameters, you can easily obtain the first phase of the recommended parameters are as follows, this arrangement basically applicable to all scenarios:

 

-XX:+UseConcMarkSweepGC -XX:+UseParNewGC  -XX:+CMSParallelRemarkEnabled  -XX:+UseCMSCompactAtFullCollection  -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=75% -XX:-DisableExplicitGC

 

 

 

Tuning a prepared

Given above basic recommended settings by explaining the various GC parameters significance, but also mentioned several major performance impact parameters: these few parameters Xmn, SurvivorRatio and MaxTenuringThreshold, the following will be verified by theoretical reasoning test + the way tuning system HBase provided. Before we get tuning skills required for the next three additional relevant parts of explaining to do in advance, so we can better understand the context for analysis of experimental data. These three parts are: basic conditions of the test environment Test +, GC log interpretation, HBase scene analysis memory;

test environment

First, hardware topology will be described below in the experimental test, the test data and related software configuration where:


It is emphasized that all configured for the HBase BucketCache mode instead LruBlockCache. Using a large number of external heap memory as a read cache, the GC largely optimized, as shown below:


The figure is in both caching strategies GC performance, the visible BucketCache mode than LruBlockCache mode GC performance is much better, it is strongly recommended that BucketCache line configuration mode. Many children's shoes may have tested GC, throughput, latency and other indicators to read and write in both modes, see the test results will be very confused, the performance indicators than LruBlockCache in BucketCache pattern a lot worse , I also wondered and later came to understand: the test must be carried out in basically the whole scene memory, in this case indeed be the case. Readers can think about why so really do not understand before you can reference a blog " BlockCache program performance comparison test report ." But then again, in the big data business scene and how much memory does it will be full?

GC log analysis

After completion of the experiment describes the basic condition, then the GC log simple explanation convenience in the following analysis of the log. Note that only add parameters -XX: + PrintTenuringDistribution to print the corresponding log is strongly recommended to open the online cluster parameters , log segments as follows:

2016-07-26T10:37:16.933+0800: 227753.150: [GC2016-07-26T10:37:16.933+0800: 227753.150: [ParNew
Desired survivor size 268435456 bytes, new threshold 5 (max 15)
- age   1:   57523184 bytes,   57523184 total
- age   2:   80236520 bytes,  137759704 total
- age   3:   73226496 bytes,  210986200 total
- age   4:   50318392 bytes,  261304592 total
- age   5:   63166384 bytes,  324470976 total
- age   6:        240 bytes,  324471216 total
: 1268903K->305311K(1572864K), 0.0840620 secs] 26598675K->25635082K(66584576K), 0.0844700 secs] [Times: user=1.82 sys=0.08, real=0.08 secs]

Log fragments explained above three parts:

Part I: basic information area, there are two things to focus on, one is Desired survivor size 268435456 bytes, the value is calculated by SurvivorSize * TargetSurvivorRatio from, TargetSurvivorRatio default is 50%, if xmn to 5g, SurvivorRatio to 8, then Desired Survivor Size equal to 256M; the other is the new threshold 5 (max 15), the object represented in parentheses max 15 cut Older Generation maximum threshold 15, new threshold 5 represents the adjusted threshold of 5, visible, threshold throughout will continue to adjust, threshold represents the sum of all age adjustment is not greater than the value just greater than the size of the object Desired Survivor size, or the maximum threshold (15).

 

Second part: Objects of different age distribution, the first column indicates the total distribution of the object area Young age 1 to 6; second column represents the age where the object set containing the share memory size, such as objects for all age 2 the total size is 80236520 bytes; third column represents all objects smaller than the corresponding age occupy memory integrated value, corresponding to the second row such age2 137759704 total represented age and age of 1 to 2, the total size of all the objects;

 

Part III: information memory recovery region, where the first column represents the memory recovery region Young, 1268903K-> 305311K a front Young reclaim memory area of ​​1268903K, after recovering becomes 305311K; second column represents the memory recovery of Jvm Heap, 26598675K-> 25635082K (66584576K) represents the total current Jvm allocate memory for 66584576K, before recovery of memory occupied by the object 26598675K, memory for the object is recovered 25635082K; third column represents the recovery time, which represents the real time STW consumed this gc that business users pause time.

HBase scene memory analysis

Generally, each application will have its own memory object properties, classification is nothing more than two: one is short-lived objects (objects shorter survival refers to an object, such as temporary variables, etc.) mostly works, such as most pure HTTP request processing projects, short-lived object may account for about 70% of all objects; the other is the longevity of the object (refer to survive longer target object, such as TTL set a long cache object) mostly works, such as similar to HBase, Spark large memory and other such projects. Specific to HBase for example, to look at the specific memory objects:

1. RPC request object, such as Object Request and Response objects, these objects will typically as short connection RPC destruction and die, these short-lived objects may be considered as an object;

2. Memstore objects, HBase in Memstore objects in general will continue to survive for a long time, user data is written to the target after Memstore have existed until after Memstore filled flush to HDFS. Memstore generally filled in write high QPS situation usually takes about an hour, the object must be visible Memstore longevity object. In addition, Memstore relatively large objects by default, 2M size.

3. BlockCache objects, and objects as Memstore, BlockCache objects will generally survive longer in memory, the object belongs to longevity. This object is the default size of 64K.

So you can see, HBase system longevity objects belonging to the majority of the project, and therefore only need to RPC GC when this type of short-lived objects eliminated in the Young district GC can achieve the best results.

 

Phase II: NewParSize Tuning

 

theoretical analysis

NewParSize region represents the size of young, young and frequency directly determines the size of area of ​​minor gc. minor gc frequency on the one hand a single minor gc decide the length of time, the more frequent gc, gc shorter time; on the one hand determine the object's promotion to the amount of the old, more frequent gc, was promoted to old age, the greater the amount of objects. To explain it is this:

1. Increase the size of the young region, the frequency decreases Minor gc, gc single time will be longer (larger young zone settings, it is necessary to copy once gc more objects, inevitably takes longer), reading and writing operations than the delay jitter Big. Conversely, a small delay jitter read and write operations, is relatively stable.

2. Reduce the size of the area of ​​young, minor gc rate was faster, but it will speed up the total target was promoted to old age (once a target age will be increased by one per gc, when the age exceeds the threshold will be promoted to the old era, therefore gc The higher the frequency, age increases the faster), potentially increasing the risk of old gc.

 

Thus provided NewParSize need a certain balance can not be set too large, it can not be set too small.

 

Experimental results

 

Experimental conditions: divided into independent controlled trial, three RegionServer Xmn are provided to 512m, 2g, 5g, Xmn, the greater the area allocated Young; MaxTenuringThreshold SurvivorRatio and default values;

Graph showing experimental results:

 


 

Result analysis

1. First, FIG Xmn overall GC Processed graph under different scenarios, where the abscissa represents the number of GC, GC ordinate represents Processed (the STW), units of ms. Of particular note is that the three curves at the same time statistics, i.e. Xmn most often the case GC 512m during this time, and the corresponding Xmn a minimum number of case 5 of GC.

2. FIG green line a whole lot and the high peak represents CMS GC more frequently, but the main portion is below the green line and the blue line red, represents the average Minor GC and less time consuming; blue line minimum number of the GC, the peak more prominent, as compared to additional Minor GC takes longer red and green lines; Minor GC Processed between the red line and the green line between blue, relatively stable peak represents CMS GC relatively short; therefore overall, Xmn red line represents the scenario 2 is more reasonable CMS GC, Minor GC relatively high average, in contrast, the other two scenarios are particularly significant drawbacks, Xmn = 2 is the best choice; Figure I intuitively seen only much more accurate results require Subsequently Figure II and III.

3. 图二主要统计Minor GC的主要指标:总GC次数以及平均单次Minor GC耗时。两者来看,更关注后者,因为后者决定了业务读写的延迟以及稳定度;由图中可以看出,Xmn512m的平均单次Minor GC耗时最少,其次是Xmn2g,最差是Xmn5g,达到了130ms左右,意味着在其Minor GC过程中所有业务读写延迟至少为130ms;这个也很好理解,Young区越小,Minor GC频率越高,单次Minor GC需要复制的对象数就越少,耗时越少;

4. 图三主要统计CMS GC(老年代GC)的主要指标:CMS GC次数以及平均单次老年代GC耗时(只算STW耗时);由图中可以看出,Xmn2g无论是GC次数还是GC耗时都更加优秀,相比之下Xmn512m就是最差的选择;解释起来也很简单,因为Young区设置太小,Minor GC频率高,对象age增加很快,很多对象就有可能因为age超过阈值(默认6)晋升到老年代,相对而言会更有可能引入大量短寿对象晋升老年代。而短寿对象相对而言会比较小,比如request、response等,大量小对象一旦进入老年代,就会导致CMS GC的时候需要标注更多对象,必然比较耗时;

 

实验结论

可见,测试结果基本和理论分析一致,Xmn设置过小会导致CMS GC性能较差,而设置过大会导致Minor GC性能较差,因此建议在JVM Heap为64g以上的情况下设置Xmn在1~3g之间,在32g之下设置为512m~1g;具体最好经过简单的线上调试;需要特别强调的是,笔者在很多场合都看到很多HBase线上集群会把Xmn设置的很大,比如有些集群Xmx为48g,Xmn为10g,查看日志发现GC性能极差:单次Minor GC基本都在300ms~500ms之间,CMS GC更是很多超过1s。在此强烈建议,将Xmn调大对GC(无论Minor GC还是CMS GC)没有任何好处,不要设置太大。

 

阶段三:增大Survivor区大小(减小SurvivorRatio) & 增大MaxTenuringThreshold

 

理论分析

上文讲过,一次Minor GC会将存活对象从Eden区(以及survivor from区)复制到Survivor区(to区),因此增大Survivor区可以容纳更多的存活对象。这样就会防止因为Survivor区太小导致很对存活对象还没有达到MaxTenuringThreshold阈值就直接进入老生代,潜在增大old gc的触发频率;但是Survivor区设置太大也会有一定的问题,Survivor设置较大会使得对象可以在Young区’待’的时间很长,但是对于一些长寿对象较多的场景下(比如HBase),大量长寿对象长时间待在Young区做很多’无谓’的复制,一定程度上增加Minor GC开销。

 

另外,增加MaxTenuringThreshold相当于提高了进入老年代的门槛,可以有效限制进入老年代的对象数。和Survivor设置相似,调整MaxTenuringThreshold也需要做一个取舍,设置太小会增加CMS GC的触发频率以及耗时,而设置太大则会在长寿对象较多场景下增加Minor GC开销。一般情况下,默认MaxTenuringThreshold=15已经相对比较大,不需要做任何调整。

 

实验结果

实验条件:分为独立对照试验,三台RegionServer分别设置SurvivorRatio为2、8、15,SurvivorRatio越大,Survivor区大小越小;MaxTenuringThreshold取默认值;其他:-Xmx64g,-Xmn2g;

实验结果曲线:

 

结果分析

1. 图一是SurvivorRatio在三种不同场景下对应的GC性能曲线图,大体可以看出蓝线Minor GC次数最多,绿线尖峰太多,即CMS GC性能最差;具体细节再来看图二和图三。

 

2. 图二主要统计Minor GC主要指标:平均单次Minor GC耗时三者基本相当,SurvivorRatio:2场景下稍微较高,这是因为SurvivorRatio=2对应的Survivor区较大,可以使得对象在Young区’待’的时间很长,在HBase这种长寿对象较多的情况下,可能会增加一些无谓的‘复制’开销(下文会通过日志分析详细解释)。另外,SurvivorRatio=2场景下Minor GC频率也比较高,可能的原因是因为在总Young大小确定的情况下,Survivor越大,Eden自然越小,Minor GC频率就会增大。可见,SurvivorRatio=2场景下Minor GC性能相对稍微较差。

 

3. 图三主要统计CMS GC主要指标:三者CMS GC次数基本相当,SurvivorRatio=2场景下单次CMS GC耗时最少,相比SurvivorRatio=8的场景耗时减少30%左右,性能最好;而相比之下SurvivorRatio=15场景下耗时最长,性能相当差;这是因为SurvivorRatio=2场景下存活对象可以长时间待在Young区,可以得到充分的淘汰,晋升到老生代的短寿小对象会比较少,因而CMS GC性能较好;相比SurvivorRatio=15会因为Survivor区设置太小,很多短寿小对象因为得不到充分的淘汰就会‘溢出’到老生代,导致CMS性能很差。

实验结论

 

可见,测试结果基本和理论分析也基本一致,对于Minor GC来说,SurvivorRatio设置对其影响不是很大。而对于CMS GC来说,将SurvivorRatio设置过大简直就是灾难,性能极其差。而和默认值SurvivorRatio=8相比,将SurvivorRatio调小有利于短寿小对象更充分地淘汰,因此建议将SurvivorRatio=2

CMS调优结论

1. 缓存模式采用BucketCache策略Offheap模式

2. 对于大内存(大于64G),采用如下配置:

 

-Xmx64g -Xms64g -Xmn2g -Xss256k -XX:MaxPermSize=256m -XX:SurvivorRatio=2  -XX:+UseConcMarkSweepGC -XX:+UseParNewGC 
-XX:+CMSParallelRemarkEnabled -XX:MaxTenuringThreshold=15 -XX:+UseCMSCompactAtFullCollection  -XX:+UseCMSInitiatingOccupancyOnly        
-XX:CMSInitiatingOccupancyFraction=75 -XX:-DisableExplicitGC

 

其中Xmn可以随着Java分配堆内存增大而适度增大,但是不能大于4g,取值范围在1~3g范围;SurvivorRatio一般建议选择为2;MaxTenuringThreshold设置为15;

3 对于小内存(小于64G),只需要将上述配置中Xmn改为512m-1g即可

总结

本文首先比较系统的介绍了CMS GC的相关知识,之后分三个阶段层层推进对HBase集群中相关重要参数的调优进行了详细说明,尤其后面两阶段通过理论推理以及实验验证的方式对两组核心参数进行了针对性调整,最终得出一个较为完整的CMS GC参数配置。读者可以参考该参数配置对集群进行调整,再通过日志查看调整效果~

 

 

 

 

 
 
 
 

范欣欣

就职于网易杭州研究院后台技术中心数据库技术组,从事HBase开发、运维,对HBase相关技术有浓厚的兴趣。邮箱:[email protected]

Guess you like

Origin www.cnblogs.com/wanxqing/p/10930387.html