jvm Series - Detailed garbage collector, commonly jvm parameters, performance monitoring and tuning tools Case

Summary

  Then this blog jvm Series - Detailed and garbage collection algorithms to explain, on this blog need to understand that you have some understanding of jvm memory structure, and garbage collection algorithm, it is recommended not to know friends can look at Part I jvm blog, please ignore the great God. Closer to home, with substantial theoretical basis, we have to start them into practice, this blog mainly on the classification and selection jvm garbage collector, commonly used jvm parameters, performance monitoring tools, and tuning combat, take you step by step to uncover jvm mystery.

jvm garbage collector

  We know, jvm heap memory is divided into the old generation and the new generation, the new generation using replication algorithm, using the old generation mark - sweep or mark - sorting algorithm to collect and clean up the garbage on the specific implementation of the algorithm is explained next to garbage collector.
  jvm garbage collector present, there are seven kinds: serial collector, parnew collector, parallel scavenge collector, serial old collector, parallel old collector, cms collector, g1 collector. Seven kinds of garbage collector can be further divided into:
  the new generation of collectors : Serial, ParNew, Parallel Scavenge
  of the old generation collector : CMS, Serial Old, Parallel Old
  whole heap collector : G1
  new generation of collectors only collect the new generation garbage collectors only collect older Generation older Generation garbage collector G1 and the entire heap of new and old take-all, it is worth mentioning that, G1 collector entire Java heap is divided into multiple independent regions of equal size (region ). Although it retains the concept of the new generation, old age, but it's old and the new generation is no longer isolated from each other, they are part of the Region (not necessarily consecutive) collection, the heap memory architecture introduced with G1 collector before we the structure is a big difference, it marks the area of each region belongs to, then its garbage collection.
  The following figure shows jvm garbage collector illustrated:
Garbage collector illustrated
  Above, it can be used with continuous lines. You may wonder why they can not perallel scavenge cms with the use, cms Why can match with the serial old use, parallel old and parallel sacvenge Why only with use. First answer the first question: parallel scavenge no other Gc Gc general framework, leading to the two can not mix, as to why not use the same framework, it is entirely man-made reasons, nothing to do with technology (and perhaps the future can be achieved both share bar). The second question: In fact, the two are not with the use, cms collector (concurrent collector) uses a mark - sweep algorithm, threads run concurrently with the user, it will produce (garbage garbage generated after completion mark) floating garbage, when during cms reserved memory to run the program can not meet the needs of the time, will start the backup plan, using serial old to collect (mark - finishing), free up memory space. The third question: parallel old with parallel scavenge like a parallel collector, not the same problem with the cause of a serial and parnew.
  Here there is a concept: Parallel collector and concurrent collector, the user can thread as a reference, with the user threads run concurrently with the call, did not participate in user thread called parallel.

Serial Collector

  Serial collector is the most primitive of a garbage collector, also known as the serial collector, as the name implies, it is a single thread, but more than that, it is time to collect garbage, it will suspend all other thread work, until the end of the collection, it is termed " STOP at the world ." Imagine, for example, you're watching a movie, every five minutes to see the need to pause for a few seconds, which is clearly unacceptable. running processes serial collector as follows:
serial running processes
  For the " STOP at The world " such a bad experience, virtual machine developers very understanding, but also feel aggrieved: "Your mother is your time to clean the room, make sure you always honest real sit or go out alone, if while cleaning, while you throw confetti, the room never cleaned. "that sounds very reasonable, and true, while the virtual machine development team is also to eliminate or reduce memory recovery and lead to pause and have been working with!

ParNew collector

  ParNew collector is actually a multithreaded version Serial collectors, in addition to the use of multiple threads to collect garbage, other basic behaviors and implement Serial Collector's exactly the same. ParNew collector can only be played in multi-core CPU environment its advantages (fast multi-threaded gather speed, pause time is reduced), if it is a single core CPU it's even better to Serial collector effect (single-core CPU thread switching leads overhead). Its operating process is as follows:
ParNew running processes

Parallel Scavenge collector

  Parallel Scavenge ParNew and similar, but also a simultaneous multithreading collector, as compared to ParNew, its goal is to reach a certain controlled (= a certain user code running time / (user code running time garbage collection time + )), if a virtual machine running a total of 100 minutes, and garbage collection took 1 minute, then the throughput becomes 100-1 / 100 = 99%.
  The shorter the dwell time for the program that interacts with the user, a good response speed can improve the user experience, while the high throughput is possible to efficiently utilize cpu, primarily for operation in the background tasks do not require user interaction.
  Parallel Scavenge provides two parameters to control the throughput: -XX: MaxGCPauseMillis (Control pauses (JVM preferably not exceed the set time), the unit MS), -XX: GCTimeRatio (a certain size, greater than 0 and less than 100), but you do not think the parameters pause time set small, large throughput parameter set can make garbage collection speed becomes faster, shortening the dwell time is at the expense of throughput and the new generation of space in exchange for: the new generation of systems to transfer small, such as the regulation of 1000 megabytes 700 megabytes, 700 megabytes of space velocity collect necessarily faster than 1000 megabytes, but the corresponding collection frequency will be increased, the original 10s gather once every stop 100ms, need time to collect 5s now, every pause 70ms (equivalent to 10s pause 140ms), pause time indeed declined, but the throughput lessened.
  Therefore, Parallel Scavenge also referred to as "a certain priority Collector", this collector there is a parameter: -XX: + UseAdaptiveSizePolicy, After opening this parameter, then we do not need an extra set the size of the new generation and the new generation of proportion eden and survivor of the parameters, jvm will dynamically adjust these parameters according to the operation of the current system in order to provide the most suitable pause time and throughput this adjustment is called GC adaptive adjustment strategy, and this is one important difference between the Parallel Scavenge and ParNew.

Serial Old collectors

  Serial Old Like Serial is a single-threaded collector, use the "mark - finishing" algorithm. He CMS can be used as a collector of the reserve army, running processes with Serial collector running processes.

Parallel Old collectors

  Parallel Old collectors like Parallel Scavenge is a multi-threaded parallel collector, a "mark - finishing" algorithm. It applies to the Parallel Scavenge with increased throughput and CPU resources sensitive applications. Workflow is as follows:
parallel old workflow

CMS collector

  CMS收集器全称Concurrent Mark Sweep,主打并发收集,低停顿,适用于B/S系统的服务端,我们熟知的淘宝网站使用的便是CMS收集器,它的收集器线程可以跟用户线程一起工作,这也是与并行收集器所不同的地方,运行流程如下:
CMS running processes
  由上图可见,尽管CMS可以跟用户线程一起运行,但它同样也无法避免“stop the world”,之前的博客讲过可达性原则,即在初始标记和重新标记验证对象死活的时候也会引起工作线程的停顿,只是停顿的时间较短。CMS收集器是一款非常优秀的垃圾回收器,但它也存在以下缺点:
  1.对CPU资源敏感。事实上,面向并发设计的程序对CPU资源都较为敏感,在并发阶段,他虽然不会使用户线程停顿,但是也会因为占用了一部分CPU资源而使应用程序变慢,总吞吐量就会降低。
  2.无法处理浮动垃圾,可能出现“Concurrent Mode Failure”导致另一次Full Gc(收集老生代成为Full Gc)。由于CMS在并发清理阶段用户线程依然运行着并不断产生垃圾,这部分垃圾出现在重新标记之后,所以在本次Gc中无法清理,这部分垃圾就称为浮动垃圾。CMS在垃圾收集的时候用户线程仍在运行,所以他不能向其他收集器一样等到老生代几乎填满再进行回收,需要预留一部分空间供并发时的程序使用,可以通过:-XX:CMSInitIatingOccupancyFaction的参数值来调节触发收集的百分比,一般不需要特意动它。如果预留空间无法满足程序运行的需要,那么就会出现Concurrent Mode Failure,这个时候就轮到Serial Old收集器登场了,jvm会临时使用Serial Old来重新对老年代进行垃圾收集,这同时也就意味着系统停顿时间变长,所以此参数设置过高容易引起大量Concurrent Mode Failure,反而降低性能!
  3.产生大量内存碎片。CMS利用的是标记-清除算法来进行垃圾收集(比标记-整理快),这必然会不可避免的产生内存碎片,内存碎片过多时,就算剩余空间很足,但是无法找到连续的内存空间去分配新来的大对象,就会不得不提前触发Full GC。我们可以通过开启XX:UseCMSCompactAtFullCollection参数来解决此问题(默认开启),这样CMS在顶不住要进行Full GC时会对内存碎片进行合并整理,但这也会使得停顿时间变长(内存整理无法并发执行)。通过XX:CMSFullGCsBeforeCompaction可以设置执行多少次不合并整理的Full Gc后,执行一次带合并整理的Full Gc,默认为0,即每次进入Full Gc时都会进行碎片整理。

G1收集器

  G1收集器是当今收集器技术发展的最前沿成果之一,它是一款面向服务端应用的垃圾收集器。
  G1收集器具备以下特点:
  1.并行和并发
  2.分代收集
  3.空间整合:从整体看它基于标记-整理算法,从局部(两个Region之间)来看则是基于复制算法,这意味着它在运行的时候不会产生内存碎片,有利于程序长时间执行,不会因为分配大对象找不到连续的空间而提前触发Full Gc
  4.可预测停顿:可以让使用者明确指定在一个长度为M毫秒的时间片段内,垃圾回收的时间不超过N毫秒。运行流程如下:
G1 running processes
  虽然G1收集器有诸多优点,但它的应用案例却少之又少,而且也缺乏与之相关的性能测试,但相信在未来G1会是最终的胜利者,我们可以一直观望!如果你的收集器目前没有什么问题,那么大可以维持现状,如果你的应用追求的是吞吐量,那么G1并不会为你带来什么特别的好处。

最好的垃圾回收器 ?

  看到这里,我想你应该会明白不存在什么最好的垃圾回收器,选择什么回收器需要我们根据实际的业务场景来确定,如果追求低停顿,可以考虑ParNew+CMS组合,如果追求高吞吐量,可以考虑Parallel Scavenge+Parallel Old组合,单核CPU下还可以考虑最经典的Serial组合!

常用的jvm参数

  通过jvm参数设置可以让我们实现对jvm的个性化定制,提高系统性能。使用参数只需要在java命令后面加上就可以,例如 java -Xmx100m hello,在eliplse和idea中同样可以很方便的进行设置,设置方法自行百度。

堆参数

Heap parameter
  jdk8永久代已经废弃,替换为Metaspace(本地内存),相应参数可以自行百度,默认值大约为4096M,一般的应用来说足够了。堆参数中可以适当把年轻代的内存设置的大一些,可以有效减少Full Gc的次数,提升系统的响应速度。

回收器参数

Recovery parameters
  通过上表的参数可以指定使用的垃圾回收器,后面会介绍常用的回收器参数组合。

常用参数

Common parameters
  如上表,后面的几个参数可以打印GC日志,另外通过Xloggc:log/gc.log可以指定gc日志的位置,查看垃圾回收的情况,同时在OOM的时候可以在指定路径生成dump文件,方便我们可以使用性能监控工具分析查看。其中,xss参数值得注意,它是为每个线程所分配的内存大小,一般来说不会超过2兆,所以,xss设置的越大,可运行的线程总数就越少,但相应的每个线程栈的深度也就越深,不容易发生栈溢出,反之容易发生栈溢出,这也就是为什么有的公司严禁使用递归的原因,因为它会一直不停压栈。一般来说,jvm默认的大小基本已经够用,不需要再特别去设置。

回收器常用组合

Common collector composition
  如上表,第二和第三种组合使用最为广泛!

性能监控工具

  要想更进一步的分析jvm的运行情况,一款好的监控工具显得格外重要,幸运的是,jdk本身就自带了许多优秀的小工具,就在其bin目录下,如jps(相信大家熟知,打印jvm进程信息),jstat(查看运行时信息),jinfo(查看和修改虚拟机配置),jmap(生成dump文件)等等,当然最有名的当属jvisualvm,它几乎把jvm所有的工具命令整合并用图形化界面的方式为我们展现了出来,堪称业界良心!值得注意的是,这些小工具本身并不大,小的几十k,大的也不过几百k,实际上它们都只是一个壳子,真正的方法实现都封装进了tools.jar当中,有兴趣的朋友可以反编译看一下其中的源码实现。
  在使用jvisualvm之前,我们以tomcat为例,随便启动一个tomcat应用,便于一会去监控tomcat的进程信息。
Start tomcat
  如上图,我成功的启动了tomcat,并且使用jps命令查看到了tomcat的进程id为875(Bootstrap为tomcat的启动应用),然后我用jstat命令打印出了5行jvm进程在200ms内的运行时信息(感兴趣的朋友可以百度以下每列的具体含义),注意有一列为MC,jdk1.8之前是PC,MC代表Metaspace(本地内存),PC为永久代分配的内存,这也说明了jdk1.8已然废弃了永久代。
  回归正题,让我们运行jvisualvm(环境变量配置的没有问题直接敲命令即可成功启动),如下:
jvisualvm
  双击tomcat便可以监控其进程状态,如下:
tomcat monitoring
  这里可以载入dump文件,我们可以通过查看类的实例数来查看实例创建的个数,如果某个实例个数过多或占用内存过大那么可以考虑发生了内存泄漏(无效引用得不到及时释放造成内存空间浪费):
Here Insert Picture Description
  jvisualvm还可以安装插件帮助我们更加方便的去监控jvm,一个很受欢迎的插件便是visual gc,我们可以去访问https://visualvm.github.io/pluginscenters.html选择对应的版本下载,然后在工具——》插件菜单中进行安装。
  安装完成后重启,效果如下:
visual gc
  好了,常用的功能就是这些,远程连接请自行百度,教程很多,下面我们来看一个优化案例。

调优案例

  由于环境限制,在本地复现问题相对来说比较困难,所以在这里主要通过第一人称故事的方式来进行讲解。
  我们公司为客户做了一套数据利用系统供用户多维度查询数据并导出生成excel,下面是系统的具体配置:
  服务器:centos7一台
  内存:64G
  jdk版本:1.8
  web服务器:springboot内置tomcat
  客户规模:一千人左右
  讲道理,一千人使用,这种服务器配置可以说是非常奢侈了,所以感觉上是完全不会有问题的。但是后来客户却反应说偶尔系统在导出excel的时候会有较长时间的卡顿,最长的时候甚至长达半分钟才能有所响应,于是便迅速介入调查。
  首先考虑是不是sql有问题,但是卡顿的情况是偶尔发生,平时反应速度很快,所以暂时排除了这种可能。
  然后去询问运维人员有没有进行服务器维护相关的操作,得到的答案是否定的,这就让人陷入了困扰之中。
  没办法,这种问题只能从jvm上找原因了,于是使用了jvisualvm来进行系统监控,当我看到堆内存的时候吓了一跳,足足设置了40g,后来经过询问,原来部署程序的哥们不想浪费宝贵的服务器资源,所以故意把堆内存设置的很大。相信我们都有优化eclipse或idea的体验,默认的xmx比较小,当我们调大之后eclipse或idea的速度明显就快了很多,所以哥们想到了这一点,毫不留情的把系统堆内存设置到了40g。
  其实到这里问题已经比较明显了,大概率是进行Full GC的时候由于堆内存过大而需要耗费大量时间(stop the world)导致系统停顿时间过长,因此用户便无法获得及时响应,后来我通过输出gc日志发现果然是Full GC引起的问题,40g的内存,一次消耗半分钟也不足为奇。那么为什么会发生Full GC呢?原因其实很简单,客户在导出excel的时候生成的workbook对象需要封装大量的数据,所以属于大对象,会被直接分配到老生代,随着时间推移,大对象越积越多,老生代内存不够用时,自然要触发Full GC。值得一提的是,如果确认程序中不会有大对象的产生,那么可能对象都没有进入老生代的机会,这种场景下主要在新生代中进行垃圾回收(minor gc),速度是非常快的,系统就会“飞起来”,但一般不会有这种理想的情况。
  卡顿的原因定位了,优化的方法也显而易见,最简单的方法:调小堆内存即可!后来除了调小堆内存,还做了以下优化:
1.调小堆内存到4g
2.单机部署6个节点,使用nginx负载均衡,提高cpu的利用率
3.使用ParNew+CMS收集器组合(jdk默认为Parallel Scavenge+Parallel Old组合)
  优化过后,问题成功得到了解决。
  通过此案例也许会对你有所启发,很多人认为单机部署一个应用程序独占一台服务器是最好的选择,因为这样不会有其他程序抢占cpu资源,实际上这也不是绝对的,具体情况还需要具体分析。另外如果对jvm不够了解,不要贸然设置参数,否则可能会留下很大的坑,默认的往往不会有太大的问题。
  After all, jvm tuning is only an auxiliary means, in most cases are often the code-level problem, by optimizing the code, unauthorized use of design patterns, optimize the structure of the database, index, etc. still reasonable to make the system stable and running smoothly.

summary

  The content of this blog is to end here, hoping to inspire you, then I will introduce jvm class loader relevant knowledge and manually implement a simple hot deploy plug-ins. Thank you for watching, bye!

Published 26 original articles · won praise 99 · views 10000 +

Guess you like

Origin blog.csdn.net/m0_37719874/article/details/103801893
Recommended