求助!Java gc问题分析

1、简介

     最近负责的一个平台项目,主要功能是使用模型打分,最后排序返回,最近响应时间经常出问题,有些机器在早上会出现响应时间变长,没有查到原因,现在将整体现象和部分逻辑功能描述一下,希望各路大神帮忙分析。

     平台现状:

                     机器配置:5台机器,机器物理内存128g,jvm配置 32g,新生代7.5g,gc使用cms

Java HotSpot(TM) 64-Bit Server VM (25.102-b14) for linux-amd64 JRE (1.8.0_102-b14), built on Jun 22 2016 18:43:17 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 132244088k(35649492k free), swap 33554428k(33529800k free)

          jvm部分参数配置:

-XX:+AlwaysPreTouch -XX:AutoBoxCacheMax=20000 -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSParallelInitialMarkEnabled -XX:ErrorFile=/apps/logs/osp/hs_err_%p.log -XX:+ExplicitGCInvokesConcurrent -XX:+GCLockerInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/apps/logs/osp/ -XX:InitialHeapSize=34359738368 -XX:InitialTenuringThreshold=4 -XX:+ManagementServer -XX:MaxDirectMemorySize=2147483648 -XX:MaxHeapSize=34359738368 -XX:MaxMetaspaceSize=536870912 -XX:MaxTenuringThreshold=4 -XX:MetaspaceSize=536870912 -XX:NewRatio=3 -XX:OldPLABSize=16 -XX:-OmitStackTraceInFastThrow -XX:ParGCCardsPerStrideChunk=1024 -XX:+ParallelRefProcEnabled -XX:+PerfDisableSharedMem -XX:+PrintCommandLineFlags -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:ReservedCodeCacheSize=251658240 -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:-UseBiasedLocking -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseConcMarkSweepGC -XX:-UseCounterDecay -XX:+UseGCTaskAffinity -XX:+UseParNewGC 

         业务:

                  平台请求峰值在50/s,峰值请求时段ygc平均25s一次,加载的模型文件存储在堆内hashmap,存储空间大约占用15g(模型文件本身很大,最大1.1g),每天晚上定时更新 模型文件,最大的模型文件大约在凌晨4点更新,更新这儿因为之前业务会将模型文件县加载到ArrayList,然后再封装成hashmap,相当于模型文件会存储两份,等到加载完会清除掉ArrayList,                 只保留hashmap。

问题描述:

           五台机器不固定出现响应慢的情况,今天可能是1、2,重启后可能是2、3

           每天凌晨4点更新大模型文件时,会触发cms gc,因为模型太大,导致concurrent mode failure,这时候会继续触发cms,经历两三次concurrent mode failure,最后cms成功回收掉内存,但是接着以后有问题的机器的ygc 变了,原来eden区百分之百,新生代89%就回收

ParNew: 6728007K->16851K(7549760K), 0.0127716 secs]

           之后变成新生代100% 才回收:

ParNew: 7549759K->22038K(7549760K), 0.0131831 secs

没有问题的机器,ygc回收依然跟之前一样。这中间 不正常的机器是通过 cms 回收掉内存,正常的机器在最后触发了一次 fullgc 回收掉内存。

接着分析,发现服务响应时间变长是在更新完大模型文件后,最开始响应时间有10ms变成40ms,慢慢的变成1s甚至3s,而且不是 随机的,是在每隔一段时间,后来发现每次慢响应结束的时间跟 ygc结束的时间大致一致,响应开始慢都是在ygc前几秒,但是 ygc 清理时间没有问题

以10:00:15 的ygc 为例

[GC (Allocation Failure) 2018-08-07T10:00:15.745+0800: 58657.032: [ParNew: 7549759K->41540K(7549760K), 0.0140605 secs] 24027593K->16519445K(32715584K), 0.0142925 secs] [Times: user=0.20 sys=0.00, real=0.02 secs] 

对应的响应时间:

[2018-08-07 10:00:15.793] [thread-165]  [-8288830327696948464 10.200.52.121 10.200.52.121:40412 10.200.43.89:8080 43910 *.rerank()_1.0.0 200 0 28669 28329 0ms 0ms 1ms 27ms 67ms]
[2018-08-07 10:00:15.794] [thread-96]  [-8357411984988133029 10.214.11.166 10.214.11.71:57478 10.200.43.89:8080 35074 *.rerank()_1.0.0 200 0 28713 28357 2ms 0ms 0ms 2079ms 2081ms]
[2018-08-07 10:00:15.795] [thread-115]  [-9109509378261846079 10.208.40.79 10.208.40.79:48106 10.200.43.89:8080 23738 *.rerank()_1.0.0 200 0 28631 28330 3ms 0ms 0ms 89ms 236ms]
[2018-08-07 10:00:15.796] [thread-145]  [-8323442836846079900 10.200.52.117 10.200.52.117:44302 10.200.43.89:8080 43418 *.rerank()_1.0.0 200 0 28675 28330 0ms 0ms 0ms 34ms 120ms]
[2018-08-07 10:00:15.797] [thread-160]  [-8357867963540901220 10.200.52.121 10.200.52.121:39612 10.200.43.89:8080 44120 *.rerank()_1.0.0 200 0 40556 40233 1ms 0ms 0ms 36ms 89ms]
[2018-08-07 10:00:15.799] [thread-68]  [-8292392222823446982 10.200.52.121 10.200.52.121:39612 10.200.43.89:8080 44113 *.rerank()_1.0.0 200 0 56546 56231 0ms 0ms 0ms 2437ms 2661ms]
[2018-08-07 10:00:15.800] [thread-184]  [-8305021004930562527 10.200.52.119 10.200.52.119:19262 10.200.43.89:8080 9622 *.rerank()_1.0.0 200 0 28646 28330 5ms 0ms 1ms 187ms 238ms]
[2018-08-07 10:00:15.801] [thread-174]  [-8338195371701998057 10.200.52.125 10.200.52.125:10554 10.200.43.89:8080 43726 *.rerank()_1.0.0 200 0 2517 2195 0ms 0ms 0ms 3ms 4ms]
[2018-08-07 10:00:15.801] [thread-158]  [-8295171650903359196 10.208.40.84 10.208.40.84:57058 10.200.43.89:8080 4841 *.rerank()_1.0.0 200 0 56512 56231 0ms 0ms 0ms 2537ms 2768ms]
[2018-08-07 10:00:15.801] [thread-46]  [-9129656671070767850 10.200.52.126 10.200.52.126:42266 10.200.43.89:8080 44160 *.rerank()_1.0.0 200 0 54028 53719 2ms 0ms 0ms 250ms 584ms]
[2018-08-07 10:00:15.802] [thread-16]  [-8346462911910830241 10.208.40.76 10.208.40.76:63356 10.200.43.89:8080 23666 *.rerank()_1.0.0 200 0 56561 56229 0ms 0ms 3ms 39ms 77ms]
[2018-08-07 10:00:15.807] [thread-140]  [-8296785686930875249 10.200.52.117 10.200.52.117:45540 10.200.43.89:8080 44073 *.rerank()_1.0.0 200 0 56557 56230 6ms 0ms 0ms 46ms 134ms]
[2018-08-07 10:00:15.809] [thread-62]  [-8330646766934052660 10.208.40.82 10.208.40.82:28314 10.200.43.89:8080 23539 *.rerank()_1.0.0 200 0 42622 42279 2ms 0ms 2ms 31ms 98ms]
[2018-08-07 10:00:15.816] [thread-9]  [-8295019583619171506 10.200.52.117 10.200.52.117:45540 10.200.43.89:8080 44072 *.rerank()_1.0.0 200 0 42626 42280 2ms 0ms 1ms 55ms 218ms]
[2018-08-07 10:00:15.827] [thread-178]  [-8305067902129936638 10.208.40.83 10.208.40.83:15934 10.200.43.89:8080 23693 *.rerank()_1.0.0 200 0 28653 28329 0ms 0ms 0ms 20ms 20ms]
[2018-08-07 10:00:15.872] [thread-122]  [-8341096053262505245 10.214.1.6 10.214.1.37:58726 10.200.43.89:8080 35194 *.rerank()_1.0.0 200 0 7657 7336 0ms 0ms 0ms 7ms 7ms]
[2018-08-07 10:00:15.889] [thread-34]  [-9100444074134353504 10.214.27.245 10.214.27.122:37014 10.200.43.89:8080 34898 *.rerank()_1.0.0 200 0 910 732 0ms 0ms 0ms 1ms 1ms]
[2018-08-07 10:00:15.921] [thread-50]  [-8294018645016351829 10.208.40.81 10.208.40.81:22062 10.200.43.89:8080 23516 *.rerank()_1.0.0 200 0 28652 28329 1ms 0ms 0ms 14ms 15ms]
[2018-08-07 10:00:15.957] [thread-119]  [-9132118080387802790 10.200.52.122 10.200.52.122:61174 10.200.43.89:8080 43952 *.rerank()_1.0.0 200 0 3008 2660 0ms 0ms 0ms 3ms 4ms]
[2018-08-07 10:00:15.967] [thread-167]  [-8290516706153420296 10.200.52.122 10.200.52.122:61174 10.200.43.89:8080 43951 *.rerank()_1.0.0 200 0 42595 42279 0ms 0ms 0ms 29ms 30ms]
[2018-08-07 10:00:15.976] [thread-21]  [-8357788417187591931 10.208.40.84 10.208.40.84:57058 10.200.43.89:8080 4843 *.rerank()_1.0.0 200 0 28678 28328 0ms 0ms 0ms 14ms 14ms]
[2018-08-07 10:00:16.047] [thread-1]  [-8301189045126531877 10.214.27.245 10.214.27.122:43674 10.200.43.89:8080 34860 *.rerank()_1.0.0 200 0 28693 28355 0ms 0ms 1ms 11ms 13ms]
[2018-08-07 10:00:16.061] [thread-124]  [-8289545017613729362 10.200.52.119 10.200.52.119:19262 10.200.43.89:8080 9623 *.rerank()_1.0.0 200 0 42596 42278 0ms 0ms 0ms 15ms 17ms]
[2018-08-07 10:00:16.083] [thread-78]  [-8331749461723576302 10.200.52.122 10.200.52.122:61174 10.200.43.89:8080 43953 *.rerank()_1.0.0 200 0 28652 28327 0ms 0ms 0ms 9ms 9ms]
[2018-08-07 10:00:16.089] [thread-97]  [-8355833413237133244 10.200.52.119 10.200.52.119:65224 10.200.43.89:8080 44185 *.rerank()_1.0.0 200 0 4775 4427 0ms 0ms 0ms 7ms 8ms]
[2018-08-07 10:00:16.132] [thread-84]  [-8298776592520918742 10.214.11.167 10.214.11.22:45330 10.200.43.89:8080 35016 *.rerank()_1.0.0 200 0 28671 28355 0ms 0ms 0ms 22ms 23ms]
[2018-08-07 10:00:16.140] [thread-112]  [-8355451418324574725 10.214.11.166 10.214.11.71:52898 10.200.43.89:8080 35695 *.rerank()_1.0.0 200 0 13803 13471 0ms 0ms 0ms 14ms 14ms]
[2018-08-07 10:00:16.176] [thread-181]  [-8317321704782385856 10.208.40.83 10.208.40.83:14200 10.200.43.89:8080 23768 *.rerank()_1.0.0 200 0 37209 36884 0ms 0ms 0ms 18ms 18ms]
[2018-08-07 10:00:16.183] [thread-134]  [-8329585309970522186 10.214.25.50 10.214.25.176:46984 10.200.43.89:8080 582 *.rerank()_1.0.0 200 0 2539 2232 0ms 0ms 0ms 5ms 5ms]

对应 ygc 时间 0.0142925 secs

同时,使用 jstat jcutil 一秒监控一次的数据:

 非正常机器(Eden区满了之后需要四秒才做回收):

S		S		E		O		M		CCS	YGC		YGCT	FGC	FGCT	GCT
3.51	3.51	55.88	65.56	16.79	0	1520	81.162	14	331.849	413.01099999999997	
3.51	3.51	59.35	65.56	16.79	0	1520	81.162	14	331.849	413.01099999999997	
3.51	3.51	64.84	65.56	16.79	0	1520	81.162	14	331.849	413.01099999999997	
3.51	3.51	68.40	65.56	16.79	0	1520	81.162	14	331.849	413.01099999999997	
3.51	3.51	72.20	65.56	16.79	0	1520	81.162	14	331.849	413.01099999999997	
3.51	3.51	74.82	65.56	16.79	0	1520	81.162	14	331.849	413.01099999999997	
3.51	3.51	78.40	65.56	16.79	0	1520	81.162	14	331.849	413.01099999999997	
3.51	3.51	83.64	65.56	16.79	0	1520	81.162	14	331.849	413.01099999999997	
3.51	3.51	87.87	65.56	16.79	0	1520	81.162	14	331.849	413.01099999999997	
3.51	3.51	91.42	65.56	16.79	0	1520	81.162	14	331.849	413.01099999999997	
3.51	3.51	96.51	65.56	16.79	0	1520	81.162	14	331.849	413.01099999999997	
3.51	3.51	99.66	65.56	16.79	0	1520	81.162	14	331.849	413.01099999999997	
21.72	21.85	100.00	65.56	16.79	0	1520	81.162	14	331.849	413.01099999999997	
38.89	39.05	100.00	65.56	16.79	0	1520	81.162	14	331.849	413.01099999999997	
57.46	57.68	100.00	65.56	16.79	0	1520	81.162	14	331.849	413.01099999999997	
78.06	78.38	100.00	65.56	16.79	0	1520	81.162	14	331.849	413.01099999999997	
4.06	4.06	3.11	65.56	16.79	0	1521	81.176	14	331.849	413.025	
4.06	4.06	7.98	65.56	16.79	0	1521	81.176	14	331.849	413.025	
4.06	4.06	11.77	65.56	16.79	0	1521	81.176	14	331.849	413.025	
4.06	4.06	15.31	65.56	16.79	0	1521	81.176	14	331.849	413.025	
4.06	4.06	18.46	65.56	16.79	0	1521	81.176	14	331.849	413.025	
4.06	4.06	22.79	65.56	16.79	0	1521	81.176	14	331.849	413.025	
4.06	4.06	26.79	65.56	16.79	0	1521	81.176	14	331.849	413.025	
4.06	4.06	31.30	65.56	16.79	0	1521	81.176	14	331.849	413.025	
4.06	4.06	34.87	65.56	16.79	0	1521	81.176	14	331.849	413.025	
4.06	4.06	38.15	65.56	16.79	0	1521	81.176	14	331.849	413.025	
4.06	4.06	41.82	65.56	16.79	0	1521	81.176	14	331.849	413.025	
4.06	4.06	45.41	65.56	16.79	0	1521	81.176	14	331.849	413.025	

正常机器(Eden区满了立马回收):

S		S		E		O		M		CCS	YGC		YGCT	FGC	FGCT	GCT
0.34	0.34	70.50	65.49	16.79	0	1552	70.31	12	384.598	454.908	
0.34	0.34	72.48	65.49	16.79	0	1552	70.31	12	384.598	454.908	
0.34	0.34	75.19	65.49	16.79	0	1552	70.31	12	384.598	454.908	
0.34	0.34	79.18	65.49	16.79	0	1552	70.31	12	384.598	454.908	
0.34	0.34	83.25	65.49	16.79	0	1552	70.31	12	384.598	454.908	
0.34	0.34	86.90	65.49	16.79	0	1552	70.31	12	384.598	454.908	
0.34	0.34	90.91	65.49	16.79	0	1552	70.31	12	384.598	454.908	
0.34	0.34	94.94	65.49	16.79	0	1552	70.31	12	384.598	454.908	
0.34	0.34	98.14	65.49	16.79	0	1552	70.31	12	384.598	454.908	
0.82	0.82	1.42	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	4.85	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	8.16	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	12.08	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	16.31	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	21.45	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	26.60	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	29.62	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	33.99	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	37.71	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	41.35	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	46.32	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	49.87	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	53.91	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	58.62	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	61.42	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	66.22	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	70.84	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	73.24	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	77.48	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	81.86	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	86.28	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	90.45	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	94.98	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.82	0.82	98.17	65.49	16.79	0	1553	70.322	12	384.598	454.92	
0.61	0.61	1.25	65.49	16.79	0	1554	70.334	12	384.598	454.932	
0.61	0.61	4.43	65.49	16.79	0	1554	70.334	12	384.598	454.932	
0.61	0.61	8.14	65.49	16.79	0	1554	70.334	12	384.598	454.932	
0.61	0.61	11.87	65.49	16.79	0	1554	70.334	12	384.598	454.932	
0.61	0.61	16.30	65.49	16.79	0	1554	70.334	12	384.598	454.932	

机器 load:

具体问题就是 上面描述的,请各路大神帮忙分析,附上 gc日志:

链接: https://pan.baidu.com/s/1Nd1EE7F91vZl7VisOYxv0Q 密码: 8v95

猜你喜欢

转载自blog.csdn.net/LVZHULIU/article/details/81586389