ali的ons mq运行一段时间后消费下降并导致堆积问题查验

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_32447321/article/details/83474513

1:问题现象:

运行的instance一段时间(20h)就下降,重启之后消费正常然后又不行了;原以为是ons版本1.2.7改成laest1.7.7.final;没效果;经验之觉:肯定是代码没优化好:

处理流程一:单纯以为应该是gc没做好;有big Object ;./jmap发现了MsgContent;查project使用 ConcurrentHashMap<String ,MsgContent>一直add没有remove;so 添加remove并且就加上value = null;利于gc发现没太大效果;

public static ConcurrentHashMap<String ,MsgContent> map = new ConcurrentHashMap<String ,MsgContent>();
//遍历map中的value,然后查看value中的time值是不是超过了两分钟,是的话就删除掉对应的key  
public static void removeInvalidKey(ConcurrentHashMap<String,MsgContent> map){
        for (MsgContent value : map.values()) {
            if (System.currentTimeMillis()-value.getTime() > 2 * 60 * 1000) {
                MsgMatch.map.remove(value.getUid());
                value = null;//强制把对象设置null,check object被gc回收(System.gc())
            }
        }
    }
 num     #instances         #bytes  class name
----------------------------------------------
   1:        651850      208798320  [C
   2:        651267       15630408  java.lang.String
   3:         71571       10226008  <constMethodKlass>
   4:         71571        9172944  <methodKlass>
   5:          6020        6965584  <constantPoolKlass>
   6:         20793        5553840  [I
   7:        153195        4902240  java.util.HashMap$Entry
   8:         24879        4784448  [B
   9:        189633        4551192  java.util.concurrent.ConcurrentLinkedDeque$Node
  10:          6020        4496624  <instanceKlassKlass>
  11:          5076        4044384  <constantPoolCacheKlass>
  12:         78356        2507392  java.util.concurrent.ConcurrentHashMap$HashEntry
  13:         64274        2506768  com.xxx.xxx.access.mysql.entity.MsgContent

处理流程二:经过流程一;instance能正常跑(30h),还没找到病原体;没办法去找thread Stack快照:发现线程runable一个地方(这时jvm已经小露病源了)如图:

"ConsumeMessageThread_7" prio=10 tid=0x00007f6498008000 nid=0x43 runnable [0x00007f6558c82000]
   java.lang.Thread.State: RUNNABLE
	at java.util.concurrent.ConcurrentLinkedDeque.contains(ConcurrentLinkedDeque.java:1085)
	at com.xxx.xxxx.access.alimq.EvMsgRtListener.consume(EvMsgRtListener.java:169)
	at com.aliyun.openservices.ons.api.impl.rocketmq.ConsumerImpl$MessageListenerImpl.consumeMessage(ConsumerImpl.java:97)
	at com.aliyun.openservices.shade.com.alibaba.rocketmq.client.impl.consumer.ConsumeMessageConcurrentlyService$ConsumeRequest.run(ConsumeMessageConcurrentlyService.java:417)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
	- <0x000000070841ea38> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"ConsumeMessageThread_5" prio=10 tid=0x00007f6498004000 nid=0x42 runnable [0x00007f6558d83000]
   java.lang.Thread.State: RUNNABLE
	at java.util.concurrent.ConcurrentLinkedDeque.contains(ConcurrentLinkedDeque.java:1085)
	at com.xxx.xxxx.access.alimq.EvMsgRtListener.consume(EvMsgRtListener.java:169)
	at com.aliyun.openservices.ons.api.impl.rocketmq.ConsumerImpl$MessageListenerImpl.consumeMessage(ConsumerImpl.java:97)
	at com.aliyun.openservices.shade.com.alibaba.rocketmq.client.impl.consumer.ConsumeMessageConcurrentlyService$ConsumeRequest.run(ConsumeMessageConcurrentlyService.java:417)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
	- <0x000000070841f708> (a java.util.concurrent.ThreadPoolExecutor$Worker)

 代码此处:

 此处的queue是一个定时任务;涉及到遍历及remove key操作,因为ConcurrentLinkedDeque此处操作会严重拖耗性能,每一次重构需要重新排序;详细参考JAVA集合框架中的常用集合及其特点、适用场景、实现原理简介

 

 此时问题发现注释解决:总结一下:之前多次遇到过同样场景:运行一段时间cpu飙升;消费能力下降;:也是涉及到远程调用http SocketTimeout(5000)  ---》5000修改为1s;缩短时间,避免长时间进行响应阻塞,thread运行

CloseableHttpClient httpclient = HttpClients.createDefault();
		HttpPost http = new HttpPost(url);

		/**
		 * setConnectTimeout:设置连接超时时间,单位毫秒。
		 * setConnectionRequestTimeout:设置从connect Manager获取Connection 超时时间,单位毫秒
		 * setSocketTimeout:请求获取数据的超时时间,单位毫秒。 如果访问一个接口,多少时间内无法返回数据,就直接放弃此次调用
		 */
		RequestConfig requestConfig = RequestConfig.custom().setConnectTimeout(5000).setConnectionRequestTimeout(1000)
				.setSocketTimeout(5000).build();
		http.setConfig(requestConfig);
		HttpEntity inEntity = EntityBuilder.create().setText(json).setContentType(ContentType.APPLICATION_JSON).build();
		http.setEntity(inEntity);
		CloseableHttpResponse response = httpclient.execute(http);

ps:提到这缓存;设计缓存要清楚各个组件性能及优缺点:

简单一点用hashMap;上文就提到清理无效的数据时;如何彻底gc防止数据过多导致溢出;一个好的替代方案是weakHashMap;是使用弱引用维护一张哈希表;but 作为专业缓存,功能上略有不足;详见:WeakHashMap和HashMap的区别;更详细的:话说ReferenceQueue

猜你喜欢

转载自blog.csdn.net/qq_32447321/article/details/83474513