记一次线上请求缓慢问题处理

9月的某天晚上,接到通知,线上环境请求非常慢,页面打开要1分钟,遂按照系统参数,数据库,应用,web服务器状态逐步排查的步骤,最终定位到问题并得以解决,此处做个检查流程梳理,供大家参考。

问题描述:正式环境请求响应异常慢,页面打开需要1分钟左右时间等。

第一步 系统参数检查 
1 df -h 检查硬盘空间 


重点看 Use这一栏,可以看到空间足够,硬盘空间ok

2 free 检查内存使用情况 


对比total和used两栏,或者直接看free这一栏,默认是单位是KB 
free [-b -k -m] [-o] [-s delay] [-t] [-V] 
-b -k -m:分别以字节(KB、MB)为单位显示内存使用情况 
可以发现内存也是ok 
3 top命令检查线程的cpu负载情况 


重点看CPU这一栏,我这个图是后来补的图,现场的图忘记保存了,当时发现有个java线程的CPU的使用率超过了100%,此处可发现有问题 
4 通过jstack 命令查看线程内存信息 
jstack pid

root@172_16_1_214 webapps]# jstack 9495
9495: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding

root@172_16_1_214 logs]# tail -n 800 catalina.out 
    at java.lang.Thread.run(Thread.java:745)

"updatePromotionStatusTask-8-exe0" daemon prio=10 tid=0x00000000017cb000 nid=0x257b runnable [0x00007f9475ca8000]
   java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:152)
    at java.net.SocketInputStream.read(SocketInputStream.java:122)
    at java.net.SocketInputStream.read(SocketInputStream.java:108)
    at redis.clients.util.RedisInputStream.ensureFill(RedisInputStream.java:196)
    at redis.clients.util.RedisInputStream.readByte(RedisInputStream.java:40)
    at redis.clients.jedis.Protocol.process(Protocol.java:151)
    at redis.clients.jedis.Protocol.read(Protocol.java:215)
    at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:340)
    at redis.clients.jedis.Connection.getIntegerReply(Connection.java:265)
    at redis.clients.jedis.Jedis.hexists(Jedis.java:779)
    at cn.htd.common.dao.util.RedisDB.getHash(RedisDB.java:255)
    at cn.htd.common.util.DictionaryUtils.getDictionaryOptList(DictionaryUtils.java:250)
    at cn.htd.common.util.DictionaryUtils.getValueByCode(DictionaryUtils.java:207)
    at cn.htd.promotion.cpc.biz.task.UpdatePromotionStatusTask.execute(UpdatePromotionStatusTask.java:143)
    at cn.htd.promotion.cpc.biz.task.UpdatePromotionStatusTask.execute(UpdatePromotionStatusTask.java:32)
    at cn.htd.promotion.cpc.biz.task.UpdatePromotionStatusTask$$FastClassBySpringCGLIB$$b13b01ff.invoke(<generated>)
    at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
    at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:720)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
    at org.springframework.aop.framework.adapter.AfterReturningAdviceInterceptor.invoke(AfterReturningAdviceInterceptor.java:52)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
    at cn.htd.common.interceptor.MethodAccessLogInterceptor.invoke(MethodAccessLogInterceptor.java:48)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
    at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
    at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:655)
    at cn.htd.promotion.cpc.biz.task.UpdatePromotionStatusTask$$EnhancerBySpringCGLIB$$688d5abb.execute(<generated>)
    at com.taobao.pamirs.schedule.taskmanager.TBScheduleProcessorSleep.run(TBScheduleProcessorSleep.java:239)
    at java.lang.Thread.run(Thread.java:745)

"DubboServerHandler-172.16.1.214:28098-thread-25" daemon prio=10 tid=0x00007f9408032000 nid=0x257a waiting on condition [0x00007f9475daa000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000c5fefe50> (a java.util.concurrent.SynchronousQueue$TransferStack)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458)
    at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
    at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:925)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

"DubboServerHandler-172.16.1.214:28098-thread-24" daemon prio=10 tid=0x00007f9408030000 nid=0x2577 waiting on condition [0x00007f9475eab000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000c5fefe50> (a java.util.concurrent.SynchronousQueue$TransferStack)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458)
    at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
    at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:925)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

"cleanPromotionBargainScheduleTask-3-exe4" daemon prio=10 tid=0x00007f948002c800 nid=0x2576 in Object.wait() [0x00007f9475fac000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:503)
    at com.taobao.pamirs.schedule.taskmanager.LockObject.waitCurrentThread(LockObject.java:14)
    - locked <0x00000000c60fabf0> (a java.lang.Object)
    at com.taobao.pamirs.schedule.taskmanager.TBScheduleProcessorSleep.run(TBScheduleProcessorSleep.java:296)
    at java.lang.Thread.run(Thread.java:745)
JNI global references: 214

Heap
 PSYoungGen      total 278528K, used 253517K [0x00000000ec000000, 0x0000000100000000, 0x0000000100000000)
  eden space 253952K, 99% used [0x00000000ec000000,0x00000000fb7953a0,0x00000000fb800000)
  from space 24576K, 0% used [0x00000000fb800000,0x00000000fb800000,0x00000000fd000000)
  to   space 38400K, 0% used [0x00000000fda80000,0x00000000fda80000,0x0000000100000000)
 ParOldGen       total 655360K, used 655313K [0x00000000c4000000, 0x00000000ec000000, 0x00000000ec000000)
  object space 655360K, 99% used [0x00000000c4000000,0x00000000ebff47c8,0x00000000ec000000)
 PSPermGen       total 51200K, used 50805K [0x00000000bee00000, 0x00000000c2000000, 0x00000000c4000000)
  object space 51200K, 99% used [0x00000000bee00000,0x00000000c1f9d6e0,0x00000000c2000000)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
看最后几行,可以看到PSYoungGen ,ParOldGen,PSPermGen都已达到99%的使用率,由此可推断应用中存在线程阻塞,那么阻塞在什么地方呢,仔细查看上面的日志信息,可以看到 
t redis.clients.jedis.Jedis.hexists(Jedis.java:779) 
at cn.htd.common.dao.util.RedisDB.getHash(RedisDB.java:255) 
at cn.htd.common.util.DictionaryUtils.getDictionaryOptList(DictionaryUtils.java:250) 
这两行代码可以看到是redis获取数据时出了问题,那么是不是redis有问题呢,后证实确实是redis出了问题。 
我们首选会推测redis服务是不是挂了,赶紧通过本地连线上redis,发现可以连通,也可以正常查看key信息。 
第二步 检查redis服务 
1 先看redis服务器负载,发现CPU居高不下 
2 连上redis服务,通过keys * 查看存储情况,发现多出来7w+的促销券的key,且还在增加

由此,推断出此时由于redis在执行大量的插入操作,而redis是单线程的,我们出异常的应用刚好有个频繁的定时任务要读redis,reids服务器正忙,定时任务执行缓慢,不断堆积,从而挤爆了内存空间,导致应用响应缓慢。

解决方案: 
1 临时方案:kill促销发券进程,重启出问题的应用 
2 迁移促销发券到单独的redis
--------------------- 

转载:https://blog.csdn.net/mrzhangxl/article/details/77864497

猜你喜欢

转载自blog.csdn.net/yjh314/article/details/83539811