SpringBoot online service suspended animation, CPU memory is normal, what's the situation?


background

The development partners all know that online services hang up, basically because of insufficient CPU or memory, frequent GC OOM and other situations. This article distinguishes the above situations and brings different service hangups to friends.

Remember the scheming 0 in the Bilibili 713 accident?

6142db9e10d43f6c1d161d4345152016.jpeg

picture

Yes, it is this 0, which has nothing to do with this accident, but it is deeply inspired.

Troubleshooting

The old rule is that several nodes of the same service do not respond in a cluster environment. If it is not resolved in time, it may form an avalanche effect.

Check the service log first to see if there is an error, and check the service cpu and memory status politely and habitually. Review first, if there is no error reported in the service. If the cpu or memory is abnormal, follow the steps below to troubleshoot.

Routine investigation

1. View the thread status in the service process

top -H -p pid

ps -mp pid -o THREAD,tid,time

2. View the hexadecimal system exception thread

printf “%x\n” nid

3. View the exception thread stack information

jstack pid | grep number

View the top 100 objects occupying the largest memory

jmap -histo pid|head -100

export to file

jstack -l PID >> a.log

Or dump information using the tool Mat or JProfiler to view

jmap -dump:live,format=b,file=/dump.bin pid

After the above operation, it is enough to solve this kind of routine error, which is usually caused by various recursive loops or slow database queries.

Mat uses

In MAT, there are two size representations:

  • Shallow Size: Indicates the memory size occupied by the object itself, excluding the object it refers to.
  • Retained size: the current object memory size + the size of the object directly or indirectly referenced by the current object, the total sum, simply understood, is the total memory size that can be released after the current object is GC.

Histogram view

Use Class Name as the dimension to display the number of objects of each class. It defaults to byte as the unit,

ccc2f4166227ebae5fe49d6b9d05b928.jpeg

picture

To display the unit, click Window->Preferences to select the last item, click Apply and Close

Then reopen the Histogram view, it will take effect.

a78e996b109a5af5f4fcb40bf0685cbd.jpeg

picture

Leak Suspects

The report shows a pie chart very intuitively, and the dark part in the picture indicates that there may be suspicion of memory leaks.

Through this indicator, you can quickly locate the line of code in which class method where the memory leak occurs.

This troubleshooting

1. Information collection and analysis

Because the service health monitoring does not respond, the cpu and memory are normal, check the stack information directly to see what the threads are doing

jstack -l PID >> a.log

In the output of Jstack, the Java thread status is mainly the following:

  • RUNNABLE thread running or I/O waiting
  • BLOCKED thread is waiting for monitor lock (synchronized keyword)
  • TIMED_WAITING The thread is waiting to wake up, but a time limit is set
  • WAITING thread is waiting infinitely to wake up

It is found that all are WAITING threads.

"http-nio-8888-exec-6666" #8833 daemon prio=5 os_prio=0 tid=0x00001f2f0016e100 nid=0x667d waiting on condition [0x00002f1de3c5200]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to  wait  for  <0x00000007156a29c8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer $ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer $ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at com.alibaba.druid.pool.DruidDataSource.takeLast(DruidDataSource.java:1897)
at&nbsp;com.alibaba.druid.pool.DruidDataSource.getConnectionInternal(DruidDataSource.java:1458)
at&nbsp;com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:1253)
at&nbsp;com.alibaba.druid.filter.FilterChainImpl.dataSource_connect(FilterChainImpl.java:4619)
at&nbsp;com.alibaba.druid.filter.stat.StatFilter.dataSource_getConnection(StatFilter.java:680)
at&nbsp;com.alibaba.druid.filter.FilterChainImpl.dataSource_connect(FilterChainImpl.java:4615)
at&nbsp;com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1231)
at&nbsp;com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1223)
at&nbsp;com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:90)
at&nbsp;com.baomidou.dynamic.datasource.ds.ItemDataSource.getConnection(ItemDataSource.java:56)
at&nbsp;com.baomidou.dynamic.datasource.ds.AbstractRoutingDataSource.getConnection(AbstractRoutingDataSource.java:48)
at&nbsp;org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)
at&nbsp;org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)
at&nbsp;org.mybatis.spring.transaction.SpringManagedTransaction.openConnection(SpringManagedTransaction.java:82)
at&nbsp;org.mybatis.spring.transaction.SpringManagedTransaction.getConnection(SpringManagedTransaction.java:68)
at&nbsp;org.apache.ibatis.executor.BaseExecutor.getConnection(BaseExecutor.java:336)
at&nbsp;org.apache.ibatis.executor.SimpleExecutor.prepareStatement(SimpleExecutor.java:84)
at&nbsp;org.apache.ibatis.executor.SimpleExecutor.doQuery(SimpleExecutor.java:62)
at&nbsp;org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:324)
at&nbsp;org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:156)
at&nbsp;org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:109)
at&nbsp;com.github.pagehelper.PageInterceptor.intercept(PageInterceptor.java:143)
at&nbsp;org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:61)
at&nbsp;com.sun.proxy. $Proxy571.query(Unknown&nbsp;Source)

2. Locate key information and track source code

&nbsp;&nbsp;at&nbsp;java.util.concurrent.locks.AbstractQueuedSynchronizer $ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
&nbsp;&nbsp;at&nbsp;com.alibaba.druid.pool.DruidDataSource.takeLast(DruidDataSource.java:1897)
DruidConnectionHolder&nbsp;takeLast()&nbsp;throws&nbsp;InterruptedException,&nbsp;SQLException&nbsp;{
try&nbsp;{
while&nbsp;(poolingCount&nbsp;==&nbsp;0)&nbsp;{
emptySignal();&nbsp;//&nbsp;send&nbsp;signal&nbsp;to&nbsp;CreateThread&nbsp;create&nbsp;connection

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if&nbsp;(failFast&nbsp;&&&nbsp;isFailContinuous())&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;throw&nbsp;new&nbsp;DataSourceNotAvailableException(createError);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;notEmptyWaitThreadCount++;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if&nbsp;(notEmptyWaitThreadCount&nbsp;>&nbsp;notEmptyWaitThreadPeak)&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;notEmptyWaitThreadPeak&nbsp;=&nbsp;notEmptyWaitThreadCount;
         .
nbsp ;   try {
                   //  The connection to the database has not been released and is occupied, and there is no connection available in the connection pool, resulting in the request being
blocked ;   notEmpty. await(); // signal by recycle or creator
              } finally {
                  notEmptyWaitThreadCount--; i=5>                  notEmptyWaitThreadCount--; i=5>                  notEmptyWaitThreadCount--;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;notEmptyWaitCount++;

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if&nbsp;(! enable)&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;connectErrorCountUpdater.incrementAndGet(this);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;throw&nbsp;new&nbsp;DataSourceDisableException();
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;catch&nbsp;(InterruptedException&nbsp;ie)&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;notEmpty.signal();&nbsp;//&nbsp;propagate&nbsp;to&nbsp;non-interrupted&nbsp;thread
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;notEmptySignalCount++;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;throw&nbsp;ie;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;decrementPoolingCount();
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;DruidConnectionHolder&nbsp;last&nbsp;=&nbsp;connections[poolingCount];
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;connections[poolingCount]&nbsp;=&nbsp;null;

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return&nbsp;last;
}

Locate the problem code in combination with the log error report. Due to an error reporting that the available connection is not released normally, the await has been stuck.

The problem code is as follows:

try&nbsp;{
&nbsp;&nbsp;SqlSession&nbsp;sqlSession&nbsp;=&nbsp;sqlSessionFactory.openSession(ExecutorType.BATCH);
&nbsp;&nbsp;TestMapper&nbsp;mapper&nbsp;=&nbsp;sqlSession.getMapper(TestMapper.class);
&nbsp;&nbsp;mapper.insetList(list);
&nbsp;&nbsp;sqlSession.flushStatements();
}&nbsp;catch&nbsp;(Exception&nbsp;e)&nbsp;{
&nbsp;&nbsp;&nbsp;e.printStackTrace();
}

Problem recurrence

Reproduce in the multi-active environment according to the above information. The monitoring check does not respond because the threads are full and waiting.

The tomcat thread is full:

77b28c4f96869192429e191473605da9.jpeg

picture

Tomcat default parameters:

The maximum number of worker threads, the default is 200.

server.tomcat.max-threads=200

The maximum number of connections is 10000 by default

server.tomcat.max-connections=10000

Waiting queue length, default 100.

server.tomcat.accept-count=100

Minimum number of working idle threads, default 10.

server.tomcat.min-spare-threads=100

The default parameters of the Druid connection pool are as follows:

ac5422cd2d9d20f1cfc29e638b52feb1.jpeg

picture

The configuration parameters of the Druid connection pool are as follows:

4cafb048f022dfcae1e1edacd6308cbf.jpeg

3b12a6389278722eb40870129b4c3dc5.jpeg

solve

1. Druid connection pool configuration timeout parameters

spring:&nbsp;
&nbsp;&nbsp;redis:
&nbsp;&nbsp;&nbsp;&nbsp;host:&nbsp;localhost
&nbsp;&nbsp;&nbsp;&nbsp;port:&nbsp;6379
&nbsp;&nbsp;&nbsp;&nbsp;password:&nbsp;
&nbsp;&nbsp;datasource:
&nbsp;&nbsp;&nbsp;&nbsp;druid:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; stat-view-servlet:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;enabled:&nbsp; true
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;loginUsername:&nbsp;admin
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;loginPassword:&nbsp;123456
&nbsp;&nbsp;&nbsp;&nbsp;dynamic:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;druid:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;initial-size:&nbsp;5
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;min-idle:&nbsp;5
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;maxActive:&nbsp;20
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;maxWait:&nbsp;60000
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;timeBetweenEvictionRunsMillis:&nbsp;60000
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;minEvictableIdleTimeMillis:&nbsp;300000
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;validationQuery:&nbsp;SELECT&nbsp;1&nbsp;FROM&nbsp;DUAL
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;testWhileIdle:&nbsp; true
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;testOnBorrow:&nbsp; false
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;testOnReturn:&nbsp; false
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;poolPreparedStatements:&nbsp; true
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;maxPoolPreparedStatementPerConnectionSize:&nbsp;20
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;filters:&nbsp; stat,slf4j,wall
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;connectionProperties:&nbsp;druid.stat.mergeSql\= true;druid.stat.slowSqlMillis\=5000

2. Abnormally close the connection in time

sqlSession.close();

Source: blog.csdn.net/zhangcongyi420/article/details/131139599

End


Guess you like

Origin blog.csdn.net/zhaomengsen/article/details/131364136