background
The development partners all know that online services hang up, basically because of insufficient CPU or memory, frequent GC OOM and other situations. This article distinguishes the above situations and brings different service hangups to friends.
Remember the scheming 0 in the Bilibili 713 accident?
pictureYes, it is this 0, which has nothing to do with this accident, but it is deeply inspired.
Troubleshooting
The old rule is that several nodes of the same service do not respond in a cluster environment. If it is not resolved in time, it may form an avalanche effect.
Check the service log first to see if there is an error, and check the service cpu and memory status politely and habitually. Review first, if there is no error reported in the service. If the cpu or memory is abnormal, follow the steps below to troubleshoot.
Routine investigation
1. View the thread status in the service process
top -H -p pid或
ps -mp pid -o THREAD,tid,time
2. View the hexadecimal system exception thread
printf “%x\n” nid3. View the exception thread stack information
jstack pid | grep numberView the top 100 objects occupying the largest memory
jmap -histo pid|head -100export to file
jstack -l PID >> a.logOr dump information using the tool Mat or JProfiler to view
jmap -dump:live,format=b,file=/dump.bin pidAfter the above operation, it is enough to solve this kind of routine error, which is usually caused by various recursive loops or slow database queries.
Mat uses
In MAT, there are two size representations:
- Shallow Size: Indicates the memory size occupied by the object itself, excluding the object it refers to.
- Retained size: the current object memory size + the size of the object directly or indirectly referenced by the current object, the total sum, simply understood, is the total memory size that can be released after the current object is GC.
Histogram view
pictureUse Class Name as the dimension to display the number of objects of each class. It defaults to byte as the unit,
To display the unit, click Window->Preferences to select the last item, click Apply and Close
Then reopen the Histogram view, it will take effect.
pictureLeak Suspects
The report shows a pie chart very intuitively, and the dark part in the picture indicates that there may be suspicion of memory leaks.
Through this indicator, you can quickly locate the line of code in which class method where the memory leak occurs.
This troubleshooting
1. Information collection and analysis
Because the service health monitoring does not respond, the cpu and memory are normal, check the stack information directly to see what the threads are doing
jstack -l PID >> a.logIn the output of Jstack, the Java thread status is mainly the following:
- RUNNABLE thread running or I/O waiting
- BLOCKED thread is waiting for monitor lock (synchronized keyword)
- TIMED_WAITING The thread is waiting to wake up, but a time limit is set
- WAITING thread is waiting infinitely to wake up
It is found that all are WAITING threads.
"http-nio-8888-exec-6666" #8833 daemon prio=5 os_prio=0 tid=0x00001f2f0016e100 nid=0x667d waiting on condition [0x00002f1de3c5200]java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007156a29c8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer $ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer $ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at com.alibaba.druid.pool.DruidDataSource.takeLast(DruidDataSource.java:1897)
at com.alibaba.druid.pool.DruidDataSource.getConnectionInternal(DruidDataSource.java:1458)
at com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:1253)
at com.alibaba.druid.filter.FilterChainImpl.dataSource_connect(FilterChainImpl.java:4619)
at com.alibaba.druid.filter.stat.StatFilter.dataSource_getConnection(StatFilter.java:680)
at com.alibaba.druid.filter.FilterChainImpl.dataSource_connect(FilterChainImpl.java:4615)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1231)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1223)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:90)
at com.baomidou.dynamic.datasource.ds.ItemDataSource.getConnection(ItemDataSource.java:56)
at com.baomidou.dynamic.datasource.ds.AbstractRoutingDataSource.getConnection(AbstractRoutingDataSource.java:48)
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)
at org.mybatis.spring.transaction.SpringManagedTransaction.openConnection(SpringManagedTransaction.java:82)
at org.mybatis.spring.transaction.SpringManagedTransaction.getConnection(SpringManagedTransaction.java:68)
at org.apache.ibatis.executor.BaseExecutor.getConnection(BaseExecutor.java:336)
at org.apache.ibatis.executor.SimpleExecutor.prepareStatement(SimpleExecutor.java:84)
at org.apache.ibatis.executor.SimpleExecutor.doQuery(SimpleExecutor.java:62)
at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:324)
at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:156)
at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:109)
at com.github.pagehelper.PageInterceptor.intercept(PageInterceptor.java:143)
at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:61)
at com.sun.proxy. $Proxy571.query(Unknown Source)
2. Locate key information and track source code
at java.util.concurrent.locks.AbstractQueuedSynchronizer $ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at com.alibaba.druid.pool.DruidDataSource.takeLast(DruidDataSource.java:1897)
DruidConnectionHolder takeLast() throws InterruptedException, SQLException {
try {
while (poolingCount == 0) {
emptySignal(); // send signal to CreateThread create connection
if (failFast && isFailContinuous()) {
throw new DataSourceNotAvailableException(createError);
}
notEmptyWaitThreadCount++;
if (notEmptyWaitThreadCount > notEmptyWaitThreadPeak) {
notEmptyWaitThreadPeak = notEmptyWaitThreadCount;
.
nbsp ; try {
// The connection to the database has not been released and is occupied, and there is no connection available in the connection pool, resulting in the request being
blocked ; notEmpty. await(); // signal by recycle or creator
} finally {
notEmptyWaitThreadCount--; i=5> notEmptyWaitThreadCount--; i=5> notEmptyWaitThreadCount--;
}
notEmptyWaitCount++;
if (! enable) {
connectErrorCountUpdater.incrementAndGet(this);
throw new DataSourceDisableException();
}
}
} catch (InterruptedException ie) {
notEmpty.signal(); // propagate to non-interrupted thread
notEmptySignalCount++;
throw ie;
}
decrementPoolingCount();
DruidConnectionHolder last = connections[poolingCount];
connections[poolingCount] = null;
return last;
}
Locate the problem code in combination with the log error report. Due to an error reporting that the available connection is not released normally, the await has been stuck.
The problem code is as follows:
try { SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH);
TestMapper mapper = sqlSession.getMapper(TestMapper.class);
mapper.insetList(list);
sqlSession.flushStatements();
} catch (Exception e) {
e.printStackTrace();
}
Problem recurrence
Reproduce in the multi-active environment according to the above information. The monitoring check does not respond because the threads are full and waiting.
The tomcat thread is full:
pictureTomcat default parameters:
The maximum number of worker threads, the default is 200.
server.tomcat.max-threads=200The maximum number of connections is 10000 by default
server.tomcat.max-connections=10000Waiting queue length, default 100.
server.tomcat.accept-count=100Minimum number of working idle threads, default 10.
server.tomcat.min-spare-threads=100The default parameters of the Druid connection pool are as follows:
pictureThe configuration parameters of the Druid connection pool are as follows:
solve
1. Druid connection pool configuration timeout parameters
spring: redis:
host: localhost
port: 6379
password:
datasource:
druid:
stat-view-servlet:
enabled: true
loginUsername: admin
loginPassword: 123456
dynamic:
druid:
initial-size: 5
min-idle: 5
maxActive: 20
maxWait: 60000
timeBetweenEvictionRunsMillis: 60000
minEvictableIdleTimeMillis: 300000
validationQuery: SELECT 1 FROM DUAL
testWhileIdle: true
testOnBorrow: false
testOnReturn: false
poolPreparedStatements: true
maxPoolPreparedStatementPerConnectionSize: 20
filters: stat,slf4j,wall
connectionProperties: druid.stat.mergeSql\= true;druid.stat.slowSqlMillis\=5000
2. Abnormally close the connection in time
sqlSession.close();Source: blog.csdn.net/zhangcongyi420/article/details/131139599
End