Case Analysis: improper use of the thread pool system crashes

A few days ago, was found on one Web service Ali cloud server is unavailable. SSH remote login is not on, after several attempts to log up, execute commands display

-bash: fork: Cannot allocate memory

A look that is a memory leak cause an overflow. Because it can not execute any command, the service can only be restored by restarting the server console. 

 

Preliminary investigation

After the service recovery, view the system log, linux system log path /var/log/messages, can journalctlview the command, such as

journalctl --since="2019-06-12 06:00:00" --until="2019-06-12 10:00:00"

After viewable since, until the log before the time period. In addition to discovering crond[14954]: (CRON) CAN'T FORK (do_command): Cannot allocate memory the error log, and no other anomalies (below sshd[10764]: error: fork: Cannot allocate memoryshould be executed named ssh login failure logs)

linux-log

By Ali cloud - Cloud Monitoring - host monitoring view memory usage metrics within this period of time, memory usage has been below 40%, the basic rule out the possibility of memory overflow.

By SEARCH to exceed the operating system limit the number of processes may result in bash: fork: Cannot allocate memoryan error (Reference: the  https://blog.csdn.net/wangshuminjava/article/details/80603847  ).

By ps -eLf|wc -lviewing the current number of process threads ( ps -efprint only process ps -eLfprints all the threads), only more than 1000, a fault in the end run time system has no way of knowing how many threads can only be sustained follow-up monitoring.

identify the problem

A few days later, again by ps -eLf|wc -la look and found the number of threads have reached more than 16,000. Direct execution ps -eLfmay see a large number of tomcat threads generated by this process, leading to speculation that is not thread deadlock large number of threads have been hung in there unfinished.

Execute  jstack 进程号 > ~/jstack.txt commands will process the print run threads situation analysis and found that a large number of WAITINGthread state, as follows

"pool-19-thread-1" #254 prio=5 os_prio=0 tid=0x00007f0b700a6000 nid=0x29a9 waiting on condition [0x00007f0b274df000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000006ce3d8790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

 

From the foregoing it can be seen in conditions like a thread, and is performed LinkedBlockingQueue.takewhen the method, see the method java doc, when the queue is empty, the process will wait until all elements are available.

/**
 * Retrieves and removes the head of this queue, waiting if necessary
 * until an element becomes available.
 *
 * @return the head of this queue
 * @throws InterruptedException if interrupted while waiting
 */
E take() throws InterruptedException;

  

Ask colleagues used where the LinkedBlockingQueue, colleagues recalled recently implemented thread pool to service OSS Ali cloud upload files by way of additional features, identify problems and review code - the thread pool is not closed. In order to save the fragment file does not exist confusion, each time you save the file are a new thread pool object, 

ThreadPoolExecutor saveImgThreadPool = new ThreadPoolExecutor(1, 1, 0, TimeUnit.SECONDS, new LinkedBlockingQueue<>());

But dealt with, did not close the thread pool object, so that the thread pool method will take to get through the waiting thread task queue if there are unfinished, waiting queue will have to wait for the space-time, thus leading to a large number of threads hung here (substantially as long as the method is adjusted once, it will generate a thread hung live). 

extend

  1. Thread status is "waiting for monitor entry":
    it means that it is waiting to enter a critical region, so it is waiting for "Entry Set" queue. At this time, the thread state are generally Blocked:
    java.lang.Thread.State: BLOCKED (ON Object Monitor)

  2. Thread status is "waiting on condition":
    that it is waiting for another condition occurs, to put their wake, or simply it is called sleep (N). At this time, the thread state is roughly the following:
    java.lang.Thread.State: the WAITING (Parking): been waiting for that condition occurs (This article is the case of such a scene); java.lang.Thread.State: TIMED_WAITING (parking or sleeping): timing, conditions that can not be, it will periodically wake itself.

  3. If the number of threads in the "waiting for monitor entry": a global lock may be blocking live a lot of threads. If the thread dump file to print a short time to reflect, as time goes by, waiting for monitor entry of more and more threads, not decreasing trend could mean some threads in the critical area to stay for too long in order to As more and more new thread so they could not enter the critical region.

  4. If the number of threads in the "waiting on condition": that they may go and get a third-party resources, particularly third-party network resources, delays in obtaining less Response, resulting in a large number of thread into a wait state. So if you find a large number of threads are in Wait on condition, see the thread stack, is waiting for the network to read and write, this could be a sign of a network bottleneck, because the network congestion causes the thread can not be performed. As used herein, it may also be mentioned, caused due to improper programming.





My personal blog address: http://blog.jboost.cn
my headline space:  https://www.toutiao.com/c/user/5833678517/#mid=1636101215791112
my github address: HTTPS: // github .com / ronwxy
my micro-channel public number: jboost-ksxy

——————————————————————————————————————

Micro-channel public number
I welcome attention to the micro-channel public number, timely access to the latest share

Guess you like

Origin www.cnblogs.com/spec-dog/p/11032779.html