Oh, the code that has been running on the bank line for a year has an accident

Insert picture description here

Introduction

When I was in the water group on the weekend, I found that a small partner encountered an online problem

In the thread pool, only one thread status is RUNNABLE, and the others are WAITING. What are the possible causes?

Insert picture description here

There are 25 threads in the thread pool, only one thread is stuck on the network reading, the status is RUNNABLE, and the other threads are WAITING.
Insert picture description here

Maybe some friends have never used this tool. Briefly introduce this performance monitoring tool JMC. JMC is a set of monitoring and management tools derived from JRockit JVM. Oracle included it in the release of JAVA 7u4 (Java 7 Update 40). In JDK, users no longer need to download separately

Just execute jmc in the command

Application startup configuration parameters are as follows

-Dcom.sun.management.jmxremote.port=7091 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false

Connect to the configured JMC to see various detection indicators.

Originally, I wanted this little partner to send the code over and have a look, but he said that he was doing a bank project and was not connected to the Internet, so he could only use his mobile phone to open a video to the computer to show me an overview. I restore the scene of this code. It is estimated that many friends can find the problem at once, because I omitted the redundant code and only left the code that would cause the problem

public class BankDemo {
    
    

    public ExecutorService service = Executors.newFixedThreadPool(5);

    public static class Task implements Runnable {
    
    

        private CountDownLatch latch;

        public void setLatch(CountDownLatch latch) {
    
    
            this.latch = latch;
        }

        @SneakyThrows
        @Override
        public void run() {
    
    
            // 建立一个Socket连接发送数据
            Socket socket = new Socket("127.0.0.1",10006);
            // ...
            // 执行最后调用如下方法
            latch.countDown();
        }
    }

    // 真实的代码这里的过程为,每次往线程池里面放一批任务,这一批任务执行完毕,再放下一批任务
    // 即循环调用如下方法
    @SneakyThrows
    public void runTask(List<Task> taskList) {
    
    
        CountDownLatch latch = new CountDownLatch(5);
        taskList.forEach(item -> {
    
    
            item.setLatch(latch);
            service.submit(item);
        });
        latch.await();
    }
}

Remind that the thread in the WAITING state is blocked on the LockSupport.park() method (using the JMC tool in the figure above)

To write an episode, this little partner has always emphasized to me that this code has been running online for a year, and there has been no problem. How come he has a problem, so his solution is to always look at which parts he has modified, but he never sees the problem.

And my thinking is different from his, because some bugs will only appear in certain scenarios. Don’t believe that there is no problem with the previous code. Start with the problem itself.

Java thread status

Basic knowledge is still very important when you find a problem, review it

The simple thread state is shown in the figure below.
Insert picture description here
There is an enumeration inner class State inside the Java Thread thread, which defines the enumeration value of the Java language thread state

  1. NEW (initialization state)
  2. RUNNABLE (operable/running state)
  3. BLOCKED (blocked state)
  4. WAITING (Waiting without time limit)
  5. TIMED_WAITING (time limited waiting)
  6. TERMINATED (terminal state)

Java subdivides the blocking state at the operating system level into three states: BLOCK, WAITING, and TIMED_WAITING

NEW: New state , the state where the thread is created but not started. There are three ways to create a thread

  1. Inherit the Thread class
  2. Implement the Runnable interface
  3. Implement the Callable interface

We most commonly use this way of implementing interfaces. The difference between Runnable and Callable interfaces is as follows

  1. Runnable cannot get the return value, but Callable can get the return value
  2. Runnable cannot throw exceptions, while Callable can throw exceptions

RUNNABLE (ready state) : the state before running after calling start
RUNNING (running state) : the thread is running
BLOCKED (blocked state) : enter the following state, there are the following situations

  1. BLOCK (synchronous blocking): The lock is occupied by other threads, such as waiting to enter the synchronized method or code block
  2. WAITING (active blocking): execute Object.wait(), Thread.join(), etc.
  3. TIMED_WAITING (waiting for blocking): execute Object.wait(long), Thread.sleep(long), etc.

DEAD (termination state) : the thread execution is completed and
finally various methods are added to the thread state diagram
Insert picture description here

Scene restoration

The thread WAITING is generally called one of the following three methods

  1. Object.wait()
  2. Thread.join()
  3. LockSupport.park()

The troubleshooting process is as follows

  1. After clarifying that Object.wait() and Thread.join() are not called in the code, it is basically determined that the thread blocking caused by the tool class under the java.util.concurrent package is called, because the java.util.concurrent package The tools below frequently use LockSupport.park()

  2. Then it can be determined that the problem is caused by using CountDownLatch, other threads have ended, only one thread is running, and other threads are blocked and waiting

  3. So what did this RUNNABLE thread do, why hasn't it ended? At this time, a picture at the beginning of the article pointed out the direction, and this thread was blocked on network reading.

  4. Since it is stuck on the network reading, it must have not set the connection timeout time, or the timeout time for reading. When I asked, it was the same as I thought, no settings

After setting up, he ran locally, and it was still running normally at first, and then it directly threw an exception
Insert picture description here
SocketTimeoutException: connect timed out (connection timed out)
SocketException: Connection reset (the server closed the connection, but the client still Reading data from the connection)

Then why did the program run normally at the beginning? Did you report this connection abnormality later?

  1. The server is indeed too concurrent
  2. The server's network request is realized by BIO. One request creates one thread, which cannot support high concurrency by itself.

Insert picture description here
As for the reason? I asked my friend to find the developer of the server to confirm that the server was actually implemented using BIO. Netty is not used for network requests, it is still your waywardness!

Looking forward to my follow-up Netty article, this kind of thing must not happen again.

Welcome to follow

Insert picture description here

Reference blog

Guess you like

Origin blog.csdn.net/zzti_erlie/article/details/108681060