The request has not returned after 10s, how to troubleshoot

foreword

The ability of the programmer to locate the problem is very important. If there is a problem on the production line, and the request sent by the client has not been returned after 10s, how would you investigate it?

jstack

jstack is a very powerful command that can generate a thread snapshot of the current moment of the specified process. Before we talk about how to use jstack to check, we need some prerequisite knowledge

Thread state transitions
state illustrate
NEW In the initial state, the thread is constructed, but the start() method has not been called
RUNNABLE Operating status
WAITING Waiting state, entering this state means that the current thread needs to wait for other threads to make a specific action (notification or interruption)
TIMED_WAITING Timeout waiting state, it will return by itself within the specified time
BLOCKED Blocked state, indicating that the thread is blocked on the lock
TERMINATED Terminated state, indicating that the current thread has completed execution

image.png

After knowing the above states, it will be much easier for us to understand the dump information of jstack. In the dump information, we need to pay attention to the following states

  1. Deadlock, Deadlock
  2. Waiting for resources, Waiting on condition
  3. Waiting to get the monitor, Waiting on monitor entry (this is generally a problem)
  4. Blocked, Blocked
  5. In execution, Runnable
  6. Suspended
Simulate Waiting on condition
public class ThreadSleepTest {

    public static void main(String[] args) throws InterruptedException {
        Thread.sleep(10000000);
    }

}
复制代码

image.png

Simulate deadlock
public class ThreadDeadLockTest {

    public static void main(String[] args) {

        StringBuilder a = new StringBuilder();
        StringBuilder b = new StringBuilder();

        Thread thread1 = new Thread(() -> {
            synchronized (a) {
                a.append("a");
                b.append("b");

                try {
                    Thread.sleep(1000);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }

                synchronized (b) {
                    b.append("c");
                    a.append("d");
                }
            }
        });

        Thread thread2 = new Thread(() -> {
            synchronized (b) {
                b.append("b");
                a.append("a");

                try {
                    Thread.sleep(1000);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
                synchronized (a) {
                    a.append("c");
                    b.append("d");
                }
            }
        });
        thread1.setName("线程1");
        thread2.setName("线程2");

        thread1.start();
        thread2.start();
    }
}
复制代码

It can be seen that both thread 1 and thread 2 are waiting for lock resources and are in the BLOCK state

image.png

If you look down, you can see the deadlock prompt.

image.png

mock runnable
public class ThreadRunningTest {

    public static void main(String[] args) throws InterruptedException {
        int sum = 0;
        while (true) {
            sum += 1;
        }
    }

}
复制代码

image.png

How to find problem threads

The above are all simulated by our program, and the problem thread can be easily found, but in the actual production environment, when the request is very large, how to find the problem thread?

The most classic method is the following four steps, find out the thread that occupies the highest CPU, and print the stack information

// 找出占用cpu最高的进程
top
// 找出该进程下占用cpu最高的线程
top -Hp pid
// 打印该线程的16进制数据
printf "%x\n" 线程id
// 打印堆栈信息
jstack pid | grep nid=16进制
复制代码

Time+: the cumulative time the thread occupies the CPU

image.png

There is also a more ingenious method. After dumping a thread stack, dump a copy after a period of time.

image.png

Use Compare Files on IDEA to compare and analyze the similarities and differences between the two dump files. If there is a request that has not returned for more than 10s, it will theoretically exist in both files. At this time, you can find out the thread id. It is a more classic method, and this method can also be used for more packet capture requests

image.png

database

If the request has not been returned for too long, I usually check the database first. If the database is penetrated, the consequences will be very serious.

Now check whether the slow query log is enabled

mysql> show variables like '%slow_query_log%';
+---------------------+----------------------------------------------------------+
| Variable_name       | Value                                                    |
+---------------------+----------------------------------------------------------+
| slow_query_log      | ON                                                       |
| slow_query_log_file | /usr/local/mysql/data/zhangxiaobindeMacBook-Pro-slow.log |
+---------------------+----------------------------------------------------------+
2 rows in set (0.01 sec)
复制代码

Execute a slow sql

mysql> select sleep(100), id from user where id = 1;
+------------+----+
| sleep(100) | id |
+------------+----+
|          0 |  1 |
+------------+----+
1 row in set (1 min 40.24 sec)
复制代码

We can query this sql in the slow query log and know how long it took to execute

/usr/local/mysql/bin/mysqld, Version: 5.7.27-log (MySQL Community Server (GPL)). started with:
Tcp port: 3306  Unix socket: /tmp/mysql.sock
Time                 Id Command    Argument
# Time: 2021-07-03T08:33:40.680324Z
# User@Host: root[root] @ localhost []  Id:     7
# Query_time: 100.238479  Lock_time: 0.168766 Rows_sent: 1  Rows_examined: 1
use test;
SET timestamp=1625301220;
select sleep(100), id from user where id = 1;
复制代码

After finding the sql, it's easy to do. You can use the explain keyword to see the execution process of the sql

show processlist

It is also possible that the sql has not entered the slow query log. At this time, you can use the show processlist command to check whether there is any sql that takes too long to execute, and then locate the problematic sql more quickly.

mysql> show processlist;
+----+------+-----------------+------+---------+------+------------+----------------------------------------------+
| Id | User | Host            | db   | Command | Time | State      | Info                                         |
+----+------+-----------------+------+---------+------+------------+----------------------------------------------+
|  2 | root | localhost:54718 | test | Sleep   |   35 |            | NULL                                         |
|  3 | root | localhost:54725 | NULL | Sleep   |   42 |            | NULL                                         |
|  4 | root | localhost:51794 | test | Sleep   |   62 |            | NULL                                         |
|  5 | root | localhost:51795 | NULL | Sleep   |   12 |            | NULL                                         |
|  7 | root | localhost       | test | Query   |   23 | User sleep | select sleep(100), id from user where id = 1 |
|  9 | root | localhost       | NULL | Query   |    0 | starting   | show processlist                             |
+----+------+-----------------+------+---------+------+------------+----------------------------------------------+
6 rows in set (0.00 sec)
复制代码

References

"Java Concurrent Programming"

Guess you like

Origin juejin.im/post/6980666498151006245