A troubleshooting process for high memory usage of Java services

1. Reason

When typing code daily, the operation and maintenance colleagues suddenly pulled the team members into a group and saidAn online machine runs out of memory, OOM causes the mesh client registered by the service to be killed, and some services are called abnormally. The operation and maintenance colleagues checked the machine load and found that the memory occupied by a Java service in our group was a bit abnormal.The startup command -Xmx128mspecifies that the maximum heap memory is only 128M, but the memory occupied by the entire process reaches 640M, which is obviously a problem

2. Online investigation

After throwing the screenshots of the operation and maintenance, the pot can't be thrown away. Log in to the online machine to troubleshoot honestly. Memory usage is too high first thought is what has happened 内存泄露, use jmap -histo $pid > heap.logwithin the output statistics heap object to a file, view the file found most of heap memory for a variety of array and found no significant problems. Do anything, use the top -H p $pidcommand to check the status of threads running in the process, and finally found a suspicious spot,There are actually 5000 child threads running in this Java service, and almost all of them are in Sleeping state
Insert picture description here
The first thing that comes to mind in this situation is that a thread deadlock occurs, and resource contention causes a large number of threads to be blocked. Use jstack -l $pid > stack.logthe output of the thread stack related conditions to a file, open a file search was disappointed,No deadlock at all. Most of the thread status is there TIMED_WAITING, but as I look down line by line, I also find a suspicious point.The following such OkHttp ConnectionPoolthreads appear too much, even to the number of threads 1082707

"OkHttp ConnectionPool" #1082707 daemon prio=5 os_prio=0 tid=0x00007f564c18f000 nid=0x1a4d in Object.wait() [0x00007f5602cb4000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:460)
        at okhttp3.ConnectionPool$1.run(ConnectionPool.java:67)
        - locked <0x00000000fc30fb30> (a okhttp3.ConnectionPool)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
        - <0x00000000fc305f98> (a java.util.concurrent.ThreadPoolExecutor$Worker)

Looking back at the heap memory file at this time, I found the following record. okhttp3.ConnectionPoolAlthough the memory occupied by this connection pool object instance is not much, only 300K, butThe total number of instances is 7876, There is no doubt that there is a problem

  40:          7876         315040  okhttp3.ConnectionPool

3. Code troubleshooting

Suspicious point is that ConnectionPoolthe number of the object of the search in the project ConnectionPoolobject initialization call site, found that OkHttpClientinvokes initialization ConnectionPoolconstructor, that is, each OkHttpClientinstance is created it will be accompanied by a connection pool ConnectionPoolcreated. In the project code, each RPC call is going to be a re-created OkHttpClientobjects, thus it becomes so clear

  • Causes of the
    OkHttpClient object will be recovered after use JVM, but OkHttpthe source code ConnectionPoolconstructor'sThe default maximum thread idle number is 5, and the keepAlive time is 5 minutes, that is, after a network connection is initiated, the connection will not be disconnected within 5 minutes. In this way, when an OkHttpClientobject is reclaimed by the JVM, ConnectionPoolbecause there are threads that are still connected to the server and in a Activestate, they will not be reclaimed within 5 minutes, and naturally, thread resources will not be released.Threads will take up memory space, so as time accumulates, the number of threads increases, and the memory occupied by the process naturally increases.
    /**
    * Create a new connection pool with tuning parameters appropriate for a single-user application.
    * The tuning parameters in this pool are subject to change in future OkHttp releases. Currently
    * this pool holds up to 5 idle connections which will be evicted after 5 minutes of inactivity.
    */
    public ConnectionPool() {
          
          
     this(5, 5, TimeUnit.MINUTES);
    }
    

4. Solution

The root of the problem has been found, and the repair is natural and simple. Project code every time RPC calls to re-create an OkHttpClientobject, which is actually a huge waste of performance, as long as you save a static code OkHttpClientobjects, each network request is multiplexed with the object on it

Guess you like

Origin blog.csdn.net/weixin_45505313/article/details/104992561