I learned this performance optimization methodology, and the leadership year-end award gave me 6 months

content

Overview

Preliminary knowledge

Preparation of the simulation environment

CPU is full

memory leak

deadlock

Frequent thread switching

Summarize

Overview

Performance optimization has always been the focus of back-end service optimization , but online performance failures do not occur often, or are limited by business products, there is no way to have performance problems, including the author himself. There are not many performance problems, so in order to Reserve knowledge in advance, and you will not be in a hurry when problems arise. In this article, we will simulate several common Java performance failures to learn how to analyze and locate them .

Hello everyone, let me introduce myself first. I'm on the code. You can call me Brother Code. I am also the most ordinary student who graduated from an ordinary undergraduate degree. I believe that most programmers or those who want to work in the programmer industry are I am a child of an ordinary family, so I also rely on my own efforts, from graduation to join a traditional company, to changing jobs without fail, and now working in a giant company in the Internet industry, I hope that through my sharing, I can help everyone

I have prepared 16 technical columns for everyone to lead you to learn together

"Billion-level traffic distributed system combat"

"Battery Factory Interview Must-Ask Series"

"Technical Talk"

"Zero Foundation takes you to learn java tutorial column"

"Take you to learn springCloud column"

"Take you to learn SpringCloud source code column"

"Take you to learn distributed system column"

"Take you to learn cloud native column"

"Take you to learn springboot source code"

"Take you to learn netty principles and practical column"

"Take you to learn Elasticsearch column"

"Take you to learn mysql column"

"Take you to learn JVM principle column"

"Take you to learn Redis principle column"

"Take you to learn java advanced column"

"Take you to learn big data column"

Preliminary knowledge

Since it is a positioning problem, it is definitely necessary to use tools. Let's first understand which tools are needed to help locate the problem.

top command

topThe command is one of our most commonly used Linux commands. It can display system information such as CPU usage and memory usage of the currently executing process in real time. top -Hp pid You can view the system resource usage of a thread.

vmstat command

vmstat is a virtual memory detection tool that specifies the cycle and collection times. It can count the usage of memory, CPU, and swap. It also has an important common function to observe the context switching of the process. Field descriptions are as follows:

r: The number of processes in the run queue (when the number is greater than the number of CPU cores, there are blocked threads)
b: the number of processes waiting for IO
swpd: use virtual memory size
free: free physical memory size
buff: The size of the memory used for buffering (memory and hard disk buffers)
cache: size of memory used as cache (buffer between CPU and memory)
si: The size written from the swap area to the memory per second, transferred from the disk to the memory
so: The size of the memory written to the swap area per second, transferred from the memory to the disk
bi: number of blocks read per second
bo: blocks written per second
in: Interrupts per second, including clock interrupts.
cs: Context switches per second.
us: percentage of user process execution time (user time)
sy: Percentage of kernel system process execution time (system time)
wa: IO wait time percentage
id: idle time percentage

pidstat command

pidstat is a component of Sysstat, and it is also a powerful performance monitoring tool, top and vmstat both commands monitor the memory, CPU and I/O usage of the process, and the pidstat command can detect the thread level. pidstatThe command thread switching fields are described as follows:

UID : The real user ID of the monitored task.
TGID : Thread group ID.
TID: Thread ID.
cswch/s: The number of times of active context switching, here is the switching of threads due to resource blocking, such as lock waiting and so on.
nvcswch/s: The number of passive context switching, here refers to the CPU scheduling switching threads.

jstack command

jstack is a JDK tool command. It is a thread stack analysis tool. The most commonly used function is to use the jstack pid command to view the stack information of the thread, and it is often used to eliminate deadlocks.

jstat command

It can detect the real-time operation of Java programs, including heap memory information and garbage collection information, and we often use it to view program garbage collection. Commonly used commands are jstat -gc pid. The information fields are described as follows:

SOC: The capacity of the To Survivor in the young generation (in KB);
S1C: The capacity of the From Survivor in the young generation (unit: KB);
S0U: The space currently used by To Survivor in the young generation (unit: KB);
S1U: The space currently used by From Survivor in the young generation (unit: KB);
EC: The capacity of Eden in the young generation (in KB);
EU: The space currently used by Eden in the young generation (in KB);
OC: the capacity of the old age (in KB);
OU: The space currently used in the old age (unit: KB);
MC: the capacity of the meta space (in KB);
MU: The currently used space of the meta space (unit: KB);
YGC: The number of gcs in the young generation from application startup to sampling;
YGCT: The time (s) used by gc in the young generation from the application startup to the sampling time;
FGC: The number of gcs in the old age (Full Gc) from the application startup to the sampling time;
FGCT: The time (s) from the application startup to the old generation (Full Gc) gc when sampling;
GCT: The total time (s) taken by gc from application startup to sampling.

jmap command

jmap is also a JDK tool command. It can view the initialization information of the heap memory and the usage of the heap memory, and can also generate a dump file for detailed analysis. View heap memory status command jmap -heap pid.

mat memory tool

MAT (Memory Analyzer Tool) tool is a plug-in of eclipse (MAT can also be used alone), when it analyzes dump files with large memory, you can intuitively see the memory size and number of class instances occupied by each object in the heap space , object reference relationship, use OQL object query, and can easily find out the relevant information of object GC Roots.

There is also such a plug-in in idea, which is JProfiler .

Preparation of the simulation environment

The basic environment jdk1.8, using the SpringBoot framework to write several interfaces to trigger the simulation scene, the first is to simulate the CPU full situation

CPU is full

It is relatively simple to simulate CPU full, just write an infinite loop calculation to consume CPU.

     /**
     * 模拟CPU占满
     */
    @GetMapping("/cpu/loop")
    public void testCPULoop() throws InterruptedException {
        System.out.println("请求cpu死循环");
        Thread.currentThread().setName("loop-thread-cpu");
        int num = 0;
        while (true) {
            num++;
            if (num == Integer.MAX_VALUE) {
                System.out.println("reset");
            }
            num = 0;
        }

    }Copy to clipboardErrorCopied

Request the interface address test curl localhost:8080/cpu/loopand find that the CPU immediately soars to 100%

top -Hp 32805 View Java thread situation by executing

Execute printf '%x' 32826 to obtain the thread id in hexadecimal for dumpinformation query, and the result is 803a. Finally, we execute jstack 32805 |grep -A 20 803ato view the detailed dumpinformation.

The dumpinformation here directly locates the problem method and code line, which locates the problem of CPU full.

memory leak

Simulating memory leaks is accomplished with the help of ThreadLocal objects. ThreadLocal is a thread-private variable that can be bound to a thread and exists throughout the life cycle of the thread. However, due to the particularity of ThreadLocal, ThreadLocal is implemented based on ThreadLocalMap, and the Entry of ThreadLocalMap Inherit WeakReference, and the Key of Entry is the encapsulation of WeakReference. In other words, the Key is a weak reference, and the weak reference will be recycled after the next GC. If ThreadLocal does not perform subsequent operations after the set, because the GC will clear the Key, However, because the thread is still alive, Value will never be recycled, and eventually a memory leak will occur.

/**
     * 模拟内存泄漏
     */
    @GetMapping(value = "/memory/leak")
    public String leak() {
        System.out.println("模拟内存泄漏");
        ThreadLocal<Byte[]> localVariable = new ThreadLocal<Byte[]>();
        localVariable.set(new Byte[4096 * 1024]);// 为线程添加变量
        return "ok";
    }Copy to clipboardErrorCopied

We add a heap memory size limit to the startup, and set a stack snapshot and log output when the memory overflows.

java -jar -Xms500m -Xmx500m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapdump.hprof -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Xloggc:/tmp/heaplog.log analysis-demo-0.0.1-SNAPSHOT.jarCopy to clipboardErrorCopied

After the startup is successful, we execute the cycle 100 times, for i in {1..500}; do curl localhost:8080/memory/leak;doneand the system has returned 500 errors before the execution is completed. The following exception occurred when viewing the system log:

java.lang.OutOfMemoryError: Java heap spaceCopy to clipboardErrorCopied

We use the jstat -gc pid command to see the GC situation of the program.

Obviously, the memory overflowed, and the heap memory did not release the available memory after 45 Full Gcs, which means that the objects in the current heap memory are all alive, referenced by GC Roots, and cannot be recycled. What is the reason for the memory overflow? Should I just increase the memory? If it is a normal memory overflow, it may be enough to expand the memory, but if it is a memory leak, the expanded memory will be filled up in a short time, so we also need to determine whether it is a memory leak. We saved the heap dump file before, and this time we use our MAT tool to analyze it. Import the tool selection Leak Suspects Report, and the tool will directly list the problem report for you.

There are 4 suspected memory leaks listed here, we click on one of them to see the details.

It has been pointed out here that the memory is nearly 50M occupied by the thread, and the occupied object is ThreadLocal. If you want to analyze it manually in detail, you can click Histogramto see who is occupying the largest object, and then analyze its reference relationship to determine who caused the memory overflow.

The above figure shows that the object that occupies the largest memory is a Byte array. Let's see if it is referenced by the GC Root and has not been recycled. According to the operation guide in the red box above, the result is as follows:

We found that the Byte array is referenced by the thread object. It is also indicated in the figure that the GC Root of the Byte array object is a thread, so it will not be recycled. Expand the detailed information to see that the final memory occupied object is ThreadLocal. object occupied. This is also consistent with the results that the MAT tool automatically analyzes for us.

deadlock

Deadlock will lead to exhaustion of thread resources and occupy memory. The performance is that the memory usage increases, and the CPU will not necessarily soar (depending on the scene). If it is a direct new thread, it will cause the JVM memory to be exhausted and report the error that the thread cannot be created. , which also reflects the benefits of using thread pools.

 ExecutorService service = new ThreadPoolExecutor(4, 10,
            0, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(1024),
            Executors.defaultThreadFactory(),
            new ThreadPoolExecutor.AbortPolicy());
   /**
     * 模拟死锁
     */
    @GetMapping("/cpu/test")
    public String testCPU() throws InterruptedException {
        System.out.println("请求cpu");
        Object lock1 = new Object();
        Object lock2 = new Object();
        service.submit(new DeadLockThread(lock1, lock2), "deadLookThread-" + new Random().nextInt());
        service.submit(new DeadLockThread(lock2, lock1), "deadLookThread-" + new Random().nextInt());
        return "ok";
    }

public class DeadLockThread implements Runnable {
    private Object lock1;
    private Object lock2;

    public DeadLockThread1(Object lock1, Object lock2) {
        this.lock1 = lock1;
        this.lock2 = lock2;
    }

    @Override
    public void run() {
        synchronized (lock2) {
            System.out.println(Thread.currentThread().getName()+"get lock2 and wait lock1");
            try {
                TimeUnit.MILLISECONDS.sleep(2000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            synchronized (lock1) {
                System.out.println(Thread.currentThread().getName()+"get lock1 and lock2 ");
            }
        }
    }
}Copy to clipboardErrorCopied

We cyclically request the interface 2,000 times, and found that a log error occurred in the system after a while, and the thread pool and queue were full. Since I chose the strategy of rejecting when the queue is full, the system directly throws an exception.

java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@2760298 rejected from java.util.concurrent.ThreadPoolExecutor@7ea7cd51[Running, pool size = 10, active threads = 10, queued tasks = 1024, completed tasks = 846]Copy to clipboardErrorCopied

ps -ef|grep javaFind the pid of the Java process through the command, and execute jstack pid the java thread stack information. There are 5 deadlocks found here. We only list one of them. Obviously, the thread pool-1-thread-2locks the 0x00000000f8387d88waiting 0x00000000f8387d98lock, and the thread pool-1-thread-1locks the 0x00000000f8387d98waiting lock 0x00000000f8387d88. A deadlock occurred.

Java stack information for the threads listed above:
===================================================
"pool-1-thread-2":
        at top.luozhou.analysisdemo.controller.DeadLockThread2.run(DeadLockThread.java:30)
        - waiting to lock <0x00000000f8387d98> (a java.lang.Object)
        - locked <0x00000000f8387d88> (a java.lang.Object)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
"pool-1-thread-1":
        at top.luozhou.analysisdemo.controller.DeadLockThread1.run(DeadLockThread.java:30)
        - waiting to lock <0x00000000f8387d88> (a java.lang.Object)
        - locked <0x00000000f8387d98> (a java.lang.Object)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

 Found 5 deadlocks.Copy to clipboardErrorCopied

Frequent thread switching

Context switching causes a lot of CPU time to be wasted on saving and restoring registers, kernel stacks, and virtual memory, resulting in overall system performance degradation. When you notice a significant drop in system performance, consider whether a large number of thread context switches have occurred.

 @GetMapping(value = "/thread/swap")
    public String theadSwap(int num) {
        System.out.println("模拟线程切换");
        for (int i = 0; i < num; i++) {
            new Thread(new ThreadSwap1(new AtomicInteger(0)),"thread-swap"+i).start();
        }
        return "ok";
    }
public class ThreadSwap1 implements Runnable {
    private AtomicInteger integer;

    public ThreadSwap1(AtomicInteger integer) {
        this.integer = integer;
    }

    @Override
    public void run() {
        while (true) {
            integer.addAndGet(1);
            Thread.yield(); //让出CPU资源
        }
    }
}Copy to clipboardErrorCopied

Here I create multiple threads to perform the basic atomic +1 operation, and then give up CPU resources. In theory, the CPU will schedule other threads. We request the interface to create 100 threads to see how it works curl localhost:8080/thread/swap?num=100. After the interface request is successful, we execute vmstat 1 10it, which means that it is printed every 1 second, 10 times, and the thread switching collection results are as follows:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
101  0 128000 878384    908 468684    0    0     0     0 4071 8110498 14 86  0  0  0
100  0 128000 878384    908 468684    0    0     0     0 4065 8312463 15 85  0  0  0
100  0 128000 878384    908 468684    0    0     0     0 4107 8207718 14 87  0  0  0
100  0 128000 878384    908 468684    0    0     0     0 4083 8410174 14 86  0  0  0
100  0 128000 878384    908 468684    0    0     0     0 4083 8264377 14 86  0  0  0
100  0 128000 878384    908 468688    0    0     0   108 4182 8346826 14 86  0  0  0Copy to clipboardErrorCopied

Here we focus on 4 indicators, r, cs, us, sy.

r=100 , indicating that the number of waiting processes is 100, and the thread is blocked.

cs=more than 8 million , indicating that there are more than 8 million context switches per second, which is quite a large number.

us=14 , indicating that the user mode occupies 14% of the CPU time slice to process logic.

sy=86 , indicating that the kernel state occupies 86% of the CPU, which is obviously doing context switching.

Through the topcommand and top -Hp pidchecking the process and thread CPU situation, we found that the CPU of the Java process is full, but the CPU usage of the thread is very average, and there is no situation where a thread is full of the CPU.

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 87093 root      20   0 4194788 299056  13252 S 399.7 16.1  65:34.67 javaCopy to clipboardErrorCopied

 PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 87189 root      20   0 4194788 299056  13252 R  4.7 16.1   0:41.11 java
 87129 root      20   0 4194788 299056  13252 R  4.3 16.1   0:41.14 java
 87130 root      20   0 4194788 299056  13252 R  4.3 16.1   0:40.51 java
 87133 root      20   0 4194788 299056  13252 R  4.3 16.1   0:40.59 java
 87134 root      20   0 4194788 299056  13252 R  4.3 16.1   0:40.95 javaCopy to clipboardErrorCopied

Combined with the above, the user mode CPU only uses 14%, and the kernel mode CPU occupies 86%. It can be basically judged that the Java program thread context switching causes performance problems.

We use the pidstatcommand to see the thread switching data inside the Java process, execute pidstat -p 87093 -w 1 10, and collect the data as follows:

11:04:30 PM   UID       TGID       TID   cswch/s nvcswch/s  Command
11:04:30 PM     0         -     87128      0.00     16.07  |__java
11:04:30 PM     0         -     87129      0.00     15.60  |__java
11:04:30 PM     0         -     87130      0.00     15.54  |__java
11:04:30 PM     0         -     87131      0.00     15.60  |__java
11:04:30 PM     0         -     87132      0.00     15.43  |__java
11:04:30 PM     0         -     87133      0.00     16.02  |__java
11:04:30 PM     0         -     87134      0.00     15.66  |__java
11:04:30 PM     0         -     87135      0.00     15.23  |__java
11:04:30 PM     0         -     87136      0.00     15.33  |__java
11:04:30 PM     0         -     87137      0.00     16.04  |__javaCopy to clipboardErrorCopied

According to the information collected above, we know that Java threads switch about 15 times per second. Under normal circumstances, it should be a single digit or a decimal. Combining this information, we can conclude that too many Java threads are opened, resulting in frequent context switching, which affects the overall performance.

Why is the context switching of the system more than 8 million per second, while a certain thread switching in the Java process is only about 15 times?

System context switching is divided into three situations:

1. Multitasking: In a multitasking environment, a process is switched out of the CPU to run another process, and context switching occurs here.

2. Interrupt processing: When an interrupt occurs, the hardware will switch the context. in the vmstat command isin

3. User and kernel mode switching: When the operating system needs to switch between user mode and kernel mode, context switching is required, such as system function calls.

Linux maintains a ready queue for each CPU, sorts active processes according to priority and waiting time for CPU, and then selects the process that needs CPU the most, that is, the process with the highest priority and the longest waiting time for CPU to run. That is, in the vmstat command r.

So, when will the process be scheduled to run on the CPU?

After the process is executed and terminated, the CPU it used before will be released, and then a new process will be taken from the ready queue to run
To ensure that all processes can be scheduled fairly, CPU time is divided into time slices, which are allocated to each process in turn. When a process time slice is exhausted, it will be suspended by the system and switch to other processes waiting for the CPU to run.
When the system resources are insufficient, the process can only run after the resources are satisfied. At this time, the process will also be suspended, and the system will schedule other processes to run.
When the process is actively suspended through the sleep function sleep, it is also rescheduled.
When a process with a higher priority is running, in order to ensure the running of the high-priority process, the current process will be suspended and run by the high-priority process.
When a hardware interrupt occurs, the process on the CPU will be suspended by the interrupt and instead execute the interrupt service routine in the kernel.

Combined with our previous content analysis, the blocked ready queue is about 100, and our CPU has only 4 cores, the context switching caused by this part of the reason may be quite high, plus the number of interrupts is about 4000 and system function calls, etc. , it is not surprising that the context switch of the entire system is 8 million. The thread switching inside Java is only 15 times, because the thread is used Thread.yield()to give up CPU resources, but the CPU may continue to schedule the thread. At this time, there is no switching between threads, which is why the number of internal thread switching is not very high. big reason.

Summarize

This article simulates common performance problem scenarios, and analyzes how to locate CPU100%, memory leaks, deadlocks, and frequent thread switching problems. To analyze problems, we need to do two things well, first, to master the basic principles, and second, to use good tools. This article also lists common tools and commands for analyzing problems, hoping to help you solve problems. Of course, the real online environment may be very complex, not as simple as the simulated environment, but the principle is the same, and the performance of the problem is similar. We focus on grasping the principle, learning and applying it, and I believe that complex online problems can also be smooth. solve.

Everyone must remember to like, subscribe, and follow

Prevent it from being found next time

Your support is the driving force for me to continue to create! ! !

I learned this performance optimization methodology, and the leadership year-end award gave me 6 months

Guess you like