Performance testing is not as difficult as you think, you will understand after reading this article

01 Common concepts


Throughput (TPS, QPS)

        Simply put, it is the number of transactions or queries completed per second. Usually, a large throughput indicates that the system can process more requests per unit time, so it is usually hoped that the higher the TPS, the better.

Response time

        That is, the time from sending the request to receiving the return from the system. The average response time is generally not taken, but the average value is taken after removing unstable values. For example, the commonly used 90% response time refers to the average value of 90% of the stable response time after removing 10% of the unstable response time. From the point of view of clustering, it is actually to remove outliers.

Error rate

        That is, the ratio of the number of bad requests to the total number of requests. As the pressure increases, there may be situations where processing requests cannot be processed, and the number of errors will continue to increase.

        The three are highly correlated, and any isolated data cannot explain the problem. A typical relationship is that as throughput increases, response latency is likely to increase, and error rates are likely to increase as well. Therefore, just taking out a TPS of 10w does not explain the problem.

Ideas for performance tuning

        In general, there is a prerequisite for tuning, that is, no matter whether it is using the real pipeline online or the offline stress test to make the problem magnified and obvious.

        According to these relatively obvious phenomena, make a preliminary judgment on the problem, collect evidence to verify the validity of the preliminary judgment, then analyze the cause of the phenomenon, and try to solve the problem.

02 Performance test

        For a new system or a system that has undergone major code changes, it is still necessary to do a thorough test. Generally speaking, the expected test is a stress test on a single machine. Stress testing can help you figure out what the limit TPS of the system is, whether there are any errors or problems exposed when the pressure comes up, what the general resource usage of the system is, and where the possible performance bottlenecks of the system are.

The configuration and results of a thorough test are as follows.

        This is the result of stress testing 10 machines with 12,000 concurrent users. It can be seen that the TPS is more than 7w, the average response time is 82ms, and the error rate is 2.5%.

        What other information can be obtained from the figure? First of all, TPS dropped rapidly in the later stage, and in fact it could no longer support such a large amount of concurrency, that is, it entered the collapse zone. Here are several possibilities:

        One is that the system simply cannot bear such a large amount of concurrency

        The second is that there is a problem in the middle of the system that causes the TPS to drop.

        Second, as time goes by, the error rate increases significantly, indicating that the system can no longer handle so many requests. Combining the previous two points and the relatively stable average response time, it can be roughly inferred that the system cannot bear such a large concurrency. In addition, since there are 10 machines, the TPS of a single machine is about 7,000, which can be used as a basis for future tuning.

        The characteristics of the application should also be analyzed at this time, that is, the resources that the application may occupy. For example, whether it is a CPU-intensive application or an IO-intensive application (it can also be subdivided into whether it is disk-intensive or network-intensive)insert image description here

 picture

 picture

 picture

 

03 Define the goal of performance optimization

        I often hear people say that for performance optimization, the higher the throughput, the better; or for performance testing, the target TPS is 50,000. Can actually get this information, can you do a performance test? Is this goal clear enough?

        In fact, in my opinion, performance testing without clearly defined goals is hooliganism.

        The goal of performance optimization is generally how much throughput is achieved, what is the 90% response time, and what is the error rate . At the same time, you also need to pay attention to other performance indicators, such as cpu usage, memory usage, disk usage, bandwidth usage, etc. For problems that have been found in the preliminary test, it can be specially optimized for this problem. For example, if the load is high and the CPU consumption is too large, the goal may be TPS, and the CPU load can be reduced while the response time and error rate remain unchanged. Or if the memory grows too fast and gc is more frequent, the goal may be to find out possible memory leaks, or perform related jvm memory tuning. In short, the goal can be adjusted flexibly, but it must be clear.

04 Analysis

        The analysis process is relatively flexible, basically there are a thousand performances for a thousand systems. It's hard to explain everything here. Just talk about some common methods, tools and ideas.

01 for CPU

        For cpu monitoring, in fact, linux has provided two useful tools, one is top and the other is vmstat. I won’t go into details about these two commands. Regarding the cpu, we mainly focus on four values: us(user), sy(system), wa(wait), id(idle). In theory they should add up to 100%. However, each of the first three values ​​that are too high may indicate some problems.

us is too high:

code problem

        For example, a time-consuming loop does not add sleep, or it is not processed well during some CPU-intensive calculations (such as xml parsing, encryption and decryption, encryption and decompression, and data calculation)

gc frequent

        A problem that is easy to miss is that us is easy to be too high when gc is frequent, because garbage collection is a process of massive calculations. The high CPU caused by frequent gc is often accompanied by a large amount of fluctuations in memory. It is better to judge and solve this problem through memory.

Tips: How to locate the thread with too high us and view its status.

        a. The top command finds the process pid that consumes too much us

        b. top -Hp pid to find the corresponding thread tid

        c. printf %x tid converted to hexadecimal tid16

        d. jstack pid | grep -C 20 tid16 to find the thread stack

sy too high:

Too many context switches. Usually, the number of threads in the system is large, and the threads are often switched. Since the system preemption is relatively reasonable relative to the switching time and times, the sy is too high usually because the CPU is actively given up, such as sleep or lock wait, io wait.

wa is too high:

The cpu waiting for io accounts for a large proportion.

Pay attention to the difference with the above situation. The high sy caused by io wait refers to the io non-stop wait and then wakes up. Because the number is large, it leads to more context switching, emphasizing the dynamic process; and the high wa caused by io wait means that the threads of io wait account for a large proportion. When the CPU switches to this thread, it is io wait, and to that thread is also io wait, so the total CPU has a high proportion of wait.

id too high:

Many people think that a high id is good. In fact, a high id in a performance test indicates that resources are not fully utilized, or the pressure test is not in place, which is not a good thing.

02 for memory

        Regarding the memory of java applications, usually only jvm memory needs to be concerned, but in some special cases, physical memory also needs to be concerned. Regarding jvm memory, common tools include jstat, jmap, pidstat, vmstat, top

jvm memory:

Usually the occurrence of GC means that there is always a lack of space in an area and GC is triggered. And many situations that lead to abnormal gc usually hold unnecessary references without immediate release. For example, places like cache are easy to handle poorly and cause memory leaks to cause abnormal gc.

It is possible that the behavior of the program is normal, but the abnormal gc is caused by not configuring the appropriate gc parameters. In this case, it is usually necessary to tune the gc parameters or the heap generation size parameters.

What happens with Full gc:

permanent full

old man

The average size of minor gc promotion to the old generation is greater than the remaining size of the old generation

Promotion fail or concurrent mode fail in CMS gc

OOM:

OOM is often accompanied by abnormal gc. The reason why it is taken out separately is that it is more harmful. At most, abnormal gc is collected too fast or cannot reclaim memory, but at least there is a buffer time, but if OOM occurs, the problem will be serious.

In the heap area, too many objects are created or hold too many invalid references (leaks) or insufficient heap memory allocation. Use jmap to find the distribution of objects in memory, and use ps to find the corresponding process and initial memory configuration.

stack area, incorrect recursive calls.

In the perm area, there are too many initially loaded packages and insufficient allocated memory.

In the off-heap memory area, the allocated ByteBuffer is not released.

03 for I/O

        Useful tools for network IO include sar and netstat, which are very powerful commands that can help troubleshoot many problems. Tools for file IO include pidstat and iostat

1. File I/O:

Technically speaking, the measure that can be taken for large file IO is asynchronous batch processing, which is used to cut peaks and accumulate buffers in an asynchronous manner. Using batch processing can make disk seeks continuous and thus faster.

2. Network IO: The problem of network IO is more complicated, just to name a few common ones

Lots of TIME_WAIT.

According to the TCP protocol, the party that initiates the closure of the connection actively closes the connection on its own side and then receives the close request from the party that passively initiates the shutdown, and then changes the state to TIME_WAIT and waits for 2MSL, in order to wait for its own receipt to be sent to the other party. If a large amount of TIME_WAIT is found on the server, it means that the server actively disconnects. Under what circumstances the server will actively disconnect, it is likely that the client forgot to disconnect, so a typical case is that the jdbc connection is forgotten to close, and the database server may have a large number of TIME_WAIT states.

Lots of CLOSE_WAIT.

In the CLOSE_WAIT state, after receiving the close connection from the party that actively closes the connection, the passive close party enters the CLOSE_WAIT state. If it is hanged at this time and does not perform subsequent closes, a large number of CLOSE_WAIT will appear. Under what circumstances will it be hanged? To give a few examples, for example, I forgot to close the database connection just now. On the application server side, a large number of browser requests come in. Since there is no connection pool connection, the connection is hung. At this time, the browser waits for a certain period of time and sends a request to close the connection. Since the servlet thread is hung on the application server side, there is naturally no way to go back to the second close. So there are a lot of CLOSE_WAIT in the application server.

Another example is the pitfall of httpClient. It will not do inputStream.close() before calling response.getEntity(); if it returns before calling response.getEntity(), it will be a dog.
————————————————
Copyright statement: This article is an original article of CSDN blogger "Rejoice in the Testing World", following the CC 4.0 BY-SA copyright agreement, please attach the original source link and this statement for reprinting.
Original link: https://blog.csdn.net/m0_67695717/article/details/128865876

Guess you like

Origin blog.csdn.net/liuqinhou/article/details/131732018