Performance test monitoring indicators and analysis and tuning guide

1. What factors will become the bottleneck of the system?

CPU: If there are a lot of calculations, they will occupy CPU resources continuously for a long time, causing other resources to be unable to compete for the CPU and responding slowly, thus causing system performance problems, such as frequent FullGC, and frequent context caused by multi-threading. Switching will cause the CPU to be busy. Generally, it is more appropriate for the CPU usage to be <75%.

Memory: Java memory is generally allocated through jvm memory, and the heap memory in the jvm is mainly used to store objects created by Java. The read and write speed of memory is very fast, but the memory space is limited. When the memory space is full and the objects cannot be recycled, it will lead to memory overflow or memory leak.

Disk I/O: The storage space of disk is much larger than that of memory, but the read and write speed of disk is slower than that of memory. Although SSD solid state drive is now introduced, it still cannot compare with the speed of memory.

Network: The size of the bandwidth will have a great impact on the transmission of data. When the amount of concurrency increases, the network can easily become a bottleneck.

Exception: If a Java program throws an exception, it must be caught. This process consumes performance. If exception handling is continued under high concurrency, the performance of the system will be affected.

Database: Database operations generally involve disk I/O reading and writing. A large number of database reading and writing operations will lead to disk I/O performance bottlenecks, which in turn will lead to delays in database operations.

When programming concurrently, multi-threads are often used to operate the same resource. At this time, in order to ensure the atomicity of the data, locks must be used. The use of locks will bring about context switching, thereby causing performance overhead. In JDK1. After 6, bias lock, spin lock, lightweight lock, lock coarsening, and lock elimination were added.

2. What indicators are used to measure the performance of the system?

1.RT response time

Database response time, that is, the time of database operations

Server response time, the server includes the time consumed by requests distributed by Nginx and the time consumed by server program execution.

Network response time, network transmission, and the time it takes for network hardware to parse transmitted requests.

Client response time, for general Web and App clients, the time consumed can be ignored. However, if the client has a large amount of logical processing, the time consumed may become longer.

2.TPS throughput

Disk throughput: IOPS (Input/Output Per Second) input and output per second. This is the number of I/O requests that the system can handle per unit time. I/O requests are usually read or write data operation requests. Pay attention to randomness. Reading and writing performance, suitable for applications with frequent random reading and writing, such as small file storage and mail servers. Data throughput, which is the amount of data that can be transmitted per unit time, is used for applications that require a large number of sequential reads and writes, and a large amount of continuous data, such as video editing.

Network throughput: refers to the maximum data rate that the device can accept without frame loss during network transmission. Network throughput is not only related to bandwidth, but also closely related to CPU processing power, network card, firewall, and I/O. The throughput is determined by the processing power of the network card, internal program algorithm, and bandwidth.

3. Resource usage

For CPU usage, you can first understand the basic information of the CPU, including the number of physical CPUs and the number of cores of a single CPU. Then you can check the usage through commands, vmstat, mpstat, top

Memory usage, free -m, vmstat, top

Disk I/O, iostat, iotop

Network I/O, netstat, ifconfig, tcpstat

3. Issues to note when performing performance testing

When we do performance testing, the system will run faster and faster, and subsequent access speeds are several times faster than our first access speed. This is because the order of Java language compilation is that the .java file is first compiled into .class file, and then convert the .class bytecode into local machine code through the interpreter before it can be run.

In order to save memory and execution efficiency, when the code is initially executed, the interpreter will first interpret and execute this code. As the number of times the code is executed increases, the virtual machine finds that a certain method or code is running particularly frequently, and it is identified as a hot spot code.

In order to improve the execution efficiency of hot code, the virtual machine will compile these codes into local platform-related machine code through a just-in-time compiler (JIT) during runtime, and then store it in the memory. obtained from. This will cause the system to run slowly for the first time and subsequent access times to be several times faster.

When doing performance testing, the data set processed in each test is the same, but the results are different. This is because the test is accompanied by many unstable factors, such as the influence of other processes on the machine, network fluctuations, and each The stages of JVM garbage collection are different. We can average the test results through multiple tests. As long as the average value is within a reasonable range and the fluctuation is not large, the performance test will be passed.

4. When locating performance problems, you can use bottom-up strategy analysis and troubleshooting.

After we perform the stress test, we will output a performance test report , which includes RT, TPS, TP99, the CPU, memory, I/O of the stressed server, and the GC frequency of the JVM. Performance bottlenecks can be found through these indicators, and we can analyze them in a bottom-up manner.

1. First, from the operating system level, check whether the system's CPU, memory, I/O, and network usage are abnormal, then use commands to find abnormal logs, and finally find the cause of the bottleneck through log analysis.

2. You can also check the JVM's garbage collection frequency and memory allocation from the JVM level of the Java application to see if there are any abnormalities, analyze the garbage collection logs, and find the cause of the bottleneck.

3. If there are no exceptions at the system and JVM levels, you can then check whether there are performance bottlenecks from the application service business layer, such as Java programming issues, reading and writing database bottlenecks, etc.

5. When optimizing performance issues, you can use top-down strategies for optimization.

The overall tuning sequence can be from business tuning to programming tuning, and finally to system tuning.

1. Application layer tuning

The first is to optimize the code. Code problems are often exposed due to the consumption of system resources. For example, the code causes memory overflow, causing the JVM to run out of memory, and frequent FullGC occurs, causing the CPU to be high.

The second is to optimize the design, mainly to optimize the business layer and middleware layer code. For example, you can use the proxy mode and place it in the scenario of frequently called object creation to share a created object and reduce the consumption of creating objects.

The third step is to optimize the algorithm and choose the appropriate algorithm to reduce the time complexity.

2. Middleware tuning: MySQL tuning

1) Table structure and index optimization

Mainly to optimize the database design, table structure design and index setting dimensions. When designing the table structure, consider the horizontal and vertical expansion capabilities of the database, plan in advance for future growth in data volume, read and write volume, and plan sub-databases. Sub-table plan. Choose appropriate data types for fields, preferring smaller data structures.

2) SQL statement optimization

Mainly to optimize SQL statements, use explain to view the execution plan and see whether and which indexes are used. You can also use the Profile command to analyze the time spent at each step during statement execution.

3) MySQL parameter optimization

Mainly to optimize the configuration of the MySQL service, such as the management of the number of connections, and the optimization of various cache sizes such as index cache, query cache, sort cache, etc.

4) Hardware and system configuration

Optimize hardware devices and operating system settings, such as adjusting operating system parameters, disabling swap, increasing memory, and upgrading solid-state drives.

3. System tuning

The first is operating system tuning. The kernel parameter settings for Linux operations can be tuned to achieve the purpose of providing high performance.

Secondly, JVM tuning, setting up reasonable JVM memory space and garbage collection algorithm to improve performance. For example, if the business logic will create large objects, we can set up to put the large objects directly into the old generation, which can reduce YongGC occurs frequently in the young generation, reducing CPU usage time.

4. Tuning strategies

The first is to trade time for space. Sometimes the system does not have high requirements for query speed but high requirements for storage space. At this time, we can consider trading time for space.

Secondly, space is exchanged for time, and storage space is used to improve access speed. A typical example is MySQL's sub-database and table strategy. When MySQL form data is stored in excess of tens of millions, read and write performance will decline. At this time, we can split the data. , in order to achieve the purpose of improving performance when querying, the data in each table is small.

5. Divestment strategy

After the system is tuned, there will still be performance problems. At this time, we need to have a cover-up strategy. The first is to limit the flow, set the maximum access limit for the system entrance, and at the same time, take circuit breaker measures to return unsuccessful requests. The second is horizontal expansion. When the number of visits exceeds a certain threshold, the system can automatically increase services horizontally.

Finally, I would like to thank everyone who reads my article carefully. Reciprocity is always necessary. Although it is not a very valuable thing, if you can use it, you can take it directly:

Insert image description here

This information should be the most comprehensive and complete preparation warehouse for [software testing] friends. This warehouse has also accompanied tens of thousands of test engineers through the most difficult journey. I hope it can also help you!   

Guess you like

Origin blog.csdn.net/YLF123456789000/article/details/133273735