Problems with stress testing

(What) What is a stress test

Software stress testing is a fundamental quality assurance activity that is part of every major software testing effort. The basic idea of software stress testing is simple: instead of running manual or automated tests under normal conditions, run tests under conditions with a small number of computers or low system resources. Resources that are typically software stress tested include internal memory, CPU availability, disk space, and network bandwidth.

Stress testing covers performance testing, load testing, concurrency testing, etc. These test points are often intertwined and coupled together.

What's wrong with stress testing

Let me summarize a few more points:

The operating system is installed by default, and the stress test is performed without any optimization
Does not consider the impact of disk IO on software
Does not consider the impact of network bandwidth on software
Network software testing, does not take into account the characteristics of TCP
Various timeout parameter optimizations
Test client not optimized
Misunderstanding of concurrency
WEB server, database, etc. servers are not optimized

If the above items are not optimized, the stress test data basically has no reference value. If any item is not optimized, it will lead to deviations in your stress test data. Let me explain one by one:

Operating system problems The operating system is a popular software, and the factory optimization is for the general public, and it is impossible to optimize for a certain field. So our first step is to optimize the operating system. Linux system optimizes the kernel parameters, Windows system optimizes the registry and so on.
Disk IO This is where the bottleneck is most likely to occur. Often, the CPU has not reached its limit and the disk has been overwhelmed.
Network IO is the same as disk IO
Almost all B/S, C/S software for TCP connection adopts multi-threading or multi-process technology. A feature of this technology is that the developer designs the program in an automatic thread scaling mode. After the process is started, a small number of threads will be started. When the connection continues to increase, the number of threads will gradually increase, and as the thread ends, the number of threads will gradually decrease. Such a design will make more efficient use of hardware resources, ceding hardware resources to other processes when the program is idle. Few software are designed to open a service to monopolize resources. In this way, when testing software for stress testing, many requests cannot be concurrently performed at one time, but a gradual increase method should be adopted. Otherwise, some concurrency cannot be responded in time in the first test, resulting in deviation of test data. In addition, you can make several more stress requests (let multi-threading work), record the test data from the third time, and ignore the previous two test data.

Tip: Another problem is TCP connection multiplexing, which is also an important configuration item. If this item is not configured, the data I want to test will also be biased

Timeout parameter Timeout parameter is a very important parameter in stress testing. For example, the connection timeout from WEB to database is 60 seconds. If there is an SQL query that exceeds 300 seconds, the subsequent requests will continue to wait in queue. When the number of connections reaches the maximum of the database When connected, all subsequent requests fail. Usually our WEB server timeout will not exceed 30 seconds, sometimes I set it to 10 seconds, once the timeout occurs, it is better to let the connection Timeout, not to let it affect the overall service.
Many network software on the client side needs to send stress test requests from the client side, so the optimization of the client side is also necessary. Otherwise, the pressure on the client side cannot go out and the pressure on the server side cannot come in.
Concurrency Many people think that concurrency is the maximum number of connections at the same time, which is wrong. If you write multithreaded programs, you will find that multithreading runs regularly. It is run sequentially in a queue, not at the same time at all. So concurrency refers to the total number of connections that can be completed in relative time, for example, concurrency per second, concurrency per minute, etc., usually we have seconds as the unit. The operating system we are currently using is called a time-sharing operating system. The feature of this system is that it is possible to achieve multi-user and multi-tasking. The operating system queues (prioritizes) processes to run in round robin, but this operation is so fast that you think multiple processes are running at the same time.
Server optimization The main B/S software stress test, WEB, cache, database and other servers need to be optimized one by one to the best state

(Why) Why do stress tests

If these problem elements are taken into account in the software design stage, and strictly implemented in the development stage. Then to develop some software, there is almost no need to do this tiring stress test.

Therefore, flexibility, scalability, reliability and performance must be considered in the software design stage, as well as high availability and load balancing.

At the same time, software optimization is accompanied by development, continuous integration, continuous testing, and continuous deployment.

(Where) where to do the stress test

Some software requires closed environment testing and cannot be tested in a shared resource environment. So it is necessary for you to do Vlan isolation, or even separate routers and switches to test in a closed network.

(When) when to do the stress test

Stress testing is possible at any time, so why do I focus on "time"? Currently affected by the earth's rotation, there are often leap seconds, you don't need to consider this issue.

(Who) People involved in the stress testing process

Operation and maintenance department
Development department
testing department

(How) How to do a stress test

Below we give some examples to describe the stress testing method. Due to the limited space, it is impossible to cover everything. I just provide you with ideas.

Before testing, you need some monitoring tools to monitor server resource changes.

For example, a web server stress test, the test scenario is nginx:

    worker_processes  8;            处理器数
    worker_rlimit_nofile 65530;     允许最多打开文件数
    worker_connections  4096;       最大连接数数为
    keepalive_timeout  65;          开启复用连接
    gzip  on;                       压缩传输数据

How to test it? Are you looking to maximize performance? Or relative performance? We usually want relative performance that meets our needs, not maximizing performance. why? Because there are many configuration sacrifices to make to maximize performance, such as turning off logging, disabling access time, and so on.

According to the above configuration, your test case should be 8,000 to 10,000 concurrent 4,000 requests each time. You cannot test 8,000 concurrently with 4,000 requests. It is a mistake that many people often make, so the tester needs to connect the configuration parameters of the system, and cannot blindly use digital experiments.

As I said above, the opening of the thread gradually increases with the request, so the test data is inaccurate for the first launch, and the number of threads can be seen through the pstree command. After the third time, the number of threads is gradually increased to 4096, and the previously opened TCP can be reused. At this time, the test results are more convincing.