What is stress testing?

Load Testing is a testing method that tests the performance, stability, and reliability of the system under test by simulating the concurrency, access, or load of actual users in different scenarios. Stress testing can help us understand the carrying capacity of the system, identify system performance bottlenecks and potential problems, and provide data support for system optimization.
The meaning of pressure measurement is:
- Improve system performance: Through stress testing, you can find out the performance bottleneck of the system, optimize the system architecture, code and database, and improve the performance and response speed of the system.
- Guarantee system stability: Pressure testing can simulate high concurrent requests, discover system capacity limits and peak values, and avoid problems such as system downtime and crashes caused by excessive pressure.
- Ensure system reliability: Stress testing can simulate user behavior in various scenarios and test system performance under different loads to ensure system reliability and stability, thereby improving user experience.
- Reduce cost waste: system problems can be discovered through stress testing, avoiding serious problems in the production environment, thereby reducing cost waste caused by failures.
In short, stress testing is a very important testing method that can help us understand system performance and stability, improve user experience and satisfaction, and reduce failure rates and costs.

need

There are several interfaces in a file, and pressure testing needs to be performed on these interfaces. It is necessary to flexibly specify the number of concurrency and pressure measurement time

scene simulation

Suppose we specify the number of concurrency as 100, and the pressure test time is 10 minutes.
Imagine a scenario where 100 people each request an interface and continuously initiate requests within 10 minutes.
1. In reality, some people operate faster and some people operate slower.
So actually one more parameter is needed: request frequency. If the request frequency is 5 times per second, that is, 1 time per 200 ms
1. Then the corresponding actual scenario is that 100 people keep requesting, and the request frequency is 5 requests per second

How to develop?

Inspired by the open source repository go-stress-testing code. The development steps are as follows

step 1 Create a scene

First you need to create such a scene. In this scenario, we specify the number of concurrency, pressure test time, and request frequency as three parameters

func NewSceneWithInterval(name string, duration int, interval int) *Scene {
    
    
    return &Scene{
    
    
        Name:      name,
        // 设置默认每秒钟执行一次
        Frequency: time.Second,
        // 设置测试场景执行 duration 分钟
        Duration: time.Duration(duration * 60) * time.Second,
        // 执行压测频率
        Interval: time.Millisecond * interval
    }
}

step 2 create task

Create tasks according to the number of concurrency
Here, if the number of interfaces to be tested is and mthe number of concurrency isn
If n > mthen reuse some interfaces to create ntasks

func NewTask(task *stress.Task, url string) {
    
    
    
	*task = *stress.NewTask("test", func() *stress.TestResult {
    
    

		var errors uint64
		var success uint64
		// 开始时间
		start := time.Now()
		resp, err := http.Get(url)
		// 统计任务执行时间，并保存到测试结果中
		elapsed := time.Since(start)
		if err != nil {
    
    
			// 发生错误，请求失败
			errors = 1
			success = 0
		}
		if resp.StatusCode == 200 {
    
    
			// 请求成功
			success = 1
			errors = 0
		} else {
    
    
			// 请求失败
			success = 0
			errors = 1
		}

		// 统计任务执行时间，并保存到测试结果中
		elapsed := time.Since(start)
		result := &stress.TestResult{
    
    
			Requests: 1,
			Errors:   errors,
			Success:  success,
			Rps:      0,
			Elapsed:  elapsed,
		}
		task.Result = result

		return result
	})
}

After the task is created, add it to the scene created in step1

step 3 run the scene

The main difficulty lies in how to simulate real-world stress testing
The function implementation here uses the go language concurrency model and makes full use of goroutine

func (s *Scene) RunWitTime() *TestResult {
    
    
    // 并发数量-即任务的数量
    concurrent := len(s.tasks)

    // 用于等待所有的任务执行完成
    wg := sync.WaitGroup{
    
    }


    // 创建一个令牌桶，用于控制并发数
    tokens := make(chan bool, concurrent)
    for i := 0; i < concurrent; i++ {
    
    
        tokens <- true
    }


    // 创建一个停止信号通道，用于控制场景测试的执行时长
    stop := make(chan bool)
    go func() {
    
    
        time.Sleep(s.Duration)
        close(stop)
    }()


    // 按照指定的频率执行请求
    ticker := time.NewTicker(s.Interval)


    // 统计请求次数、错误次数、成功次数、总耗时、成功请求耗时


    loop:
    for {
    
    
        select {
    
    
            case <-stop:
                // 停止场景测试
                ticker.Stop()
                wg.Wait()
                break loop
            default:
                // 遍历所有任务，获取令牌并执行请求
                for _, task := range s.tasks {
    
    
                    select {
    
    
                        case <- tokens:
                            // 执行请求
                            wg.Add(1)
                            go func(task *Task) {
    
    
                                // 一个请求执行完毕，释放令牌
                                defer func() {
    
    
                                    tokens <- true
                                    wg.Done()
                                }()
                                // 执行任务逻辑 并处理具体逻辑
                                //  xxx
                            }(task)
                        default:
                            // 令牌桶已满，等待重试机制
                            // 等待令牌桶中有可用令牌
                            retryTicker := time.NewTicker(time.Millisecond * 50)
                            defer retryTicker.Stop()
                            select {
    
    
                            case <-tokens:
                                retryTicker.Stop()
                                wg.Add(1)
                                go func(task *Task) {
    
    
                                    defer func() {
    
    
                                        tokens <- true
                                        wg.Done()
                                    }()
                                    // 执行任务逻辑 并处理具体逻辑
                                    //  xxx
                                }(task)
                            case <-stop:
                                ticker.Stop()
                                wg.Wait()
                                break loop
                            case <-retryTicker.C:
                                // 随机休眠 几百毫秒
                                time.Sleep(time.Duration(rand.Intn(1000)) * time.Millisecond)
                            }
                    }
                }
        }
    }

    return 结果集
}

step 4 Test and process pressure test data

Use four interfaces, count 10 concurrently, stress test for 1 minute, and stress test frequency 500ms
The test results are as follows
Calculation of other indicators
Process and convert to json string

{
    
    
    "total":815,
    "success":657,
    "error":158,
    "successRate":"80.61%",
    "rps":11,
    "avgRt":"532.00ms",
    "minRt":"53.76ms",
    "maxRt":"1033.55ms",
    "p90Rt":"708.68ms",
    "p95Rt":"727.30ms",
    "p99Rt":"758.09ms",
    "successTime":"349762.00ms",
    "allTime":"366977.00ms"
}

logic code

successRate := fmt.Sprintf("%.2f", float64(ret.Success) / float64(ret.Requests) * 100) + "%"
    //  总的成功请求次数 / 总的压测时间
    rps := uint64(math.Round(float64(ret.Success) / float64(runTime*60)))
    //  平均
    avgRt := fmt.Sprintf("%.2f", float64(ret.SuccessTime / (1000000 * ret.Success))) + "ms"
    // 对成功响应时间进行排序
    sort.Slice(ret.SuccessElapseds, func(i, j int) bool {
    
    
        return ret.SuccessElapseds[i] < ret.SuccessElapseds[j]
    })
    minRt := fmt.Sprintf("%.2f", float64(ret.SuccessElapseds[0])/1000000) + "ms"
    length := len(ret.SuccessElapseds)
    maxRt := fmt.Sprintf("%.2f", float64(ret.SuccessElapseds[length - 1])/1000000) + "ms"
    // 计算 90%Rt、95%Rt、99%Rt
    TempP90 := float64(ret.SuccessElapseds[int(float64(length)*0.9)])/1000000
    p90 := strconv.FormatFloat(TempP90, 'f', 2, 64) + "ms"  


    TempP95 := float64(ret.SuccessElapseds[int(float64(length)*0.95)])/1000000
    p95 := strconv.FormatFloat(TempP95, 'f', 2, 64) + "ms"


    TempP99 := float64(ret.SuccessElapseds[int(float64(length)*0.99)])/1000000
    p99 := strconv.FormatFloat(TempP99, 'f', 2, 64) + "ms"
    


    allElapsedTime := fmt.Sprintf("%.2f", float64(ret.ElapsedTime / 1000000)) + "ms"
    allSuccessTime := fmt.Sprintf("%.2f", float64(ret.SuccessTime / 1000000)) + "ms"


    // 打印每一个变量值
    fmt.Printf("successRate: %.2f%%\n", successRate)
    fmt.Printf("rps: %d\n", rps)
    fmt.Printf("avgRt: %v\n", avgRt)
    fmt.Printf("minRt: %v\n", minRt)
    fmt.Printf("maxRt: %v\n", maxRt)
    fmt.Printf("p90: %v\n", p90)
    fmt.Printf("p95: %v\n", p95)
    fmt.Printf("p99: %v\n", p99)
    fmt.Printf("ElapsedTime: %v\n", allElapsedTime)
    fmt.Printf("SuccessTime: %v\n", allSuccessTime)
    fmt.Printf("Total: %v\n", ret.Requests)
    fmt.Printf("Success: %v\n", ret.Success)
    fmt.Printf("Error: %v\n", ret.Errors)


    report := StressTestReportForAddress {
    
    
        Total: ret.Requests,
        Success: ret.Success,
        Error: ret.Errors,
        SuccessRate: successRate,
        Rps: rps,
        AvgRt: avgRt,
        MinRt: minRt,
        MaxRt: maxRt,
        P90Rt: p90,
        P95Rt: p95,
        P99Rt: p99,
        SuccessTime: allSuccessTime,
        AllTime: allElapsedTime,
    }


    jsonBytes, err := json.Marshal(report)


    if err != nil {
    
    
        fmt.Println(err)
        return ""
    }


    jsonString := string(jsonBytes)


    fmt.Println(jsonString)

Remember a pressure test requirements development