How to test algorithms in software testing?

Algorithms in a broad sense refer to solutions to problems, ranging from solving math problems to formulating business strategies, all of which can be called algorithms.
The algorithm in software testing we are discussing today corresponds to the English word Algorithm, which refers to programs or
instructions for computers to deal with complex problems.

With the rapid development of artificial intelligence and other fields in recent years, algorithms have received unprecedented attention, and algorithm testing has also emerged. |

In order to let students have a certain understanding of algorithm testing and the old rules, I will first publish the outline of this article as follows:

1. What does the algorithm test measure?
2. How to do the algorithm test?
3. A supplementary algorithm test case
4. Answer questions

1. What does the algorithm test measure?

Let’s first look at a few examples of successful application of well-known algorithms:

1. The AlphaGo robot defeated the human chess player, in fact, it used complex artificial intelligence algorithms

2. Apps such as Toutiao and Douyin use the interest algorithm to recommend, and the push is all you are interested in

3. Shopping software, you will be troubled by any product you have searched for - direct recommendation, which also uses the algorithm recommendation mechanism
insert image description here

Different from general functional testing, algorithm testing has a different focus.

General functional testing focuses on the correctness of data returned by the server after front-end operations (addition, deletion, modification, and checking), while algorithm testing focuses on whether the data increment (positive or negative) meets expectations after the algorithm (model) is enabled.

give two examples

Example 1: The advertising recommendation in a certain software has updated a set of algorithms, and it is expected to increase the recommendation conversion rate by several percentage points.

[The focus of the test is] whether the new algorithm has increased the conversion rate percentage, and whether the increase has reached expectations.

Example 2: A face recognition software has updated the recognition algorithm, which is expected to reduce the time-consuming recognition.

[The focus of the test is] whether the accuracy rate has decreased while the time consumption is reduced.

2. How to test the algorithm?

1. The method used in ordinary functional testing

1) Algorithm stability test

a) Long-running, whether the algorithm crashes

b) After the amount of data is increased, whether the results of the algorithm model meet expectations

2) Algorithm performance test

a) Response time of the algorithm model

b) CPU and disk consumption of the algorithm model

3) Algorithm compatibility test

a) Set the data within different thresholds, and whether the algorithm results are stable (such as user age, region, gender, etc.)

The test method mentioned above, does it seem familiar? That's right, these test methods are basically the same as those used in ordinary functional tests

2. Test methods that are not used in ordinary functional tests

1) Algorithm pk (horse racing)

For a requirement, different people or teams can design different algorithm models, which one is more reliable, whether it is a mule or a horse, you can know it at a glance.

This link is a key part of algorithm testing. The use case design mainly adopts the scenario method. By enumerating different scenarios, multiple algorithms are tested and verified separately. Finally, the performance of the algorithm models in all scenarios is comprehensively selected to select the top few.

You may have doubts: Why do you need the top few? Isn’t it enough to choose the first place?

The test cases in this link are listed below through examples, which can better understand the reasons for designing the use cases in this way (not listed here for the time being).
insert image description here

2) A/B testing

Since the accuracy of the algorithm will be affected by the test data, in the test environment, the source of the data is generally manually inserted into the database or imported from the line.

Although the test data will be close to the real data, there will still be problems such as insufficient coverage of data types and insufficient data volume. Therefore, even if the algorithm model passes the acceptance test in the test environment, it still cannot be fully scaled in the production environment.

The usual method is: take out 5%-10% of the online traffic, part of the data is used as the control group, and the other part is used as one or more experimental groups (the algorithm used by the experimental group is the top algorithm in pk) . The data of the control combination experimental group are marked with different marks, and after a period of time, the statistically calculated items refer to the indicators of the control combination experimental group, and the effectiveness of the algorithm is verified according to the key indicators.
insert image description here

In the actual test, the selection of the algorithm is often not determined by one indicator, but usually by a comprehensive comparison of multiple indicators.

Seeing this, if you are still a little confused, don't worry, we also use the following example to illustrate.

3. Algorithm test example

A navigation APP needs to upgrade the navigation route recommendation algorithm, and it is expected to find a less time-consuming route and recommend it to the user.

First of all, let's understand the keyword "less time-consuming" in the requirements: time-consuming is not equal to distance. It is possible that the distance is short but there is a traffic jam, which actually takes longer than detours.

After n days of research and development, the algorithm classmate finally gave 3 optimized algorithm models, and now it is time for testing and verification.

For the convenience of description, I call the old navigation route recommendation algorithm Algorithm 0, and the new algorithms are Algorithm 1, Algorithm 2, and Algorithm 3

1. Algorithm pk (the following is a list of scenario-based test cases)

For the same route, the default is the current time and weather conditions. After multiple rounds of testing, the optimal algorithm is selected, assuming Algorithm 2

For the same route, set different time periods (morning and evening peak hours, working days, holidays, etc.) to find the optimal algorithm, assuming it is Algorithm 1

For the same route, set different weather conditions (rain, snow, thunder, sand, hail, etc.) to find the optimal algorithm, assuming it is Algorithm 3

There are many other scenarios for testing, so I won’t list them one by one here...

2. Stability test

Run the algorithm model for a long time (more than 24 hours), whether it has stable performance, assuming that the optimal algorithm here is Algorithm 1

Ultra-long distance (above 1000km) test algorithm model, compare whether the recommendation effect is stable, assuming that the optimal algorithm here is Algorithm 2

……

3. Compatibility test

Select road tests in different cities (such as Chongqing, Guizhou, etc.), and test the performance of the navigation recommendation algorithm under different road conditions such as mountain roads, climbing, and turning. Assume that the optimal algorithm here is Algorithm 1

Select roads with different traffic capacities (urban and country roads, etc.), and test whether the recommendation algorithm works under different road conditions such as small roads and narrow roads.

Stable performance, assuming the optimal algorithm here is Algorithm 3

……

4. Performance test

For the time-consuming of different algorithm models under the same route, it is assumed that the algorithm with the least time-consuming is Algorithm 3

Under the same route, the pressure of different algorithm models on the server, assuming that the final result is that algorithm 2 has the least pressure on the server

……

5. AB test

After the tests of the above several links, all the results are integrated, assuming that the final selected Algorithm 1 and Algorithm 3

The online gray scale is heavy, select the target users, and the selected users will receive a message like "whether to participate in the closed beta/gray scale"

After a period of A/B testing, get the real data, and finally select an algorithm that meets expectations after discussions with architects, R&D managers, and product managers.

Fourth, answering questions

In the above example, it is assumed that Algorithm 2 is finally adopted, but you may have many questions:

1. Algorithm 1 performs best during morning and evening peak hours. I just use navigation software for commuting, why not use it?
2. Algorithm 3 takes the least time. I just want the recommended route to be faster, why not use it?

Algorithm testing needs to go through multiple links to comprehensively evaluate the effect, so even if a certain link performs well, it may not be selected in the end.

In the end, various factors such as effect, cost, and stability will be combined, and a compromise will often be chosen in the end.

How to test the algorithm in the software? Do you understand?

Guess you like

Origin blog.csdn.net/nhb687096/article/details/131807925