Delta Debugging
Author: Andreas Zeller
Motivation
Tudou was so strong that he changed a thousand lines overnight, and Huang Laoxian felt that his thinking was clear and logical and he couldn't run smoothly!
18000行的小明带飞龙这波,要做这个优化轻而易举啊。
哎呀,奶不死的啊,这怎么奶死嘛?**老子是专业码农好吗?这怎么奶死嘛?专业码农这种局面还看不懂啊?
F5了!F5了!FFFF5了,让你们看看什么叫专业码农好吗?直接骑脸了好吗。什么叫飞龙骑脸。
测试集选得好有什么用嘛?
…吔?…别啊?!哎~!呀~!这解说不下去了,哎呀这!!呃啊!
为什么会这样?别打那么惊险呐!!你别害我呀!
**这个罪名我背不起呀!!我背不住这个罪名啊我凑!!哎呀...遭不住啦...
Now the problem is coming. Huang Laoxian milk died of Tuo Ming, he has to find a way to help Tuo Ming debug, but he is too lazy to get started personally, what should he do?
Shortcomings of traditional solutions
Regression Containment uses a linear test method, which is very effective in many situations. But there are also the following aspects that did not perform well
- Interference: Every change to your own work is ok, but the combination will make mistakes (this is often the case in cooperative development)
- inconsistency: Some combinations of changes are added, and maybe there is no way to generate a program that can be used for testing (for example, the compilation fails)
- granularity: A logical change may affect thousands of lines of code, but only a few of them actually caused this error. It’s useless to just point out this big piece, you need to find the wrong place more precisely.
The author claims that his dd+ (this abbreviation is imaginative) algorithm can
- Interference detected in linear time
- Independent error changes detected within log time
- Efficient processing of inconsistency, which can support input more detailed changes (fine-granular)
Basic definition
Configuration: The subset c of the full change set C is called a configuration
Baseline: If c is an empty configuration, it is called a baseline
Three output possibilities:
- Pass ✔
- Fail ✖
- Unresolved ?
Test: A function that maps configuration to three possible outputs is called test
Mistake set: If for any C subset c', as long as c is included, then the test result of c'is not ✔, then c is called failure − inducing change set failure-inducing\ change\ setfailure−inducing change set
Minimal false set: For a false set B, if any subset of it is no longer tested as ✖, then it is called the minimum false set
- Obviously, the minimum set of errors is what we are searching for
Ideal situation
Monotony: If the test result of configuration c is ✖, then the test result of all configurations containing c is also ✖
- Corollary: If the test result of c is ✔ at this time, then the test results of all its subsets are not ✖
Unambiguity (unambiguity, that is, the configuration that essentially causes the error is unique): if c 1 c_1c1, c 2 c_2 c2The test results of both are ✖, then the test result of their intersection is not ✔
- Unambiguity does not allow two or more disjoint configurations to produce errors respectively, which means that the cause of the error can be said to be the only one to a certain extent
- This can save a lot of overhead. When you find an error in the subset c of C, you do not need to consider the complement of c (the complement of c must be ✔, otherwise the baseline test result will be ✖ )
Consistency: The test result of any configuration is not **? **
Optimal disposal method
The so-called ideal situation is that the test set is monotonous, unambiguous and consistent. At this time, we find errors based on dichotomy.
- When test(left)=✖, continue to search in the left half of the collection
- When test(right)=✖, continue to search in the right half of the collection
- When both tests are ✔, it means that some changes together caused an error (interference)
- At this point, first keep all applied on one side and find the changes necessary to cause an error on the other side; then keep all applied on the other side and find the changes necessary to cause an error on this side
- Combine the two necessary parts to find the detection target
How to deal with non-ideal situations
ambiguous
When there are multiple groups of changes that cause the problem, dd will correctly return one of them
At this time, just delete this group in the complete set, call dd to get other errors, repeat this process to get all the absurd sets
not monotone
If test(a)=✖, but existence b contains a and test(b)=✔, then a is not strictly said to be wrong (configuration a bug has been fixed in configuration b). But C that contains b is still wrong, which means that the part that caused the error should be in Cb (or interference). Eliminate all the errors that have been fixed, the remaining errors can be found through dd.
inconsistency (complex)
Arbitrary selection of configuration can easily cause inconsistency problems. Here are a few reasons
- Integration failure: A change cannot be applied. It is possible that this change is based on some earlier changes, but that change has not been added to the configuration; it is possible that change a and change b are in conflict. The conflict resolution change c was originally written, but c was not added. Knife configuration
- Generation failure: Although the selected changes are all thrown in, they may cause syntax or semantic errors, so there is no way to generate programs
- Execution error: Missing some parts, the program may not be executed correctly. The output of test is undefined.
It is unrealistic to verify the consistency of all combinations in advance, and then we consider how to deal with the unresolve situation.
Consider the worst case, that is, after we divide C into subsets, all the subset test results are unresolved, then what combination should we consider to find a valid result as soon as possible?
- Try to add as few changes as possible (close to yesterday) (the more subsets are divided, the smaller the subset size, the greater the possibility of obtaining consistent results)
- Change as much as possible (close to today)
Before further considering the dd+ algorithm, we define the following three situations
- Found:如果 t e s t ( c i ) = ✖ test(c_i)=✖ test(ci)=✖ , thenci c_iciIt contains an absurd set, which is the same as dd
- Interference: If any test result of c and its complement is ✔, then the complement of c and c constitutes an interference relationship, which is also the same as dd
- Preference (Priority): If c is uncertain and the complement of c is passed, then first search for d, d = c ‾ ∪ c ′ d, d=\overline c \cup c'd,d=c∪c′ Wherec ′ ⊆ c c'\sube cc′⊆c . It is equivalent to using the complement of c as the baseline to search for a subset of c (this can effectively reduce the possible range of the absurd set)
- Try again: If the above conditions are not satisfied, then we will increase the number of subsets to 2n, re-divide, and then run again
- After each run, if ci is passed, then ci can be removed from C, because they cannot be the configuration that caused the error.
- Similarly, if the complement of c fails the test, then c can always be placed in applied
Avoid Inconsistency
Reconsider the issue of inconsistency. If we suspect that some changes are related to each other at the beginning, then we can treat them as a change early, or always put them in a subset, which can reduce a lot of unresolved test cases.
Prior Knowledge
- Changes with similar changes in time and sources are more likely to be relevant
- Changes that operate on the same file or the same directory are more likely to be related
- Changes that use the same reference or use similar identifiers are more likely to be related
- Changes that affect sentence entities of the same function (function) and module (module) are more likely to be related
- Changes that have a similar semantic impact are more likely to be relevant
Predicting Test Outcomes
- If we knew from the beginning that certain changes are interconnected, then we can actually predict some of them? **Results, and there is no need to actually test
- If our changes are in order and always use pre-modification, then we can find that many configurations do not need to be tested
Time complexity summary
All three good properties
Worst: O(n)
error set has only one change: O(logn)
Ambiguous (with multiple errors)
The dd algorithm still returns one of the errors in O(n) time
Not Monotone
Suppose a is wrong, a is a subset of b, and b is right, then the dd algorithm returns an error in cb in O(n) time
inconsistent
The author does not seem to give an analysis
Future Work
Further solve Avoid Inconsistency
Use domain knowledge (by exploiting domain knowledge, I can't understand what this means at all)
Use a complete change file management restriction system (unintelligible)
The main direction that can be improved is to useSemantic levelRelevance. For a program, we can easily maintain a PDG (Program Dependency Graph) to describe the relationship between code functions and modules, so that when we apply a change to a certain module, we can easily find out which parts it will be related to . According to the correlation between change and node, we can divide the whole graph into many slices. Changes in the same slice (that is, semantically related) will be placed in the same subset
Remove gray code
Some changes will not be executed during operation. The author wants to use the code coverage tool to eliminate these changes directly.