Paper Reading Note:Delta Debugging

Delta Debugging

Author: Andreas Zeller

Motivation

Tudou was so strong that he changed a thousand lines overnight, and Huang Laoxian felt that his thinking was clear and logical and he couldn't run smoothly!

18000行的小明带飞龙这波,要做这个优化轻而易举啊。

哎呀,奶不死的啊,这怎么奶死嘛?**老子是专业码农好吗?这怎么奶死嘛?专业码农这种局面还看不懂啊?

F5了!F5了!FFFF5了,让你们看看什么叫专业码农好吗?直接骑脸了好吗。什么叫飞龙骑脸。

测试集选得好有什么用嘛?

…吔?…别啊?!哎~!呀~!这解说不下去了,哎呀这!!呃啊!

为什么会这样?别打那么惊险呐!!你别害我呀!

**这个罪名我背不起呀!!我背不住这个罪名啊我凑!!哎呀...遭不住啦...

Now the problem is coming. Huang Laoxian milk died of Tuo Ming, he has to find a way to help Tuo Ming debug, but he is too lazy to get started personally, what should he do?

Shortcomings of traditional solutions

Regression Containment uses a linear test method, which is very effective in many situations. But there are also the following aspects that did not perform well

  • Interference: Every change to your own work is ok, but the combination will make mistakes (this is often the case in cooperative development)
  • inconsistency: Some combinations of changes are added, and maybe there is no way to generate a program that can be used for testing (for example, the compilation fails)
  • granularity: A logical change may affect thousands of lines of code, but only a few of them actually caused this error. It’s useless to just point out this big piece, you need to find the wrong place more precisely.

The author claims that his dd+ (this abbreviation is imaginative) algorithm can

  • Interference detected in linear time
  • Independent error changes detected within log time
  • Efficient processing of inconsistency, which can support input more detailed changes (fine-granular)

Basic definition

Configuration: The subset c of the full change set C is called a configuration

Baseline: If c is an empty configuration, it is called a baseline

Three output possibilities:

  • Pass ✔
  • Fail ✖
  • Unresolved

Test: A function that maps configuration to three possible outputs is called test

Mistake set: If for any C subset c', as long as c is included, then the test result of c'is not ✔, then c is called failure − inducing change set failure-inducing\ change\ setfailureinducing change set

Minimal false set: For a false set B, if any subset of it is no longer tested as ✖, then it is called the minimum false set

  • Obviously, the minimum set of errors is what we are searching for

Ideal situation

Monotony: If the test result of configuration c is ✖, then the test result of all configurations containing c is also ✖

  • Corollary: If the test result of c is ✔ at this time, then the test results of all its subsets are not ✖

Unambiguity (unambiguity, that is, the configuration that essentially causes the error is unique): if c 1 c_1c1 c 2 c_2 c2The test results of both are ✖, then the test result of their intersection is not ✔

  • Unambiguity does not allow two or more disjoint configurations to produce errors respectively, which means that the cause of the error can be said to be the only one to a certain extent
  • This can save a lot of overhead. When you find an error in the subset c of C, you do not need to consider the complement of c (the complement of c must be ✔, otherwise the baseline test result will be ✖ )

Consistency: The test result of any configuration is not **? **

Optimal disposal method

The so-called ideal situation is that the test set is monotonous, unambiguous and consistent. At this time, we find errors based on dichotomy.

  • When test(left)=✖, continue to search in the left half of the collection
  • When test(right)=✖, continue to search in the right half of the collection
  • When both tests are ✔, it means that some changes together caused an error (interference)
    • At this point, first keep all applied on one side and find the changes necessary to cause an error on the other side; then keep all applied on the other side and find the changes necessary to cause an error on this side
    • Combine the two necessary parts to find the detection target

How to deal with non-ideal situations

ambiguous

When there are multiple groups of changes that cause the problem, dd will correctly return one of them

At this time, just delete this group in the complete set, call dd to get other errors, repeat this process to get all the absurd sets

not monotone

If test(a)=✖, but existence b contains a and test(b)=✔, then a is not strictly said to be wrong (configuration a bug has been fixed in configuration b). But C that contains b is still wrong, which means that the part that caused the error should be in Cb (or interference). Eliminate all the errors that have been fixed, the remaining errors can be found through dd.

inconsistency (complex)

Arbitrary selection of configuration can easily cause inconsistency problems. Here are a few reasons

  • Integration failure: A change cannot be applied. It is possible that this change is based on some earlier changes, but that change has not been added to the configuration; it is possible that change a and change b are in conflict. The conflict resolution change c was originally written, but c was not added. Knife configuration
  • Generation failure: Although the selected changes are all thrown in, they may cause syntax or semantic errors, so there is no way to generate programs
  • Execution error: Missing some parts, the program may not be executed correctly. The output of test is undefined.

It is unrealistic to verify the consistency of all combinations in advance, and then we consider how to deal with the unresolve situation.

Consider the worst case, that is, after we divide C into subsets, all the subset test results are unresolved, then what combination should we consider to find a valid result as soon as possible?

  • Try to add as few changes as possible (close to yesterday) (the more subsets are divided, the smaller the subset size, the greater the possibility of obtaining consistent results)
  • Change as much as possible (close to today)

Before further considering the dd+ algorithm, we define the following three situations

  • Found:如果 t e s t ( c i ) = ✖ test(c_i)=✖ test(ci)= , thenci c_iciIt contains an absurd set, which is the same as dd
  • Interference: If any test result of c and its complement is ✔, then the complement of c and c constitutes an interference relationship, which is also the same as dd
  • Preference (Priority): If c is uncertain and the complement of c is passed, then first search for d, d = c ‾ ∪ c ′ d, d=\overline c \cup c'd,d=cc Wherec ′ ⊆ c c'\sube ccc . It is equivalent to using the complement of c as the baseline to search for a subset of c (this can effectively reduce the possible range of the absurd set)
  • Try again: If the above conditions are not satisfied, then we will increase the number of subsets to 2n, re-divide, and then run again
    • After each run, if ci is passed, then ci can be removed from C, because they cannot be the configuration that caused the error.
    • Similarly, if the complement of c fails the test, then c can always be placed in applied

Avoid Inconsistency

Reconsider the issue of inconsistency. If we suspect that some changes are related to each other at the beginning, then we can treat them as a change early, or always put them in a subset, which can reduce a lot of unresolved test cases.

Prior Knowledge

  • Changes with similar changes in time and sources are more likely to be relevant
  • Changes that operate on the same file or the same directory are more likely to be related
  • Changes that use the same reference or use similar identifiers are more likely to be related
  • Changes that affect sentence entities of the same function (function) and module (module) are more likely to be related
  • Changes that have a similar semantic impact are more likely to be relevant

Predicting Test Outcomes

  1. If we knew from the beginning that certain changes are interconnected, then we can actually predict some of them? **Results, and there is no need to actually test
  2. If our changes are in order and always use pre-modification, then we can find that many configurations do not need to be tested

Time complexity summary

All three good properties

Worst: O(n)
error set has only one change: O(logn)

Ambiguous (with multiple errors)

The dd algorithm still returns one of the errors in O(n) time

Not Monotone

Suppose a is wrong, a is a subset of b, and b is right, then the dd algorithm returns an error in cb in O(n) time

inconsistent

The author does not seem to give an analysis

Future Work

Further solve Avoid Inconsistency

Use domain knowledge (by exploiting domain knowledge, I can't understand what this means at all)

Use a complete change file management restriction system (unintelligible)

The main direction that can be improved is to useSemantic levelRelevance. For a program, we can easily maintain a PDG (Program Dependency Graph) to describe the relationship between code functions and modules, so that when we apply a change to a certain module, we can easily find out which parts it will be related to . According to the correlation between change and node, we can divide the whole graph into many slices. Changes in the same slice (that is, semantically related) will be placed in the same subset

Remove gray code

Some changes will not be executed during operation. The author wants to use the code coverage tool to eliminate these changes directly.

Guess you like

Origin blog.csdn.net/Kaiser_syndrom/article/details/105302327